Antigen prediction method, apparatus, device, and storage medium

By fusing the gene, sequence, and three-dimensional structural features of immune cell receptors for antigen prediction, the problem of predicting the specific binding of immune cell receptors to antigens has been solved, improving prediction accuracy and research efficiency.

CN115171787BActive Publication Date: 2026-06-16TENCENT TECHNOLOGY (SHENZHEN) CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
TENCENT TECHNOLOGY (SHENZHEN) CO LTD
Filing Date
2022-07-08
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

Existing technologies struggle to effectively predict the specific binding of immune cell receptors to antigens, impacting the efficiency of immunotherapy and vaccine design.

Method used

By inputting the genetic information, sequence information, and three-dimensional structural features of immune cell receptors into the antigen prediction model, feature extraction and fusion are performed. The probability of candidate antigens corresponding to immune cell receptors is output using fully connected and normalized methods to determine the target antigen.

🎯Benefits of technology

It improved the accuracy of antigen prediction, reduced the number of experiments, and increased the efficiency of scientific research and vaccine design.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115171787B_ABST
    Figure CN115171787B_ABST
Patent Text Reader

Abstract

The application discloses an antigen prediction method, device and equipment and a storage medium, and belongs to the technical field of computers. Through the technical scheme provided by the embodiment of the application, the antigen prediction model extracts the gene information and sequence of the immune cell receptor to obtain the gene features and sequence features of the immune cell receptor. In the process of obtaining the receptor features of the immune cell receptor, the gene features, sequence features and three-dimensional structure features are fused. The introduction of the three-dimensional structure features enriches the content of the receptor features and improves the expression ability of the receptor features, so that when the antigen prediction is performed based on the receptor features, the accuracy of the target antigen obtained is higher.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of computer technology, and in particular to an antigen prediction method, apparatus, device, and storage medium. Background Technology

[0002] The human immune system consists of innate immunity and adaptive immunity. The adaptive immune system is mediated by various immune cells, which can respond specifically to certain pathogens. Immune cell receptors are the regions on which immune cells recognize antigens. Successful antigen recognition activates the immune system to eliminate pathogens, playing a crucial role in maintaining human health. Immune cell receptors exhibit antigen specificity, meaning that a particular immune cell receptor can only bind to a specific antigen. Studying the antigen specificity of immune cell receptors is essential for understanding the immune system and can further promote the design and development of immunotherapy and vaccines. Therefore, there is an urgent need for a method to predict antigens that can specifically bind to immune cell receptors. Summary of the Invention

[0003] This application provides an antigen prediction method, apparatus, device, and storage medium, which can predict antigens that specifically bind to immune cell receptors. The technical solution is as follows:

[0004] On the one hand, an antigen prediction method is provided, the method comprising:

[0005] The genetic information, sequence information, and three-dimensional structural features of immune cell receptors are input into the antigen prediction model;

[0006] The antigen prediction model is used to extract features from the gene and sequence information of the immune cell receptor to obtain the gene and sequence features of the immune cell receptor.

[0007] The antigen prediction model is used to fuse the gene features, sequence features, and three-dimensional structural features of the immune cell receptor to obtain the receptor features of the immune cell receptor.

[0008] The antigen prediction model is used to perform fully connected and normalized analysis on the receptor features of the immune cell receptor, and outputs the probability that the immune cell receptor corresponds to multiple candidate antigens.

[0009] Based on the probability that the immune cell receptor corresponds to multiple candidate antigens, a target antigen is determined from the multiple candidate antigens, wherein the target antigen is an antigen that can specifically bind to the immune cell receptor.

[0010] On the one hand, a training method for an antigen prediction model is provided, the method comprising:

[0011] Input the gene information, sequence information, and three-dimensional structural features of the immune cell receptors in the sample into the antigen prediction model;

[0012] The antigen prediction model is used to extract features from the gene and sequence information of the immune cell receptors in the sample, thereby obtaining the gene and sequence features of the immune cell receptors in the sample.

[0013] The antigen prediction model is used to fuse the gene features, sequence features, and three-dimensional structural features of the immune cell receptors in the sample to obtain the receptor features of the immune cell receptors in the sample.

[0014] The antigen prediction model is used to perform a fully connected and normalized process on the receptor features of the immune cell receptors in the sample, and outputs the probability that the immune cell receptors in the sample correspond to multiple candidate predicted antigens.

[0015] Based on the probability that the sample immune cell receptor corresponds to multiple sample candidate antigens, the predicted antigen corresponding to the sample immune cell receptor is determined from the multiple sample candidate antigens;

[0016] The antigen prediction model is trained based on the difference between the predicted antigen and the labeled antigen corresponding to the immune cell receptor of the sample.

[0017] On the one hand, an antigen prediction device is provided, the device comprising:

[0018] The input unit is used to input the genetic information, sequence information, and three-dimensional structural features of immune cell receptors into the antigen prediction model;

[0019] The feature extraction unit is used to extract features from the gene information and sequence information of the immune cell receptor through the antigen prediction model, so as to obtain the gene features and sequence features of the immune cell receptor.

[0020] The feature fusion unit is used to fuse the gene features, sequence features, and three-dimensional structural features of the immune cell receptor through the antigen prediction model to obtain the receptor features of the immune cell receptor.

[0021] An antigen prediction unit is used to perform fully connected and normalized receptor features of the immune cell receptor through the antigen prediction model, and output the probability that the immune cell receptor corresponds to multiple candidate antigens; based on the probability that the immune cell receptor corresponds to multiple candidate antigens, a target antigen is determined from the multiple candidate antigens, wherein the target antigen is an antigen that can specifically bind to the immune cell receptor.

[0022] In one possible implementation, the feature extraction unit is used to encode the VDJ information of the immune cell receptor using the gene encoder of the antigen prediction model to obtain the gene features of the immune cell receptor, wherein V is the coding variable region, D is the coding hypervariable region, and J is the coding crosslinking region; and to encode the amino acid sequence of the immune cell receptor using the sequence encoder of the antigen prediction model to obtain the sequence features of the immune cell receptor.

[0023] In one possible implementation, the feature extraction unit is configured to perform any of the following:

[0024] When the immune cell receptor is a B cell receptor, the VJ information of the light chain and the VDJ information of the heavy chain of the immune cell receptor are encoded to obtain the gene characteristics of the immune cell receptor.

[0025] When the immune cell receptor is a T cell receptor, the VJ information of the α chain and the VDJ information of the β chain of the immune cell receptor are encoded to obtain the gene characteristics of the immune cell receptor.

[0026] In one possible implementation, the feature extraction unit is used to perform a full connection on the VJ information of the light chain and the VDJ information of the heavy chain of the immune cell receptor to obtain the gene features of the immune cell receptor, wherein the gene features of the immune cell receptor include the light chain gene features and the heavy chain gene features of the immune cell receptor; the encoding of the VJ information of the α chain and the VDJ information of the β chain of the immune cell receptor to obtain the gene features of the immune cell receptor includes: performing a full connection on the VJ information of the α chain and the VDJ information of the β chain of the immune cell receptor to obtain the gene features of the immune cell receptor, wherein the gene features of the immune cell receptor include the α chain gene features and the β chain gene features of the immune cell receptor.

[0027] In one possible implementation, the feature extraction unit is configured to perform any of the following:

[0028] In the case where the immune cell receptor is a B cell receptor, the sequence encoder of the antigen prediction model encodes the amino acid sequence of the light chain and the amino acid sequence of the heavy chain of the immune cell receptor based on the attention mechanism to obtain the sequence features of the immune cell receptor. The sequence features of the immune cell receptor include the light chain sequence features and the heavy chain sequence features of the immune cell receptor.

[0029] In the case where the immune cell receptor is a T cell receptor, the sequence encoder of the antigen prediction model encodes the amino acid sequence of the α chain and the amino acid sequence of the β chain of the immune cell receptor based on the attention mechanism to obtain the sequence features of the immune cell receptor. The sequence features of the immune cell receptor include the α chain sequence features and the β chain sequence features of the immune cell receptor.

[0030] In one possible implementation, the feature fusion unit is used to splice the gene features and sequence features of the immune cell receptor through the feature fusion module of the antigen prediction model to obtain the gene sequence fusion features of the immune cell receptor; and to perform weighted fusion of the gene sequence fusion features and three-dimensional structural features of the immune cell receptor based on a gated attention mechanism to obtain the receptor features of the immune cell receptor.

[0031] In one possible implementation, the device further includes:

[0032] A three-dimensional structural feature acquisition unit is used to acquire the target amino acid sequence of the immune cell receptor, the target amino acid sequence including the CDR3 region of the immune cell receptor; perform multiple sequence alignment on the target amino acid sequence of the immune cell receptor to obtain at least one reference amino acid sequence, the similarity between the reference amino acid sequence and the target amino acid sequence meeting the similarity condition; acquire the homologous template corresponding to the target amino acid sequence, the homologous template including the structural information of the homologous sequence of the target amino acid sequence; and perform multiple iterations based on the target amino acid sequence, the at least one reference amino acid sequence and the homologous template to obtain the three-dimensional structural features of the immune cell receptor.

[0033] In one possible implementation, the device further includes:

[0034] A three-dimensional structural feature acquisition unit is used to acquire the three-dimensional structural information of the immune cell receptor, wherein the three-dimensional structural information includes the three-dimensional coordinates of multiple amino acids in the immune cell receptor;

[0035] The three-dimensional structural feature acquisition unit is used to perform any of the following:

[0036] The three-dimensional structural information of the immune cell receptor is subjected to graph convolution to obtain the three-dimensional structural features of the immune cell receptor.

[0037] The three-dimensional structural information of the immune cell receptor is encoded based on the attention mechanism to obtain the three-dimensional structural features of the immune cell receptor.

[0038] In one possible implementation, the feature fusion unit is further configured to fuse the gene features, sequence features, three-dimensional structural features, and physicochemical information of amino acids in the immune cell receptor using the antigen prediction model to obtain the receptor features of the immune cell receptor.

[0039] On the one hand, a training device for an antigen prediction model is provided, the device comprising:

[0040] The training information input unit is used to input the gene information, sequence information, and three-dimensional structural features of the immune cell receptors of the sample into the antigen prediction model;

[0041] The training feature extraction unit is used to extract features from the gene information and sequence information of the immune cell receptor of the sample through the antigen prediction model, so as to obtain the gene features and sequence features of the immune cell receptor of the sample.

[0042] The training feature fusion unit is used to fuse the gene features, sequence features, and three-dimensional structural features of the immune cell receptor of the sample through the antigen prediction model to obtain the receptor features of the immune cell receptor of the sample.

[0043] The antigen prediction output unit is used to perform fully connected and normalized receptor features of the sample immune cell receptor through the antigen prediction model, and output the probability that the sample immune cell receptor corresponds to multiple candidate antigens; based on the probability that the sample immune cell receptor corresponds to multiple sample candidate antigens, the predicted antigen corresponding to the sample immune cell receptor is determined from the multiple sample candidate antigens.

[0044] The training unit is used to train the antigen prediction model based on the difference information between the predicted antigen and the labeled antigen corresponding to the immune cell receptor of the sample.

[0045] On one hand, a computer device is provided, the computer device including one or more processors and one or more memories, the one or more memories storing at least one computer program, the computer program being loaded and executed by the one or more processors to implement the antigen prediction method or the training method of the antigen prediction model.

[0046] On one hand, a computer-readable storage medium is provided, wherein at least one computer program is stored in the computer-readable storage medium, the computer program being loaded and executed by a processor to implement the antigen prediction method or the training method of the antigen prediction model.

[0047] On one hand, a computer program product or computer program is provided, which includes program code stored in a computer-readable storage medium. A processor of a computer device reads the program code from the computer-readable storage medium and executes the program code, causing the computer device to perform the antigen prediction method or the training method of the antigen prediction model described above.

[0048] The technical solution provided in this application allows the antigen prediction model to extract features from the gene information and sequence of immune cell receptors, obtaining the gene and sequence characteristics of the immune cell receptors. In the process of obtaining the receptor characteristics of immune cell receptors, gene features, sequence features, and three-dimensional structural features are integrated. The introduction of three-dimensional structural features enriches the content of receptor features and improves the expression ability of receptor features, thus resulting in higher accuracy of target antigens obtained when performing antigen prediction based on receptor features. Attached Figure Description

[0049] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0050] Figure 1 This is a schematic diagram of the implementation environment of an antigen prediction method provided in an embodiment of this application;

[0051] Figure 2 This is a flowchart of an antigen prediction method provided in an embodiment of this application;

[0052] Figure 3 This is a flowchart of another antigen prediction method provided in the embodiments of this application;

[0053] Figure 4 This is a flowchart illustrating the determination of three-dimensional structural features provided in an embodiment of this application;

[0054] Figure 5 This is a flowchart of another antigen prediction method provided in the embodiments of this application;

[0055] Figure 6 This is a schematic diagram of an experimental result provided in an embodiment of this application;

[0056] Figure 7 This is a flowchart of a training method for an antigen prediction model provided in an embodiment of this application;

[0057] Figure 8 This is a schematic diagram of the structure of an antigen prediction device provided in an embodiment of this application;

[0058] Figure 9 This is a schematic diagram of the structure of a training device for an antigen prediction model provided in an embodiment of this application;

[0059] Figure 10 This is a schematic diagram of the structure of a terminal provided in an embodiment of this application;

[0060] Figure 11 This is a schematic diagram of the structure of a server provided in an embodiment of this application. Detailed Implementation

[0061] To make the objectives, technical solutions, and advantages of this application clearer, the embodiments of this application will be described in further detail below with reference to the accompanying drawings.

[0062] In this application, the terms "first," "second," etc., are used to distinguish identical or similar items with essentially the same function. It should be understood that there is no logical or temporal dependency between "first," "second," and "nth," nor are there any restrictions on quantity or execution order.

[0063] Artificial intelligence (AI) is the theory, methods, technology, and application systems that use digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence, perceive the environment, acquire knowledge, and use that knowledge to achieve optimal results. In other words, AI is a comprehensive technology within computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a way similar to human intelligence. AI studies the design principles and implementation methods of various intelligent machines, enabling them to possess the functions of perception, reasoning, and decision-making.

[0064] Artificial intelligence (AI) is a comprehensive discipline encompassing a wide range of fields, including both hardware and software technologies. Fundamental AI technologies generally include sensors, dedicated AI chips, cloud computing, distributed storage, big data processing, operating / interactive systems, and mechatronics. AI software technologies primarily include computer vision, speech processing, natural language processing, and machine learning / deep learning.

[0065] Machine learning (ML) is a multidisciplinary field involving probability theory, statistics, approximation theory, convex analysis, and algorithm complexity theory. It specifically studies how computers can simulate or implement human learning behavior to acquire new knowledge or skills, and reorganize existing knowledge sub-models to continuously improve their performance. Machine learning is the core of artificial intelligence and the fundamental way to endow computers with intelligence; its applications span all areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and instruction-based learning.

[0066] Embedded coding, mathematically speaking, represents a correspondence, that is, mapping data in space X to space Y using a function F. This function F is injective, and the mapping result preserves the structure. An injective function means that the mapped data uniquely corresponds to the original data, and preserving the structure means that the order of the original data remains the same. For example, if there are data X1 and X2 before mapping, after mapping we get Y1 corresponding to X1 and Y2 corresponding to X2. If the original data X1 > X2, then correspondingly, the mapped data Y1 > Y2. For words, this means mapping words to another space to facilitate subsequent machine learning and processing.

[0067] Attention weights represent the importance of a piece of data during training or prediction. Importance indicates the magnitude of the influence of input data on output data. Data with high importance corresponds to higher attention weights, while data with low importance corresponds to lower attention weights. The importance of data varies in different scenarios, and training the model to assign attention weights is essentially the process of determining data importance.

[0068] Immune cells: commonly known as white blood cells, including innate lymphocytes, various phagocytes, and lymphocytes that can recognize antigens and produce specific immune responses.

[0069] T cells, also known as T lymphocytes, originate from pluripotent stem cells in the bone marrow (or from the yolk sac and liver during the embryonic period). During the embryonic and early life stages, some pluripotent stem cells or pre-T cells in the bone marrow migrate to the thymus and differentiate and mature under the induction of thymic hormones, becoming immune-active T cells.

[0070] TCR: T cell receptor (TCR) is a characteristic marker on the surface of all T cells. The function of TCR is to recognize antigens.

[0071] B cells, also known as B lymphocytes, originate from pluripotent stem cells in the bone marrow. Progenitor cells of B lymphocytes are found in the hematopoietic cell islands of the fetal liver (14 days into the embryonic stage in mice or 8-9 weeks into the fetus). Subsequently, the bone marrow gradually takes over as the site of B lymphocyte production and differentiation. Mature B cells primarily reside in the superficial lymph nodes of the lymph node cortex and in the red and white pulp of the spleen. Under antigenic stimulation, B cells can differentiate into plasma cells, which synthesize and secrete antibodies (immunoglobulins), primarily responsible for humoral immunity.

[0072] BCR: The B-cell receptor (BCR) is a molecule located on the surface of B cells that is responsible for the specific recognition and binding of antigens. It is essentially a membrane surface immunoglobulin. BCR has antigen-binding specificity.

[0073] Antigen: refers to any substance that can stimulate the body to produce a specific immune response (humoral immunity and cellular immunity).

[0074] Cloud technology refers to a managed technology that unifies a series of resources such as hardware, software, and networks within a wide area network or local area network to achieve data computing, storage, processing, and sharing.

[0075] The technical solutions provided in this application can also be combined with cloud technology. For example, the trained antigen prediction model can be deployed on a cloud server. Specifically, the medical cloud in cloud technology refers to the use of cloud computing to create a healthcare service cloud platform, based on new technologies such as cloud computing, mobile technology, multimedia, 4G communication, big data, and the Internet of Things, combined with medical technology, thereby achieving the sharing of medical resources and the expansion of medical coverage.

[0076] It should be noted that all information (including but not limited to user device information, user personal information, etc.), data (including but not limited to data used for analysis, stored data, displayed data, etc.), and signals involved in this application have been authorized by the user or fully authorized by all parties, and the collection, use, and processing of related data must comply with the relevant laws, regulations, and standards of the relevant countries and regions. For example, the genetic information involved in this application was obtained with full authorization.

[0077] Figure 1 This is a schematic diagram of the implementation environment of an antigen prediction method provided in this application embodiment. See also... Figure 1 The implementation environment may include terminal 110 and server 140.

[0078] Terminal 110 is connected to server 140 via a wireless or wired network. Optionally, terminal 110 may be a smartphone, tablet, laptop, desktop computer, smartwatch, etc., but is not limited to these. Terminal 110 has an application installed and running that supports antigen prediction.

[0079] Server 140 is a standalone physical server, or a server cluster or distributed system consisting of multiple physical servers, or a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery network (CDN), and big data and artificial intelligence platforms.

[0080] Those skilled in the art will understand that the number of terminals and servers described above can be more or less. For example, there may be only one terminal, or there may be dozens or hundreds of terminals, or even more, in which case other terminals may also be included in the above implementation environment. This application does not limit the number of terminals or the type of devices.

[0081] After introducing the implementation environment of the embodiments of this application, the technical solutions provided by the embodiments of this application will be described below in conjunction with the above implementation environment. In the following description, the terminal is the terminal 110 in the above implementation environment, and the server is the server 140 in the above implementation environment.

[0082] The antigen prediction method provided in this application can be applied in scientific research and vaccine design, specifically in determining the antigen specificity of immune cell receptors. Antigen specificity refers to the target antigen that can specifically bind to the immune cell receptor. Using the technical solution provided in this application, technicians upload the gene information, sequence information, and three-dimensional structural features of the immune cell receptor to a server via a terminal. The server processes these information using a trained antigen prediction model to obtain the receptor characteristics of the immune cell receptor. The gene information of the immune cell receptor includes the VDJ information, the sequence information is the amino acid sequence of the immune cell receptor, and the three-dimensional structural features represent the three-dimensional structure of the immune cell receptor. The server uses the antigen prediction model to predict the antigen based on the receptor characteristics of the immune cell receptor and outputs the target antigen corresponding to the immune cell receptor. This target antigen is the antigen that can specifically bind to the immune cell receptor, allowing technicians to conduct further scientific research or vaccine design based on this target antigen. Using the technical solution provided in this application reduces the number of experiments technicians need to perform based on immune cell receptors, improving the efficiency of scientific research and vaccine design.

[0083] After introducing the implementation environment and application scenarios of the embodiments of this application, the antigen prediction method provided by the embodiments of this application will be described below. The technical solution provided by the embodiments of this application can be executed by a terminal or a server, or by both a terminal and a server. In the following description, the server is used as the execution subject as an example. See [link to documentation]. Figure 2 The method includes the following steps.

[0084] 201. The server inputs the gene information, sequence information, and three-dimensional structural features of immune cell receptors into the antigen prediction model.

[0085] The immune cell receptor is either a T-cell receptor or a B-cell receptor. In some embodiments, the genetic information of the immune cell receptor includes the VDJ information, where V encodes a variable region, D encodes a hypervariable region, and J encodes a crosslinking region. The sequence information of the immune cell receptor is its amino acid sequence. The three-dimensional structural features of the immune cell receptor are determined based on its three-dimensional structure, which represents the positions of multiple amino acids within the receptor and reflects the overall three-dimensional structure. The antigen prediction model is trained based on the genetic information, sequence information, and three-dimensional structural features of the sample's immune cell receptor and has the function of predicting the antigen corresponding to the immune cell receptor.

[0086] 202. The server uses the antigen prediction model to extract features from the gene and sequence information of the immune cell receptor, and obtains the gene and sequence features of the immune cell receptor.

[0087] The process of extracting features from the gene and sequence information of the immune cell receptor is also the process of abstracting and expressing the gene and sequence information of the immune cell receptor. The resulting gene and sequence features can not only represent the gene and sequence information of the immune cell receptor, but also facilitate subsequent processing by the server.

[0088] 203. The server uses the antigen prediction model to fuse the gene features, sequence features, and three-dimensional structural features of the immune cell receptor to obtain the receptor features of the immune cell receptor.

[0089] The receptor features of this immune cell receptor are obtained by fusing gene features, sequence features and three-dimensional structural features, which means that the immune cell receptor can be represented from three aspects: gene, sequence and structure. Therefore, the expression ability of this receptor feature is strong.

[0090] 204. The server uses the antigen prediction model to perform fully connected and normalized receptor features of the immune cell receptor, and outputs the probability that the immune cell receptor corresponds to multiple candidate antigens.

[0091] The process of performing full connectivity and normalization based on the receptor characteristics of the immune cell receptor is also the process of antigen prediction based on the receptor characteristics of the immune cell receptor.

[0092] 205. Based on the probability that the immune cell receptor corresponds to multiple candidate antigens, the server determines the target antigen from the multiple candidate antigens, which is an antigen that can specifically bind to the immune cell receptor.

[0093] The technical solution provided in this application allows the antigen prediction model to extract features from the gene information and sequence of immune cell receptors, obtaining the gene and sequence characteristics of the immune cell receptors. In the process of obtaining the receptor characteristics of immune cell receptors, gene features, sequence features, and three-dimensional structural features are integrated. The introduction of three-dimensional structural features enriches the content of receptor features and improves the expression ability of receptor features, thus resulting in higher accuracy of target antigens obtained when performing antigen prediction based on receptor features.

[0094] Steps 201-205 above are a brief description of the antigen prediction method provided in the embodiments of this application. The antigen prediction method provided in the embodiments of this application will be further described below with reference to some examples. See [link to relevant documentation]. Figure 3 Taking the server as the executing entity as an example, the method includes the following steps.

[0095] 301. The server obtains the three-dimensional structural features of immune cell receptors.

[0096] Among them, immune cell receptors are either T cell receptors or B cell receptors. Immune cell receptors are used to recognize antigens and bind specifically to them, thereby activating the immune system. Immune cell receptors are proteins, and proteins consist of multiple amino acids. The three-dimensional structural features of immune cell receptors are used to represent the spatial positions of these multiple amino acids.

[0097] In one possible implementation, the server obtains the target amino acid sequence of the immune cell receptor, which includes the CDR3 region of the immune cell receptor. The server performs multiple sequence alignment on the target amino acid sequence of the immune cell receptor to obtain at least one reference amino acid sequence, the similarity between the reference amino acid sequence and the target amino acid sequence meeting a similarity condition. The server obtains a homologous template corresponding to the target amino acid sequence, the homologous template including structural information of homologous sequences of the target amino acid sequence. Based on the target amino acid sequence, the at least one reference amino acid sequence, and the homologous template, the server performs multiple rounds of iteration to obtain the three-dimensional structural features of the immune cell receptor.

[0098] Among them, immune cell receptors contain a complementary determining region (CDR), which includes three subregions: CDR1, CDR2, and CDR3. CDR3 is the most variable and plays a key role in antigen recognition.

[0099] In this implementation, the server can determine the three-dimensional structural features of the immune cell receptor based on the target amino acid sequence of the immune cell receptor without the need for observation using other equipment such as cryo-electron microscopy, thereby improving the efficiency of obtaining three-dimensional structural features and reducing the cost of obtaining three-dimensional structural features.

[0100] For example, the server acquires the sequencing data of the immune cell receptor. This sequencing data includes multiple amino acids of the immune cell receptor and their sequence. This sequencing data is obtained by a technician using gene sequencing equipment, and this embodiment does not limit its scope. The server performs data preprocessing on the sequencing data of the immune cell receptor to obtain reference sequencing data. This preprocessing includes eliminating erroneous data and converting the sequencing data into a format that is easy for the server to process. The preprocessing rules are set by the technician according to the actual situation, and this embodiment does not limit their scope. The server performs quality control on the reference sequencing data to obtain the target sequencing data of the immune cell receptor. This quality control includes filtering out dead cells, background estimation, pairing chains, dextramer signal correction, log-rank test, and receptor gene aggregation. The server extracts a target length amino acid sequence containing the CDR3 region from the target sequencing data. This target length amino acid sequence containing the CDR3 region is the target amino acid sequence. The target length is set by technicians according to actual conditions, such as being greater than 50 amino acids, which is not limited in this embodiment. The server searches a gene database based on the target amino acid sequence to obtain at least one reference amino acid sequence. This at least one reference amino acid sequence is an amino acid sequence with a similarity greater than or equal to a similarity threshold with the target amino acid sequence. The similarity between amino acid sequences is determined by comparing the types and order of amino acids in the amino acid sequences. Multiple sequence alignment, also known as multiple sequence alignment, is used to extract and input sequences with similar amino acid sequences from a large database and perform alignment at the same time. Since amino acid sequences with similar sequences generally have similar folding patterns, multiple sequence alignment can add similar sequence structure information to the features. The server searches a structure database based on the target amino acid sequence to obtain the homology template corresponding to the target amino acid sequence. The homology template includes the structural information of homologous sequences of the target amino acid sequence. Based on an attention mechanism, the server performs multiple rounds of iterative encoding on the target amino acid sequence, the at least one reference amino acid sequence, and the homologous template to obtain the distance distribution between each pair of amino acids in the target amino acid sequence and the angle of the chemical bonds connecting them.The server uses an attention mechanism to encode the distance distribution between each pair of amino acids in the target amino acid sequence and the angles of the chemical bonds connecting them, outputting the three-dimensional structural information of the immune cell receptor. This three-dimensional structural information includes the three-dimensional positions of multiple amino acids within the immune cell receptor. The server then extracts features from the three-dimensional structure of the immune cell receptor, for example, by processing it using a graph network to obtain its three-dimensional structural features.

[0101] To illustrate the above implementation methods more clearly, the following will be combined with... Figure 4 The above implementation method will be described.

[0102] See Figure 4 The server preprocesses the sequencing data of the immune cell receptor 401 to obtain reference sequencing data for the immune cell receptor. The server performs quality control on the reference sequencing data 402 to obtain target sequencing data for the immune cell receptor. Quality control 402 includes dead cell removal 4021, background estimation 4022, strand pairing 4023, signal correction 4024, Log-rank test 4025, and receptor gene aggregation 4026. The server performs sequence truncation on the target sequencing data 403 to obtain the target amino acid sequence. The server performs multiple sequence alignment 404 based on the target amino acid sequence to obtain at least one reference amino acid sequence. The server searches a structural database based on the target amino acid sequence to obtain the homologous template corresponding to the target amino acid sequence. Based on an attention mechanism, the server performs multiple rounds of iterative encoding 405 on the target amino acid sequence, the at least one reference amino acid sequence, and the homologous template to obtain the three-dimensional structural information of the immune cell receptor.

[0103] The above implementation describes a method for a server to determine the three-dimensional structural features of an immune cell receptor based on the target amino acid sequence of the receptor. In other possible implementations, the server can use a trained structure prediction model to obtain three-dimensional structural features based on the amino acid sequence. The structure prediction model includes models such as RoseTTAFold, AlphaFold, and AlphaFold2. Of course, with the development of science and technology, other structure prediction models can also be used, and this application does not limit this.

[0104] The following describes a method for obtaining the three-dimensional structural features of an immune cell receptor based on its three-dimensional structural information, wherein the three-dimensional structural information includes the three-dimensional positions of multiple amino acids in the immune cell receptor.

[0105] In one possible implementation, the server acquires the three-dimensional structural information of the immune cell receptor, which includes the three-dimensional coordinates of multiple amino acids in the immune cell receptor. The server performs graph convolution on the three-dimensional structural information of the immune cell receptor to obtain the three-dimensional structural features of the immune cell receptor.

[0106] The three-dimensional structural information refers to the three-dimensional structural file of the immune cell receptor. In some embodiments, the three-dimensional structural information is obtained through images captured by cryo-electron microscopy, or through a structural prediction model based on the amino acid sequence of the immune cell receptor; this application does not limit this method. Graph convolution, or Graph Convolutional Network (GCN), is used to extract features from a graph. In this application embodiment, the nodes in the graph represent amino acids in the immune cell receptor, and the lines in the graph represent the relative positional relationships between amino acids.

[0107] In this implementation, the server can obtain the three-dimensional structural features of the immune cell receptor by directly performing graph convolution on the three-dimensional structural information of the immune cell receptor, without having to determine the three-dimensional structural information of the immune cell receptor first, thus making the determination of the three-dimensional structural features more efficient.

[0108] For example, the server acquires the three-dimensional structural information of the immune cell receptor. Based on this information, the server generates a three-dimensional structural map of the immune cell receptor. Nodes in this map correspond to amino acids of the receptor, and lines represent the connections between amino acids. The node features include the type of the corresponding amino acid and its three-dimensional coordinates. The server then performs graph convolution on the map to obtain the three-dimensional structural features of the immune cell receptor.

[0109] In one possible implementation, the server acquires the three-dimensional structural information of the immune cell receptor, which includes the three-dimensional coordinates of multiple amino acids in the immune cell receptor. The server encodes the three-dimensional structural information of the immune cell receptor based on an attention mechanism to obtain the three-dimensional structural features of the immune cell receptor.

[0110] In this implementation, the server can obtain the three-dimensional structural features of the immune cell receptor by directly encoding the three-dimensional structural information of the immune cell receptor based on the attention mechanism, without having to determine the three-dimensional structural information of the immune cell receptor first, thus making the determination of the three-dimensional structural features more efficient.

[0111] For example, the server acquires the three-dimensional structural information of the immune cell receptor. The server performs embedding encoding on multiple amino acids in this three-dimensional structural information to obtain multiple amino acid embedding features. This embedding encoding process also involves representing the multiple amino acids in a discretized form, facilitating subsequent server processing. The server uses an attention mechanism to encode these multiple amino acid embedding features based on the three-dimensional structural information, obtaining attention weights for each amino acid. Based on these attention weights, the server fuses the multiple amino acid embedding features to obtain the three-dimensional structural features of the immune cell receptor. In some embodiments, the server can use a Transformer model encoder to encode the three-dimensional structural information of the immune cell receptor to obtain its three-dimensional structural features.

[0112] It should be noted that the above two implementation methods are illustrated by the example of the server encoding the three-dimensional structural information of the immune cell receptor using graph convolution and attention mechanisms respectively to obtain the three-dimensional structural features. In other possible implementation methods, the server can also use other models to encode the three-dimensional structural information of the immune cell receptor, and this application embodiment does not limit this.

[0113] It should be noted that step 301 above is an optional step.

[0114] 302. The server inputs the gene information, sequence information, and three-dimensional structural features of immune cell receptors into the antigen prediction model.

[0115] The genetic information of immune cell receptors includes the VDJ information, where V encodes the variable region, D encodes the hypervariable region, and J encodes the crosslinking region. The sequence information of the immune cell receptor is its amino acid sequence; for example, AEGAL is an amino acid sequence where A represents alanine, E represents glutamate, G represents glycine, and L represents leucine. Immune cell receptors are proteins, and their amino acid sequences are also referred to as the one-dimensional structure of proteins. The antigen prediction model is a model trained based on the genetic information, sequence information, and three-dimensional structural features of the sample's immune cell receptors, and it has the function of predicting the antigens corresponding to the immune cell receptors.

[0116] In one possible implementation, the antigen prediction model includes three information encoding channels: a first information encoding channel is a gene information encoding channel, which includes a gene encoder for encoding gene information; a second information encoding channel is a sequence information encoding channel, which includes a sequence encoder for encoding sequence information; and a third information encoding channel is a structural feature encoding channel, which includes a structural encoder for encoding structural features. The server inputs the gene information of the immune cell receptor into the gene information encoding channel of the antigen prediction model, and subsequently encodes the gene information using the gene encoder in that channel. The server also inputs the sequence information of the immune cell receptor into the sequence information encoding channel of the antigen prediction model, and subsequently encodes the sequence information using the sequence encoder in that channel. Finally, the server inputs the three-dimensional structural features of the immune cell receptor into the structural feature encoding channel, and subsequently encodes the three-dimensional structural features using the structural encoder in that channel.

[0117] In some embodiments, before inputting the sequence information of the immune cell receptor into the antigen prediction model, the server can preprocess the sequence information of the immune cell receptor to ensure that the length of the sequence information input into the antigen prediction model is the same. If the length of the immune cell receptor sequence information is greater than a length threshold, the server truncates the portion of the sequence information that is longer than or equal to the length threshold, obtaining sequence information of length equal to the length threshold. This truncated sequence information is then input into the antigen prediction model. If the length of the immune cell receptor sequence information is less than the length threshold, the server fills the sequence information with a target symbol, obtaining sequence information of length equal to the length threshold. This truncated sequence information is then input into the antigen prediction model. The target symbol is set by the technician according to the actual situation, for example, to 0.

[0118] It should be noted that steps 301-302 above are illustrated using the example of the server obtaining the three-dimensional structural features of the immune cell receptor in advance. In other possible implementations, the server may also obtain the three-dimensional structural information of the immune cell receptor in advance, input the three-dimensional structural information into the structural feature encoding channel of the antigen prediction model, and then obtain the three-dimensional structural features of the immune cell receptor through the structural encoder of the structural feature encoding channel. This application embodiment does not limit this.

[0119] Furthermore, the above steps 301-302 are illustrated by taking the server obtaining the three-dimensional structural features of the immune cell receptor and inputting the gene information, sequence information and three-dimensional structural features of the immune cell receptor into the antigen prediction model as an example. In other possible implementations, if the server does not obtain the three-dimensional structural features of the immune cell receptor, it is also possible to input only the gene information and sequence information of the immune cell receptor into the antigen prediction model.

[0120] 303. The server uses the antigen prediction model to extract features from the gene and sequence information of the immune cell receptor, and obtains the gene and sequence features of the immune cell receptor.

[0121] The process of extracting features from the gene and sequence information of the immune cell receptor is also the process of abstracting and expressing the gene and sequence information of the immune cell receptor. The resulting gene and sequence features can not only represent the gene and sequence information of the immune cell receptor, but also facilitate subsequent processing by the server.

[0122] In one possible implementation, the antigen prediction model includes a gene encoder and a sequence encoder. The server uses the gene encoder of the antigen prediction model to encode the VDJ information of the immune cell receptor, obtaining the gene characteristics of the immune cell receptor, where V encodes a variable region, D encodes a hypervariable region, and J encodes a crosslinking region. The server uses the sequence encoder of the antigen prediction model to encode the amino acid sequence of the immune cell receptor, obtaining the sequence characteristics of the immune cell receptor.

[0123] In this implementation, the server can encode the gene information and sequence information of the immune cell receptor through the gene encoder and sequence encoder of the antigen prediction model, respectively. That is, it can extract features from the gene information and sequence information. The resulting gene features and sequence features can represent the immune cell receptor from different dimensions.

[0124] To provide a clearer explanation of the above embodiments, the following description will be divided into two parts.

[0125] In the first part, the server encodes the VDJ information of the immune cell receptor through the gene encoder of the antigen prediction model to obtain the gene characteristics of the immune cell receptor.

[0126] In one possible implementation, when the immune cell receptor is a B cell receptor, the server encodes the VJ information of the light chain and the VDJ information of the heavy chain of the immune cell receptor through the gene encoder of the antigen prediction model to obtain the gene characteristics of the immune cell receptor.

[0127] The B-cell receptor consists of two identical heavy chains (H chains) and two identical light chains (L chains), linked by interchain disulfide bonds to form a tetrapeptide structure. The heavy chain has a molecular weight of approximately 50–75 kDa and is composed of 450–550 amino acid residues. The light chain has a molecular weight of approximately 25 kDa and is composed of 214 amino acid residues.

[0128] To illustrate the above embodiments more clearly, three examples will be used below.

[0129] Example 1: The server uses the gene encoder of the antigen prediction model to perform a full connection on the VJ information of the light chain and the VDJ information of the heavy chain of the immune cell receptor to obtain the gene characteristics of the immune cell receptor. The gene characteristics of the immune cell receptor include the light chain gene characteristics and the heavy chain gene characteristics of the immune cell receptor.

[0130] In one possible implementation, the antigen prediction model includes two gene encoders. The server uses the first gene encoder of the antigen prediction model to splice the VJ information of the light chain of the B cell receptor to obtain the light chain gene information of the B cell receptor. The server uses the second gene encoder of the antigen prediction model to splice the VDJ information of the light chain of the B cell receptor to obtain the heavy chain gene information of the B cell receptor. The server uses the first gene encoder of the antigen prediction model to perform two full connections on the light chain gene information of the B cell receptor to obtain the light chain gene features of the B cell receptor. The server uses the second gene encoder of the antigen prediction model to perform two full connections on the heavy chain gene information of the B cell receptor to obtain the heavy chain gene features of the B cell receptor. The light chain gene features and the heavy chain gene features of the B cell receptor constitute the gene features of the B cell receptor.

[0131] Example 2: The server uses the gene encoder of the antigen prediction model to convolve the VJ information of the light chain and the VDJ information of the heavy chain of the immune cell receptor to obtain the gene features of the immune cell receptor. The gene features of the immune cell receptor include the light chain gene features and the heavy chain gene features of the immune cell receptor.

[0132] In one possible implementation, the antigen prediction model includes two gene encoders. The server uses the first gene encoder of the antigen prediction model to splice the VJ information of the light chain of the B cell receptor to obtain the light chain gene information of the B cell receptor. The server uses the second gene encoder of the antigen prediction model to splice the VDJ information of the light chain of the B cell receptor to obtain the heavy chain gene information of the B cell receptor. The server uses the first gene encoder of the antigen prediction model to perform two convolutions on the light chain gene information of the B cell receptor to obtain the light chain gene features of the B cell receptor. The server uses the second gene encoder of the antigen prediction model to perform two convolutions on the heavy chain gene information of the B cell receptor to obtain the heavy chain gene features of the B cell receptor. The light chain gene features and heavy chain gene features of the B cell receptor constitute the gene features of the B cell receptor.

[0133] Example 3: The server uses the gene encoder of the antigen prediction model to encode the VJ information of the light chain and the VDJ information of the heavy chain of the immune cell receptor based on the attention mechanism, thereby obtaining the gene features of the immune cell receptor. The gene features of the immune cell receptor include the light chain gene features and the heavy chain gene features of the immune cell receptor.

[0134] In one possible implementation, the antigen prediction model includes two gene encoders. The server uses the first gene encoder of the antigen prediction model to splice the VJ information of the light chain of the B cell receptor to obtain the light chain gene information of the B cell receptor. The server uses the second gene encoder of the antigen prediction model to splice the VDJ information of the light chain of the B cell receptor to obtain the heavy chain gene information of the B cell receptor. The server uses the first gene encoder of the antigen prediction model to encode the light chain gene information of the B cell receptor based on an attention mechanism to obtain the light chain gene features of the B cell receptor. The server uses the second gene encoder of the antigen prediction model to encode the heavy chain gene information of the B cell receptor based on an attention mechanism to obtain the heavy chain gene features of the B cell receptor. The light chain gene features and the heavy chain gene features of the B cell receptor constitute the gene features of the B cell receptor.

[0135] The above explanation uses B cell receptors as an example of immune cell receptors. The following explanation uses T cell receptors as an example of immune cell receptors.

[0136] In one possible implementation, when the immune cell receptor is a T cell receptor, the server encodes the VJ information of the α chain and the VDJ information of the β chain of the immune cell receptor through the gene encoder of the antigen prediction model to obtain the gene characteristics of the immune cell receptor.

[0137] Some T-cell receptors include both α- and β-chains; these are also known as αβ-TCRs. Other T-cell receptors include both γ- and δ-chains; these are also known as γδ-TCRs. Since the number of αβ-TCRs in the human body far exceeds the number of γδ-TCRs, the following description will use αβ-TCRs as an example. γδ-TCRs have a similar structure to αβ-TCRs, both being double-stranded, and their processing follows the same inventive concept; the implementation process is described below.

[0138] To illustrate the above embodiments more clearly, three examples will be used below.

[0139] Example 1: The server uses the gene encoder of the antigen prediction model to perform a full connection on the VJ information of the α chain and the VDJ information of the β chain of the immune cell receptor to obtain the gene features of the immune cell receptor. The gene features of the immune cell receptor include the gene features of the α chain and the gene features of the β chain of the immune cell receptor.

[0140] In one possible implementation, the antigen prediction model includes two gene encoders. The server uses the first gene encoder of the antigen prediction model to splice the VJ information of the α chain of the T cell receptor to obtain the α chain gene information of the T cell receptor. The server uses the second gene encoder of the antigen prediction model to splice the VDJ information of the α chain of the T cell receptor to obtain the β chain gene information of the T cell receptor. The server uses the first gene encoder of the antigen prediction model to perform two full connections on the α chain gene information of the T cell receptor to obtain the α chain gene features of the T cell receptor. The server uses the second gene encoder of the antigen prediction model to perform two full connections on the β chain gene information of the T cell receptor to obtain the β chain gene features of the T cell receptor. The α chain gene features and β chain gene features of the T cell receptor constitute the gene features of the T cell receptor.

[0141] Example 2: The server uses the gene encoder of the antigen prediction model to convolve the VJ information of the α chain and the VDJ information of the β chain of the immune cell receptor to obtain the gene features of the immune cell receptor. The gene features of the immune cell receptor include the gene features of the α chain and the gene features of the β chain of the immune cell receptor.

[0142] In one possible implementation, the antigen prediction model includes two gene encoders. The server uses the first gene encoder of the antigen prediction model to splice the VJ information of the α chain of the T cell receptor to obtain the α chain gene information of the T cell receptor. The server uses the second gene encoder of the antigen prediction model to splice the VDJ information of the α chain of the T cell receptor to obtain the β chain gene information of the T cell receptor. The server uses the first gene encoder of the antigen prediction model to perform two convolutions on the α chain gene information of the T cell receptor to obtain the α chain gene features of the T cell receptor. The server uses the second gene encoder of the antigen prediction model to perform two convolutions on the β chain gene information of the T cell receptor to obtain the β chain gene features of the T cell receptor. The α chain gene features and β chain gene features of the T cell receptor constitute the gene features of the T cell receptor.

[0143] Example 3: The server uses the gene encoder of the antigen prediction model to encode the VJ information of the α chain and the VDJ information of the β chain of the immune cell receptor based on the attention mechanism, thereby obtaining the gene features of the immune cell receptor. The gene features of the immune cell receptor include the gene features of the α chain and the gene features of the β chain of the immune cell receptor.

[0144] In one possible implementation, the antigen prediction model includes two gene encoders. The server uses the first gene encoder of the antigen prediction model to splice the VJ information of the α chain of the T cell receptor to obtain the α chain gene information of the T cell receptor. The server uses the second gene encoder of the antigen prediction model to splice the VDJ information of the α chain of the T cell receptor to obtain the β chain gene information of the T cell receptor. The server uses the first gene encoder of the antigen prediction model to encode the α chain gene information of the T cell receptor based on an attention mechanism to obtain the α chain gene features of the T cell receptor. The server uses the second gene encoder of the antigen prediction model to encode the β chain gene information of the T cell receptor based on an attention mechanism to obtain the β chain gene features of the T cell receptor. The α chain gene features and β chain gene features of the T cell receptor constitute the gene features of the T cell receptor.

[0145] The second part involves the server encoding the amino acid sequence of the immune cell receptor using the sequence encoder of the antigen prediction model, thereby obtaining the sequence characteristics of the immune cell receptor.

[0146] In one possible implementation, when the immune cell receptor is a B cell receptor, the server encodes the amino acid sequences of the light chain and heavy chain of the immune cell receptor using the sequence encoder of the antigen prediction model, based on an attention mechanism, to obtain the sequence features of the immune cell receptor. These sequence features include both the light chain and heavy chain sequence features of the immune cell receptor. In some embodiments, the sequence encoder is an encoder of a Transformer model.

[0147] For example, the antigen prediction model includes two sequence encoders. When the immune cell receptor is a B cell receptor, the server uses the first sequence encoder of the antigen prediction model to encode the amino acid sequence of the light chain of the B cell receptor, obtaining light chain embedding features, where one light chain embedding feature corresponds to one amino acid on the light chain. The server then uses the first sequence encoder to encode multiple light chain embedding features based on the sequence of multiple amino acids in the B cell receptor's amino acid sequence, obtaining attention weights corresponding to each light chain embedding feature. The server then uses the first sequence encoder to weight and fuse these multiple light chain embedding features based on the attention weights corresponding to each light chain embedding feature, obtaining the light chain sequence features of the B cell receptor. The server uses the second sequence encoder of the antigen prediction model to encode the amino acid sequence of the heavy chain of the B cell receptor, obtaining heavy chain embedding features, where one heavy chain embedding feature corresponds to one amino acid on the heavy chain. The server then uses the second sequence encoder to encode multiple heavy chain embedding features based on the sequence of multiple amino acids in the B cell receptor's amino acid sequence, obtaining attention weights corresponding to each heavy chain embedding feature. The server, through the second sequence encoder, weights and fuses multiple heavy chain embedding features based on the attention weights corresponding to each heavy chain embedding feature to obtain the heavy chain sequence features of the B cell receptor. The light chain sequence features and the heavy chain sequence features of the B cell receptor constitute the sequence features of the B cell receptor. In some embodiments, the embedding encoding can adopt a one-hot encoding method or other methods, which are not limited in this application embodiment.

[0148] In one possible implementation, when the immune cell receptor is a T cell receptor, the server encodes the amino acid sequence of the α chain and the amino acid sequence of the immune cell receptor based on the attention mechanism using the sequence encoder of the antigen prediction model to obtain the sequence features of the immune cell receptor, which include the α chain sequence features and the β chain sequence features of the immune cell receptor.

[0149] For example, the antigen prediction model includes two sequence encoders. When the immune cell receptor is a T-cell receptor, the server uses the first sequence encoder of the antigen prediction model to encode the amino acid sequence of the α-chain of the T-cell receptor, obtaining the α-chain embedding features, where one α-chain embedding feature corresponds to one amino acid on the α-chain. The server then uses the first sequence encoder to encode multiple α-chain embedding features based on the sequence of multiple amino acids in the T-cell receptor's amino acid sequence, obtaining the attention weights corresponding to each α-chain embedding feature. The server then uses the first sequence encoder to weight and fuse the multiple α-chain embedding features based on the attention weights corresponding to each α-chain embedding feature, obtaining the α-chain sequence features of the T-cell receptor. The server uses the second sequence encoder of the antigen prediction model to encode the amino acid sequence of the β-chain of the T-cell receptor, obtaining the β-chain embedding features, where one β-chain embedding feature corresponds to one amino acid on the β-chain. The server then uses the second sequence encoder to encode multiple β-chain embedding features based on the sequence of multiple amino acids in the T-cell receptor's amino acid sequence, obtaining the attention weights corresponding to each β-chain embedding feature. The server, through the second sequence encoder, weights and fuses multiple β-chain embedding features based on the attention weights corresponding to each β-chain embedding feature to obtain the β-chain sequence features of the T cell receptor. The light chain sequence features and the heavy chain sequence features of the T cell receptor constitute the sequence features of the T cell receptor.

[0150] 304. The server uses the antigen prediction model to fuse the gene features, sequence features, and three-dimensional structural features of the immune cell receptor to obtain the receptor features of the immune cell receptor.

[0151] The receptor features of this immune cell receptor are obtained by fusing gene features, sequence features and three-dimensional structural features, which can represent the immune cell receptor from three aspects: gene, sequence and structure. The receptor features can represent the immune cell receptor relatively completely.

[0152] In one possible implementation, the server uses the feature fusion module of the antigen prediction model to splice together the gene features and sequence features of the immune cell receptor to obtain the gene sequence fusion feature of the immune cell receptor. Then, using the feature fusion module of the antigen prediction model, the server performs a weighted fusion of the gene sequence fusion feature and the three-dimensional structural features of the immune cell receptor based on a gated attention mechanism to obtain the receptor feature of the immune cell receptor.

[0153] In this implementation, the server first fuses the gene features and sequence features of the immune cell receptor using a feature fusion module to obtain the gene sequence fusion feature of the immune cell receptor. The server then uses a gated attention mechanism to fuse the sequence fusion feature and the three-dimensional structural feature, ultimately obtaining the receptor feature of the immune cell receptor. The introduction of the gated attention mechanism allows the model to focus more on content of higher importance. Through the feature fusion method provided by the above implementation, gene features, sequence features, and three-dimensional structural features can be organically combined, resulting in receptor features with stronger expressive power.

[0154] In the case where the immune cell receptor is a B cell receptor, the gene characteristics of the B cell receptor include the light chain gene characteristics and the heavy chain gene characteristics, and the sequence characteristics of the B cell receptor include the light chain sequence characteristics and the heavy chain sequence characteristics. The server, through the feature fusion module, adds the light chain gene characteristics and the light chain sequence characteristics of the B cell receptor to obtain the light chain gene sequence characteristics. The server, through the feature fusion module, adds the heavy chain gene characteristics and the heavy chain sequence characteristics of the B cell receptor to obtain the heavy chain gene sequence characteristics. The server, through the feature fusion module, splices the light chain gene sequence characteristics and the heavy chain gene sequence characteristics of the B cell receptor to obtain the gene sequence fusion characteristics of the B cell receptor. The server, through the feature fusion module, uses an attention mechanism to encode the gene sequence fusion characteristics and the three-dimensional structural characteristics of the B cell receptor, obtaining a first attention weight for the gene sequence fusion characteristics encoding the three-dimensional structural characteristics and a second attention weight for the three-dimensional structural characteristics encoding the gene sequence fusion characteristics. The server, through the feature fusion module, processes the first attention weight and the second attention weight using a gating function to obtain a first gating weight and a second gating weight. These first and second gating weights are used to control the information flow during feature fusion. The server, through the feature fusion module, uses the first gating weight to weightedly fuse the gene sequence fusion feature and the three-dimensional structural feature of the B cell receptor to obtain the target gene sequence fusion feature of the B cell receptor. In some embodiments, this is achieved by multiplying the first gating weight by the three-dimensional structural feature and then adding it to the gene sequence fusion feature to obtain the target gene sequence fusion feature. The server, through the feature fusion module, uses the second gating weight to weightedly fuse the gene sequence fusion feature and the three-dimensional structural feature of the B cell receptor to obtain the target three-dimensional structural feature of the B cell receptor. In some embodiments, this is achieved by multiplying the second gating weight by the gene sequence fusion feature and then adding it to the three-dimensional structural feature to obtain the target three-dimensional structural feature. The server uses this feature fusion module to perform tensor fusion of the target gene sequence fusion feature and the target three-dimensional structure feature. For example, it multiplies the target gene sequence fusion feature with the target three-dimensional structure to obtain the initial receptor feature of the B cell receptor. The server then uses this feature fusion module to perform at least two full connections on the initial receptor feature of the B cell receptor to obtain the receptor feature of the B cell receptor.

[0155] In the case where the immune cell receptor is a T-cell receptor, the genetic features of the T-cell receptor include the α-chain genetic features and the β-chain genetic features, and the sequence features of the T-cell receptor include the α-chain sequence features and the β-chain sequence features. The server, through the feature fusion module, adds the α-chain genetic features and the α-chain sequence features of the T-cell receptor to obtain the α-chain genetic sequence features. The server, through the feature fusion module, adds the β-chain genetic features and the β-chain sequence features of the T-cell receptor to obtain the β-chain genetic sequence features. The server, through the feature fusion module, splices the α-chain genetic sequence features and the β-chain genetic sequence features of the T-cell receptor to obtain the gene sequence fusion features of the T-cell receptor. The server, through the feature fusion module, uses an attention mechanism to encode the gene sequence fusion features and the three-dimensional structural features of the T-cell receptor, obtaining a third attention weight for the gene sequence fusion features encoding the three-dimensional structural features and a fourth attention weight for the three-dimensional structural features encoding the gene sequence fusion features. The server, through the feature fusion module, processes the third and fourth attention weights using a gating function to obtain third and fourth gating weights. These third and fourth gating weights are used to control the information flow during feature fusion. The server, through the feature fusion module, uses the third gating weight to weightedly fuse the gene sequence fusion feature and three-dimensional structural feature of the T cell receptor to obtain the target gene sequence fusion feature of the T cell receptor. In some embodiments, this is achieved by multiplying the third gating weight by the three-dimensional structural feature and then adding it to the gene sequence fusion feature. The server, through the feature fusion module, uses the fourth gating weight to weightedly fuse the gene sequence fusion feature and three-dimensional structural feature of the T cell receptor to obtain the target three-dimensional structural feature of the T cell receptor. In some embodiments, this is achieved by multiplying the fourth gating weight by the gene sequence fusion feature and then adding it to the three-dimensional structural feature to obtain the target three-dimensional structural feature. The server uses the feature fusion module to perform tensor fusion between the target gene sequence fusion feature and the target three-dimensional structure feature. For example, it multiplies the target gene sequence fusion feature with the target three-dimensional structure to obtain the initial receptor feature of the T cell receptor. The server then uses this feature fusion module to perform at least two full connections on the initial receptor feature of the T cell receptor to obtain the receptor feature of the T cell receptor.

[0156] In one possible implementation, the server uses the feature fusion module of the antigen prediction model to add the gene features and sequence features of the immune cell receptor to obtain the gene sequence fusion feature of the immune cell receptor. The server then uses the feature fusion module to splice and perform at least one full connection on the sequence-based and three-dimensional structural features of the immune cell receptor to obtain the receptor feature of the immune cell receptor.

[0157] In this implementation, the server utilizes the feature fusion module to quickly fuse the gene features, sequence features, and three-dimensional structural features of the immune cell receptor through addition, splicing, and full connection, thereby obtaining the receptor features of the immune cell receptor with high efficiency.

[0158] In the case where the immune cell receptor is a B cell receptor, the gene characteristics of the B cell receptor include the light chain gene characteristics and the heavy chain gene characteristics, and the sequence characteristics of the B cell receptor include the light chain sequence characteristics and the heavy chain sequence characteristics. The server, through the feature fusion module, adds the light chain gene characteristics and the light chain sequence characteristics of the B cell receptor to obtain the light chain gene sequence characteristics. The server, through the feature fusion module, adds the heavy chain gene characteristics and the heavy chain sequence characteristics of the B cell receptor to obtain the heavy chain gene sequence characteristics. The light chain gene sequence characteristics and the heavy chain gene sequence characteristics constitute the gene sequence fusion characteristics of the B cell receptor. The server, through the feature fusion module, splices the gene sequence fusion characteristics and the three-dimensional structural characteristics of the B cell receptor to obtain the initial receptor characteristics of the B cell receptor. The server, through the feature fusion module, performs at least one full connection on the initial receptor characteristics of the B cell receptor to obtain the receptor characteristics of the B cell receptor.

[0159] In the case where the immune cell receptor is a T-cell receptor, the genetic characteristics of the T-cell receptor include the α-chain gene characteristics and the β-chain gene characteristics, and the sequence characteristics of the T-cell receptor include the α-chain sequence characteristics and the β-chain sequence characteristics. The server, through the feature fusion module, adds the α-chain gene characteristics and the α-chain sequence characteristics of the T-cell receptor to obtain the α-chain gene sequence characteristics. The server, through the feature fusion module, adds the β-chain gene characteristics and the β-chain sequence characteristics of the T-cell receptor to obtain the β-chain gene sequence characteristics. The α-chain gene sequence characteristics and the β-chain gene sequence characteristics constitute the gene sequence fusion characteristics of the T-cell receptor. The server, through the feature fusion module, splices the gene sequence fusion characteristics and the three-dimensional structural characteristics of the T-cell receptor to obtain the initial receptor characteristics of the T-cell receptor. The server, through the feature fusion module, performs at least one full connection on the initial receptor characteristics of the T-cell receptor to obtain the receptor characteristics of the T-cell receptor.

[0160] It should be noted that the above description is based on the example of the server fusing the gene features, sequence features, and three-dimensional structural features of the immune cell receptor to obtain the receptor features of the immune cell receptor. In other possible implementations, in addition to fusing the gene features, sequence features, and three-dimensional structural features of the immune cell receptor, the server can also fuse other information to obtain the receptor features of the immune cell receptor, as described in the following implementations.

[0161] In one possible implementation, the server uses the feature fusion module of the antigen prediction model to fuse the gene features, sequence features, three-dimensional structural features, and physicochemical information of amino acids in the immune cell receptor to obtain the receptor features of the immune cell receptor.

[0162] The physicochemical information of amino acids in this immune cell receptor includes their physical and chemical properties. Physical properties include basic composition and structure, solubility, melting point, boiling point, optical behavior, and optical rotation. Chemical properties include acidity / basicity and hydrophobicity. Introducing the physicochemical information of amino acids into the receptor characteristics of this immune cell receptor can improve the expression of receptor characteristics, allowing the receptor characteristics to more completely represent the immune cell receptor.

[0163] For example, the server uses this feature fusion module to splice together the gene features and sequence features of the immune cell receptor to obtain the gene sequence fusion feature of the immune cell receptor. The server then uses the feature fusion module of the antigen prediction model, based on a gated attention mechanism, to perform a weighted fusion of the gene sequence fusion feature and the three-dimensional structural features of the immune cell receptor to obtain the initial receptor feature. Finally, the server uses the feature fusion module to add the initial receptor feature of the immune cell receptor to the physicochemical information of the amino acids in the immune cell receptor to obtain the receptor feature of the immune cell receptor.

[0164] 305. The server uses the antigen prediction model to perform fully connected and normalized receptor features of the immune cell receptor, and outputs the probability that the immune cell receptor corresponds to multiple candidate antigens.

[0165] In one possible implementation, the server uses the classification module of the antigen prediction model to perform a fully connected operation on the receptor features of the immune cell receptor, obtaining a classification matrix for the immune cell receptor. The server then uses this classification module to normalize the classification matrix of the immune cell receptor, obtaining a probability set corresponding to the immune cell receptor. This probability set includes multiple probabilities, each corresponding to a candidate antigen. This classification module is also referred to as the classification head.

[0166] 306. The server determines the target antigen from among the multiple candidate antigens based on the probability that the immune cell receptor corresponds to multiple candidate antigens.

[0167] In one possible implementation, the server uses the classification model to identify the candidate antigen corresponding to the probability that meets the target condition in the probability set as the target antigen. The probability set includes multiple probabilities, each corresponding to a candidate antigen. In some embodiments, the probability that meets the target condition refers to the highest probability in the probability set, or the probability in the probability set that is greater than or equal to a probability threshold. The probability threshold is set by a technician based on actual conditions, and this application embodiment does not limit this. In some embodiments, the classification module includes a multilayer perceptron (MLP).

[0168] In this implementation, the server uses the classification module of the antigen prediction model to make predictions based on the receptor characteristics, and can finally obtain the target antigen corresponding to the immune cell receptor without repeated experiments, which is highly efficient.

[0169] The following will combine Figure 5 The above steps 301-306 will be explained.

[0170] See Figure 5The server inputs the gene information, sequence information, and three-dimensional structural information of the immune cell receptor into the antigen prediction model. This antigen prediction model includes a gene encoder 501, a sequence encoder 502, and a structural encoder 503. The server encodes the gene information of the immune cell receptor using the gene encoder 501 to obtain its gene characteristics. The server encodes the sequence information of the immune cell receptor using the sequence encoder 502 to obtain its sequence characteristics. The server encodes the three-dimensional structural information of the immune cell receptor using the structural encoder 503 to obtain its three-dimensional structural characteristics. The antigen prediction model also includes a feature fusion module 504. The server uses this feature fusion module 504 to concatenate the gene characteristics and sequence characteristics of the immune cell receptor to obtain the gene sequence fusion feature h of the immune cell receptor. bio The server, through the feature fusion module of the antigen prediction model, uses a gated attention mechanism to fuse the gene sequence features of the immune cell receptor. bio and three-dimensional structural features h stru Weighted fusion was performed to obtain the fusion characteristics h of the target gene sequence of the immune cell receptor. / bio and target three-dimensional structural features h / stru The server uses the feature fusion module 504 to fuse the target gene sequence with feature h. / bio Multiply by h with the three-dimensional structure of the target / stru The initial receptor characteristics h of this B cell receptor were obtained. fusion The server uses the feature fusion module 504 to process the initial receptor feature h. fusion Two fully connected components (FC1, FC2) are performed to obtain the receptor feature representation of the B cell receptor. The antigen prediction model also includes a classification module. The server uses this module to predict the antigen based on the receptor features of the immune cell receptor, identifying the target receptor 505 corresponding to the immune cell receptor from multiple candidate antigens.

[0171] It should be noted that the above description is based on the example of a server executing steps 301-306. In other possible implementations, steps 301-306 can also be executed by a terminal, and this application embodiment does not limit this.

[0172] All of the above-mentioned optional technical solutions can be combined in any way to form the optional embodiments of this application, and will not be described in detail here.

[0173] Figure 6The results of testing the antigen prediction method provided in the embodiments of this application on a public dataset are shown.

[0174] See Figure 6 The accuracy of the antigen prediction model provided by the antigen prediction method in this application when tested on a public dataset is as follows: Figure 6 As can be seen from the examples, the accuracy of the antigen prediction model provided in this application is higher than that of other models in related technologies.

[0175] The technical solution provided in this application allows the antigen prediction model to extract features from the gene information and sequence of immune cell receptors, obtaining the gene and sequence characteristics of the immune cell receptors. In the process of obtaining the receptor characteristics of immune cell receptors, gene features, sequence features, and three-dimensional structural features are integrated. The introduction of three-dimensional structural features enriches the content of receptor features and improves the expression ability of receptor features, thus resulting in higher accuracy of target antigens obtained when performing antigen prediction based on receptor features.

[0176] To more clearly illustrate the antigen prediction method provided in the embodiments of this application, the training method of the antigen prediction model provided in the embodiments of this application is described below. (See also...) Figure 7 Taking the server as the executing entity as an example, the method includes the following steps.

[0177] 701. The server inputs the gene information, sequence information, and three-dimensional structural features of the immune cell receptors in the sample into the antigen prediction model.

[0178] Step 701 and step 302 above belong to the same inventive concept. The implementation process is described in the relevant description of step 302 above, and will not be repeated here.

[0179] 702. The server uses the antigen prediction model to extract features from the gene and sequence information of the immune cell receptors in the sample, thereby obtaining the gene and sequence features of the immune cell receptors in the sample.

[0180] Step 702 and step 303 above belong to the same inventive concept. The implementation process is described in the relevant description of step 303 above, and will not be repeated here.

[0181] 703. The server uses the antigen prediction model to fuse the gene features, sequence features, and three-dimensional structural features of the immune cell receptor of the sample to obtain the receptor features of the immune cell receptor of the sample.

[0182] Step 703 and step 304 above belong to the same inventive concept. The implementation process is described in the relevant description of step 304 above, and will not be repeated here.

[0183] 704. The server uses the antigen prediction model to perform fully connected and normalized connections on the receptor features of the immune cell receptors of the sample, and outputs the probability that the immune cell receptors of the sample correspond to multiple candidate predicted antigens.

[0184] Step 704 and step 305 above belong to the same inventive concept. The implementation process is described in the relevant description of step 305 above, and will not be repeated here.

[0185] 705. Based on the probability that the immune cell receptor of the sample corresponds to multiple sample candidate antigens, the server determines the predicted antigen corresponding to the immune cell receptor of the sample from the multiple sample candidate antigens.

[0186] Step 705 and step 306 above belong to the same inventive concept. The implementation process is described in the relevant description of step 306 above, and will not be repeated here.

[0187] 706. The server trains the antigen prediction model based on the difference information between the predicted antigen and the labeled antigen corresponding to the immune cell receptor of the sample.

[0188] In one possible implementation, the server constructs a cross-entropy loss function based on the difference between the predicted antigen and the labeled antigen corresponding to the immune cell receptor. The server then uses gradient descent to train the antigen prediction model using this cross-entropy loss function, that is, to adjust the model parameters of the antigen prediction model.

[0189] It should be noted that steps 701-706 above are illustrated using the example of the server performing one round of training on the antigen prediction model. The process of performing multiple rounds of training on the antigen prediction model belongs to the same inventive concept as steps 701-706 above, and will not be described in detail here.

[0190] Figure 8 This is a schematic diagram of the structure of an antigen prediction device provided in an embodiment of this application. See also... Figure 8 The device includes: an input unit 801, a feature extraction unit 802, a feature fusion unit 803, and an antigen prediction unit 804.

[0191] The input unit 801 is used to input the gene information, sequence information and three-dimensional structural features of immune cell receptors into the antigen prediction model.

[0192] The feature extraction unit 802 is used to extract features from the gene information and sequence information of the immune cell receptor through the antigen prediction model, so as to obtain the gene features and sequence features of the immune cell receptor.

[0193] The feature fusion unit 803 is used to fuse the gene features, sequence features and three-dimensional structural features of the immune cell receptor through the antigen prediction model to obtain the receptor features of the immune cell receptor.

[0194] The antigen prediction unit 804 is used to perform fully connected and normalized receptor features of the immune cell receptor through the antigen prediction model, and output the probability that the immune cell receptor corresponds to multiple candidate antigens; based on the probability that the immune cell receptor corresponds to multiple candidate antigens, the target antigen is determined from the multiple candidate antigens, and the target antigen is an antigen that can specifically bind to the immune cell receptor.

[0195] In one possible implementation, the feature extraction unit 802 is used to encode the VDJ information of the immune cell receptor using the gene encoder of the antigen prediction model to obtain the gene characteristics of the immune cell receptor, wherein V encodes a variable region, D encodes a hypervariable region, and J encodes a crosslinking region. The sequence encoder of the antigen prediction model then encodes the amino acid sequence of the immune cell receptor to obtain the sequence characteristics of the immune cell receptor.

[0196] In one possible implementation, the feature extraction unit 802 is configured to perform any of the following:

[0197] In the case that the immune cell receptor is a B cell receptor, the VJ information of the light chain and the VDJ information of the heavy chain of the immune cell receptor are encoded to obtain the gene characteristics of the immune cell receptor.

[0198] In the case that the immune cell receptor is a T cell receptor, the VJ information of the α chain and the VDJ information of the β chain of the immune cell receptor are encoded to obtain the gene characteristics of the immune cell receptor.

[0199] In one possible implementation, the feature extraction unit 802 is used to perform a full connection of the VJ information of the light chain and the VDJ information of the heavy chain of the immune cell receptor to obtain the genetic features of the immune cell receptor. The genetic features of the immune cell receptor include the genetic features of the light chain and the genetic features of the heavy chain. Encoding the VJ information of the α chain and the VDJ information of the β chain of the immune cell receptor to obtain the genetic features of the immune cell receptor includes: performing a full connection of the VJ information of the α chain and the VDJ information of the β chain of the immune cell receptor to obtain the genetic features of the immune cell receptor. The genetic features of the immune cell receptor include the genetic features of the α chain and the genetic features of the β chain of the immune cell receptor.

[0200] In one possible implementation, the feature extraction unit 802 is configured to perform any of the following:

[0201] In the case that the immune cell receptor is a B cell receptor, the sequence encoder of the antigen prediction model encodes the amino acid sequence of the light chain and the amino acid sequence of the heavy chain of the immune cell receptor based on the attention mechanism to obtain the sequence features of the immune cell receptor. The sequence features of the immune cell receptor include the light chain sequence features and the heavy chain sequence features of the immune cell receptor.

[0202] In the case where the immune cell receptor is a T cell receptor, the sequence encoder of the antigen prediction model encodes the amino acid sequence of the α chain and the amino acid sequence of the β chain of the immune cell receptor based on the attention mechanism, thereby obtaining the sequence features of the immune cell receptor. The sequence features of the immune cell receptor include the α chain sequence features and the β chain sequence features of the immune cell receptor.

[0203] In one possible implementation, the feature fusion unit 803 is used to splice the gene features and sequence features of the immune cell receptor through the feature fusion module of the antigen prediction model to obtain the gene sequence fusion feature of the immune cell receptor. Based on a gated attention mechanism, the gene sequence fusion feature and the three-dimensional structural feature of the immune cell receptor are weighted and fused to obtain the receptor feature of the immune cell receptor.

[0204] In one possible implementation, the device further includes:

[0205] A three-dimensional structural feature acquisition unit is used to acquire the target amino acid sequence of the immune cell receptor, which includes the CDR3 region of the immune cell receptor. Multiple sequence alignment is performed on the target amino acid sequence of the immune cell receptor to obtain at least one reference amino acid sequence, the similarity between the reference amino acid sequence and the target amino acid sequence meeting a similarity condition. A homologous template corresponding to the target amino acid sequence is acquired, the homologous template including structural information of homologous sequences of the target amino acid sequence. Based on the target amino acid sequence, the at least one reference amino acid sequence, and the homologous template, multiple iterations are performed to obtain the three-dimensional structural features of the immune cell receptor.

[0206] In one possible implementation, the device further includes:

[0207] The three-dimensional structural feature acquisition unit is used to acquire the three-dimensional structural information of the immune cell receptor, which includes the three-dimensional coordinates of multiple amino acids in the immune cell receptor.

[0208] This 3D structural feature acquisition unit is used to perform any of the following:

[0209] The three-dimensional structural information of the immune cell receptor is subjected to graph convolution to obtain the three-dimensional structural features of the immune cell receptor.

[0210] The three-dimensional structural information of the immune cell receptor is encoded based on the attention mechanism to obtain the three-dimensional structural features of the immune cell receptor.

[0211] In one possible implementation, the feature fusion unit 803 is further configured to fuse the gene features, sequence features, three-dimensional structural features, and physicochemical information of amino acids in the immune cell receptor using the antigen prediction model to obtain the receptor features of the immune cell receptor.

[0212] It should be noted that the antigen prediction device provided in the above embodiments is only illustrated by the division of the above functional modules when predicting antigens. In practical applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the computer device can be divided into different functional modules to complete all or part of the functions described above. In addition, the antigen prediction device and the antigen prediction method embodiments provided in the above embodiments belong to the same concept, and the specific implementation process can be found in the method embodiments, which will not be repeated here.

[0213] The technical solution provided in this application allows the antigen prediction model to extract features from the gene information and sequence of immune cell receptors, obtaining the gene and sequence characteristics of the immune cell receptors. In the process of obtaining the receptor characteristics of immune cell receptors, gene features, sequence features, and three-dimensional structural features are integrated. The introduction of three-dimensional structural features enriches the content of receptor features and improves the expression ability of receptor features, thus resulting in higher accuracy of target antigens obtained when performing antigen prediction based on receptor features.

[0214] Figure 9 This is a schematic diagram of the structure of a training device for an antigen prediction model provided in an embodiment of this application. See also... Figure 9 The device includes: a training information input unit 901, a training feature extraction unit 902, a training feature fusion unit 903, a predicted antigen output unit 904, and a training unit 905.

[0215] The training information input unit 901 is used to input the gene information, sequence information, and three-dimensional structural features of the immune cell receptor of the sample into the antigen prediction model.

[0216] The training feature extraction unit 902 is used to extract features from the gene information and sequence information of the immune cell receptor of the sample through the antigen prediction model, so as to obtain the gene features and sequence features of the immune cell receptor of the sample.

[0217] The training feature fusion unit 903 is used to fuse the gene features, sequence features, and three-dimensional structural features of the immune cell receptor of the sample through the antigen prediction model to obtain the receptor features of the immune cell receptor of the sample.

[0218] The antigen prediction output unit 904 is used to perform fully connected and normalized receptor features of the immune cell receptor of the sample using the antigen prediction model, and output the probability that the immune cell receptor of the sample corresponds to multiple candidate predicted antigens. Based on the probability that the immune cell receptor of the sample corresponds to multiple candidate antigens, the predicted antigen corresponding to the immune cell receptor of the sample is determined from the multiple candidate antigens.

[0219] Training unit 905 is used to train the antigen prediction model based on the difference information between the predicted antigen and the labeled antigen corresponding to the immune cell receptor of the sample.

[0220] It should be noted that the training device for the antigen prediction model provided in the above embodiments is only illustrated by the division of the above functional modules when training the antigen prediction model. In practical applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the computer device can be divided into different functional modules to complete all or part of the functions described above. In addition, the antigen prediction device and the antigen prediction method embodiments provided in the above embodiments belong to the same concept, and their specific implementation process can be found in the method embodiments, which will not be repeated here.

[0221] This application provides a computer device for performing the above-described method. This computer device can be implemented as a terminal or a server. The structure of the terminal will be described below:

[0222] Figure 10 This is a schematic diagram of the structure of a terminal provided in an embodiment of this application. The terminal 1000 can be a smartphone, tablet computer, laptop computer, or desktop computer. The terminal 1000 may also be referred to as user equipment, portable terminal, laptop terminal, desktop terminal, or other names.

[0223] Typically, terminal 1000 includes one or more processors 1001 and one or more memories 1002.

[0224] Processor 1001 may include one or more processing cores, such as a quad-core processor, an octa-core processor, etc. Processor 1001 may be implemented using at least one hardware form selected from DSP (Digital Signal Processing), FPGA (Field-Programmable Gate Array), and PLA (Programmable Logic Array). Processor 1001 may also include a main processor and a coprocessor. The main processor, also known as a CPU (Central Processing Unit), is used to process data in the wake-up state; the coprocessor is a low-power processor used to process data in the standby state. In some embodiments, processor 1001 may integrate a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content to be displayed on the screen. In some embodiments, processor 1001 may also include an AI (Artificial Intelligence) processor, which is used to handle computational operations related to machine learning.

[0225] The memory 1002 may include one or more computer-readable storage media, which may be non-transitory. The memory 1002 may also include high-speed random access memory and non-volatile memory, such as one or more disk storage devices or flash memory devices. In some embodiments, the non-transitory computer-readable storage media in the memory 1002 are used to store at least one computer program, which is executed by the processor 1001 to implement the antigen prediction method or antigen prediction model training method provided in the method embodiments of this application.

[0226] In some embodiments, the terminal 1000 may also optionally include a peripheral device interface 1003 and at least one peripheral device. The processor 1001, memory 1002, and peripheral device interface 1003 can be connected via a bus or signal line. Each peripheral device can be connected to the peripheral device interface 1003 via a bus, signal line, or circuit board. Specifically, the peripheral device includes at least one of the following: a radio frequency circuit 1004, a display screen 1005, a camera assembly 1006, an audio circuit 1007, and a power supply 1008.

[0227] Peripheral device interface 1003 can be used to connect at least one I / O (Input / Output) related peripheral device to processor 1001 and memory 1002. In some embodiments, processor 1001, memory 1002 and peripheral device interface 1003 are integrated on the same chip or circuit board; in some other embodiments, any one or two of processor 1001, memory 1002 and peripheral device interface 1003 can be implemented on separate chips or circuit boards, which is not limited in this embodiment.

[0228] The radio frequency (RF) circuit 1004 is used to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. The RF circuit 1004 communicates with communication networks and other communication devices via electromagnetic signals. The RF circuit 1004 converts electrical signals into electromagnetic signals for transmission, or converts received electromagnetic signals back into electrical signals. Optionally, the RF circuit 1004 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a user identity module card, etc.

[0229] Display screen 1005 is used to display a user interface (UI). This UI may include graphics, text, icons, video, and any combination thereof. When display screen 1005 is a touch display screen, it also has the ability to collect touch signals on or above its surface. These touch signals can be input as control signals to processor 1001 for processing. In this case, display screen 1005 can also be used to provide virtual buttons and / or a virtual keyboard, also known as soft buttons and / or a soft keyboard.

[0230] The camera assembly 1006 is used to capture images or videos. Optionally, the camera assembly 1006 includes a front-facing camera and a rear-facing camera. Typically, the front-facing camera is located on the front panel of the terminal, and the rear-facing camera is located on the back of the terminal.

[0231] The audio circuit 1007 may include a microphone and a speaker. The microphone is used to collect sound waves from the user and the environment, and convert the sound waves into electrical signals that are input to the processor 1001 for processing, or input to the radio frequency circuit 1004 to realize voice communication.

[0232] The power supply 1008 is used to supply power to the various components in the terminal 1000. The power supply 1008 can be AC ​​power, DC power, a disposable battery, or a rechargeable battery.

[0233] In some embodiments, the terminal 1000 further includes one or more sensors 1009. The one or more sensors 1009 include, but are not limited to: an acceleration sensor 1010, a gyroscope sensor 1011, a pressure sensor 1012, an optical sensor 1013, and a proximity sensor 1014.

[0234] Accelerometer 1010 can detect the magnitude of acceleration on the three coordinate axes of a coordinate system established with terminal 1000.

[0235] The gyroscope sensor 1011 can detect the orientation and rotation angle of the terminal 1000. The gyroscope sensor 1011 can work in conjunction with the accelerometer sensor 1010 to collect the user's 3D movements on the terminal 1000.

[0236] The pressure sensor 1012 can be installed on the side bezel of the terminal 1000 and / or on the lower layer of the display screen 1005. When the pressure sensor 1012 is installed on the side bezel of the terminal 1000, it can detect the user's grip signal on the terminal 1000, and the processor 1001 can perform left / right hand recognition or quick operation based on the grip signal collected by the pressure sensor 1012. When the pressure sensor 1012 is installed on the lower layer of the display screen 1005, the processor 1001 can control the operable controls on the UI interface based on the user's pressure operation on the display screen 1005.

[0237] An optical sensor 1013 is used to collect ambient light intensity. In one embodiment, a processor 1001 can control the display brightness of a display screen 1005 based on the ambient light intensity collected by the optical sensor 1013.

[0238] The proximity sensor 1014 is used to detect the distance between the user and the front of the terminal 1000.

[0239] Those skilled in the art will understand that Figure 10 The structure shown does not constitute a limitation on terminal 1000 and may include more or fewer components than shown, or combine certain components, or use different component arrangements.

[0240] The aforementioned computer equipment can also be implemented as a server. The structure of a server is described below:

[0241] Figure 11This is a schematic diagram of a server structure provided in an embodiment of this application. The server 1100 can vary significantly due to different configurations or performance. It may include one or more Central Processing Units (CPUs) 1101 and one or more memories 1102. The one or more memories 1102 store at least one computer program, which is loaded and executed by the one or more processors 1101 to implement the methods provided in the above-described method embodiments. Of course, the server 1100 may also have wired or wireless network interfaces, a keyboard, and input / output interfaces for input and output. The server 1100 may also include other components for implementing device functions, which will not be elaborated upon here.

[0242] In an exemplary embodiment, a computer-readable storage medium is also provided, such as a memory including a computer program that can be executed by a processor to perform the antigen prediction method or the antigen prediction model training method in the above embodiments. For example, the computer-readable storage medium may be a read-only memory (ROM), a random access memory (RAM), a compact disc read-only memory (CD-ROM), magnetic tape, floppy disk, and optical data storage device, etc.

[0243] In an exemplary embodiment, a computer program product or computer program is also provided, which includes program code stored in a computer-readable storage medium. A processor of a computer device reads the program code from the computer-readable storage medium and executes the program code, causing the computer device to perform the above-described antigen prediction method or antigen prediction model training method.

[0244] In some embodiments, the computer program involved in the present application embodiments may be deployed and executed on a computer device, or executed on multiple computer devices located in one location, or executed on multiple computer devices distributed in multiple locations and interconnected through a communication network. Multiple computer devices distributed in multiple locations and interconnected through a communication network may constitute a blockchain system.

[0245] Those skilled in the art will understand that all or part of the steps of the above embodiments can be implemented by hardware or by a program instructing related hardware. The program can be stored in a computer-readable storage medium, such as a read-only memory, a disk, or an optical disk.

[0246] The above are merely optional embodiments of this application and are not intended to limit this application. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the protection scope of this application.

Claims

1. An antigen prediction method, characterized in that, The method includes: The gene information, sequence information, and three-dimensional structural features of the immune cell receptor are input into the antigen prediction model. The sequence information is the amino acid sequence of the immune cell receptor, and the three-dimensional structural features represent the three-dimensional coordinates of multiple amino acids in the immune cell receptor. The antigen prediction model is used to extract features from the gene and sequence information of the immune cell receptor to obtain the gene and sequence features of the immune cell receptor. The antigen prediction model is used to fuse the gene features, sequence features, and three-dimensional structural features of the immune cell receptor to obtain the receptor features of the immune cell receptor. The antigen prediction model is used to perform fully connected and normalized analysis on the receptor features of the immune cell receptor, and outputs the probability that the immune cell receptor corresponds to multiple candidate antigens. Based on the probability that the immune cell receptor corresponds to multiple candidate antigens, a target antigen is determined from the multiple candidate antigens, wherein the target antigen is an antigen that can specifically bind to the immune cell receptor.

2. The method according to claim 1, characterized in that, The step of extracting features from the gene and sequence information of the immune cell receptor using the antigen prediction model to obtain the gene and sequence features of the immune cell receptor includes: The VDJ information of the immune cell receptor is encoded by the gene encoder of the antigen prediction model to obtain the gene characteristics of the immune cell receptor, wherein V is the coding variable region, D is the coding hypervariable region, and J is the coding crosslinking region. The sequence encoder of the antigen prediction model encodes the amino acid sequence of the immune cell receptor to obtain the sequence characteristics of the immune cell receptor.

3. The method according to claim 2, characterized in that, The encoding of the VDJ information of the immune cell receptor to obtain the gene characteristics of the immune cell receptor includes any one of the following: When the immune cell receptor is a B cell receptor, the VJ information of the light chain and the VDJ information of the heavy chain of the immune cell receptor are encoded to obtain the gene characteristics of the immune cell receptor. When the immune cell receptor is a T cell receptor, the VJ information of the α chain and the VDJ information of the β chain of the immune cell receptor are encoded to obtain the gene characteristics of the immune cell receptor.

4. The method according to claim 3, characterized in that, The encoding of the VJ information of the light chain and the VDJ information of the heavy chain of the immune cell receptor to obtain the gene characteristics of the immune cell receptor includes: The VJ information of the light chain and the VDJ information of the heavy chain of the immune cell receptor are fully linked to obtain the gene characteristics of the immune cell receptor, which include the light chain gene characteristics and the heavy chain gene characteristics of the immune cell receptor. The encoding of the VJ information of the α chain and the VDJ information of the β chain of the immune cell receptor to obtain the gene characteristics of the immune cell receptor includes: The VJ information of the α chain and the VDJ information of the β chain of the immune cell receptor are fully linked to obtain the gene characteristics of the immune cell receptor, which include the gene characteristics of the α chain and the gene characteristics of the β chain of the immune cell receptor.

5. The method according to claim 2, characterized in that, The sequence encoder of the antigen prediction model encodes the amino acid sequence of the immune cell receptor to obtain the sequence characteristics of the immune cell receptor, including any one of the following: In the case where the immune cell receptor is a B cell receptor, the sequence encoder of the antigen prediction model encodes the amino acid sequence of the light chain and the amino acid sequence of the heavy chain of the immune cell receptor based on the attention mechanism to obtain the sequence features of the immune cell receptor. The sequence features of the immune cell receptor include the light chain sequence features and the heavy chain sequence features of the immune cell receptor. In the case where the immune cell receptor is a T cell receptor, the sequence encoder of the antigen prediction model encodes the amino acid sequence of the α chain and the amino acid sequence of the β chain of the immune cell receptor based on the attention mechanism to obtain the sequence features of the immune cell receptor. The sequence features of the immune cell receptor include the α chain sequence features and the β chain sequence features of the immune cell receptor.

6. The method according to claim 1, characterized in that, The antigen prediction model is used to fuse the gene features, sequence features, and three-dimensional structural features of the immune cell receptor to obtain the receptor features of the immune cell receptor, including: The feature fusion module of the antigen prediction model splices together the gene features and sequence features of the immune cell receptor to obtain the gene sequence fusion features of the immune cell receptor. Based on the gating attention mechanism, the gene sequence fusion features and three-dimensional structural features of the immune cell receptor are weighted and fused to obtain the receptor features of the immune cell receptor.

7. The method according to claim 1, characterized in that, Before inputting the gene information, sequence information, and three-dimensional structural features of the immune cell receptor into the antigen prediction model, the method includes: Obtain the target amino acid sequence of the immune cell receptor, wherein the target amino acid sequence includes the CDR3 region of the immune cell receptor; Multiple sequence alignment is performed on the target amino acid sequence of the immune cell receptor to obtain at least one reference amino acid sequence, and the similarity between the reference amino acid sequence and the target amino acid sequence meets the similarity condition. Obtain the homology template corresponding to the target amino acid sequence, wherein the homology template includes the structural information of the homology sequence of the target amino acid sequence; Based on the target amino acid sequence, the at least one reference amino acid sequence, and the homologous template, multiple rounds of iteration are performed to obtain the three-dimensional structural features of the immune cell receptor.

8. The method according to claim 1, characterized in that, Before inputting the gene information, sequence information, and three-dimensional structural features of the immune cell receptor into the antigen prediction model, the method includes: Obtain the three-dimensional structural information of the immune cell receptor, wherein the three-dimensional structural information includes the three-dimensional coordinates of multiple amino acids in the immune cell receptor; The method further includes any of the following: The three-dimensional structural information of the immune cell receptor is subjected to graph convolution to obtain the three-dimensional structural features of the immune cell receptor. The three-dimensional structural information of the immune cell receptor is encoded based on the attention mechanism to obtain the three-dimensional structural features of the immune cell receptor.

9. The method according to claim 1, characterized in that, After extracting features from the gene and sequence information of the immune cell receptor using the antigen prediction model to obtain the gene and sequence features of the immune cell receptor, the method further includes: The antigen prediction model integrates the gene characteristics, sequence characteristics, three-dimensional structural characteristics, and physicochemical information of amino acids in the immune cell receptor to obtain the receptor characteristics of the immune cell receptor.

10. A training method for an antigen prediction model, characterized in that, The method includes: The gene information, sequence information, and three-dimensional structural features of the immune cell receptor of the sample are input into the antigen prediction model. The sequence information is the amino acid sequence of the immune cell receptor of the sample, and the three-dimensional structural features represent the three-dimensional coordinates of multiple amino acids in the immune cell receptor of the sample. The antigen prediction model is used to extract features from the gene and sequence information of the immune cell receptors in the sample, thereby obtaining the gene and sequence features of the immune cell receptors in the sample. The antigen prediction model is used to fuse the gene features, sequence features, and three-dimensional structural features of the immune cell receptors in the sample to obtain the receptor features of the immune cell receptors in the sample. The antigen prediction model is used to perform a fully connected and normalized process on the receptor features of the immune cell receptors in the sample, and outputs the probability that the immune cell receptors in the sample correspond to multiple candidate predicted antigens. Based on the probability that the sample immune cell receptor corresponds to multiple sample candidate antigens, the predicted antigen corresponding to the sample immune cell receptor is determined from the multiple sample candidate antigens; The antigen prediction model is trained based on the difference between the predicted antigen and the labeled antigen corresponding to the immune cell receptor of the sample.

11. An antigen prediction device, characterized in that, The device includes: The input unit is used to input the gene information, sequence information and three-dimensional structural features of the immune cell receptor into the antigen prediction model. The sequence information is the amino acid sequence of the immune cell receptor, and the three-dimensional structural features represent the three-dimensional coordinates of multiple amino acids in the immune cell receptor. The feature extraction unit is used to extract features from the gene information and sequence information of the immune cell receptor through the antigen prediction model, so as to obtain the gene features and sequence features of the immune cell receptor. The feature fusion unit is used to fuse the gene features, sequence features, and three-dimensional structural features of the immune cell receptor through the antigen prediction model to obtain the receptor features of the immune cell receptor. An antigen prediction unit is used to perform fully connected and normalized receptor features of the immune cell receptor through the antigen prediction model, and output the probability that the immune cell receptor corresponds to multiple candidate antigens; based on the probability that the immune cell receptor corresponds to multiple candidate antigens, a target antigen is determined from the multiple candidate antigens, wherein the target antigen is an antigen that can specifically bind to the immune cell receptor.

12. A training device for an antigen prediction model, characterized in that, The device includes: The training information input unit is used to input the gene information, sequence information and three-dimensional structural features of the sample immune cell receptor into the antigen prediction model. The sequence information is the amino acid sequence of the sample immune cell receptor, and the three-dimensional structural features represent the three-dimensional coordinates of multiple amino acids in the sample immune cell receptor. The training feature extraction unit is used to extract features from the gene information and sequence information of the immune cell receptor of the sample through the antigen prediction model, so as to obtain the gene features and sequence features of the immune cell receptor of the sample. The training feature fusion unit is used to fuse the gene features, sequence features, and three-dimensional structural features of the immune cell receptor of the sample through the antigen prediction model to obtain the receptor features of the immune cell receptor of the sample. The antigen prediction output unit is used to perform fully connected and normalized receptor features of the sample immune cell receptor through the antigen prediction model, and output the probability that the sample immune cell receptor corresponds to multiple candidate antigens; based on the probability that the sample immune cell receptor corresponds to multiple sample candidate antigens, the predicted antigen corresponding to the sample immune cell receptor is determined from the multiple sample candidate antigens. The training unit is used to train the antigen prediction model based on the difference information between the predicted antigen and the labeled antigen corresponding to the immune cell receptor of the sample.

13. A computer device, characterized in that, The computer device includes one or more processors and one or more memories, wherein at least one computer program is stored in the one or more memories, and the computer program is loaded and executed by the one or more processors to implement the antigen prediction method as described in any one of claims 1 to 9, or to implement the training method for the antigen prediction model as described in claim 10.

14. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores at least one computer program, which is loaded and executed by a processor to implement the antigen prediction method as described in any one of claims 1 to 9, or to implement the training method for the antigen prediction model as described in claim 10.

15. A computer program product, comprising a computer program, characterized in that, When executed by a processor, the computer program implements the antigen prediction method according to any one of claims 1 to 9, or implements the training method for the antigen prediction model according to claim 10.