Method and assay device for predicting t cell activation of peptide-mhc

By analyzing patient genetic data and using neural network models to predict T cell activation of peptide-MHC, the challenge of discovering highly reactive neoantigens has been solved, enabling high-precision prediction of T cell activation and the development of personalized anti-cancer vaccines, thus improving the effectiveness of cancer immunotherapy.

CN117121109BActive Publication Date: 2026-06-30PETMEDIX GMBH +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
PETMEDIX GMBH
Filing Date
2021-12-16
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Current technologies struggle to efficiently discover novel antigens that are highly reactive to T cells, thus affecting the effectiveness of cancer immunotherapy.

Method used

By analyzing patients' genetic data, a neural network model is used to predict T cell activation of peptide-MHC, and interferon-γ secretion is used as a reference to quickly select new antigens with high T cell activation.

Benefits of technology

This enables high-precision prediction of T-cell activation of antigenic peptide-MHC, supports the development of personalized anti-cancer vaccines, and improves the targeting of cancer immunotherapy.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN117121109B_ABST
    Figure CN117121109B_ABST
Patent Text Reader

Abstract

This method for predicting T cell activation via peptide-MHC includes the following steps: wherein the analytical device: receives genetic data from a patient; identifies, based on the genetic data, the first amino acid sequence of the major histocompatibility complex (MHC) and the second amino acid sequence of an antigen produced by tumor cells; generates a matrix indicating the interrelationship between the first and second amino acid sequences in units of individual amino acids; and inputs the matrix into a trained neural network model to determine whether T cells secrete at least a threshold amount of cytokines due to the binding of MHC to the antigen.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The following description relates to techniques for predicting T cell activation of antigen peptide-major histocompatibility complex (MHC). Background Technology

[0002] Neoantigens are tumor cell-specific proteins. They are expressed due to tumor cell-specific mutations. Neoantigen epitopes are expressed on the major histocompatibility complex (MHC) located on the surface of tumor cells, and T cells recognize MHC epitopes to trigger an immune response.

[0003] Cancer immunotherapy is a treatment that activates the body's immune system to kill tumor cells. Research is underway to discover new antigens that are effective in the field of cancer immunotherapy. Summary of the Invention

[0004] [Technical Issues]

[0005] The following description provides computer techniques for discovering novel antigens that are highly reactive to T cells.

[0006] [Technical Solution]

[0007] In one general aspect, there is a method for predicting T cell activation of the peptide-major histocompatibility complex (MHC), comprising: receiving genetic data of a patient via an analytical device; identifying, via the analytical device, a first amino acid sequence of the MHC and a second amino acid sequence of an antigen produced by tumor cells based on the genetic data; generating, via the analytical device, a matrix indicating the relationship between the first amino acid sequence and the second amino acid sequence in a single amino acid unit; and inputting the matrix via the analytical device into a trained neural network model to determine whether the T cells secrete cytokines greater than or equal to a threshold due to the binding of the MHC to the antigen.

[0008] In another aspect, there is an analytical device for predicting T cell activation of peptide-MHC, comprising: an input device configured to receive genetic data of a patient; a storage device configured to store a neural network model that predicts the amount of cytokine secreted by T cells based on a matrix representing the relationship between the amino acid sequences of MHC and the amino acid sequences of an antigen produced by tumor cells; and a computing device configured to identify a first amino acid sequence of the MHC and a second amino acid sequence of the antigen produced by the tumor cells from the genetic data, generate a matrix representing the relationship between the first amino acid sequence and the second amino acid sequence on a per-amino acid basis, and input the generated matrix into the neural network model to determine whether the patient's MHC-antigen induces interferon-γ secretion by the T cells.

[0009] [Beneficial Effects]

[0010] The techniques described below use deep learning models to rapidly select neoantigens with high T-cell activation from a patient's candidate peptides. These techniques also use interferon-γ secretion levels as a reference to accurately predict T-cell activation of the antigen peptide-major histocompatibility complex (MHC). Attached Figure Description

[0011] Figure 1 This is an example of a system for predicting T cell activation of the peptide-major histocompatibility complex (MHC).

[0012] Figure 2 This is an example of the process of developing personalized cancer vaccines.

[0013] Figure 3 This is an example of the process of training a neural network model.

[0014] Figure 4 This is an example of a process that generates a matrix illustrating peptide-MHC interactions.

[0015] Figure 5 This is an example of the process used to predict T cell activation of peptide-MHC.

[0016] Figure 6 This is an example of an analytical device used to predict T cell activation of peptide-MHC.

[0017] Figure 7 This is an example of experimental results that validate a neural network model.

[0018] Figure 8 This is another example of experimental results validating neural network models. Detailed Implementation

[0019] This disclosure can be modified in various ways and has multiple exemplary embodiments. Therefore, specific exemplary embodiments of this disclosure will be shown and described in detail in the accompanying drawings. However, it should be understood that the invention is not limited to the specific exemplary embodiments, but includes all modifications, equivalents, and substitutions without departing from the scope and spirit of the invention.

[0020] Terms such as “first,” “second,” “A,” and “B” may be used to describe various components, but these components should not be construed as limited to these terms and are used only to distinguish one component from others. For example, without departing from the scope of this disclosure, a first component may be named a second component, and a second component may similarly be named a first component. The term “and / or” includes a combination of or any one of a plurality of related descriptive terms. It should be understood that singular expressions include plural expressions unless the context clearly indicates otherwise, and it should also be understood that the terms “comprising” and “having” as used in this specification specify the presence of the stated features, steps, operations, components, parts, or combinations thereof, but do not exclude the presence or addition of one or more other features, numbers, steps, operations, components, parts, or combinations thereof.

[0021] Before describing the accompanying drawings in detail, it is intended to clarify that the components in this specification are distinguished only by their primary functions. That is, two or more components described below can be combined into one component, or a component can be divided into two or more parts for each sub-function. Furthermore, each component described below, in addition to its primary function, may also perform some or all of the functions of other components, and some primary functions of a component may be performed specifically by other components.

[0022] Furthermore, when executing a method or operation, each process constituting the method may occur in a different order than specified unless a particular order is explicitly described in the context. That is, the individual steps may be executed in the same order as described, may be executed at substantially the same time, or may be executed in the reverse order of described.

[0023] The terminology used in the following description will be described.

[0024] Antigens are substances that induce an immune response.

[0025] Neoantigens are tumor cell-specific antigens generated by mutations or post-translational modifications in tumor cells. Neoantigens can include polypeptide or nucleotide sequences. Mutations can include any genomic or expression alterations that result in frameshifts, insertions, deletions, substitutions, splice site changes, genomic rearrangements, gene fusions, or de novo open reading frames (ORFs). Additionally, mutations can include splice variants. Tumor cell-specific post-translational modifications may include aberrant phosphorylation. Tumor cell-specific post-translational modifications can also include proteasome-generated splice antigens.

[0026] Epitopes can refer to specific parts of an antigen that antibodies or T-cell receptors typically bind to.

[0027] The major histocompatibility complex (MHC) is a peptide structure that acts as a mediator to recognize target substances of an immune response as antigens. The human MHC is called human leukocyte antigen (HLA). In the following text, MHC is used to mean that it includes human HLA.

[0028] A peptide is an amino acid polymer. The techniques described below correspond to techniques used for discovering neoantigens. The peptides used in the following description are amino acid polymers or amino acid sequences expressed in tumor cells. Therefore, the following peptides may be tumor-specific amino acid polymers or amino acid sequences expressed on the surface of tumor cells.

[0029] Peptide-MHC (pMHC) or peptide-MHC complexes are structures of peptides and MHC expressed on the surface of tumor cells. T cells recognize peptide-MHC complexes and execute immune responses.

[0030] Binding degree is the degree of binding between MHC and peptide. Binding preference or binding affinity is the degree of binding affinity between MHC molecules and peptides.

[0031] The sample is a single cell or multiple cells, cell debris, body fluid, etc. from the object to be analyzed.

[0032] The object can be a cell, tissue, or organism. Generally, the object is obtained from a patient with a specific tumor. The object is essentially a person, but is not limited to this.

[0033] An exome is a subset of the genome that encodes proteins. An exome can refer to a group of exons present in a cell, cell population, or organism.

[0034] Genetic data refers to genetic information calculated through the analysis of samples. For example, genetic data can include base sequences obtained from deoxyribonucleic acid (DNA), ribonucleic acid (RNA), or proteins from cells, tissues, etc.; gene expression data; genetic variations with standard genetic data; DNA methylation, etc. Genetic data can be obtained through traditional sequencing methods, next-generation sequencing (NGS), etc. Genetic data is usually digital data and can be calculated in the form of files in a specific format (e.g., FATSQ).

[0035] Machine learning, a field of artificial intelligence, refers to the area of ​​algorithms developed to enable computers to learn. Learning models include decision trees, random forests (RF), k-nearest neighbors (KNN), Naive Bayes, support vector machines (SVM), and artificial neural networks. The techniques described below utilize artificial neural networks. The following descriptions will focus on artificial neural networks or neural network models.

[0036] Artificial neural networks are statistical learning algorithms that mimic biological neural networks. Various neural network models are being researched. Recently, deep learning networks (DNNs) have attracted considerable attention. A DNN is an artificial neural network model consisting of multiple hidden layers between the input and output layers. Similar to general artificial neural networks, DNNs can model complex nonlinear relationships. Various types of DNN models have been studied. Examples of DNNs include convolutional neural networks (CNNs), recurrent neural networks (RNNs), restricted Boltzmann machines (RBMs), deep belief networks (DBNs), generative adversarial networks (GANs), relational networks (RL), and so on.

[0037] An analytical device is a tool that identifies specific tumor neoantigens in patient samples. It predicts T-cell activation associated with specific peptide-MHC. The device can process and analyze data using installed programs or code.

[0038] Figure 1 This is an example of a system 100 for predicting T cell activation of peptide-MHC. Figure 1 In the analysis, devices 130, 140, and 150 predicted T cell activation. Figure 1 In the diagram, the analysis equipment is shown in the form of server 130 and computer terminals 140 and 150. However, the analysis equipment 130, 140, and 150 can be implemented in various forms.

[0039] Gene analysis device 110 generates genetic data by analyzing a patient's sample. For example, gene analysis device 110 may be an NGS analysis device. Since peptides expressed by tumor cells are the analysis targets, gene analysis device 110 can perform exome sequencing. Detailed description of whole-exome sequencing is omitted. Gene analysis device 110 can store the generated genetic data in a separate DB 120.

[0040] Server 130 receives genetic data from gene analysis device 110 or DB 120. Server 130 provides services for predicting T cell activation by analyzing the genetic data.

[0041] Computer terminal 140 receives genetic data from gene analysis device 110 or DB 120. Computer terminal 140 analyzes the genetic data to predict T cell activation.

[0042] Computer terminal 150 receives genetic data through a medium (e.g., Universal Serial Bus (USB), Secure Digital (SD) card, etc.) in which genetic data generated by gene analysis device 110 is stored. Computer terminal 150 analyzes the genetic data to predict T cell activation.

[0043] Analysis devices 130, 140, and 150 predict the T cell activation level of the peptide-MHC currently being analyzed, and when T cell activation is greater than or equal to a threshold, the current peptide can be identified as a neoantigen candidate. Analysis devices 130, 140, and 150 can input information about the peptide-MHC into a previously constructed neural network model to predict the level of T cell activation. The process of using neural network models with analysis devices 130, 140, and 150 to predict T cell activation of a specific peptide-MHC is described below.

[0044] Users 10, 20, and 30 may be researchers or medical personnel developing neoantigens and vaccines. Users 10, 20, and 30 can confirm the degree of T cell activation of specific peptide-MHC in the sample. Additionally, users 10, 20, and 30 can identify effective neoantigens in the sample. User 10 can access server 130 and confirm the analysis results performed by server 130 via a user terminal (PC, smartphone, etc.). User 20 can confirm the analysis results via computer terminal 140 used by user 20. User 30 can confirm the analysis results via computer terminal 150 used by user 30.

[0045] Figure 2 This is an example of the process of developing personalized cancer vaccines (200). Figure 2 This includes the process of using neural network models to predict T cell activation through analytical devices and discovering new antigens based on the prediction results.

[0046] The analysis device constructs a neural network model by training it with training data (210). The process of training the neural network model is described below. The process of training the neural network model can be performed by a separate computer device other than the analysis device. That is, the object that trains the neural network model and the object that uses the neural network model for analysis can be different from each other.

[0047] The analytical device receives the results of gene sequence analysis (genetic data) from patient samples (e.g., tumor tissue). The analytical device can identify tumor-specific mutation sequences in the genetic data. The analytical device can identify mutation sequences in tumor tissue sequences based on normal tissue sequences or reference sequences. The analytical device can identify tumor-specific mutation sequences as tumor-specific antigens (220). That is, the analytical device can select tumor-specific mutation sequences as neoantigen candidates. The analytical device can select multiple neoantigen candidates. It is assumed and described that multiple neoantigen candidates have been identified.

[0048] The analytical device predicts T cell activation by selecting specific candidates from a pool of neoantigen candidates. It can use genetic data to identify the gene sequences of specific candidates. Based on these gene sequences, the device can determine the amino acid sequences of the candidate antigens. Additionally, the device can use genetic data to identify the patient's MHC sequence. Based on this MHC sequence, the device can determine the amino acid sequences of the MHC.

[0049] The analysis device uses a previously constructed neural network model to predict T cell activation for candidate antigens (230). The analysis device predicts T cell activation by inputting information about the candidate antigen into the neural network model. The analysis device can predict T cell activation by inputting the amino acid sequence of the candidate antigen and the amino acid sequence of the MHC into the neural network model. The analysis device can generate a matrix representing the interaction or affinity between the candidate antigen and the MHC based on the amino acid sequence of the candidate antigen and the MHC. The analysis device can predict T cell activation of the candidate antigen by inputting the generated matrix into the neural network model. The operation of the neural network model is described below.

[0050] The neural network model outputs information about whether an input candidate antigen is present in T cells. For example, the neural network model can output whether T cells are active or inactive in response to the candidate antigen being analyzed. T cell activity can be determined by whether the activity is greater than or equal to a certain threshold.

[0051] The analytical device determines whether T cell activation is greater than or equal to a threshold (240). Simultaneously, it can determine T cell activity based on the amount of cytokines secreted by T cells. When T cell activation for the current candidate antigen is greater than or equal to the threshold (YES in 240), the analytical device adds the current candidate antigen (peptide) to the target candidate group (250). The target candidate group consists of tumor-specific neoantigen candidates that may become targets for immunotherapy. The analytical device can confirm whether the prediction of T cell activation for a specific antigen of the sample has been completed (260). When the prediction of T cell activation for a candidate antigen of the sample has not been completed (NO in 260), the analytical device repeats the process of selecting the next specific antigen from the candidate antigens that does not predict T cell activation (270) and determining the T cell activation of the corresponding antigen. The analytical device even performs the process of extracting the target candidate group.

[0052] Once the prediction of T-cell activation for all candidate antigens is complete, researchers can conduct additional validation experiments on the current target candidate group (280). Furthermore, researchers can design vaccines targeting neoantigens with high T-cell activation for current patients (290). Cancer vaccines produced through this process are patient-specific and target only tumor cells.

[0053] The process of constructing the aforementioned neural network model will be described. The researchers described the process of constructing an actual neural network model.

[0054] Researchers collected information on peptide-MHC (pMHC) from open databases, including IEDB, etc. They collected pMHC data from humans and mice from IEDB, IMMA2, MHCBN, and various other sources. In addition, they collected information on pMHC-related T cell activation. T cell activation is determined based on the amount of cytokines secreted by T cells. More specifically, they assessed T cell activation for specific pMHCs based on the amount of interferon-gamma (IFNγ) secreted by T cells. For this purpose, they selected data on IFNγ secretion levels from T cells with the corresponding pMHCs from open database pMHC data.

[0055] The data collected by the researchers included HLA type and peptide length. MHC class I peptides were 9-mers in length, and MHC class II peptides were 15-mers in length. In addition, immunogenicity marker values ​​for each pMHC were used.

[0056] Unlike MHC I, HLA-DP and HLA-DQ are heterodimers in MHC II, therefore experimental data are presented as HLA-DQA / HLA-DQB pairs. As described below, because the molecular distance between the antigen and the MHC was used as training data, researchers only used the β-chain that directly acts on the antigen. The training data were adjusted to balance antigen and non-antigen data. Finally, researchers prepared 13,128 MHC I data and 6,650 MHC II data. The data collected by researchers are shown in Table 1 below. In Table 1, "Individual Study" refers to data obtained through individual studies. Immunogenic peptides refer to tumor-specific neoantigens.

[0057] [Table 1]

[0058] type source Total peptides Immunogenic peptides peptide length Human - MHC Class I IEDB 15925 4045 9 Human - MHC Class I Individual research 2613 557 8~11 Human - MHC Class I IMMA2 1085 558 9 Mouse-MHC Class I IEDB 4373 1162 9 Mouse-MHC Class I MHCBN 324 260 9 Mouse-MHC Class I Individual research 258 165 9~15 Human MHC Class II IEDB 6841 3968 15 Human MHC Class II MHCBN 1014 416 15 Mouse-MHC Class II IEDB 3461 452 15 Mouse-MHC Class II MHCBN 128 125 15

[0059] Figure 3 This is an example of the process of training a neural network model (300).

[0060] Peptide DBA can store information related to peptide-MHC. Figure 3 Only open databases such as IEBD are shown. However, training data can include both open databases and data obtained by researchers (developers) through separate experiments. Peptide DB A can be a device existing on a network. Alternatively, peptide DB A can be a device connected to or embedded in computer device B.

[0061] Computer device B can use training data to learn a neural network model. Computer device B can be an analytical device for predicting T cell activation or a dedicated device for training.

[0062] Computer device B extracts training data (310) from peptide DB A. For example... Figure 3 As shown, training data can include MHC class, amino acid sequences of antigen candidates, and the amount of IFNγ secreted by T cells for the corresponding peptide-MHC. IFNγ secretion can be categorized into cases greater than or equal to a threshold (high, H) and cases less than a threshold (low, L), with the threshold serving as a reference for determining T cell activation.

[0063] Computer device B generates a matrix (320) representing the interactions between peptides and MHCs of a specific antigen to be analyzed. Interactions can refer to the affinity between a peptide and an MHC. The process of generating the matrix is ​​described below.

[0064] Computer device B predicts T cell activation of the current peptide-MHC by inputting the generated matrix into a neural network model. The neural network model outputs information indicating T cell activation (high IFNγ secretion) or inactivity (low IFNγ secretion) of the peptide-MHC. Computer device B updates the weights (330) of the neural network model based on the current input label value (IFNγ secretion) of the peptide-MHC. The neural network model can be learned through a backpropagation process.

[0065] Researchers used a CNN model to predict T cell activation of peptide-MHC. Of course, other neural network models besides CNN can also be used to predict T cell activation. This paper will focus on describing neural network models using CNN.

[0066] CNNs can include convolutional layers (Conv), pooling layers, and fully connected layers (FC). Multiple convolutional and pooling layers can be repeatedly arranged.

[0067] CNN model 400 predicts peptide-MHC binding levels based on input data (interaction graph). CNN model 400 includes multiple convolutional layers 410 and 420, a fully connected layer 430, and an output layer 440. (The text abruptly ends here.) Figure 5 As shown, a convolutional layer can consist of two layers.

[0068] Convolutional layers perform convolution operations on the input data and output values ​​obtained by applying the Corrected Linear Unit (ReLU) function to the convolution values. The convolution operation is the multiplication of the input value by a weight matrix. The weights can be updated during the training process. Convolutional layers extract peptide-MHC interaction features. Simultaneously, the input data can include parameters representing the degree of interaction between amino acid pairs.

[0069] Fully connected layers integrate input information. They receive the output values ​​from convolutional layers as input. Fully connected layers can perform ReLU operations.

[0070] The output layer uses the sigmoid function to output information about the level of T cell activation or whether T cells are active for a given peptide-MHC.

[0071] The final output value of the neural network model can be a value between 0 and 1. Analysis devices can determine whether T cells are active or inactive by comparing the output value of the neural network model with a threshold.

[0072] Based on Figure 3 The model shown will be explained further.

[0073] Convolutional layers use a specific number of kernels or weight matrices to perform convolution. Convolution can be a one-dimensional or two-dimensional operation, etc. All convolution results are transformed by ReLU. ReLU converts negative values ​​to zero. Figure 3 Two convolutional layers are shown.

[0074] The first convolutional layer detects combined patterns in the input data. The first convolutional layer can use a window with a stride of 1. The operation of the convolutional layer is shown in Equation 1 below. The second convolutional layer can have the same structure as the first convolutional layer. Alternatively, the second convolutional layer can have a different window size or stride width than the first convolutional layer.

[0075] [Equation 1]

[0076]

[0077] X represents the input data, i represents the index indicating the output position, and k represents the kernel index. Each convolution kernel W k This corresponds to a weight matrix of size M×N. M represents the window size, and N represents the number of input channels.

[0078] Pooling layers can be omitted. Pooling is a process of reducing the dimensionality of data. Even amino acids that are far apart can affect the interaction between the peptide-MHC complex and the T cell receptor. Therefore, CNNs can extract features without using pooling layers while maintaining the size of the input data.

[0079] A fully connected (FC) layer takes all the outputs of the second convolutional layer as input. The FC layer integrates the input values ​​from the outputs of the previous layer. The FC layer performs the ReLU(WX) function. X represents the input value, and W represents the weight matrix of the FC layer.

[0080] The output layer can output a value between 0 and 1 based on the sigmoid function. The output value represents the active (H) or inactive (L) T cells. The output layer executes the sigmoid function Sigmoid(WX). X represents the input value, and W represents the weight matrix of the output layer. Alternatively, the output layer can use activation functions other than sigmoid, such as softmax or ReLU.

[0081] A CNN model is trained in the direction that minimizes the objective function. The training process corresponds to the process of optimizing the weights used in the CNN model. For example, weight optimization can be achieved using gradient descent.

[0082] The objective function is defined as the sum of the negative log-likelihood (NLL) and the regularization term. The objective function of a CNN model can be expressed as Equation 2 below.

[0083] [Equation 2]

[0084]

[0085]

[0086] s represents the index of the training data. t represents the index of the interaction feature. Y t s The label value (0 or 1) representing T cell activation in training data s. t (X s ) represents the neural network model's response to input data X. s The results of T cell activation prediction.

[0087] Meanwhile, MHC I and MHC II have different functional characteristics and different protein lengths. Therefore, separate neural network models are needed for MHC I and MHC II. Researchers also constructed separate neural network models using training data for MHC I and MHC II respectively.

[0088] The neural network model receives a matrix of peptide-MHCs. The computer device generates the peptide-MHC matrix during training. The analytical device generates a matrix for each peptide-MHC during analysis. Figure 4 This is an example of a process (400) that generates a matrix representing peptide-MHC interactions. For ease of description, in Figure 4 It is assumed that computer device B generates the matrix. The computer device can be a PC, server, etc.

[0089] Computer device B receives the amino acid sequence of the MHC peptide (410). The computer device may receive the amino acid sequence via an input device, storage medium, or communication. The amino acid sequence is the amino acid sequence of the MHC and the amino acid sequence of the antigen. Alternatively, the computer device may pre-store a specific MHC amino acid sequence and receive only the amino acid sequence of the antigen.

[0090] Computer device B generates a matrix of MHC amino acid sequences and antigen amino acid sequences. In the MHC amino acid sequence, each amino acid can be sequentially labeled from 1 to n. Similarly, in the antigen amino acid sequence, each amino acid can be sequentially labeled from a to z.

[0091] Computer device B determines the interaction value of each amino acid pair between the amino acid sequence of the MHC (referred to as the first amino acid sequence) and the amino acid sequence of the antigen (referred to as the second amino acid sequence). For example, the computer device determines the interaction value of amino acid 1 of the first amino acid sequence and amino acid a of the second amino acid sequence. In this way, the computer device determines the interaction value of all amino acid pairs that can be composed of the first amino acid sequence and the second amino acid sequence.

[0092] Computer device B can determine the interaction values ​​for specific amino acids by referring to previously known protein structures. Protein structure database A stores information about the structures of previously known proteins. Protein structure database A can store information about the amino acids that make up the protein structure and the distances between them. Protein structure database A can store information about multiple protein structures.

[0093] Computer device B can determine the distance (proximity) between a specific first amino acid in the first amino acid sequence and a specific second amino acid in the second amino acid sequence by referring to protein structure DB A. Protein structure DB A can store distance information for multiple identical amino acid pairs. Computer device B can determine the interaction value of the first amino acid-second amino acid pair based on various criteria. For example, (i) computer device B can determine the interaction value of the first amino acid-second amino acid pair as the average distance of the first amino acid-second amino acid pair in protein structure DB A. (ii) computer device B can determine the interaction value of the first amino acid-second amino acid pair based on the proximity frequency of the first amino acid-second amino acid pair in protein structure DB A. Computer device B can determine that when the first amino acid-second amino acid pair in protein structure DB A is within a predetermined reference distance in secondary or tertiary space, the corresponding amino acid pair is close. Now, (ii) computer device B can determine the interaction value of the first amino acid-second amino acid pair based on the proximity frequency of the first amino acid-second amino acid pair in protein structure DB A. Computer device B can determine the interaction value as the number of times the first amino acid and second amino acid in protein structure DB A approach each other. Alternatively, computer device B can determine the interaction value by processing the proximity frequency between the first amino acid and second amino acid in protein structure DB A.

[0094] The interaction values ​​between amino acid pairs can be determined in units of regions into which the protein structure is continuously divided. The interaction values ​​between amino acids can also be determined based on the distances between Cα (α-carbon) atoms present in the protein structure.

[0095] Computer device B extracts proximity information (distance or proximity frequency, etc.) for specific amino acid pairs by referencing protein structure DB A (420). Computer device B generates a matrix (430) by determining the interaction value of each amino acid pair constituting the first and second amino acid sequences. This matrix represents the interaction of amino acid sequences and can also be called an interaction matrix. The matrix of the first and second amino acid sequences consists of information representing the degree of interaction (affinity or proximity) of each amino acid pair.

[0096] Instances of interaction diagrams are in Figure 4The bottom of the diagram is shown. The interaction diagram is a two-dimensional matrix with horizontal and vertical axes. The horizontal axis corresponds to the amino acid sequences of the MHC labeled 1 to n, and the vertical axis corresponds to the amino acid sequences of the antigen labeled a to z.

[0097] The matrix includes the interaction value for each pair of amino acids. The interaction value can be numerical. Alternatively, the matrix can be a graph where the degree of interaction is represented by a constant color.

[0098] Meanwhile, the amino acid sequence length of the antigen may vary based on the source data or MHC class. Therefore, the computing device can fill the matrix based on the maximum input data.

[0099] Figure 5 This is an example of a process used to predict T cell activation of peptide-MHC 500.

[0100] The analytical device receives the genetic data (510) of a sample. The sample may be tissue from a patient with a specific tumor. The genetic data may include information about multiple antigens. For ease of description, it will be described based on a peptide-MHC.

[0101] Simultaneously, the analysis device can select a previously constructed neural network model based on the MHC category. As mentioned above, different neural network models can be prepared based on the MHC category. Accordingly, the analysis device can select a matching neural network model based on the MHC category of the current analysis target, and then perform the analysis process.

[0102] The analytical device extracts the amino acid sequences of the MHC and antigens from genetic data. The device can use programs or models to predict MHC structure. For example, it can use HLAminer to predict HLA structure. Additionally, the device can use specific programs to identify the amino acid sequences of antigens from genetic data. For instance, it can detect the amino acid sequences of antigens by using the idfetch program to search for flanking amino acid sequences of non-synonymous mutations in the genetic data.

[0103] like Figure 4 As shown, the analytical device can generate a matrix (520) of the amino acid sequences of MHC and the amino acid sequences of the antigen.

[0104] The analysis device performs analysis by inputting the generated matrix into a neural network model (530). The analysis device can determine whether the current analysis target (T cell activation or inactivity) has been identified based on the information output (T cell activation or inactivity) on the matrix input to the neural network model (540).

[0105] The analysis device determines T cell activity by comparing the output of a neural network model with a threshold. Researchers constructed separate neural network models for MHC I and MHC II. Using the training data described in Table 1, the neural network model outputting a value greater than 0.5 for MHC I determined T cell activation, while the MHC II neural network outputting a value greater than 0.7 determined T cell activation.

[0106] Furthermore, when T cells with peptide-MHC are active, the analytical device can identify the antigen as a target candidate for anti-cancer vaccines.

[0107] Figure 6 This is an example of an analytical device 600 used to predict T cell activation by peptide-MHC. The analytical device 600 is compatible with... Figure 1 The corresponding device for analysis equipment 130, 140 or 150.

[0108] The analytical device 600 can use the aforementioned neural network model to predict the degree of peptide-MHC binding. The analytical device 600 can be physically implemented in various forms. For example, the analytical device 600 can take the form of a PC, a smart device, a web server, or a computer device with only a data processing chipset.

[0109] The analysis device 600 may include a storage device 610, a memory 620, a computing device 630, an interface device 640, a communication device 650, and an output device 660.

[0110] Storage device 610 stores a neural network model predicting the level of T cell activation. The neural network model is as described above. The neural network model should be trained in advance. The neural network model can output the amount of cytokine secretion corresponding to the measure of T cell activation. For example, the neural network model can output the amount of IFNγ secreted by T cells. The neural network model can output information indicating whether T cells are activated (secreting large amounts of IFNγ) or inactive (secreting small amounts of IFNγ or not secreting IFNγ).

[0111] In addition, the storage device 610 can store programs, source code, etc. required for data processing.

[0112] Storage device 610 can store input genetic data. Storage device 610 can store the amino acid sequence of the antigen to be analyzed. Storage device 610 can store the amino acid sequence of the MHC to be analyzed.

[0113] Storage device 610 can store programs for identifying sequences of MHC and / or antigens from genetic data.

[0114] Storage device 610 can store the T cell activation level of a specific peptide-MHC as an analytical result. Storage device 610 can also store the aforementioned neoantigen candidates.

[0115] The storage device 620 can store data and information generated during the analysis of T cell activation by the analysis device 600.

[0116] Interface device 640 is a device for receiving predetermined commands and data from the outside. Interface device 640 can receive the patient's genetic data from a physically connected input device or an external storage device.

[0117] Alternatively, the interface device 640 may receive the amino acid sequence of the MHC and / or the amino acid sequence of the antigen to be analyzed.

[0118] The interface device 640 can receive a learning model for data analysis. The interface device 640 can also receive training data, information, and parameter values ​​for training the learning model.

[0119] The interface device 640 can receive the distance or proximity frequency of a specific amino acid pair in the protein structure DB.

[0120] Communication device 650 refers to a configuration for receiving and transmitting predetermined information via a wired or wireless network. Communication device 650 can receive genetic data from an external object. Communication device 650 can also receive data used for training models. Communication device 650 can receive the amino acid sequence of MHC and / or the amino acid sequence of an antigen to be analyzed.

[0121] The communication device 650 can transmit the analysis results of the input sample to an external object. The analysis results could be T cell activation of a specific peptide-MHC. Alternatively, the analysis results could be whether the corresponding peptide is a neoantigen candidate in a specific peptide-MHC.

[0122] The communication device 650 can receive the distance or proximity frequency of a specific amino acid pair in the protein structure DB.

[0123] The communication device 650 or interface device 640 is a device for receiving predetermined data or commands from the outside. The communication device 650 or interface device 640 may be referred to as an input device.

[0124] Output device 660 is a device for outputting predetermined information. Output device 660 can output interfaces, analysis results, etc., required for the data processing process.

[0125] The computing device 630 can identify the first amino acid sequence of MHC and the second amino acid sequence of antigens produced by tumor cells from genetic data. The computing device 630 can use a specific program to identify the first amino acid sequence and / or the second amino acid sequence from the genetic data.

[0126] As described above, the computing device 630 can generate a matrix of a first amino acid sequence and a second amino acid sequence by referring to known protein structure information. The computing device 630 can calculate the distance or proximity frequency of a specific amino acid pair to be evaluated by referring to previously known protein structures from the protein structure database. The computing device 630 can determine the interaction value based on the distance or proximity frequency of the specific amino acid pair.

[0127] The computing device 630 can predict the presence of T cell activation for a specific peptide-MHC by inputting an interaction matrix into a neural network model. The computing device 630 can predict the amount or presence of IFNγ secreted by T cells with a specific peptide-MHC. Furthermore, when T cell activation for a specific peptide-MHC is high, the computing device 630 can identify the corresponding peptide as a neoantigen candidate.

[0128] The computing device 630 may be a device such as a processor, an AP, or a chip embedded with a program that processes data and performs predetermined calculations.

[0129] The experimental results verifying the effectiveness of the above-mentioned T-cell activation method are described below.

[0130] Researchers performed an ELISPOT analysis on EMT6 to validate the neural network model. They selected mutations with a variant allele frequency (VAF) greater than 0.3 from the genes in the sample. Based on the neural network's prediction scores, they selected the 25 peptides with the highest scores (neoantigen candidates) and the 5 peptides with the lowest scores (control group).

[0131] Researchers performed ELISPOT analyses on each of the 25 and 5 peptides. They also performed ELISPOT analyses on H2-Dd / H2-Ld (allele class 1) and H2-IAd, H2-IEd (allele class 2). The researchers calculated the ELISPOT results (ELISPOT.count) for all 30 peptides. Furthermore, a computer model used to measure peptide-MHC binding was used as a reference model. NetMHCIIpan was used as the reference model.

[0132] Figure 7 This is an example of experimental results that validate a neural network model. Figure 7 The ELISPOT analysis results for the above 30 peptides are shown. Figure 7 The peptides are shown arranged in ascending order of their ELISPOT.count values. That is, peptides with higher T cell activation are located at... Figure 7 The right side of the image. Figure 7The lower half of the diagram shows the prediction results of the aforementioned neural network model (labeled as the target) and the reference model (labeled as the reference). White blocks represent non-immunogenic peptides, and shaded blocks represent immunogenic peptides. Observation Figure 7 The results show that many reference models used in conventional studies fail to accurately predict actual T cell activation. In contrast, the neural network model described above exhibits high accuracy overall, except for two peptides in the control group (non-immunogenic). Therefore, it can be seen that the neural network model developed by the researchers demonstrates significantly superior performance compared to traditionally widely used computer models.

[0133] The data previously collected by the researchers is shown in Table 1. The researchers selected some of the collected data as training data, training models for MHC I and MHC II respectively. In addition, the researchers used some data as validation data. They selected 13,128 data points for MHC I and 6,650 data points for MHC II. The selected data were then divided into training and validation data in a 7:3 ratio.

[0134] The accuracy of the neural network model was verified by comparing the results calculated by a neural network trained on human or mouse peptide-MHC pairs with known experimental values.

[0135] Figure 8 This is another example of experimental results validating neural network models. The neural network models were trained separately for MHC I and MHC II. Figure 8 A represents the experimental results for MHC I. The validation result shows that the area under the curve (AUC) of the MHC I neural network model is 0.7787. Figure 8 B represents the experimental results of MHC II. The AUC of the MHC II neural network model is 0.8083. Therefore, it can be said that the prediction accuracy of the developed neural network model is quite high.

[0136] Furthermore, the methods for predicting T cell activation or discovering neoantigens as described above can be implemented as a program (or application) including an executable algorithm that can be executed on a computer. This program can be stored and provided in a non-transitory computer-readable medium.

[0137] Non-transitory computer-readable media are not media in which data is temporarily stored, such as registers, caches, and memories, but rather media in which data is stored semi-permanently and can be read by a device. Specifically, the various applications or programs described above can be provided by storing data on non-transitory readable media, such as optical discs (CDs), digital video discs (DVDs), hard disks, Blu-ray discs, USB, memory cards, read-only memory (ROM), programmable read-only memory (PROM), erasable PROM (EPROM), electrical EPROM (EEPROM), or flash memory.

[0138] Temporarily readable media refers to various types of RAM, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchronous link DRAM (SLDRAM), and direct rambus RAM (DRRAM).

[0139] This embodiment and the accompanying drawings clearly illustrate only some of the technical ideas included in the above-described technology. Therefore, it is obvious that all modifications and specific embodiments that can be readily deduced by those skilled in the art within the scope of the technical spirit included in the above-described technology are included within the scope of the above-described technology.

Claims

1. A method for predicting T cell activation of the peptide-major histocompatibility complex (MHC), comprising: The device receives the patient's genetic data; The analysis device was used to identify the first amino acid sequence of the MHC and the second amino acid sequence of the antigen produced by tumor cells based on the genetic data. The analysis device generates a matrix indicating the relationship between the first amino acid sequence and the second amino acid sequence, on a per-amino acid basis; as well as The matrix is ​​input into a trained neural network model using the analysis device to determine whether the T cell secretes cytokines greater than or equal to a threshold due to the binding of MHC to the antigen. The neural network model is trained using training data. The training data includes amino acid sequence pairs of MHC-neoantigens as input values ​​and the amount of cytokine secreted by T cells for each amino acid sequence pair as label values. The neural network model outputs the level of cytokine secretion by T cells based on the interaction between the MHC and the antigen; and The cytokine mentioned is interferon-γ.

2. The method of claim 1, wherein the matrix comprises, for each amino acid pair between the first amino acid sequence and the second amino acid sequence, the proximity of the amino acid pairs in the actual protein structure based on previously known protein structure information.

3. The method according to claim 1, wherein when the output of the neural network model is that cytokine secretion is greater than or equal to the threshold, the analysis device identifies the antigen as a target candidate for an anticancer vaccine.

4. The method according to claim 1, wherein the neural network model is a convolutional neural network (CNN).

5. An analytical device for predicting T cell activation of the peptide-major histocompatibility complex (MHC), comprising: The input device is configured to receive the patient's genetic data; The storage device is configured to store a neural network model that predicts the amount of cytokine secreted by T cells based on a matrix representing the relationship between the amino acid sequences of MHC and the amino acid sequences of antigens produced by tumor cells. as well as The computing device is configured to identify a first amino acid sequence of the MHC and a second amino acid sequence of the antigen produced by the tumor cells from the genetic data, generate a matrix representing the relationship between the first amino acid sequence and the second amino acid sequence in units of individual amino acids, and input the generated matrix into the neural network model to determine whether the patient's MHC-antigen induces the secretion of interferon-γ by the T cells; The neural network model is trained using training data. The training data includes amino acid sequence pairs of MHC neoantigens as input values ​​and the amount of interferon-γ secreted by T cells for each amino acid sequence pair as a label value; and The neural network model described therein outputs the level of interferon-γ secretion by T cells in response to the interaction between MHC-antigen binding.

6. The analytical apparatus of claim 5, wherein the matrix comprises, for each amino acid pair between the first amino acid sequence and the second amino acid sequence, the proximity of amino acid pairs in the actual protein structure based on previously known protein structure information.

7. The analytical apparatus of claim 5, wherein when the output of the neural network model is that interferon-γ secretion is greater than or equal to a threshold, the analytical apparatus identifies the antigen as a target candidate for an anticancer vaccine.

8. The analysis device according to claim 5, wherein the neural network model is a convolutional neural network (CNN).