A method for predicting T lymphocyte antigen binding based on residual graph attention
By using a residual graph attention network and a difficult negative sampling mechanism, combined with pairwise AUC loss and a multi-optimizer collaborative strategy, the problems of class imbalance, gradient vanishing, and training efficiency in T lymphocyte antigen binding prediction are solved, achieving more efficient prediction accuracy and robustness.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- LUDONG UNIVERSITY
- Filing Date
- 2026-04-28
- Publication Date
- 2026-06-30
AI Technical Summary
Existing technologies for predicting T-lymphocyte antigen binding suffer from class imbalance, vanishing gradient, conflicts in multi-loss joint optimization, insufficient utilization of difficult negative samples, and low training computation efficiency, resulting in insufficient prediction accuracy and efficiency.
We employ a residual graph attention network to enhance feature transfer, introduce a difficult negative sampling mechanism and a pairwise AUC loss function, and combine a multi-optimizer collaborative strategy to optimize the training process through positive sample weight pre-computation.
It significantly improves the accuracy and robustness of T lymphocyte-antigen binding prediction, enhances training efficiency, and provides an efficient and reliable computational tool for tumor neoantigen screening and personalized immunotherapy.
Smart Images

Figure CN122117008B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of bioinformatics, specifically relating to a method for predicting T lymphocyte antigen binding based on residual graph attention. Background Technology
[0002] The specific recognition of antigens by T lymphocytes is a core mechanism of the adaptive immune system. T lymphocytes initiate the subsequent immune response by precisely binding their complementarity-determining regions (CDMs) to the antigen peptide-MHC complex. High-precision prediction of T lymphocyte-antigen binding specificity is crucial for novel vaccine design, tumor neoantigen screening, and personalized immunotherapy. Traditional experimental methods such as enzyme-linked immunosorbent assays (ELISA) and surface plasmon resonance (SPR) are accurate and reliable, but they suffer from limitations such as low throughput, long processing times, and high costs. With the rapid development of high-throughput sequencing technology, immune repertoire data are experiencing explosive growth, necessitating efficient and accurate computational prediction methods.
[0003] In recent years, graph neural networks (GNNs) have been successfully introduced into the field of biomolecular interaction prediction due to their powerful relation modeling capabilities. The GTE model was the first to model T lymphocytes, antigenic peptides, and MHC molecules as nodes in a heterogeneous graph, and associated events as edges, using graph convolutional networks for edge classification and prediction. However, existing methods mostly employ static, homogeneous neighborhood aggregation, lacking the ability to fine-tune the modeling of the differences between nodes and interaction relationships in the graph. While graph attention networks can aggregate neighbor information through attention coefficient weighting, they face the gradient vanishing problem when stacked in deep layers, and the biological interpretation of attention weights lacks systematic research.
[0004] In terms of model training, current technologies face four major challenges: First, extreme class imbalance. Positive binding pairs typically account for less than 10% of the dataset, and traditional cross-entropy loss is easily dominated by the majority class, causing the model to neglect minority class learning. While focus loss can alleviate this problem, the setting of its key hyperparameters α and γ relies on human experience and lacks adaptive adjustment strategies. Second, mismatch between loss functions and evaluation objectives. T lymphocyte binding prediction to antigen peptides is essentially a ranking problem, and AUC (Area Under ROC Curve) is the core evaluation metric. However, most existing models are trained with cross-entropy loss, whose optimization objective is inconsistent with AUC. Although AUC loss can directly optimize ranking performance, its use alone is extremely unstable in the early stages of training. Third, gradient conflict during joint optimization of multiple losses. When optimizing focus loss and AUC loss simultaneously, their gradient directions may conflict. Single optimizers are difficult to coordinate: Adam is highly adaptive but inefficient at handling AUC gradients, PESG is tailored for AUC but lacks global coordination capabilities, and SGD is highly exploratory but converges slowly. Current technologies lack training mechanisms that leverage the advantages of multiple optimizers. Fourth, there is the issue of negative sample quality. Simple negative samples contribute limitedly to training; the truly valuable ones are difficult negative samples with features similar to positive samples. Existing random sampling methods struggle to guarantee coverage of difficult negative samples, hindering the improvement of the model's discriminative ability. Furthermore, in the engineering implementation of focus loss, existing methods repeatedly calculate the number of positive and negative samples and the weight of positive samples in each training iteration. While this dynamic calculation method reflects the data distribution of each batch, it significantly increases computational overhead, especially leading to low training efficiency with large-scale immune repertoire data. In summary, existing technologies have significant shortcomings in terms of the refinement of feature interaction modeling, the efficiency of gradient propagation in deep networks, the training coordination of multi-loss joint optimization, the utilization of difficult negative samples, and training computation efficiency. Therefore, a predictive method that can collaboratively innovate at both the model architecture and training strategy levels is urgently needed.
[0005] To address the aforementioned issues, a novel T-lymphocyte antigen binding prediction method based on residual graph attention innovates in both model architecture and training strategy. At the model level, residual connections are introduced to construct a deep graph attention network, promoting gradient flow and feature reuse. At the training level, a difficult negative sampling mechanism is designed to screen high-confidence negative samples, and a pairwise AUC loss function is introduced to directly optimize ranking performance. Simultaneously, a multi-optimizer collaborative strategy is combined to balance multi-objective optimization. In particular, a positive sample weight pre-computation method is proposed, calculating and fixing positive sample weights once outside the training loop, avoiding repetitive calculations in each iteration and significantly improving training efficiency. Summary of the Invention
[0006] To address the technical problems of low deep feature transfer efficiency, training bias due to extreme class imbalance, insufficient ranking performance optimization, and redundant training computation in existing graph attention networks, this invention proposes a T-lymphocyte antigen binding prediction method based on residual graph attention. This method enhances feature transfer efficiency through a two-layer graph attention network with residual connections, introduces a difficult negative sampling strategy to improve the model's ability to discriminate boundary samples, and directly optimizes ranking performance using a pairwise AUC loss function. Simultaneously, it incorporates a multi-optimizer collaborative mechanism to achieve multi-objective optimization. Regarding training efficiency, a positive sample weight pre-calculation strategy is adopted, calculating and fixing weights based on the global number of positive and negative samples outside the training loop to avoid repetitive calculations in each round. The technical solution includes the following four steps:
[0007] Step 1: Input the node feature matrix. After processing, add the first layer output features and the second layer output features according to the element positions to achieve residual connection. Pass the residual-connected features through a linear correction unit and random discarding to obtain the final node representation vector. Concatenate the representation vector along the feature dimension and input it into a multilayer perceptron. After linear transformation and activation by the linear correction unit, output a single scalar, which is then converted into a combination probability through the Sigmoid function.
[0008] Step 2: Before training begins, calculate and fix the weights of positive samples based on the number of positive and negative samples in the entire training set; fuse the focus loss, AUC optimization loss, and pairwise AUC loss as the total loss function; and sequentially call the adaptive moment estimation optimizer, the AUC-specific optimizer, and the stochastic gradient descent optimizer with momentum to update the parameters.
[0009] Step 3: Using a five-fold cross-validation evaluation model, extract the attention weights of the last layer of the graph attention network, map them back to the amino acid residue positions of the original sequence, screen out key residue interaction pairs by setting a contribution threshold, and output a structured report containing the sequence, predicted probability, key residue pairs and contribution score.
[0010] Step 4: After determining the optimal hyperparameter configuration, retrain the final model; during deployment, the system receives the user-input T lymphocyte sequence and antigen peptide sequence, automatically executes the entire process, and outputs the predicted binding probability, binary classification suggestions, and a list of key residue interaction pairs.
[0011] This method significantly improves the accuracy and robustness of T lymphocyte-antigen binding prediction through the synergistic optimization of residual graph attention network, difficult negative sampling and paired AUC loss, and positive sample weight pre-computation strategy, while also improving training efficiency. It provides an efficient and reliable computational tool for tumor neoantigen screening, personalized immunotherapy strategy formulation and infectious disease vaccine development. Attached Figure Description
[0012] Figure 1 This is a flowchart of a method for predicting T lymphocyte antigen binding based on residual graph attention.
[0013] Figure 2 This is a schematic diagram of the residual graph attention network structure.
[0014] Figure 3 This is a schematic diagram of a multi-loss fusion and multi-optimizer collaborative strategy. Detailed Implementation
[0015] The present invention will be described in detail below with reference to the accompanying drawings and embodiments. The overall process of a T-lymphocyte antigen binding prediction method based on residual graph attention is as follows: Figure 1 It mainly includes the following four detailed steps:
[0016] Step 1: Residual Graph Attention Network Modeling. First, following the previous data processing and node feature construction methods (specifically including obtaining T lymphocyte and antigen peptide sequences from the pMTnet database, cleaning and standardizing them, extracting 768-dimensional semantic features using the TCRpeg model and combining them with 24-dimensional biophysical features to form a 792-dimensional node vector, extracting 1280-dimensional structural features using the ESM-2 model, and constructing positive and negative sample labels), and following the graph construction method (constructing an undirected heterogeneous network containing T lymphocyte nodes and antigen peptide nodes, with positive edges connecting known binding pairs and negative edges connecting randomly generated non-binding pairs, the graph structure remaining fixed during training), input preparation is completed. The core prediction model of a T lymphocyte antigen binding prediction method based on residual graph attention is a two-layer graph attention network (Residual Graph Attention Network, ResGAT) with residual connections, as shown below. Figure 2The schematic diagram of the residual graph attention network structure shows the following forward propagation process: First, the input node feature matrix X enters the first graph attention convolutional layer. This layer uses 8 independent attention heads to calculate the association strength between each node and all its neighbors through a self-attention mechanism. Then, it weights and sums the neighbor features based on the normalized attention coefficients to generate a new feature representation for the node. The output dimension is the size of the hidden layer (128 dimensions). The output of the first layer is processed by the Rectified Linear Unit (ReLU) activation function and dropout to obtain the first layer output features. Then, the first layer output features are input into the second graph attention convolutional layer. This layer also uses 8 attention heads, maintaining the 128-dimensional output, to further refine the features. The second layer output is processed by the Rectified Linear Unit and dropout to obtain the second layer output features. Next, residual connections are performed: the first layer output features and the second layer output features are added element-wise, meaning the residual connection feature is equal to the sum of the first and second layer output features. This design enables gradients to propagate directly through residual paths, effectively mitigating the vanishing gradient problem in deep networks. It also achieves the fusion and reuse of low-level and high-level features, enhancing the model's feature representation capabilities. Subsequently, the residual-connected features are activated again by a linear correction unit and randomly discarded to obtain the final node representation vector. Finally, for each pair of T lymphocyte-antigen peptide nodes to be predicted, the representation vectors of the two nodes are concatenated along the feature dimension and input into a multilayer perceptron (MLP). The structure of this MLP is as follows: a first-layer linear transformation (256-dimensional input, 256-dimensional output), a linear correction unit activation function, a second-layer linear transformation (256-dimensional input, 1-dimensional output), and the output is a single scalar representing the logarithmic probability of the binding strength, which is then converted to a binding probability between 0 and 1 using a sigmoid function. The introduction of residual connections allows the model to more effectively fuse multi-layer semantic information, improving its ability to capture complex immune recognition patterns.
[0017] Step 2: Joint optimization of multiple losses and difficult negative sampling strategy. For example... Figure 3As shown, this method comprises four parts: loss function design, hard negative sampling mechanism, positive sample weight pre-computation, and multi-optimizer collaborative strategy. Focus loss and positive sample weight pre-computation: Focus loss reduces the weight of easily classified samples by introducing a weight factor α and a focusing parameter γ, making the model focus more on the difficult-to-classify minority class samples. The optimal values of α and γ are automatically determined through grid search before training, selecting the parameter combination with the highest AUC on the validation set. Unlike existing methods that dynamically calculate positive sample weights within each training iteration, this residual graph attention-based T-lymphocyte antigen binding prediction method employs a positive sample weight pre-computation strategy: before the start of the training cycle, the positive sample weights are calculated once based on the total number of positive and negative samples in the entire training set, and this value is fixed throughout the training process. This design avoids repetitive statistics and calculations in each iteration, significantly reducing computational overhead; simultaneously, weights calculated based on global data distribution are more stable than dynamic weights based on mini-batch data, which is beneficial for model convergence. Hard negative sampling mechanism: To address the problem that traditional random negative sampling cannot provide effective supervision signals, this residual graph attention-based T-lymphocyte antigen binding prediction method designs a hard negative sampling function. This function selects several negative samples with the highest predicted probabilities (i.e., non-binding pairs that the model "misjudged" as having high binding probabilities) based on the model's current predicted probabilities and uses them as difficult negative samples in this round of training. Specifically, it calculates the predicted probabilities of all negative samples, sorts them in descending order, and selects samples whose quantity is the product of the number of positive samples and the proportion of difficult negative samples as difficult negative samples. This mechanism allows the model to continuously focus on boundary samples during training, effectively improving its ability to distinguish easily confused samples. Multi-optimizer collaborative strategy: To directly optimize ranking performance, a T-lymphocyte antigen binding prediction method based on residual graph attention introduces a pairwise AUC loss function. This loss maximizes the gap between the predicted scores of positive and negative samples by constructing positive-negative sample pairs. Its calculation formula is as follows: , where s i and s j These are the original model output values for positive and negative samples, respectively. To reduce computational complexity, this function randomly samples the positive and negative samples, controlling computational cost while maintaining optimization effectiveness. The total loss function is a weighted combination of the focus loss, AUC optimization loss, and pairwise AUC loss: λ1, λ2, and λ3 are adjustable weight coefficients. The focus loss handles class imbalance, the AUC optimization loss stabilizes ranking performance, and the pairwise AUC loss further enhances the discriminative power of boundary samples. Regarding optimizer collaboration, three optimizers are executed sequentially: the adaptive moment estimation optimizer Adam, the AUC-specific optimizer PESG, and the momentum-driven stochastic gradient descent optimizer SGD, achieving a triple collaboration of basic stable updates, AUC enhancement optimization, and momentum exploratory fine-tuning. The specific process is as follows: in each iteration, forward propagation is performed to calculate the total loss, then backpropagation is used to obtain gradient information, and finally, the parameter update methods of each optimizer are called sequentially according to the collaborative order, thereby achieving complementary advantages in multi-objective optimization.
[0018] Step 3: Model Validation and Interpretability Output. Five-fold cross-validation was used to evaluate model performance. Specifically, the pMTnet dataset was randomly divided into five non-overlapping subsets. Four subsets were used as the training set, and the remaining subset as the test set. This process was repeated five times to ensure each sample was evaluated as test data. The core evaluation metric was AUC. Model interpretability analysis was achieved by extracting the attention weights of the last layer of the graph attention network. During forward propagation, the attention coefficients corresponding to each edge were obtained. These attention coefficients were mapped back to the amino acid residue positions of the original sequence to identify the key CDR3 ring residues that contribute most to binding decisions and potential anchoring points on the antigenic peptide. Specifically, for each node, the attention coefficients of all its incoming edges were weighted and averaged to obtain the importance score for each residue in that node. By setting a contribution threshold (the top 10% of residues), key residue interaction pairs were selected to form a visualized binding hotspot map. The final output is a structured report containing T lymphocyte sequences, antigen peptide sequences, predicted binding probabilities, key residue pairs and their contribution scores, providing clear targets for subsequent biological experimental verification or rational design.
[0019] Step 4: Application Deployment and Result Evaluation. After determining the optimal hyperparameter configuration, the final model is retrained using all available training data. During deployment, the system receives the user-submitted T lymphocyte sequences and antigen peptide sequences, automatically executing the entire process of sequence preprocessing, feature extraction, graph construction, model inference, and interpretability analysis, outputting predicted binding probabilities, binary classification suggestions, and a list of key residue interaction pairs. After determining the optimal hyperparameter configuration, the final model is retrained using all available training data to ensure that the model fully utilizes all known information. During deployment, the system receives the user-submitted T lymphocyte sequences and antigen peptide sequences, automatically executing the complete prediction process: first, sequence preprocessing (cleaning, standardization, and length uniformity) is performed; then, the pre-trained model is called to extract features (TCRpeg extracts T lymphocyte features, ESM-2 extracts antigen peptide features); a heterogeneous network graph containing a single edge to be predicted is constructed; the graph data is input into the trained residual graph attention network for forward inference to obtain the log odds of binding strength, which is then converted into binding probabilities using the Sigmoid function. Simultaneously, the system automatically extracts attention weights and backtracks to residue positions, outputting a list of key residue interaction pairs. The final output prediction report includes: the predicted binding probability of each input pair (a continuous value between 0 and 1), a binary classification suggestion (a probability ≥ 0.5 indicates binding, otherwise it indicates non-binding), and the identified key residue interaction pairs and their spatial location information.
[0020] Systematic experiments on the pMTnet dataset validated a T-lymphocyte antigen binding prediction method based on residual graph attention, achieving an AUC of 0.944, a significant improvement over the existing representative method GTE (AUC 0.911). The introduction of residual connections makes the model more efficient in deep feature extraction, the hard negative sampling mechanism effectively improves the ability to distinguish boundary samples, and the pairwise AUC loss further enhances the ranking performance. The synergy of these three factors achieves more stable and efficient model training.
[0021] The above description, in conjunction with specific preferred embodiments, provides a further detailed explanation of the present invention. It should not be construed that the specific implementation of the present invention is limited to these descriptions. For those skilled in the art, various simple deductions or substitutions can be made without departing from the concept of the present invention, and all such modifications and substitutions should be considered within the scope of protection of the present invention.
Claims
1. A method for predicting T lymphocyte antigen binding based on residual map attention, characterized in that, The specific steps include the following: Step 1: Input the node feature matrix. After processing, add the first layer output features and the second layer output features according to the element positions to achieve residual connection. Pass the residual-connected features through a linear correction unit and random discarding to obtain the final node representation vector. Concatenate the representation vector along the feature dimension and input it into a multilayer perceptron. After linear transformation and activation by the linear correction unit, output a single scalar, which is then converted into a combination probability through the Sigmoid function. Step 2: Before training begins, calculate and fix the weights of positive samples based on the number of positive and negative samples in the entire training set; fuse the focus loss, AUC optimization loss, and pairwise AUC loss as the total loss function; and sequentially call the adaptive moment estimation optimizer, the AUC-specific optimizer, and the stochastic gradient descent optimizer with momentum to update the parameters. This step involves multi-loss joint optimization and a hard negative sampling strategy, comprising four parts: loss function design, hard negative sampling mechanism, pre-calculation of positive sample weights, and multi-optimizer collaborative strategy. Focus loss and positive sample weight pre-calculation: Focus loss reduces the weight of easily classified samples by introducing a weight factor α and a focusing parameter γ, making the model focus more on the difficult-to-classify minority class samples. The optimal values of α and γ are automatically determined through grid search before training, selecting the parameter combination with the highest AUC on the validation set. Positive sample weight pre-calculation strategy: Before the start of the training loop, the positive sample weights are calculated once based on the total number of positive and negative samples in the entire training set, and this value is used throughout the training process. Fixed use in the middle; Difficult negative sampling mechanism: To address the problem that traditional random negative sampling cannot provide effective supervision signals, a difficult negative sampling function is designed. This function selects several negative samples with the highest prediction probabilities based on the current prediction probabilities of the model, and uses them as difficult negative samples to participate in this round of training. The prediction probabilities of all negative samples are calculated, sorted in descending order, and samples whose quantity is the product of the number of positive samples and the proportion of difficult negative samples are selected as difficult negative samples; Multi-optimizer collaborative strategy: To directly optimize the ranking performance, a pairwise AUC loss function is introduced. This loss maximizes the gap between the prediction scores of positive samples and the prediction scores of negative samples by constructing positive and negative sample pairs. Its calculation formula is: , where s i and s j These are the original model output values for positive and negative samples, respectively. To reduce computational complexity, this function randomly samples the positive and negative samples. The total loss function is a weighted combination of the focus loss, AUC optimization loss, and pairwise AUC loss. λ1, λ2, and λ3 are adjustable weight coefficients. The focus loss is responsible for handling the class imbalance problem, the AUC optimization loss is responsible for stabilizing the ranking performance, and the pairwise AUC loss further enhances the ability to distinguish boundary samples. In terms of optimizer collaboration, three optimizers are executed in sequence: the adaptive moment estimation optimizer Adam, the AUC-specific optimizer PESG, and the momentum stochastic gradient descent optimizer SGD. This achieves a triple collaboration of basic stable update, AUC enhancement optimization, and momentum exploratory fine-tuning. In each iteration, forward propagation is performed first to calculate the total loss, then backpropagation is used to obtain gradient information, and finally, the parameter update methods of each optimizer are called in sequence according to the collaboration order. Step 3: Using a five-fold cross-validation evaluation model, extract the attention weights of the last layer of the graph attention network, map them back to the amino acid residue positions of the original sequence, screen out key residue interaction pairs by setting a contribution threshold, and output a structured report containing the sequence, predicted probability, key residue pairs and contribution score. Step 4: After determining the optimal hyperparameter configuration, retrain the final model; during deployment, the system receives the user-input T lymphocyte sequence and antigen peptide sequence, automatically executes the entire process, and outputs the predicted binding probability, binary classification suggestions, and a list of key residue interaction pairs.
2. The method for predicting T lymphocyte antigen binding based on residual map attention according to claim 1, characterized in that, The specific forward propagation process for residual graph attention network modeling is as follows: First, the input node feature matrix X enters the first graph attention convolutional layer. This layer uses 8 independent attention heads and calculates the association strength between each node and all its neighbors through a self-attention mechanism. Then, based on the normalized attention coefficients, the neighbor features are weighted and summed to generate a new feature representation for the node. The output dimension is the size of the hidden layer. The output of the first layer is activated by a linear correction unit and randomly dropped to obtain the first layer output features. Then, the first layer output features are input into the second graph attention convolutional layer, which also uses 8 attention heads and maintains an output dimension of 128, to further refine the features. After the second layer output is processed by a linear correction unit and random discarding, the second layer output features are obtained. Then, residual connection is performed: the first layer output features and the second layer output features are added according to their element positions, that is, the feature after residual connection is equal to the sum of the first layer output features and the second layer output features. Subsequently, the residual concatenated features are activated again by a linear correction unit and randomly discarded to obtain the final node representation vector. Finally, for each pair of T lymphocyte-antigen peptide nodes that need to be predicted, the representation vectors of the two nodes are concatenated along the feature dimension and input into a multilayer perceptron. The structure of the multilayer perceptron is as follows: a first layer of linear transformation, a linear correction unit activation function, a second layer of linear transformation, and the output is a single scalar as the logarithmic probability of the binding strength, which is then converted into a binding probability between 0 and 1 by the sigmoid function.