A deep learning-based glycopeptide mass spectrum prediction device

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By using a deep learning-based glycopeptide mass spectrum prediction device, peptide and glycan features are separated and processed, solving the problem of inaccurate glycopeptide mass spectrum prediction in existing technologies and achieving accurate prediction of glycopeptide mass spectra.

CN116825200BActive Publication Date: 2026-06-12ZJU HANGZHOU GLOBAL SCI & TECH INNOVATION CENT

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: ZJU HANGZHOU GLOBAL SCI & TECH INNOVATION CENT
Filing Date: 2023-07-07
Publication Date: 2026-06-12

Application Information

Patent Timeline

07 Jul 2023

Application

12 Jun 2026

Publication

CN116825200B

IPC: G16B35/00; G16B15/20; G16B40/00; G06N3/045; G06N3/0442

AI Tagging

Application Domain

Biostatistics Biological models

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

Existing technologies cannot effectively handle the nonlinear structure of glycopeptides, leading to inaccurate predictions of glycopeptide mass spectra.

⚗Method used

A deep learning-based glycopeptide mass spectrum prediction device is used. By separating the peptide and glycan parts, the features of each part are processed by sequence neural network and tree recurrent neural network respectively. Combined with feature fusion and spectrum merging modules, a complete mass spectrum of glycopeptide is generated.

🎯Benefits of technology

It enables accurate prediction of the nonlinear structure of glycopeptides, improving the prediction accuracy of glycopeptide mass spectra.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN116825200B_ABST

Patent Text Reader

Abstract

The application discloses a kind of glycopeptide mass spectrum based on deep learning prediction device, containing peptide segment coding module, sugar chain coding module, feature fusion module, peptide segment spectrum output module, sugar chain spectrum output module and spectrum merging module;Prediction process is: glycopeptide is divided into peptide segment and sugar chain;Peptide segment part is input into peptide segment coding module, generates multiple feature representations;Sugar chain part is input into sugar chain coding module, generates multiple feature representations;Multiple feature representations of peptide segment part and sugar chain part are respectively input into feature fusion module and carry out feature fusion;Peptide segment part feature representation is input into peptide segment spectrum output module, and outputs peptide segment part spectrum;Sugar chain part feature representation is input into sugar chain spectrum output module, and outputs sugar chain part spectrum;Peptide segment part spectrum and sugar chain part spectrum are input into spectrum merging module, and are spliced into the complete mass spectrum of glycopeptide.The application can handle glycopeptide nonlinear structure problem, accurately predict glycopeptide mass spectrum.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the fields of mass spectrometry, proteomics and deep learning, and in particular relates to a deep learning-based glycopeptide mass spectrum prediction device. Background Technology

[0002] Glycosylation is a widespread post-translational modification of proteins that plays a crucial role in many biological processes. Mass spectrometry has become the primary tool for glycosylation proteomics analysis. Developing more accurate glycopeptide mass spectrometry prediction models is one of the effective methods to improve the reliability of protein glycosylation identification.

[0003] Currently, many methods exist for predicting the spectra of conventional peptides, including pDeep (Analytical Chemistry, 2017, 89(23):12690-12697; Analytical Chemistry, 2019, 91(15):9724-9731; Analytical Chemistry, 2021, 93(14):5815-5822), Prosit (Nature Methods, 2019, 16(6):509-518), DeepDIA (Nature Communications, 2020, 11(1):146), DeepPhospho (Nature Communications, 2021, 12(1):6685), and AlphaPeptDeep (Nature Communications, 2022, 13(1):7238). These methods utilize deep neural networks to identify characteristic patterns in the amino acid sequences of peptides, predicting and outputting the peak intensities of peptide fragment ions.

[0004] However, the aforementioned tools can only handle linear peptide sequence data and cannot handle the nonlinear structures of glycopeptides. Therefore, developing new methods to predict the spectra of glycopeptides is an urgent technical problem to be solved. Summary of the Invention

[0005] To address the problem that existing technologies cannot handle the nonlinear structure of glycopeptides, this invention provides a deep learning-based glycopeptide mass spectrum prediction device that can accurately predict glycopeptide mass spectra.

[0006] A deep learning-based glycopeptide mass spectrum prediction device includes a computer memory, a computer processor, and a computer program stored in the computer memory and executable on the computer processor. The computer memory stores a trained glycopeptide mass spectrum prediction model, which includes a peptide encoding module, a glycan encoding module, a feature fusion module, a peptide spectrum output module, a glycan spectrum output module, and a spectrum merging module.

[0007] The computer processor executes the computer program to perform the following steps:

[0008] (1) Glycopeptides are divided into two parts: peptide segments and glycan chains. The peptide segment part includes the amino acid sequence of the glycopeptide and other modifications on each amino acid except for the glycan chain. The glycan chain part is a tree-like representation of monosaccharides and their connection methods.

[0009] (2) Input the peptide segment into the peptide encoding module, and after processing by the sequence neural network model in the peptide encoding module, generate multiple feature representations of the peptide segment.

[0010] (3) Input the glycan part into the glycan encoding module, and after processing by the tree-type recurrent neural network model in the glycan encoding module, generate multiple feature representations of the glycan part;

[0011] (4) Input the multiple feature representations of the peptide part and the multiple feature representations of the glycan part into the feature fusion module for feature fusion to generate new feature representations of the peptide part and glycan part;

[0012] (5) Input the peptide segment feature representation into the peptide spectrum output module, and generate the mass spectrum peak intensity of the corresponding peptide fragment ions according to the correspondence between each theoretical break position of the peptide segment and the amino acid in the sequence, and output the peptide segment spectrum.

[0013] (6) Input the partial feature representation of the sugar chain into the sugar chain spectrum output module, perform a polymerization operation based on the correspondence between each theoretical break position of the sugar chain and the monosaccharide, generate the mass spectrum peak intensity of the corresponding sugar chain fragment ions, and output the partial spectrum of the sugar chain.

[0014] (7) Input the partial mass spectrum of peptide and the partial mass spectrum of glycan into the mass spectrum merging module to synthesize the complete mass spectrum of glycopeptide.

[0015] In step (2), when the peptide segment is input into the peptide encoding module, its amino acid sequence is represented by amino acid type one-hot encoding or amino acid type embedding vector. The other modifications on each amino acid, except for the sugar chain, are represented by modification type one-hot encoding, modification type embedding vector, or vector representation composed of the number of atoms of each element in the modification.

[0016] The sequence neural network model described is a combination of a unidirectional recurrent neural network, a bidirectional recurrent neural network, a unidirectional long short-term memory network, a bidirectional long short-term memory network, a gated recurrent unit network, a one-dimensional convolutional neural network, a Transformer, or a combination of the above networks.

[0017] In step (3), when the sugar chain part is input into the sugar chain encoding module, its monosaccharide representation is monosaccharide type one-hot encoding, monosaccharide type embedding vector, or vector representation composed of the number of atoms of each element in the monosaccharide.

[0018] The tree-based recurrent neural network model employs tree-structured models such as Tree-RNN, Tree-LSTM, and Tree-GRU. When processing the input glycan tree structure, the method for traversing the monosaccharides within the glycan chain can be as follows:

[0019] Traverse from the leaf nodes of the sugar chain to the root node, and generate the feature vector of each node from the input of each node and the feature vector of its child nodes.

[0020] Alternatively, traverse from the root node of the sugar chain to the root node, and generate the feature vector of each node from the input of each node and the feature vector of the parent node.

[0021] Alternatively, you can first traverse in one of the directions mentioned above, merge the input of each node with the feature vector generated by the traversal, and then traverse in the other direction to update the feature vector of the node.

[0022] In step (4), the feature fusion module performs feature fusion by splicing, element-wise addition, element-wise multiplication, or weighted averaging using an attention mechanism.

[0023] In step (6), the glycan spectrum output module performs a polymerization operation on the feature representations of the lost and remaining monosaccharides for each break position of the glycan and the correspondence between the break position and the lost and remaining monosaccharides after the break, generating a feature representation for each break position; then the feature representations of each break position are polymerized, and after passing through a fully connected layer and an activation function, the mass spectrum peak intensity of the corresponding glycan fragment ions is generated.

[0024] Among them, aggregation operations are weighted aggregations that take the maximum value, sum, average, or introduce attention mechanisms.

[0025] In step (7), the spectrum merging module passes the feature representations of the peptide segment and the glycan segment after feature fusion through a fully connected layer and an activation function to generate a scaling factor. After adjusting the peak intensities of the peptide segment spectrum and the glycan segment spectrum according to this scaling factor, they are then combined to form the complete mass spectrum of the glycopeptide.

[0026] The training process of the glycopeptide mass spectrometry prediction model is as follows:

[0027] Collect glycopeptide mass spectra in sufficient quantities, label the fragment ion peaks of the peptide and glycan portions, and construct a training dataset.

[0028] A glycopeptide mass spectrum prediction model was constructed, and the prediction of the peak intensity of the complete glycopeptide spectrum, the peak intensity of the peptide segment, and the peak intensity of the glycan segment were used as three objectives for multi-task learning to optimize the model parameters. The loss function of multi-task learning was obtained by combining the loss functions of the complete glycopeptide spectrum, the peak intensity of the peptide segment, and the peak intensity of the glycan segment.

[0029] When the available glycopeptide training dataset is small, as a preferred technical solution, the glycopeptide mass spectrometry prediction model can be pre-trained on a large-scale non-glycosylated peptide dataset and then trained using the glycopeptide dataset. The process is as follows:

[0030] Collect non-glycosylated mass spectra in sufficient quantities, label the peptide fragment ion peaks, and construct a pre-training dataset; construct the peptide encoding module and peptide spectrum output module in the glycopeptide mass spectrum prediction model, and train the model parameters using the pre-training dataset;

[0031] Collect glycopeptide mass spectra in sufficient quantities, label the fragment ion peaks of the peptide and glycan portions, and construct a training dataset; calculate the peak intensity ratio coefficients of fragment ions in the peptide and glycan portions, and then normalize the peak intensities of fragment ions in the peptide and glycan portions respectively.

[0032] A complete glycopeptide mass spectrum prediction model was constructed, in which the peptide encoding module and peptide spectrum output module adopted the parameters obtained by pre-training; the prediction of the peak intensity of the complete glycopeptide spectrum, the peak intensity of the peptide part, the peak intensity of the glycan part, and the peak intensity ratio coefficient were used as four objectives for multi-task learning to optimize the model parameters;

[0033] The loss function for multi-task learning is obtained by combining the loss functions of the complete glycopeptide spectrum, the peak intensity of the peptide segment, the peak intensity of the glycan segment, and the peak intensity scaling factor.

[0034] Compared with the prior art, the present invention has the following beneficial effects:

[0035] (1) This invention provides a glycopeptide mass spectrum prediction model for processing the nonlinear structure of glycopeptides, thereby achieving accurate prediction of glycopeptide mass spectra;

[0036] (2) This invention provides a training method for a glycopeptide mass spectrum prediction model, which utilizes multi-task learning to simultaneously train the prediction modules for complete glycopeptide spectra, partial peptide spectra, and partial glycan spectra. Attached Figure Description

[0037] Figure 1This is a schematic diagram of the framework of a deep learning-based glycopeptide mass spectrometry prediction device of the present invention. Detailed Implementation

[0038] The present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be noted that the embodiments described below are intended to facilitate the understanding of the present invention and do not constitute any limitation thereof.

[0039] A deep learning-based glycopeptide mass spectrometry prediction device includes a computer memory, a computer processor, and a computer program stored in the computer memory and executable on the computer processor. The computer memory stores a trained glycopeptide mass spectrometry prediction model. During prediction, the glycopeptide is first divided into two parts: peptide segments and glycans. The peptide segment includes the amino acid sequence of the glycopeptide and other modifications on each amino acid except for the glycan. The glycan segment is a tree representation of the monosaccharide and its linkage. The two parts are processed by different modules in the device to generate partial mass spectra, which are then spliced together to obtain the glycopeptide mass spectrum.

[0040] like Figure 1 As shown, the glycopeptide mass spectrometry prediction model includes:

[0041] The peptide encoding module includes processing the input peptide sequence using a sequence neural network model to generate feature representations of the peptide portion.

[0042] The glycan encoding module includes processing the input glycan tree structure using a tree-like recurrent neural network model to generate feature representations of the glycan parts.

[0043] The feature fusion module is used to fuse the generated peptide partial feature representations and glycan partial feature representations to generate new peptide partial feature representations and glycan partial feature representations.

[0044] The peptide spectrum output module includes a partial feature representation of the peptide generated by processing with a sequence neural network model, and generates the mass spectrum peak intensity of the corresponding peptide fragment ions based on the correspondence between each theoretical break position of the peptide and the amino acids in the sequence.

[0045] The glycan spectrum output module includes a partial feature representation of the glycan generated using a tree-like recurrent neural network model. Based on the correspondence between each theoretical break point of the glycan and the monosaccharide, a polymerization operation is performed to generate the mass spectrum peak intensity of the corresponding glycan fragment ions.

[0046] The spectrum merging module is used to combine the generated peptide partial spectrum and the generated glycan partial spectrum into a complete mass spectrum of the glycopeptide.

[0047] Sequential neural network models can be unidirectional or bidirectional recurrent neural networks (RNNs) or their variants Long Short-Term Memory (LSTM) networks, Gated Recurrent Unit (GRU) networks, one-dimensional convolutional neural networks (CNNs), Transformers, or combinations of the above networks. Tree-based recurrent neural network models can be tree-structured RNNs (Tree-RNNs) or their variants Tree-LSTM and Tree-GRU.

[0048] Tree-RNN and its variants can traverse monosaccharides in sugar chains in one of the following ways:

[0049] (1) Traverse from the leaf nodes of the sugar chain to the root node, and generate the feature vector of each node from the input of each node and the feature vector of its child nodes.

[0050] (2) Traverse from the root node of the sugar chain to the root node, and generate the feature vector of the node from the input of each node and the feature vector of the parent node.

[0051] (3) First, traverse in one direction and merge the input of each node with the feature vector generated by the traversal. Then, traverse in the other direction and update the feature vector of the node.

[0052] The glycan spectrum output module, based on the correspondence between each break point in the glycan and the lost and remaining monosaccharides after breakage, performs a aggregation operation on the feature representations of the lost and remaining monosaccharides to generate a feature representation for each break point. Then, the feature representations for each break point are aggregated, passed through a fully connected layer and an activation function, to generate the mass spectrometry peak intensities of the corresponding glycan fragment ions. The aggregation operation can be one of the following: maximization, summation, averaging, or weighted aggregation incorporating an attention mechanism.

[0053] The fusion method for peptide feature representation and glycan feature representation in the feature fusion module can be splicing, element-wise addition or multiplication, or weighted averaging using attention mechanism.

[0054] The amino acid representation in the peptide sequence input can be one of the following: (1) one-hot encoding of amino acid type; (2) embedding vector of amino acid type.

[0055] The modification of peptide sequence input other than glycan can be represented in one of the following ways: (1) one-hot encoding of modification type; (2) modification type embedding vector; (3) vector representation composed of the number of atoms of each element in the modification.

[0056] The representation of monosaccharides in the glycan input can be one of the following: (1) monosaccharide type one-hot encoding; (2) monosaccharide type embedding vector; (3) vector representation composed of the number of atoms of each element in the monosaccharide.

[0057] The spectrum merging module can generate a scaling factor by passing the fused peptide and glycan feature representations through a fully connected layer and an activation function. The peak intensities of the peptide and glycan spectra are then adjusted according to this scaling factor before being combined to form a complete mass spectrum of the glycopeptide.

[0058] This invention also provides a training method for this glycopeptide mass spectrometry prediction model, comprising the following steps:

[0059] (1) Collect glycopeptide mass spectra that meet the requirements, label the fragment ion peaks of the peptide and glycan parts, and construct a training dataset;

[0060] (2) Construct a glycopeptide mass spectrometry prediction model based on deep neural networks;

[0061] (3) The prediction of the peak intensity of the complete glycopeptide spectrum, the peak intensity of the peptide segment and the peak intensity of the glycan segment are used as three objectives for multi-task learning to optimize the model parameters.

[0062] In some embodiments of the present invention, a pre-trained model can be constructed using mass spectrometry data of non-glycosylated peptides, and then the model can be trained using glycopeptide mass spectrometry data. Specifically, this includes the following steps:

[0063] (1) Collect non-glycosylated mass spectra that meet the requirements, label the peptide fragment ion peaks in them, and construct a pre-training dataset;

[0064] (2) Construct the peptide encoding module and peptide spectrum output module in the glycopeptide mass spectrometry prediction model, and train the model parameters;

[0065] (3) Construct the entire glycopeptide mass spectrum prediction model, wherein the peptide encoding module and the peptide spectrum output module use the parameters obtained from pre-training, and then train the entire model.

[0066] During model training, the loss function for multi-task learning can be obtained by combining the loss functions for the peak intensities of the complete glycopeptide spectrum, the peptide portion, and the glycan portion. The loss functions for the complete glycopeptide spectrum, the peptide portion, and the glycan portion can be the mean squared error (MSE), the mean absolute error (MAE), or the spectral angle (SA) between the predicted and target peak intensities.

[0067]

[0068] In the formula, x and y are the predicted peak intensity vector and the target peak intensity vector, respectively. The loss function can be combined in the following ways: summation, arithmetic mean, geometric mean, or uncertainty weights (UW), or the weights can be dynamically adjusted by the changes in the loss function for averaging (dynamic weight average, DWA).

[0069] In some embodiments of the present invention, the glycopeptide mass spectrum prediction device also predicts the peak intensity ratio coefficients of fragment ions of peptide segments and fragment ions of glycan chains in the glycopeptide spectrum, and its model training method includes the following steps:

[0070] (1) Collect glycopeptide mass spectra that meet the requirements, label the fragment ion peaks of the peptide segment and the glycan segment, and construct a training dataset; preprocess the collected training data, calculate the peak intensity ratio coefficient of fragment ions of peptide segment and fragment ions of glycan segment, and then normalize the peak intensity of fragment ions of peptide segment and fragment ions of glycan segment respectively.

[0071] (2) Construct a glycopeptide mass spectrum prediction model;

[0072] (3) The prediction of peak intensity of complete glycopeptide spectrum, peak intensity of peptide segment, peak intensity of glycan segment, and peak intensity ratio coefficient are used as four objectives for multi-task training to optimize model parameters.

[0073] The loss function for multi-task learning is obtained by combining the loss functions of the complete glycopeptide spectrum, the peak intensities of the peptide and glycan portions, and the peak intensity scaling factor; the loss function for the peak intensity scaling factor can be either MSE or MAE.

[0074] In this embodiment, a glycopeptide mass spectrometry prediction model is constructed, specifically selected as follows:

[0075] (1) The sequence neural network model of the peptide encoding module adopts bidirectional LSTM;

[0076] (2) The tree-shaped recurrent neural network model in the sugar chain encoding module adopts a tree-shaped LSTM. First, it traverses from the leaf node to the root node, and after fusing the input of each node with the feature vector generated by the traversal, it traverses from the root node to the root node to generate the feature vector of the node.

[0077] (3) The fusion method of peptide feature representation and glycan feature representation in the feature fusion module is splicing;

[0078] (4) The sequence neural network model of the peptide spectrum output module adopts bidirectional LSTM;

[0079] (5) The polymerization operation in the glycan spectrum output module is summation;

[0080] (6) In the spectrum merging module, the peptide feature representation and glycan feature representation are processed by a fully connected layer and an activation function to generate the spectrum merging ratio coefficient.

[0081] Constructing a pre-trained model using mass spectrometry data of non-glycosylated peptides includes the following steps:

[0082] (1) Collect a large number (>100,000) of mass spectra of non-glycosylated peptides from the ProteomExchange website, label the fragment ion peaks of peptide b and y, and construct a pre-training dataset.

[0083] (2) Construct the peptide encoding module and peptide spectrum output module in the glycopeptide mass spectrum prediction model, train the model parameters, use the spectrum angle as the loss function, and use Adam as the optimizer.

[0084] The model is further trained using mass spectrometry data of glycosylated peptides, specifically including the following steps:

[0085] (1) Collect mass spectra of mouse tissue glycopeptides to meet the required quantity (>10,000 images), label the peaks of peptide b, y and glycan Y fragment ions, and construct a training dataset; preprocess the collected training data, calculate the peak intensity ratio coefficient of peptide fragment ions and glycan fragment ions, and then normalize the peak intensities of peptide fragment ions and glycan fragment ions respectively (the maximum peak intensity is 1).

[0086] (2) Construct a complete glycopeptide mass spectrum prediction model, wherein the parameters of the peptide encoding module and the peptide spectrum output module are the parameters of the pre-trained model;

[0087] (3) The peak intensity of the complete glycopeptide spectrum, the peak intensity of the peptide segment, the peak intensity of the glycan segment, and the peak intensity ratio coefficient are predicted as four objectives for multi-task training. The peak intensity loss function is the spectrum angle, the ratio coefficient loss function is MSE, and the total loss function is obtained by summing the loss functions of the four objectives using uncertainty as weight. The optimizer is Adam.

[0088] Glycopeptide mass spectra of other samples were collected, and the peaks of peptides b and y and glycan Y fragment ions were labeled to construct an external validation dataset. The glycopeptides from the validation dataset were input into the trained model, and the peak intensity similarity between the model-predicted glycopeptide spectra and the experimental spectra in the validation dataset was compared to evaluate the model's prediction performance. The prediction similarity is generally 0.9.

[0089] The embodiments described above provide a detailed explanation of the technical solutions and beneficial effects of the present invention. It should be understood that the above descriptions are merely specific embodiments of the present invention and are not intended to limit the present invention. Any modifications, additions, and equivalent substitutions made within the scope of the principles of the present invention should be included within the protection scope of the present invention.

Claims

1. A deep learning-based glycopeptide mass spectrometry prediction device, comprising a computer memory, a computer processor, and a computer program stored in the computer memory and executable on the computer processor, characterized in that: The computer memory stores a trained glycopeptide mass spectrum prediction model, which includes a peptide encoding module, a glycan encoding module, a feature fusion module, a peptide spectrum output module, a glycan spectrum output module, and a spectrum merging module. The computer processor executes the computer program to perform the following steps: (1) Glycopeptides are divided into two parts: peptide segments and glycan chains. The peptide segment part includes the amino acid sequence of the glycopeptide and other modifications on each amino acid except for the glycan chain. The glycan chain part is a tree-like representation of monosaccharides and their connection methods. (2) Input the peptide segment into the peptide encoding module, and after processing by the sequence neural network model in the peptide encoding module, generate multiple feature representations of the peptide segment. (3) Input the glycan part into the glycan encoding module, and after processing by the tree-type recurrent neural network model in the glycan encoding module, generate multiple feature representations of the glycan part; (4) Input the multiple feature representations of the peptide part and the multiple feature representations of the glycan part into the feature fusion module for feature fusion to generate new feature representations of the peptide part and glycan part; (5) Input the peptide segment feature representation into the peptide spectrum output module, and generate the mass spectrum peak intensity of the corresponding peptide fragment ions according to the correspondence between each theoretical break position of the peptide segment and the amino acid in the sequence, and output the peptide segment spectrum. (6) Input the partial feature representation of the sugar chain into the sugar chain spectrum output module, perform a polymerization operation based on the correspondence between each theoretical break position of the sugar chain and the monosaccharide, generate the mass spectrum peak intensity of the corresponding sugar chain fragment ions, and output the partial spectrum of the sugar chain. (7) Input the partial mass spectrum of peptide and the partial mass spectrum of glycan into the mass spectrum merging module to synthesize the complete mass spectrum of glycopeptide.

2. The deep learning-based glycopeptide mass spectrometry prediction device according to claim 1, characterized in that, In step (2), when the peptide segment is input into the peptide encoding module, its amino acid sequence is represented by amino acid type one-hot encoding or amino acid type embedding vector. The other modifications on each amino acid, except for the sugar chain, are represented by modification type one-hot encoding, modification type embedding vector, or vector representation composed of the number of atoms of each element in the modification.

3. The deep learning-based glycopeptide mass spectrometry prediction device according to claim 1, characterized in that, In step (2), the sequence neural network model is a combination of a unidirectional recurrent neural network, a bidirectional recurrent neural network, a unidirectional long short-term memory network, a bidirectional long short-term memory network, a gated recurrent unit network, a one-dimensional convolutional neural network, a Transformer, or a combination of the above networks.

4. The deep learning-based glycopeptide mass spectrometry prediction device according to claim 1, characterized in that, In step (3), when the sugar chain part is input into the sugar chain encoding module, its monosaccharide representation is monosaccharide type one-hot encoding, monosaccharide type embedding vector, or vector representation composed of the number of atoms of each element in the monosaccharide.

5. The deep learning-based glycopeptide mass spectrometry prediction device according to claim 1, characterized in that, In step (3), the tree-structured recurrent neural network model uses tree-based structures such as Tree-RNN, Tree-LSTM, and Tree-GRU. When processing the input glycan tree structure, the tree-structured recurrent neural network model traverses the monosaccharides in the glycan chain in the following way: Traverse from the leaf nodes of the sugar chain to the root node, and generate the feature vector of each node from the input of each node and the feature vector of its child nodes. Alternatively, traverse from the root node of the sugar chain to the root node, and generate the feature vector of each node from the input of each node and the feature vector of the parent node. Alternatively, we can first traverse in one of the directions mentioned above, merge the input of each node with the feature vector generated by the traversal, and then traverse in the other direction to update the feature vector of the node.

6. The deep learning-based glycopeptide mass spectrometry prediction device according to claim 1, characterized in that, In step (4), the feature fusion module performs feature fusion by splicing, element-wise addition, element-wise multiplication, or weighted averaging using an attention mechanism.

7. The deep learning-based glycopeptide mass spectrometry prediction device according to claim 1, characterized in that, In step (6), the glycan spectrum output module performs a polymerization operation on the feature representations of the lost and remaining monosaccharides for each break position of the glycan and the correspondence between the break position and the lost and remaining monosaccharides after the break, generating a feature representation for each break position; then the feature representations of each break position are polymerized, and after passing through a fully connected layer and an activation function, the mass spectrum peak intensity of the corresponding glycan fragment ions is generated. Among them, aggregation operations are weighted aggregations that take the maximum value, sum, average, or introduce attention mechanisms.

8. The deep learning-based glycopeptide mass spectrometry prediction device according to claim 1, characterized in that, In step (7), the spectrum merging module passes the feature representations of the peptide segment and the glycan segment after feature fusion through a fully connected layer and an activation function to generate a scaling factor. After adjusting the peak intensities of the peptide segment spectrum and the glycan segment spectrum according to this scaling factor, they are then combined to form the complete mass spectrum of the glycopeptide.

9. The deep learning-based glycopeptide mass spectrometry prediction device according to claim 1, characterized in that, The training process of the glycopeptide mass spectrometry prediction model is as follows: Collect glycopeptide mass spectra in sufficient quantities, label the fragment ion peaks of the peptide and glycan portions, and construct a training dataset. A glycopeptide mass spectrum prediction model was constructed, and the prediction of the peak intensity of the complete glycopeptide spectrum, the peak intensity of the peptide segment, and the peak intensity of the glycan segment were used as three objectives for multi-task learning to optimize the model parameters. The loss function of multi-task learning was obtained by combining the loss functions of the complete glycopeptide spectrum, the peak intensity of the peptide segment, and the peak intensity of the glycan segment.

10. The deep learning-based glycopeptide mass spectrometry prediction device according to claim 9, characterized in that, The training process of the glycopeptide mass spectrometry prediction model is as follows: Collect non-glycosylated mass spectra in sufficient quantities, label the peptide fragment ion peaks, and construct a pre-training dataset; construct the peptide encoding module and peptide spectrum output module in the glycopeptide mass spectrum prediction model, and train the model parameters using the pre-training dataset; Collect glycopeptide mass spectra in sufficient quantities, label the fragment ion peaks of the peptide and glycan portions, and construct a training dataset; calculate the peak intensity ratio coefficients of fragment ions in the peptide and glycan portions, and then normalize the peak intensities of fragment ions in the peptide and glycan portions respectively. A complete glycopeptide mass spectrum prediction model was constructed, in which the peptide encoding module and peptide spectrum output module adopted the parameters obtained by pre-training; the prediction of the peak intensity of the complete glycopeptide spectrum, the peak intensity of the peptide part, the peak intensity of the glycan part, and the peak intensity ratio coefficient were used as four objectives for multi-task learning to optimize the model parameters; The loss function for multi-task learning is obtained by combining the loss functions of the complete glycopeptide spectrum, the peak intensity of the peptide segment, the peak intensity of the glycan segment, and the peak intensity scaling factor.