Machine learning algorithm for identifying peptides that contain features positively associated with natural endogenous or exogenous cellular processing, transportation and major histocompatibility complex (MHC) presentation

A histocompatibility and machine learning technology, applied in machine learning, based on specific mathematical models, genomics, etc., can solve problems such as limited utility

Active Publication Date: 2019-03-01
NEC奥克尔姆内特公司
View PDF10 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This difference in binding may yield algorithms that have learned features of both processing and HLA / MHC binding, but not the processing features themselves. Furthermore, the HLA / MHC-restricted nature of these tools limits their utility in antigen discovery

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Machine learning algorithm for identifying peptides that contain features positively associated with natural endogenous or exogenous cellular processing, transportation and major histocompatibility complex (MHC) presentation
  • Machine learning algorithm for identifying peptides that contain features positively associated with natural endogenous or exogenous cellular processing, transportation and major histocompatibility complex (MHC) presentation
  • Machine learning algorithm for identifying peptides that contain features positively associated with natural endogenous or exogenous cellular processing, transportation and major histocompatibility complex (MHC) presentation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0096] Example 1—Advantage of Using Matched Pairs from the Same Source Protein and Subsequent Training Set of Matched Pairs optimization.

[0097] To investigate the benefit of selecting matching negative peptides from the same proteins as positive peptides, different training sets were generated in which the matching negative members of each pair were selected from the same or random proteins. Negative peptides are selected on the basis that they share a predicted binding affinity of 10%, 100%, or within a range of 10%-100% of their corresponding positive partners. A different training set was then used to train the SVM algorithm, using VHSE and frequency vectors (dimers) as peptide flanking regions spanning the entire peptide length and 3 amino acids long (subsequently referred to as the "broad" configuration) extracted from the parental protein training features.

[0098] Each algorithm was then tested using three different independent test sets called melanoma, thymus...

Embodiment 2

[0100] Example 2—Investigation of predicted binding affinity differences between positive and negative members of the training set energy impact.

[0101] In order to study the relationship between the positive and negative members of the matched pairs used for training, different training sets were generated in which the matched negative members were selected on the basis listed in the table below; Combining differences of increasing width creates a training set.

[0102] Table 1: Creation of training sets with different binding variances

[0103]

[0104] Once the training set has been generated, they can be balanced in size simply by selecting matching pairs in which the positive membership is common to all the different groups. The balanced training set was then used to train 8 different SVM algorithms (using the training features described above). Each algorithm is then tested using the melanoma, thymus, and sample 10 test sets, and the results are in figure 2 ...

Embodiment 3

[0106] Example 3 - Optimizing the composition of the negative training set to improve performance.

[0107] In order to find the best criteria for selecting the negative training set, we created a series of negative datasets in which a negative peptide was selected based on its predefined range within its corresponding matched positive partner (as defined in Table 2 below) Contains predicted binding affinities.

[0108] Table 2: Different binding thresholds and criteria used to select the negative training set

[0109]

[0110]

[0111] Then, use 28 different training sets to train the SVM algorithm. Sample 10 test sets containing 608 and 5200 peptides respectively (where the predicted binding ICs of all positive peptides 50 value below 500nm) and sample 10 complementary test set (wherein the predicted binding IC of all positive peptides 50 values ​​above 500nm) to test each algorithm.

[0112] like image 3 As shown in Panels A to D (red lines) in , for the sampl...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention provides a method for identifying peptides that contain features positively associated with natural endogenous or exogenous cellular processing, transportation and major histocompatibility complex (MHC) presentation. In particular, the invention / method controls for the influence of protein abundance, stability and HLA / MHC binding on processing and presentation, enabling a machine-learning algorithm or statistical inference model trained using the method to be applied to any test peptide regardless of its HLA / MHC restriction i.e. the algorithm operates in a HLA / MHC-agnostic manner. This is attained through the building of positive and negative data sets of peptide sequences (peptides identified or inferred from surface bound or secreted MHC / peptide complexes in the literature, and those which are not). Specifically, the positive and negative data sets comprise a multiplicity of pairings between individual entries, in which both sequences of a pair are of equal or similar length, and are derived from the same source protein, and / or have similar binding affinities, with respect to the HLA / MHC molecule from which the peptide of the positive peptide is restricted.

Description

technical field [0001] The present invention relates to methods of identifying peptides comprising features associated with successful cellular processing, transport and major histocompatibility complex presentation by using machine learning algorithms or statistical inference models. Background technique [0002] For decades, the identification of immunogenic antigens in pathogens and tumors has played a central role in vaccine development. Over the past 15 to 20 years, this process has been simplified and enhanced by employing computational methods that reduce the number of antigens that need to be tested. Although the key features determining immunogenicity are not fully understood, it is known that most immunogenic class I peptides (antigens) are produced in the classical pathway by proteasomal cleavage of their parental polypeptide / protein in the cytosol, It is subsequently transported into the endoplasmic reticulum by the TAP transporter, then packaged into empty HLA / ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G16B20/20G16B40/00G16B40/20G16B40/30
CPCG16B40/30G16B40/20G06N20/10G06N20/00G16B40/00G16B20/30
Inventor 理查德·斯特拉特福德特雷弗·克兰西
Owner NEC奥克尔姆内特公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products