Method for predicting potential sensitization in protein

A prediction method, protein technology, applied in the field of computational biology prediction of protein characteristics, can solve the problems of high false positive rate, failure to meet actual needs, and inability to know allergen characteristic information, etc., to achieve the improvement of sensitivity and specificity Effect

Inactive Publication Date: 2013-04-17
SHANGHAI JIAO TONG UNIV
View PDF3 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The sequence-based prediction method proposed by FAO/WHO (Food and Agriculture Organization of the United Nations/World Health Organization) is based on the principle of distinguishing the amino acid sequence similarity between the protein to be tested and the known allergen protein. This method can effectively The allergen protein is predicted, but its false positive rate is very high; the motif-based prediction method is to compare the protein to be tested with the allergen characteristic motifs. Compared with the sequence-based method, this method improves the specificity to a certain extent and reduces the

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for predicting potential sensitization in protein
  • Method for predicting potential sensitization in protein
  • Method for predicting potential sensitization in protein

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0024] Embodiment 1, Dataset Preparation for Prediction Methods

[0025] Allergen protein dataset preparation: data source databases include Swiss-Prot Allergen Index (an authoritative protein database); IUIS Allergen Nomenclature (International Union of Immunological Societies-Allergen Nomenclature); SDAP (allergen structure database) and ADFS (using Allergen database for food safety), a total of 1176 allergen protein sequences were obtained after removing redundancy, and these sequences were used as the positive set for training the model;

[0026] Preparation of non-allergen protein data set: In order to construct a feasible negative set, the following steps are taken: 1. Download all protein sequences of Swiss-Prot (version: 2010_11), a total of 522,019; 2. Remove all known allergens Sequences with a similarity >= 30%; 3. Remove all sequences with a length less than 50 amino acids; 4. Randomly select protein sequences with the same number as the positive set from the res...

Embodiment 2

[0027] Embodiment 2, Input feature vector construction of SVM

[0028] Physicochemical characteristics: The physical and chemical characteristics of the protein collected in this example include eight aspects, 1. Amino acid composition; 2. Molecular weight; 3. Secondary structure tendency; 4. Hydrophobicity; 5. Polarization; 6. Solubility; 7. Normalized van der Waals volume; 8. Polarity; 9. Sequence length. The formula for amino acid composition is Fraction of a min o acid i = total number of a min o acids ( i ) total number of a min o acids in protein ; In addition to amino acid composition, molecular weight and sequence length, the other six attributes are related to a single amino acid and can be divided into 2 to 3 catego...

Embodiment 3

[0034] Embodiment 3, Feature selection for building models

[0035] The mRMR (Maximum Relevance Minimal Redundancy) method is used to sort the features before modeling, and the parameters of the mRMR program are set to λ=1, m=MID. The sorted feature list is used in the subsequent incremental feature selection process;

[0036]Use the IFS (Incremental Feature Selection) process to model, evaluate performance and obtain the optimal feature composition. The specific principle is to first select the top-ranked feature for modeling, and calculate its performance parameters for 10-fold cross-validation , and then select the top 2 features for modeling, and calculate the performance parameters of its 10-fold cross-validation, and so on, each time adding a top feature, until all the features are added, a total of the same For performance parameter pairs with the same number of features, draw a performance curve. The performance curve obtained by this method gradually rises from lo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method for predicting potential sensitization in protein. The method includes: firstly, making a training positive set and a training negative set; secondly, encoding various attributes of protein to build characteristic vectors; thirdly, sorting characteristics by maximum correlation and minimum redundancy methods, and selecting optimal characteristics by progressive characteristic selection method; and fourthly, counting and analyzing the selected characteristics, and giving a report of characteristic results evidently related to the sensitization in protein. The potential sensitization of the protein can be predicted effectively, the method is more accurate than the existing computational biology prediction methods, protein characteristics related to the sensitization of the protein can be analyzed effectively, and the method is significant to prediction of allergens, and study on the sensitization of the protein.

Description

technical field [0001] The invention relates to a computational biology prediction method for protein properties, in particular to a prediction method for potential allergenicity of proteins. Background technique [0002] Allergies and other hypersensitivity reactions to food and environmental factors are a major cause of chronic disease, affecting approximately 25% of the world's population. Allergens include proteins in food, cold air, hot air, ultraviolet rays, metals, etc. Among them, allergenic proteins may cause great harm to human health. In addition, more and more genetically modified foods have entered our daily life, and the potential risk of food allergies has also increased. Therefore, it is necessary to evaluate and predict the potential allergenicity of proteins. [0003] At present, there are three main methods of allergen prediction in computational biology, one is sequence-based method, the other is motif (motif)-based method, and the third is SVM (support...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F19/18
Inventor 李婧王婧张大兵
Owner SHANGHAI JIAO TONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products