Prediction method for protein sequence disulfide bond connection mode based on forest regression model

A protein sequence and connection mode technology, applied in the field of disulfide bond prediction in protein sequences in bioinformatics, to achieve the effect of improving prediction accuracy, improving prediction accuracy, and increasing prediction speed

Active Publication Date: 2014-09-24
NANJING UNIV OF SCI & TECH
View PDF1 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In summary, although some progress has been made in the prediction of disulfide bond patterns in the prior art, there is still room for improvement in the prediction accuracy.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Prediction method for protein sequence disulfide bond connection mode based on forest regression model
  • Prediction method for protein sequence disulfide bond connection mode based on forest regression model
  • Prediction method for protein sequence disulfide bond connection mode based on forest regression model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] In order to better understand the technical content of the present invention, specific embodiments are given together with the attached drawings for description as follows.

[0028] like figure 1 As shown, according to a preferred embodiment of the present invention, a method for predicting the disulfide bond connection pattern of a protein sequence based on the regression forest model, its implementation includes the following steps:

[0029] Step 1, feature extraction, based on the input protein sequence information, perform multi-view feature extraction and feature combination, namely:

[0030] Use the PSI-BLAST algorithm to extract the evolutionary information of the protein sequence, use the PSIPRED algorithm to extract the secondary structure information of the protein sequence, and then use the sliding window and feature serial combination method to extract each cysteine ​​from the aforementioned evolutionary information and secondary structure information. Mult...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a prediction method for protein sequence disulfide bond connection mode based on a forest regression model. The method comprises the following steps that step 1, the feature vector of each cysteine residues pair in the protein sequence is obtained through multi-view feature extraction and feature combination; step 2, for the to-be-predicated protein sequence information and training datasets, the feature vectors of all the cysteine residues pairs in the to-be-predicated protein sequence information and training datasets are generated, so that a training sample set and a to-be-predicated sample set are respectively formed; step 3, the distribution rule of cysteine samples in the feature space is studied through the random forest algorithm, so that a random forest regression model is generated; step 4, the feature vector of the to-be-predicated sample set is predicated through the random forest regression model, so that the prone value of each cysteine residues pair forming disulfide bond is obtained, and the disulfide bond connection mode with the highest score is finally-predicated the disulfide bond connection mode in the protein sequence.

Description

technical field [0001] The invention relates to the technical field of disulfide bond prediction in protein sequences of bioinformatics, in particular to a method for predicting disulfide bond connection patterns of protein sequences based on a regression forest model. Background technique [0002] Disulfide bonds are one of the most important structural properties of proteins. Disulfide bonds are the primary covalent bonds formed between two cysteine ​​residues in a protein polypeptide chain, and they can form either interchain or intrachain in the peptide chain. Disulfide bonds play a very important role in protein folding and stability. Therefore, predicting the way cysteine ​​residues in proteins form disulfide bonds plays a pivotal role in predicting protein structure and function. [0003] There are many methods for predicting disulfide bonds, for example, DISULFIND method (A. Ceroni, A. Passerini, A. Vullo et al., "DISULFIND: a disulfide bonding state and cysteine ​...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F19/16
Inventor 李阳於东军胡俊沈红斌杨静宇
Owner NANJING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products