DNA binding protein prediction method based on local evolution information

A protein-binding and prediction method technology, applied in the field of bioinformatics DNA-binding protein prediction, can solve the problems of weakening the influence of global information on the model contribution, large model scale redundancy, insufficient model prediction efficiency, etc., to achieve training efficiency and prediction. Efficiency improvement, model accuracy improvement, model efficiency improvement effect

Active Publication Date: 2021-03-12
NANJING UNIV OF SCI & TECH
View PDF3 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] However, most prediction models extract a large number of parameters for local information, and the input parameters are huge, thus weakening the influence of global information on the contribution of the model
Although algo

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • DNA binding protein prediction method based on local evolution information
  • DNA binding protein prediction method based on local evolution information
  • DNA binding protein prediction method based on local evolution information

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0049] Example

[0050] like figure 1 As shown, in this embodiment, a DNA-binding protein prediction method based on local evolution information includes the following steps:

[0051] Step 1: Feature Extraction

[0052] Given a protein sequence S, denoted as S 1 S 2 S 3 …S L , where S i (1≤i≤L) is the amino acid (residue) appearing at position i, and L is the length of the protein sequence S. Use PSI-BLAST to obtain protein evolution information PSSM. The PSSM matrix is ​​a matrix of L×20 (L rows and 20 columns), and its format is as follows:

[0053]

[0054] where L is the length of the original protein sequence, p i,j (i=1, 2, 3...L, j=1, 2, 3...20) is the probability score of the amino acid at position i evolving into position j in the protein sequence.

[0055] By dividing the PSSM into k equal PSSM matrices by row, the sub-matrix formula is obtained as:

[0056]

[0057] Among them, d=(λ-1)×U(λ), indicating that the starting sequence position of each sub...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a DNA binding protein prediction method based on local evolution information. The method comprises the specific steps: extracting the evolution information of a protein, segmenting the evolution information into local evolution information, and obtaining a feature vector for prediction; sorting the feature vectors according to contribution degrees of the feature vectors tothe model by using an SVMRFE + CBR feature extraction method, and removing irrelevant features; dividing the feature vectors without the irrelevant features into five parts by adopting a five-fold cross validation method, and inputting the four parts as a training set into an SVM model to train the SVM model; and after processing the protein, inputting the feature vector of the protein into an SVMmodel to obtain a prediction result. In the invention, the characteristics of multiple protein sequences are combined, wherein the local evolution information, original evolution information, amino acid composition and dipeptide information of the protein are combined, so that local and overall information of the protein is all contained, and the precision of a DNA binding protein prediction calculation model is improved.

Description

technical field [0001] The invention belongs to the field of DNA binding protein prediction of bioinformatics, and specifically relates to a DNA binding protein prediction method based on local evolution information. Background technique [0002] Identification of DNA-binding proteins based on sequence information is one of the most challenging problems in the field of genome annotation. DNA-binding proteins play crucial roles in various cellular biological processes, such as gene expression and transcription. However, identification using experimental methods is time-consuming and expensive. In the face of the increasingly large amount of data in the post-genome era, it is extremely important to find a method for quickly and accurately predicting whether a protein is DBP. [0003] In recent years, many prediction methods for DBP have emerged, which can be roughly divided into two categories, structure-based methods and sequence-based methods. Structure-based methods main...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G16B20/00G16B30/00
CPCG16B20/00G16B30/00
Inventor 於东军韩阳
Owner NANJING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products