Method based on support vector machine for on-line prediction of interaction of protein and nucleic acid

A support vector machine and protein technology, applied in the field of bioinformatics, can solve problems such as inability to use, and achieve the effects of high accuracy, short forecast time and low cost

Inactive Publication Date: 2010-01-20
SHANGHAI UNIV
View PDF0 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For predicting proteins interacting with rRNA, RNA, and DNA, the correct rates of 10-fold cross-validation are 84%, 78%, and 72% respec

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method based on support vector machine for on-line prediction of interaction of protein and nucleic acid

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0016] The present invention will be described in further detail below in conjunction with the accompanying drawings.

[0017] Such as figure 1 As shown, the above-mentioned support vector machine-based online prediction method for protein-nucleic acid interaction includes the following steps:

[0018] (1) Establish training samples for protein sequence data sets: collect and construct training samples for protein sequence data sets from the protein database SWISS-PROT on the Internet. The training samples for this protein sequence data set include protein data sets interacting with DNA and interacting with RNA Protein data sets, protein data sets interacting with rRNA, and data sets can be added or updated as needed. The above data sets contain two types: one is the sequence of proteins interacting with DNA, RNA, and rRNA; the other is the sequence of proteins that do not interact with DNA , RNA, rRNA action protein sequence. The specific distribution is as follows in Table...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method based on a support vector machine for the on-line prediction of the interaction of protein and nucleic acid. The method includes the following steps: 1, the establishment of a training sample set of a protein sequence dataset; 2, the conversion of the protein sequence dataset; 3, the training of generated protein feature dataset by the support vector machine; and 4, prediction of the reading and the data conversion of protein sequence and the online prediction of type of the interaction classification of the protein and the nucleic acid. The invention can detect whether the protein acts with the nucleic acid or not under the circumstance that the interaction of the protein and the nucleic acid is not detected; proved by verification results, the accuracy rates of the 10 folded cross validation prediction of the protein which acts with r RNA, RNA and DNA respectively achieve 93.75 percent, 83.41 percent and 81.85 percent; and the accuracy rates of models obtained by verification of an external testing set are respectively 93.8 percent, 84.52 percent and 81.9 percent. During on-line prediction, a user only needs to provide the protein sequence to predict on the interface of a prediction webpage, data of the protein sequence is converted so as to accomplish the training of the support vector machine and the prediction of target types, and the result of prediction is outputted.

Description

technical field [0001] The invention relates to a method for realizing online prediction of protein and nucleic acid (DNA-, RNA-, rRNA-) interaction classification type based on a support vector machine. in the field of bioinformatics. Background technique [0002] Proteins that interact with nucleic acids play extremely important roles in many aspects of gene function. Proteins that interact with DNA play key roles in various processes such as transcription, packaging, rearrangement, and repair. Proteins that interact with RNA control the synthesis process by interacting with various RNAs during protein synthesis. Therefore, proteins that interact with nucleic acids have received extensive interest over the past three decades. Since the Human Genome Project, the number of protein sequences that have been determined has gradually increased, and various protein data resources have expanded rapidly. Determining protein-nucleic acid interactions experimentally would be time...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F19/00G06N1/00G06F19/18
Inventor 袁友浪陆文聪刘亮钮冰彭淳容
Owner SHANGHAI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products