Novel protein sequence representation method based on gene ontology information

A technology of protein sequence and gene ontology, which is applied in the field of protein sequence representation based on gene ontology information, can solve the problem of low prediction rate of label positioning, achieve the effect of expanding the scope of use, broad application prospects, and improving the success rate of prediction

Active Publication Date: 2017-06-13
上海司默迪医学信息科技有限公司
View PDF1 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] The technical problem to be solved by the present invention is to provide a new protein sequence representation method based on gene ontology information, aiming to fuse other protein GO information into a new vector description of protein P to solve protein subcellular pair label location prediction low rate problem

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Novel protein sequence representation method based on gene ontology information
  • Novel protein sequence representation method based on gene ontology information
  • Novel protein sequence representation method based on gene ontology information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] In order to make the object, technical solution and advantages of the present invention more clear, the present invention will be further described in detail below in conjunction with the examples. It should be understood that the specific examples described here are only used to explain the present invention, and are not intended to limit the present invention. The example here is an algorithm for predicting subcellular multi-labels of animal proteins.

[0021] Using the new gene ontology information-based protein sequence representation method of the present invention, the specific steps are as follows:

[0022] 1) Use the BLAST program to search the Swiss-Prot database to find all similar protein sequences of protein sequence P.

[0023] Protein P can be directly input to the BLAST tool web page of the Swiss-Prot database, its URL is http: / / www.uniprot.org / blast / , BLAST operation parameters are default, and BLAST can also be downloaded from NCBI for local configurati...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a novel protein sequence representation method based on gene ontology information, comprising: using BLAST program to search Swiss-Prot database for all similar protein sequences of protein sequence P, inputting all proteins in a training dataset into GO (gene ontology) database, and searching for GO information of each protein; searching the gene ontology database for targeting gene ontology information of P protein; defining the P protein as discrete vectors of M elements according to M labels that a prediction problem has. The protein GO information in a sequence set is fused into novel P protein vector description, and the dimensionality of the GO method is reduced greatly; by applying the method to protein subcellular multi-label positioning prediction and antibacterial peptide functional multi-label prediction, it is possible to significantly increase the prediction success rate of a related predictor; the novel protein sequence representation method based on gene ontology information has a promising application prospect.

Description

technical field [0001] The invention relates to the technical fields of bioinformatics, protein pseudo-amino acid components and traditional protein sequence analysis, in particular to a new protein sequence representation method based on gene ontology information. Background technique [0002] With the advancement of sequencing technology in the past two decades, bioinformatics has entered the post-genome era. How to analyze hundreds of millions of genome sequences, such as which sub-cells the protein works in, what function it has, what kind of secondary structure, tertiary structure and quaternary structure it has, and how these genes make living organisms active, The answers to a series of questions such as which proteins may be potential drug targets are current research hotspots. [0003] Due to the time-consuming and labor-intensive reasons for the above-mentioned problems using biological experiment techniques, bioinformatics has been greatly developed in recent yea...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F19/16G06F19/18
CPCG16B15/00G16B20/00
Inventor 肖绚程翔
Owner 上海司默迪医学信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products