Realization of kernel function predicted signal peptide and broken point based on sequence ratio

A technology for sequence alignment and implementation methods, applied in the field of bioengineering, can solve problems such as the low rate of correct judgment of the breakpoint position, lack of physical explanation of the problem, and difficulty in predicting signal peptides.

Inactive Publication Date: 2006-08-16
SHANGHAI JIAO TONG UNIV
View PDF0 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This diversity makes the problem of signal peptide prediction difficult
2) Coding problem: Amino acid sequence is formally expressed as a sequence of letters, and usually needs to be further encoded as a digital attribute for easy processing
3) The problem of correct rate: the correct rate of signal peptide prediction must reach at least 90%, and the prediction of signal peptide break point must reach at least 70% to be meaningful
However, the weighted matrix algorithm cannot get a high accuracy rate for today's data.
In 2000, Nakai K proposed that the neural network can obtain a high signal peptide prediction rate, but the correct judgment rate of the breakpoint position is not very high. In addition, it lacks a clear physical explanation for the problem and is prone to overfitting.
The HMM method has improved in distinguishing signal peptides and signal anchors, but the results are not as good as other classical methods in terms of cut point prediction
[0004] After searching the literature of the prior art, it was found that "Prediction of protein signal sequences" published by Chou in "Proteins: Structure, Function, and Genetics" (Protein: Structure, Function, Genetics) 2001, 42, pp.136-139 and their cleavage sites" ("prediction of signal peptides and their cleavage sites"), and published by Liu et al. "Prediction of protein signal sequences and their cleavage sites by statistical rulers" ("Prediction of signal peptides and their breakpoint positions based on statistical rules"), all use the sliding window to separate the signal peptide sequence to obtain an amino acid sequence of equal length, so as to use the traditional pattern recognition algorithm Forecasting, this algorithm can get a higher signal peptide prediction rate than the neural network, but the correct judgment rate of the breakpoint is still not high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Realization of kernel function predicted signal peptide and broken point based on sequence ratio
  • Realization of kernel function predicted signal peptide and broken point based on sequence ratio
  • Realization of kernel function predicted signal peptide and broken point based on sequence ratio

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] The technical solution of the present invention will be further described in detail below in conjunction with specific embodiments.

[0028]The database utilized in the present invention adopts Nielsen (Nielsen, H., Engelbrecht, J., Brunak S., and von Heijne, G. (1997): "Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites" "Protein Eng .”, 1997, 10, pp.1-6). The present invention predicts the Human database, E.coli database, Gram-database and Gram+ database, and the number of signal peptide sequences and non-signal peptide sequences contained in each set of data are 416 and 251, 105 and 119, 266 and 186, 141 respectively and 64. Each amino acid sequence data includes sequence category information, sequence amino acid arrangement and breakpoint position.

[0029] The whole system implementation process is as follows:

[0030] 1. Digitization of attributes.

[0031] Each group of data is processed separately, and the E.c...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The realization method is part of the biology project technology field, based on sequence alignment nuclear function to forecast signal peptide and its breakpoint position. The invention adopt full sequence alignment to settle the problem that the amino acid sequence lengths are disagreement, and count the relativity between two amino acid sequences to reflect the similarity of both. The new coordinate is made by space transform, after the similarity matrix transform non-negative fix matrix; it settle the problem how make a matrix of dissatisfied non-negative fix condition into a nuclear matrix. For a new amino acid sequence, at its new characteristic space, the said method can forecast whether it is a signal peptide and estimate its breakpoint position. The said invention increase the correctness of forecast signal peptide and forecast breakpoint, so it has a great effect to understand the cause of some illness and is helpful to explore effective cure precept.

Description

technical field [0001] The present invention relates to a method in the technical field of bioengineering, in particular to a method for realizing the prediction of signal peptides and their breakpoint positions based on sequence alignment kernel functions. Background technique [0002] At present, the study of signal peptide has become a hot spot in the field of bioinformatics. Signal peptides play an important role in controlling the secretion pathway of proteins and directing proteins to specific locations, so they have become a key tool in the field of new drugs for gene therapy. However, with the sharp increase of signal peptide sequences entering the database, identifying signal peptides simply by experiment requires high funds and a lot of time. Therefore, it is inevitable to develop algorithms in the field of pattern recognition and machine learning to automatically identify signal peptides in newly synthesized proteins. Algorithms based on pattern recognition and ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F19/00G06F19/22
Inventor 刘惠刘丹青姚莉秀杨杰
Owner SHANGHAI JIAO TONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products