Unlock instant, AI-driven research and patent intelligence for your innovation.

A Protein Subcellular Interval Prediction Method Using Bag-of-Words Model

A technology of subcellular interval and bag of words model, applied in the field of biology, can solve the problem of low accuracy, and achieve the effect of improving the recognition accuracy and improving the recognition accuracy.

Inactive Publication Date: 2018-01-26
JIANGNAN UNIV +1
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Summarizing the previous research results, it is not difficult to find that the accuracy rate of simply using traditional protein sequence feature extraction algorithms, such as AAC, for feature extraction and sending it to the classifier for location prediction is low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Protein Subcellular Interval Prediction Method Using Bag-of-Words Model
  • A Protein Subcellular Interval Prediction Method Using Bag-of-Words Model
  • A Protein Subcellular Interval Prediction Method Using Bag-of-Words Model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0049] The present invention will be further described below in conjunction with specific examples.

[0050] Taking the dataset of 317 apoptotic protein sequences obtained from the SWISS-PROT database as an example, the bag-of-words model and the AAC algorithm are used to extract the bag-of-words features of the protein sequence, and sent to the support vector machine multi-class classifier for positioning predict. figure 1 is a schematic diagram of the word bag feature extraction process, such as figure 1 As shown, the specific steps are as follows. In the formula involved in the present invention, the symbol Λ represents the omitted item in the sequence.

[0051] 1. After obtaining the data set from the original database, first use the sliding window method to segment all protein sequences in the data set to generate several sequence words, and then extract the features of all sequence words. The specific steps are as follows:

[0052] First, the protein sequence is segment...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for predicting protein subcellular intervals using a bag-of-words model, using a sliding window method to segment protein sequences, obtaining a collection of a large number of sequence words, using amino acid composition to obtain sequence word features, and performing cluster analysis and construction on sequence word features dictionary, and obtain the bag-of-words features of protein sequences through statistical calculations, and finally send the bag-of-words features to the support vector machine multi-class classifier for protein subcellular interval prediction. It can be proved by experiments that the present invention can effectively improve the recognition accuracy, especially in the subcellular class whose prediction accuracy rate is low by traditional methods, and the recognition accuracy is significantly improved, which plays an important role in accurately predicting the subcellular location of unknown proteins.

Description

technical field [0001] The invention relates to the field of biology, in particular to a method for predicting protein subcellular intervals using a bag-of-words model. Background technique [0002] Human research on life sciences has undergone tremendous changes due to the vigorous development of computer technology. Since entering the post-genome era, humans have obtained large-scale nucleic acid and protein sequence data. Effective information has become an inevitable trend. In previous studies, scholars at home and abroad mainly used mathematical methods to describe the extracted protein sequence feature information, expressed protein sequences with high-dimensional feature vectors, and then designed and used efficient classifiers for predictive analysis. [0003] At present, the algorithms used for protein sequence feature extraction mainly include: amino acid composition (AAC), physicochemical properties of amino acids, dipeptide and polypeptide composition, pseudo am...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F19/24
Inventor 张梁薛卫赵南
Owner JIANGNAN UNIV