Method for predicting membrane protein beta-barrel transmembrane area based on sparse coding and chain training

A sparse coding, transmembrane region technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as reducing prediction accuracy, and achieve the effects of improving prediction accuracy, avoiding erroneous judgments, and reducing burrs

Active Publication Date: 2015-05-13
SHANGHAI JIAO TONG UNIV
View PDF4 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, methods for obtaining protein structures are based on statistical information and methods based on physical and chemical properties of membrane proteins Freeman, T. and Wimley, W. (2010) A highly accurate statistical approach for the prediction of transmembrane beta-barrels. Bioinformatics, such Methods based on statistical information and physicochemical properties of membrane proteins are limited to a small number of structurally simple protein types, such as membrane protein structures with a small number of beta-strands. With the rapid development of machine learning methods, such as based on hidden Mark Singh, N. et al. (2011) Tmbhmm: a frequency profile based HMM for predicting the topology of transmembrane beta barrel proteins and the exposure status of transmembrane residues. Biochim. Biophys. Acta BBA Proteins Proteomics, 1814, 664 –670, the prediction accuracy has been improved, but for special lengths such as shorter strands, there is a phenomenon that the false positive rate is too high, and the influence of system noise and many factors that reduce the prediction accuracy in the feature extraction process need to be resolved.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for predicting membrane protein beta-barrel transmembrane area based on sparse coding and chain training
  • Method for predicting membrane protein beta-barrel transmembrane area based on sparse coding and chain training
  • Method for predicting membrane protein beta-barrel transmembrane area based on sparse coding and chain training

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0025] Such as figure 1 As shown, this embodiment includes the following steps:

[0026] 1) After obtaining the position-specific scoring matrix containing evolution information and the Z coordinate value representing amino acid distance information as features, the position-specific scoring matrix (position-specific scoring matrix, PSSM) generated by the PSI-BALST sequence alignment tool will be and the residue distance information Z-score calculated by the Z-pred software were normalized, in which: the PSSM matrix represents the amino acid sequence evolution information, reflecting the conservation of amino acids, which has been proven to be an effective protein feature; and the Z- socre represents the amino acid distance information to calculate the distance of each residue relative to the membrane center, which is also used as an effective protein characteristic for the studied transmembrane structure.

[0027] The normalization formula for eigenvectors is: f...

Embodiment 2

[0051] This embodiment includes the following steps:

[0052] [1] Protein data set: select the data set from the protein database, and divide the training set and test set respectively;

[0053] [2] Feature selection: respectively select the position-specific scoring matrix PSSM representing protein evolution information and Z-score representing the distance information of amino acids relative to the membrane;

[0054] [3] Feature extraction: extract feature vectors through normalization and sliding window methods;

[0055] [4] Sparse coding: the extracted features are calculated by the method of sparse coding, and the calculated sparse coefficients and base vectors are used to represent the original data. After the experimental statistics, the base vectors and the number of iterations are selected to be 128 and 1000 times respectively;

[0056] Support vector machine SVM[5] determines the optimal parameters c and g through 5-fold cross-validation and grid search method, sele...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method for predicting a membrane protein beta-barrel transmembrane area based on sparse coding and chain training and relates to a sparse coding technology, a chain learning algorithm and a support vector machine. Structure prediction is conducted on the membrane protein beta-barrel transmembrane area through a computing method, and important information is provided for the research of the structures and functions of proteins. According to the method, the concept of digital image processing is introduced creatively, sparse coding is conducted on a protein feature matrix, and feature dimensionality reduction and denoising are achieved; a membrane protein beta-barrel data set is organized in a protein database PDB, a position-specific scoring matrix and a Z score are extracted and used as features, the position-specific scoring matrix represents amino acid evolution information, the Z score represents the position information of amino acid residues, a feature vector is extracted through a sliding window, multi-feature fusion is achieved, a chain learning algorithm training model based on a SVM classifier is provided, a predication effect is remarkably improved, and a Jakenife cross validation result shows that the precision can reach 92.5%.

Description

technical field [0001] The invention relates to a technology in the field of membrane protein structure prediction and computational intelligence, specifically a method for predicting the transmembrane region of membrane protein beta-barrel based on sparse coding and chain learning. Background technique [0002] At present, with the rapid development of proteome databases, the number of proteins with known structures is increasing, which plays an important role in the study of protein functions. Membrane proteins are embedded in the biomembrane and run through the phospholipid bilayer. They have strong hydrophobicity and are not suitable for crystallization. The experimental method to solve the protein structure is not only expensive but also time-consuming. Therefore, using calculation methods to predict protein structure is a However, traditional machine learning methods still have some problems to be solved in the field of protein prediction, such as feature selection and...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F19/18
Inventor 沈红斌殷曦
Owner SHANGHAI JIAO TONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products