Unlock instant, AI-driven research and patent intelligence for your innovation.

Protein domain detection method and system based on cost-sensitive lstm network

A cost-sensitive, protein-based technology, applied in the biomedical field, can solve problems such as unbalanced sample sets, inability to solve long-range correlations of protein sequences, and dependence on processing power of distance correlations, so as to improve accuracy and reduce false positive results , the effect of improving adaptability

Active Publication Date: 2019-03-26
SHANGHAI JIAOTONG UNIV
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this technology cannot solve the long-range correlation of protein sequences. The ability to deal with distance correlation depends on the size of the sliding window, and within the range of the sliding window, the perception between amino acid residues decays exponentially with the increase of distance; there is no Dealing with unbalanced sample sets

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Protein domain detection method and system based on cost-sensitive lstm network
  • Protein domain detection method and system based on cost-sensitive lstm network
  • Protein domain detection method and system based on cost-sensitive lstm network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 2

[0058] In this embodiment, the steps similar to those in Embodiment 1 are adopted, taking the protein sequence T0780 provided by the international protein structure competition CASP11 as an example, the identity between this sequence and the model training set is less than 25%. The protein sequence of T0780 is:

[0059] MKKNSLYIISSLFFACVLFVYATATNFQNSTSARQVKTETYTNTVTNVPIDIRYNSDKYFISGFASEVSVVLTGANRLSLASEMQESTRKFKVTADLTDAGVGTIEVPLSIEDLPNGLTAVATPQKITVKIGKKAQKDKVKIVPEIDPSQIDSRVQIENVMVSDKEVSITSDQETLDRIDKIIAVLPTSERITGNYSGSVPLQAIDRNGVVLPAVITPFDTIMKVTTKPVAPSSSTSNSSTSSSSETSSSTKATSSKTN

[0060] The structural domains of the protein defined in CASP11 are 1-134, 135-259, that is, the domain boundary position is 134.

[0061] The bidirectional LSTM network is set to extract 20-dimensional PSSM, secondary structure of 3-dimensional detection, solubility of 1-dimensional detection, and disordered information of 1-dimensional detection, a total of 25-dimensional features as model input.

[00...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a protein domain detection method based on a cost-sensitive LSTM network. The two-way LSTM network is adopted as a protein domain detection model; a cost-sensitive loss function is provided, and different costs are given to error division of positive and negative samples. The invention further provides a method for updating cost weight in the loss function. As for any given initial value, the cost weight can be updated in a self-adaptive mode in the network training process. The smoothing, deletion and merge operations are adopted for performing post-processing on original output of the model. The method has the advantages that the detection model adopts the two-way LSTM network, and effective modeling is performed on the long-range correlation of a protein sequence; the cost-sensitive loss function training model is used for adapting to protein domain dataset unbalanced features; the cost-weight self-adaptation updating method is introduced, and the method is applicable to any dataset; post-processing is performed on the model, and false positive results are reduced.

Description

technical field [0001] The present invention relates to a technique in the field of biomedicine, in particular to a protein domain detection method and system based on a cost-sensitive LSTM network. Background technique [0002] A protein domain is a unit in the tertiary structure of a protein and the basic unit of protein structure and function. Domains are highly conserved protein regions separated by loose border regions in proteins with multiple domains. Prediction of protein domains is an important part of protein structure research and a preparatory step for protein tertiary structure prediction. In recent years, the rapid development of data acquisition technology has produced massive biological data, and it takes a lot of cost and time to use experimental methods to segment protein domains on protein molecules. Therefore, computational methods are needed to predict protein domains to make up for the ability of biological data generation. and the gap between speed o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G16B15/00G16B50/00
CPCG16B15/00G16B40/00
Inventor 沈红斌陈晓
Owner SHANGHAI JIAOTONG UNIV