Protein domain detection method and system based on cost-sensitive lstm network
A cost-sensitive, protein-based technology, applied in the biomedical field, can solve problems such as unbalanced sample sets, inability to solve long-range correlations of protein sequences, and dependence on processing power of distance correlations, so as to improve accuracy and reduce false positive results , the effect of improving adaptability
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 2
[0058] In this embodiment, the steps similar to those in Embodiment 1 are adopted, taking the protein sequence T0780 provided by the international protein structure competition CASP11 as an example, the identity between this sequence and the model training set is less than 25%. The protein sequence of T0780 is:
[0059] MKKNSLYIISSLFFACVLFVYATATNFQNSTSARQVKTETYTNTVTNVPIDIRYNSDKYFISGFASEVSVVLTGANRLSLASEMQESTRKFKVTADLTDAGVGTIEVPLSIEDLPNGLTAVATPQKITVKIGKKAQKDKVKIVPEIDPSQIDSRVQIENVMVSDKEVSITSDQETLDRIDKIIAVLPTSERITGNYSGSVPLQAIDRNGVVLPAVITPFDTIMKVTTKPVAPSSSTSNSSTSSSSETSSSTKATSSKTN
[0060] The structural domains of the protein defined in CASP11 are 1-134, 135-259, that is, the domain boundary position is 134.
[0061] The bidirectional LSTM network is set to extract 20-dimensional PSSM, secondary structure of 3-dimensional detection, solubility of 1-dimensional detection, and disordered information of 1-dimensional detection, a total of 25-dimensional features as model input.
[00...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


