Protein folding identification method based on deep metric learning

A technology of protein folding and metric learning, applied in neural learning methods, informatics, bioinformatics, etc., can solve problems such as strengthening the identification of protein features, and achieve the effect of shortening the recognition time, improving the recognition speed, and fast recognition speed

Active Publication Date: 2020-12-22
NANJING UNIV OF SCI & TECH
View PDF3 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, these methods have the following problems: how to effectively measure the distance

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Protein folding identification method based on deep metric learning
  • Protein folding identification method based on deep metric learning
  • Protein folding identification method based on deep metric learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] In order to better understand the technical content of the present invention, the present invention will be further described below in conjunction with the accompanying drawings.

[0031] Such as figure 2 As shown, a protein folding recognition method based on deep metric learning, the specific implementation steps are:

[0032] Step 1: Data preprocessing, use one-hot encoding to encode N groups of protein training data respectively, and obtain the digital expression of the protein sequence;

[0033] Step 2: Input the One-hot code of the protein sequence into the SSA protein residue and residue contact map prediction tool, the SSA program used in the present invention is ( https: / / github.com / tbepler / protein-sequence- embedding-iclr2019 ), so as to predict the contact graph between non-standardized protein residues and residues. In the present invention, the output of the upper layer of the SSA model output layer is used as the potential protein residue-residue rela...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a protein folding identification method based on deep metric learning. The method comprises the following steps of encoding protein to obtain digital expression of a protein sequence, inputting the digital expression of the protein sequence into an SSA model to obtain a potential relational graph of protein residues, and fixing the relational graph to a set size, inputtingthe relational graph into a trained convolutional neural network, and obtaining the output of the previous layer of the classification layer as a depth feature, inputting the depth features into a trained twin network to obtain final protein features, and calculating the Euclidean distance between the query protein and the template protein based on the protein characteristics, and allocating the folding type of the template protein closest to the query protein to the query protein. A twin network is used, so that the distance between protein pairs of the same folding type is closer, and the distance between protein pairs of different folding types is farther.

Description

technical field [0001] The invention belongs to the field of bioinformatics prediction of protein structure, in particular to a protein folding recognition method based on deep metric learning. Background technique [0002] With the continuous advancement of genetic engineering, the protein sequence information known to humans has increased exponentially, but little is known about the biological characteristics and structure of proteins. This is because understanding the function and three-dimensional structure of even a single protein is a daunting task. So the best way to understand all these sequences is by searching databases and linking them to other proteins of known function and structure, and improving this algorithm is still one of the great challenges in bioinformatics today. The idea of ​​nuclear template matching for protein folding recognition is based on this. Its goal is to compare a new protein (known sequence) through similarity comparison. The template pro...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G16B15/20G06N3/04G06N3/08
CPCG16B15/20G06N3/08G06N3/045
Inventor 於东军刘岩
Owner NANJING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products