Protein sequence design method of fixed skeleton based on graph network mask node classification

A protein sequence and design method technology, applied in sequence analysis, neural learning methods, biological neural network models, etc., can solve problems such as insufficient structural feature constraints of ProteinSolver, insufficient efficiency of network models, and increased structural constraints

Pending Publication Date: 2022-08-02
SHANGHAI JIAO TONG UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Aiming at the problems of insufficient structural feature constraints of the existing ProteinSolver and insufficient efficiency of the network model, the present invention proposes a fixed-skeleton protein sequence design method based on graph network mask node classification, which adds more structural constraints, and in the protein graph The distance and relative angle features and the dihedral angle features of each amino acid are added to the connected amino acid pairs, and then the neighbor graph is established to realize a more efficient graph network based on Transformer's multi-head attention mechanism and explore the optimal amino acid mask ratio.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Protein sequence design method of fixed skeleton based on graph network mask node classification
  • Protein sequence design method of fixed skeleton based on graph network mask node classification
  • Protein sequence design method of fixed skeleton based on graph network mask node classification

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] like figure 1 As shown, this embodiment relates to a fixed-skeleton protein sequence classification method based on graph network mask node classification, using the CATH 40% non-redundant data set, selecting sequences with a length of 50-500, according to 80:15 The ratio of :5 is divided into training set, validation set and test set. Then use the protein amino acid sequence in C α the coordinates of The distance is the standard to build the nearest neighbor map, i.e. when the two amino acids C α coordinates are less than , the two amino acids are neighbor nodes to each other, and when the number of neighbor nodes of the node is greater than 32, the distance C is selected α The nearest 32 amino acids serve as neighbor nodes. The feature dimension of the node feature of the created graph is 24, and the feature dimension of the edge feature is 4 respectively.

[0026] like figure 1 As shown, the graph network in this embodiment includes an input layer, an encodi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A protein sequence design method of a fixed skeleton based on graph network mask node classification comprises the following steps: in an offline stage, training a graph neural network by constructing a protein nearest neighbor graph of a sample as a training set; in the online stage, the nearest graph of the protein structure of the deleted sequence is processed through the trained graph neural network, the category probability of the deleted amino acid is obtained, and then the category probability is sampled to obtain the predicted deleted amino acid sequence. According to the method, more structural constraints are added, distance and relative angle features and dihedral angle features of each amino acid are added to connected amino acid pairs in a protein map, and then a neighbor map is established, so that a more efficient map network based on a Transform multiple attention mechanism is realized, and the optimal amino acid mask proportion is explored.

Description

technical field [0001] The invention relates to a technology in the field of bioengineering, in particular to a method for designing protein sequences with fixed skeletons based on graph network mask node classification. Background technique [0002] Protein design is an important method to study the relationship between protein sequence and protein structure. It designs novel proteins by designing amino acid sequences with specific structures. At present, the technology of protein design using deep learning generally divides protein design into two steps. First, the protein skeleton is generated according to the required function, and then the protein design of the fixed skeleton is carried out, and a specific and stable three-dimensional structure is strived for. [0003] ProteinSolver (https: / / github.com / ostrokach / proteinsolver) belongs to a fixed-skeleton protein design method, which models protein design as a constraint satisfaction problem (CSP), specifically using the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06N3/04G06N3/08G16B30/00G16B40/00
CPCG06N3/08G16B30/00G16B40/00G06N3/047G06F18/2414G06F18/2415
Inventor 刘炎沈红斌袁野
Owner SHANGHAI JIAO TONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products