Unlock instant, AI-driven research and patent intelligence for your innovation.

Amino acid sequence design method taking given protein main chain structure as target

A main chain structure and sequence design technology, applied in the field of protein design, can solve the problems of high degree of freedom of all atoms, difficult to optimize, difficult to verify, etc., to achieve the effect of rapid design and simplified optimization process

Pending Publication Date: 2022-05-03
UNIV OF SCI & TECH OF CHINA
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The model applying the three-dimensional convolutional neural network mainly realizes the redesign of the side chain type and conformation by mapping the atom type and atomic coordinates into the voxel grid, which is mainly limited by its design process, which is highly dependent on the precision that cannot be obtained in the design process. Atomic coordinate information, and the sequence design process of all atoms has a high degree of freedom and is difficult to optimize
Although the autoregressive model has made great improvements in indicators such as the natural sequence recovery rate, the unreasonable upstream design effect of the autoregressive process will be aggregated and propagated to the downstream process when designing a long sequence, so it may be difficult to test in the experiment. verify

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Amino acid sequence design method taking given protein main chain structure as target
  • Amino acid sequence design method taking given protein main chain structure as target
  • Amino acid sequence design method taking given protein main chain structure as target

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0086] Embodiment 1 Encoder-decoder network construction

[0087] The encoder part of the encoder-decoder network of ABACUS-R is a Transformer whose input includes the side chain type and 3D backbone structure information of all other residues that are structurally distant from the central residue. It is emphasized that the side chain type of the central residue and the side chain conformations of neighboring residues are not used as input. The output of the encoder constitutes the desired vector representation, which is decoded into various attributes of the central residue. This encoder-decoder network is trained on a selected set of PDB structures. figure 1 A self-consistent iteration to design a full sequence of given target backbone structures is shown. The method can start from a randomly selected initial sequence, then apply a pretrained encoder-decoder to its local environment (depending on the side chain types of surrounding residues in the current sequence) to one ...

Embodiment 2

[0088] Embodiment 2 The training of encoder-decoder network

[0089]A set of non-redundant PDB structures has been used to train encoder-decoder networks. We split the selected PDB structures into training and testing sets in two different ways, thus learning two sets of network parameters. The first set of network parameters (Model eval ) by using about 95% of the protein structures for training and using the remaining about 5% for testing, the structures used for testing belong to the single-domain topology type (CATH4 .2 classification) obtained. With this selection of test proteins, none of the test structures belong to the same CATH topology as the training structures. Therefore Model eval Can be used for unbiased computational evaluation. The second set of network parameters Model final is learned by randomly splitting protein structures into roughly 95% for training and 5% for testing, disregarding their CATH structural classification. model final Sequences that...

Embodiment 3

[0093] Embodiment 3 Convergence of sequence design iterations

[0094] The present invention applies Model eval and self-consistent iteration to generate data from the Model eval The full sequence of 100 target structure designs obtained in the test set. These target structures cover three main CATH categories. For each target structure, 10 sequences were designed using 10 different runs, each starting from a different random initial sequence. Since the iterative method is actually a greedy algorithm that maximizes the (predicted) probability of the sidechain type, we monitored the evolution of the negative logarithm of the designed sidechain type probability (-logP value) during the iterative run. The average per-residue -logP value decreases and converges to a plateau value. At the same time, the side chain types of most residues converged towards the corresponding types in the final sequence. For all target structures, iterative runs can produce self-consistent sequenc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides an amino acid sequence design method taking a given protein main chain structure as a target. According to the method, only the side chain type of the environmental residue is used as the input of the encoder, only the side chain type needs to be updated in the sequence design process, side chain conformation does not need to be reconstructed and optimized, the complexity of the sequence design problem is remarkably reduced, and therefore the complete sequence can be optimized simply through an iteration method. The success rate of folding the obtained sequence into the target structure is high, and the structural stability is high.

Description

technical field [0001] The invention belongs to the field of protein design, and specifically relates to an amino acid sequence design method targeting a given protein main chain structure, that is, automatically designing all or part of the amino acid sequence of a protein according to a preset target main chain structure. The method uses a pre-trained deep learning neural network encoder to encode the three-dimensional local structure environment of a single central residue into a real-valued vector, and pre-trains an encoder-decoder to decode this vector into a central The side chain type of the residue. The input to this encoder contains the side chain type information of other residues spatially adjacent to the central residue. In sequence design, we start from an arbitrarily set initial sequence, apply the encoder / decoder to different central residues, and update the side of the central residue according to the decoding output of the local environment of the central res...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G16B15/20G16B15/00G16B40/00
CPCG16B15/20G16B15/00G16B40/00
Inventor 刘海燕陈泉李厚强刘宇枫王炜伦
Owner UNIV OF SCI & TECH OF CHINA