Protein and nucleic acid binding site prediction method based on graph neural network characterization

A combination site and neural network technology, applied in the field of bioengineering, can solve the problem of low recognition accuracy of protein nucleic acid binding sites, and achieve the effect of improving prediction accuracy and good generalization performance

Pending Publication Date: 2022-07-19
SHANGHAI JIAO TONG UNIV
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Aiming at the problem that the existing recognition accuracy of protein-nucleic acid binding sites is not high, the present invention proposes a protein-nucleic acid binding site prediction method based on graph neural network representation, through the graph representation and hierarchical graph of residues based on structural context A neural network model to learn key structures and characteristic patterns of binding sites from graph representations

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Protein and nucleic acid binding site prediction method based on graph neural network characterization
  • Protein and nucleic acid binding site prediction method based on graph neural network characterization
  • Protein and nucleic acid binding site prediction method based on graph neural network characterization

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0042] In this example, the protein and DNA\RNA binding site data set is used as the training set and test set of this example. For the DNA binding data set, the training set contains 573 proteins. After data enhancement, 14479 binding residues are obtained. , 145404 non-binding residues, the test set contains 129 proteins, including 2249 binding residues and 35275 non-binding residues; for the RNA binding data set, the training set contains 495 proteins, after data enhancement, 14609 binding residues were obtained Residues, 122290 non-binding residues, the test set contains 117 proteins, including 2031 binding residues and 35314 non-binding residues. The data augmentation process is to calculate the sequence and structural similarity of proteins and group proteins with sequence similarity greater than 0.8 and TM-score greater than 0.5 into a cluster, and migrate the binding site tags of proteins in each cluster into this cluster on the protein with the most residues.

[0043...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A protein and nucleic acid binding site prediction method based on graph neural network characterization comprises the following steps: constructing a protein and nucleic acid interaction data set, extracting the position and feature information of each residue in the protein and the structural context thereof after sample fusion processing, and constructing graph representation of the structural context of the residues according to the graph representation. And predicting the graph representation of the to-be-predicted protein through the hierarchical graph neural network to obtain the probability that each residue is combined with DNA/RNA, thereby realizing prediction of the binding site of the protein and the nucleic acid. Key structures and feature patterns of binding sites are learned from graph representations through graph representations based on residues of structural context and a hierarchical graph neural network model.

Description

technical field [0001] The present invention relates to a technology in the field of bioengineering, in particular to a method for predicting binding sites between proteins and nucleic acids based on graph neural network representations of protein local structure contexts. Background technique [0002] Protein-nucleic acid interactions play an important role in a variety of life activities, such as DNA replication, transcription, translation, gene expression, signal transduction and recognition, etc. Learning protein-nucleic acid interactions is important for analyzing genes, protein functions and drugs There are significant implications in terms of design. Due to the disadvantages of expensive and time-consuming analysis of protein-nucleic acid interactions by experimental methods, which cannot meet the current needs of massive protein analysis, computational-based methods have become more and more important. Current computational-based methods can be divided into protein ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G16B20/30
CPCG16B20/30
Inventor 夏莹沈红斌潘小勇夏春秋
Owner SHANGHAI JIAO TONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products