Self-supervised graph neural network pre-training method based on comparative learning

A neural network and pre-training technology, applied in the field of deep learning, can solve problems such as insufficient generalization performance and achieve good generalization effects

Pending Publication Date: 2022-02-11
JINAN UNIVERSITY
View PDF1 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The present invention provides a self-supervised graph neural network pre-training method based on contrastive learning in order to solve the technical defect of insufficient generalization performance caused by the existing graph neural network technology performing model training on scenes lacking labeled drug molecule data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Self-supervised graph neural network pre-training method based on comparative learning
  • Self-supervised graph neural network pre-training method based on comparative learning
  • Self-supervised graph neural network pre-training method based on comparative learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0050] Such as figure 1 As shown, a self-supervised graph neural network pre-training method based on contrastive learning includes the following steps:

[0051] S1: Preprocessing the compound molecules in the public database to screen out organic molecules;

[0052] S2: Decompose and extract the structure of the screened organic molecules, use the obtained substructures as identifiers, and build a corpus of substructures;

[0053] S3: Treat the decomposed substructure as a super node and construct the corresponding subgraph data. The subgraph data and the original molecular graph data form a positive sample pair, and ten subgraph data are randomly selected to form a negative sample pair with the original molecular graph data. ;

[0054] S4: Construct a graph convolutional neural network based on an attention mechanism, a multi-level gated recurrent unit and a multi-layer perceptron module for the transformation of full-image features to form a self-supervised learning model...

Embodiment 2

[0080] Such as image 3 As shown, the flow of molecules to obtain substructure subgraphs through substructure decomposition is as follows image 3 shown.

[0081] In this example, benzoic acid is used as an example to briefly illustrate the steps of substructure decomposition. Benzoic acid in the ZINC database is represented by the SMIELS string (C1=CC=CC=C1C(=O)O), which is converted to the molecular structure format by the Python open source chemical calculation toolkit RdKit. First, use Rdkit to obtain the atomic numbers corresponding to all the rings and functional groups in the molecule as the overall super node and store them in the hash table. Similarly, use all the keys in the molecule as the overall super node and store them in the hash table , and then record the connection relationship between super nodes, and agree that the nodes where more than three super nodes intersect are used as intermediate nodes and stored in the hash table, and the connected edges are ad...

Embodiment 3

[0085] Such as figure 2 As shown, in order to construct the positive and negative sample pairs, design the corresponding self-supervised training tasks, in the preprocessing and substructure decomposition steps, the substructure graph data and the original molecular graph data constitute positive sample pairs, and correspondingly randomly pass through the substructure graph data Collectively select ten sub-graph data and the original molecular graph data to form a negative sample pair, satisfying that the ratio of positive and negative samples is 1:10. Specifically, one original molecular graph data and its corresponding sub-structure graph data constitute a positive sample pair, randomly Select 10 substructure graph data and the original molecular graph data to form 10 negative sample pairs, which are used for the input of the subsequent graph attention convolutional neural network.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a self-supervised graph neural network pre-training method based on comparative learning. The method comprises the steps of carrying out preprocessing on compound molecules of a public database, and screening out organic molecules; performing structural decomposition and extraction on the screened organic molecules, taking the obtained substructures as identifiers, and constructing a corpus of the substructures; taking the decomposed substructures as super nodes, and constructing corresponding subgraph data, wherein the subgraph data and the original molecular graph data form positive sample pairs, and a plurality of subgraph data are randomly selected to form negative sample pairs with the original molecular graph data; constructing a graph convolutional neural network based on an attention mechanism, and forming a self-supervised learning model based on a multi-level gating circulation unit and a multi-layer perceptron module; and inputting all positive and negative sample pair data into the self-supervised learning model for pre-training and storing, thereby facilitating fine adjustment of downstream tasks. The problem of insufficient generalization performance generated by deep learning model training in a scene lacking labeled drug molecules is solved.

Description

technical field [0001] The present invention relates to the field of deep learning, more specifically, to a self-supervised graph neural network pre-training method based on contrastive learning. Background technique [0002] Drug research and development is a multidisciplinary, long-term, and high-input system engineering. It faces problems such as high R&D costs, long cycle times, and high failure rates. Therefore, people begin to use artificial intelligence technology to assist drug R&D. In recent years, graph neural networks, as an emerging technology in deep learning, have shown excellent performance on graph data. Compound molecule is a kind of natural graph data, so it opens up a new path for the study of deep learning in assisting drug development. [0003] Nowadays, the graph neural network based on supervised learning has achieved great success in the past few years. In order to learn a strong expressive ability, it relies on a large amount of manually labeled gra...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G16C20/70G16C20/20G06N3/04G06N3/08
CPCG16C20/70G16C20/20G06N3/088G06N3/084G06N3/045
Inventor 官全龙叶贤斌赖兆荣罗伟其汪超男方良达
Owner JINAN UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products