Unlock instant, AI-driven research and patent intelligence for your innovation.

A Molecular Design Approach Based on Small-Scale Datasets and Generative Models

A technology for generative models and molecular design, which is applied in molecular design, computational models, calculations, etc., can solve problems such as the lack of data to train production models, insufficient data sets, separation of generative models and scoring models, etc., to achieve improved results And efficiency, reduce error, reduce the effect of overfitting

Active Publication Date: 2022-06-03
SICHUAN UNIV
View PDF12 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The purpose of the present invention is to provide a molecular design method based on small-scale data sets and generative models, which is used to solve the problem that the lack of data sets in the prior art limits the application of molecular design methods based on deep learning in a wider and more detailed material field. Application, the amount of data is very small, it is impossible to train an effective production model, and there is a problem of separation between the generation model and the scoring model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Molecular Design Approach Based on Small-Scale Datasets and Generative Models
  • A Molecular Design Approach Based on Small-Scale Datasets and Generative Models
  • A Molecular Design Approach Based on Small-Scale Datasets and Generative Models

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0059] combined with figure 1 As shown, a molecular design method based on a small-scale data set and a generative model, characterized in that it includes:

[0060] Step S100, based on the initial data set D o Build an extended dataset D a , including the initial data set D o Split all the molecules of the molecular fragments into molecular fragments, and gather all the non-repeating molecular fragments to obtain the molecular fragment set, randomly combine the molecular fragments in the molecular fragment set to obtain the molecular structure, and select from them that pass the rationality verification and do not appear in the initial data set D o The Molecular Structure Molecule Extension Dataset D a ;

[0061] Step S200, initialize the generative model, use the extended data set D a Train generative models to tune model parameters;

[0062] Step S300, initialize the scoring model, and introduce the information of the trained generation model into the scoring model, u...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a molecular design method based on a small-scale data set and a generative model, based on an initial data set D o Build an extended dataset D a ; using the extended dataset D a Train the generative model to adjust the model parameters; introduce the information of the trained generative model to the scoring model, using the initial data set D o Train the scoring model to adjust the scoring model parameters to obtain the optimized scoring model; use the initial data set D o Adjust the parameters of the trained generation model to obtain the final generation model; use the final generation model to generate new molecular structures; use the optimized scoring model to evaluate and screen new molecular structures to obtain candidate molecules. The large data set is constructed from the initial data set without reference to additional data and does not contain symbols that do not exist in the large data set. Compared with the method of using predefined atoms or fragments to form molecules, the molecules generated by this method will have Better properties, more natural, easier to synthesize.

Description

technical field [0001] The invention relates to the technical field of computer-aided molecular design, in particular to a molecular design method based on small-scale data sets and generation models. Background technique [0002] The goal of molecular design of materials or drugs is to identify molecules with desirable properties. The molecular design task consists of two core steps: the first step, creating the molecule; and the second step, scoring and filtering the created molecule. The first step is to create a rational, efficient and novel molecular structure. In the past, there were mainly two strategies: (1) using pre-defined atoms or fragments to form molecules, the disadvantage of which is that the resulting molecules are often difficult to synthesize; (2) based on expert coding The disadvantage of regular virtual chemical reactions is that they can fail and unnecessarily limit the chemical space. [0003] In recent years, models using deep learning for molecular ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G16C20/50G06K9/62G06N20/00
CPCG16C20/50G06N20/00G06F18/214
Inventor 李川曾严蒲雪梅刘江亭
Owner SICHUAN UNIV