Unlock instant, AI-driven research and patent intelligence for your innovation.

Compound image molecular structural formula extraction method based on adversarial learning

A technology of molecular structure and extraction method, applied in the direction of neural learning method, neural architecture, character and pattern recognition, etc., can solve the problems of low recognition rate and accuracy, high adaptability and generalization ability, low resolution, etc. , to achieve the effect of improving recognition rate, high adaptive and generalization ability, improving accuracy and robustness

Active Publication Date: 2020-10-30
CHONGQING INST OF GREEN & INTELLIGENT TECH CHINESE ACADEMY OF SCI
View PDF6 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Currently, most publications on molecular data do not provide computer-readable formats of molecular structures, such as Simplified Molecular Input Line Entry System (SMILES), Connection table, etc.
However, the existing extraction methods rely on artificially customized rules and artificially designed recognition features. These recognition rules and features can maintain a high recognition rate when extracting conventional simple compound structures, but the recognition rate and accuracy in practical application scenarios All are relatively low, for example: complex chemical structure patterns in the processed images, different styles of publications, various types of noise, low resolution that cannot meet the needs of recognition, etc.
Artificially designed rules and features are difficult to obtain high adaptability and generalization capabilities. At the same time, various rules and features also have interdependence, such as compound molecular formula segmentation, chemical bonds and chemical symbol features are interdependent, and the chemical bond segmentation effect is not good. Good usually leads to missed identification or wrong identification

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Compound image molecular structural formula extraction method based on adversarial learning
  • Compound image molecular structural formula extraction method based on adversarial learning
  • Compound image molecular structural formula extraction method based on adversarial learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0030] Aiming at extracting molecular structural formulas of compound images from existing journal databases, this embodiment provides a method for extracting molecular structural formulas of compound images based on adversarial learning.

[0031] combine figure 1 , a method for extracting molecular structural formulas from compound images based on adversarial learning, comprising the following steps:

[0032] S1. Build a data set;

[0033] S101, using the molecular formula SMILES codes of 300,000 compounds in the compound image generation tool RDkit database as the input SMILES code database;

[0034] S102, use RDkit to generate 2D compound structure images for all SMILES codes in the database, and perform preprocessing;

[0035] S103, correspond one-to-one with 300,000 SMILES codes and compound images, and form data pair as a data set.

[0036] Further, all molecular structure images of compounds need to be preprocessed, specifically including: grayscale processing, norm...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a compound image molecular structural formula extraction method based on adversarial learning, and belongs to the field of deep learning, image recognition and compound molecular formula extraction, and the method comprises the following steps: S1, constructing a data set of data pairs composed of compound images and SMILES codes; S2, constructing an adversarial network composed of an SMILES code generator and an SMILES code determiner, and carrying out the network weight initialization; S3, alternately training the adversarial network, and performing testing; S4, inputting the compound image of which the molecular structural formula needs to be extracted into an SMILES code generator to generate an SMILES code. According to the method, the adaptivity and generalization performance of compound image feature extraction are improved, the judgment of a compound generation rule is fused, and the recognition rate, precision and robustness of molecular structural formula extraction are improved.

Description

technical field [0001] The invention relates to a method for extracting molecular structural formulas of compound images based on adversarial learning, which belongs to deep learning, image recognition and compound molecular formula extraction, and is especially suitable for extracting molecular structural formulas of compound images. Background technique [0002] In drug research and development, it is often necessary to read a large number of documents, such as articles published in various journals and magazines, patents, etc. These documents contain structural information for many compounds and are often described in the form of pictures. Although this graphical chemical structure is convenient for everyone to browse, it cannot be edited directly. Of course, we can draw in a chemical editor according to the chemical structure in the picture, but this manual extraction method is too time-consuming and labor-intensive, and it is also prone to errors, especially the large ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/32G06K9/62G06N3/04G06N3/08
CPCG06N3/049G06N3/08G06V20/62G06V30/10G06N3/047G06N3/045G06F18/241G06F18/2415Y02D10/00
Inventor 陈琳尚明生朱帆
Owner CHONGQING INST OF GREEN & INTELLIGENT TECH CHINESE ACADEMY OF SCI