Chemical information extraction method and device, equipment and storage medium

A chemical information and chemical technology, applied in the field of chemical information, which can solve the problems of error-prone, difficult to maintain data sets, and time-consuming manual copying of information.

Pending Publication Date: 2021-05-18
GUANGZHOU YINNOVATOR BIOTECH CO LTD
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Copying information manually is time-consuming and error-prone
Additionally, the rapid growth of publications makes it difficult to maintain up-to-date datasets

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chemical information extraction method and device, equipment and storage medium
  • Chemical information extraction method and device, equipment and storage medium
  • Chemical information extraction method and device, equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0067] figure 1It is a flow chart of a chemical information extraction method provided by Embodiment 1 of the present invention. This embodiment is applicable to extracting structured data from chemical industry literature containing chemical information of un / semi-structured data. This method can be implemented by the present invention example provided by the chemical information extraction device, which can be implemented in the form of software and / or hardware, and integrated into the computer equipment provided by the embodiment of the present invention, such as figure 1 As shown, the method specifically includes the following steps:

[0068] S101. Obtain chemical documents.

[0069] Exemplarily, in some embodiments of the present invention, documents and materials related to chemical components and their reactions can be collected from the entire network. The document format may include a word document, an RTF document, an Excel document, an HTML webpage, a PDF document...

Embodiment 2

[0101] figure 2 A schematic structural diagram of a chemical information extraction device provided in Embodiment 2 of the present invention, as shown in figure 2 As shown, the chemical information extraction device includes:

[0102] A chemical document acquisition module 201, configured to acquire chemical documents;

[0103] A separation module 202, configured to separate images and texts from the chemical documents;

[0104] A tag extraction module 203, configured to extract a chemical structure from the image and a tag for annotating the chemical structure;

[0105] A mapping relationship establishment module 204, configured to establish a mapping relationship between the chemical structure and the label to obtain first storage information;

[0106] An association relationship extraction module 205, configured to extract chemical entities and association relationships between chemical entities from the text to obtain second storage information;

[0107] A storage mo...

Embodiment 3

[0130] Embodiment 3 of the present invention provides a computer device, image 3 A schematic structural diagram of a computer device provided by Embodiment 3 of the present invention, such as image 3 As shown, the computer device includes a processor 301, a memory 302, a communication module 303, an input device 304, and an output device 305; the number of processors 301 in the computer device may be one or more, image 3 Take a processor 301 as an example; the processor 301, memory 302, communication module 303, input device 304 and output device 305 in the computer equipment can be connected by bus or other methods, image 3 Take connection via bus as an example. The above-mentioned processor 301, memory 302, communication module 303, input device 304 and output device 305 may be integrated on the control board of the computer equipment.

[0131] The memory 302, as a computer-readable storage medium, can be used to store software programs, computer-executable programs an...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a chemical information extraction method and device, equipment and a storage medium. The method comprises the steps of obtaining a chemical document, separating an image and a text from the chemical document, extracting a chemical structure and a label used for labeling the chemical structure from the image, establishing a mapping relation between the chemical structure and the label to obtain first storage information, extracting chemical entities and an incidence relation between the chemical entities from the text, and obtaining second storage information; and obtaining second storage information, and storing the first storage information and the second storage information in a chemical database. According to the technical scheme, chemical documents can be automatically scanned, structured data can be extracted from unstructured or semi-structured data, data management is facilitated, and great help is provided for scientific research, production and experiments in the chemical industry. In addition, manual operation is not needed, so that the labor cost is saved, input errors are reduced, and the data updating speed is increased.

Description

technical field [0001] Embodiments of the present invention relate to chemical information technology, and in particular to a method, device, equipment and storage medium for extracting chemical information. Background technique [0002] Accurate chemical data management is essential for cheminformatics. Today, researchers or discovery software can access internal or external public databases to retrieve the necessary information, although the main source of knowledge is the scientific literature. However, since the information in the literature is unstructured or semi-structured, it is written in natural language. Chemical structures are embedded as images in reports, journals and patents. These cannot be entered directly into chemical databases or chemical software. Copying information manually is time-consuming and error-prone. Furthermore, the rapid growth of publications makes it difficult to maintain up-to-date datasets. Contents of the invention [0003] The in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/166G06F40/169G06F40/279G06F40/284G06K9/00G06N3/04G06N3/08G06F16/31
CPCG06F40/166G06F40/284G06F40/279G06F40/169G06N3/049G06N3/08G06F16/31G06V30/40
Inventor 钟实张睿哲宋悦飞潘志锋
Owner GUANGZHOU YINNOVATOR BIOTECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products