Method and system for automatically extracting scientific and technical literature data based on text mining

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A literature data, automatic extraction technology, applied in electrical digital data processing, natural language data processing, instruments, etc., can solve the problems of slow data accumulation, large time overhead, etc., to improve the accumulation efficiency, strong operability and practicality sexual effect

Pending Publication Date: 2021-12-10

UNIV OF SCI & TECH BEIJING

View PDF0 Cites 1 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] The present invention provides a method and system for automatically extracting scientific and technological literature data based on text mining to solve the technical problems of large time overhead and slow data accumulation in the existing scientific and technological literature data collection methods

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

no. 1 example

[0054] This embodiment provides a method for automatically extracting scientific and technological literature data based on text mining. Through the preprocessing, word segmentation, text classification, and named entity recognition of the literature corpus, the target entity in the scientific and technological literature corpus can be quickly identified; The semantic relationship between target entities forms an entity relationship, which can capture the events in the sentence of the literature corpus, and then automatically extract the key data information in the scientific and technological literature.

[0055] The method for automatically extracting scientific and technical literature data in this embodiment can be implemented by an electronic device, and the electronic device can be a terminal or a server. The execution flow of this method is as follows figure 1 shown, including the following steps:

[0056] S1, obtaining the file of the data to be extracted; wherein, th...

no. 2 example

[0094] This embodiment provides a method for automatically extracting scientific and technological literature data based on text mining. Through the preprocessing, word segmentation, text classification, and named entity recognition of the literature corpus, the target entity in the scientific and technological literature corpus can be quickly identified; The semantic relationship between target entities forms an entity relationship, which can capture the events in the sentence of the literature corpus, and then automatically extract the key data information in the scientific and technological literature.

[0095]Next, take the automatic extraction of superalloy scientific and technological literature data in the field of material science as an example to illustrate the process of the automatic extraction method for scientific and technological literature data in this embodiment, as shown in figure 2 shown, which includes:

[0096] 1) Document acquisition: automatically deter...

no. 3 example

[0117] This embodiment provides a system for automatically extracting scientific and technological literature data based on text mining, including:

[0118] A document acquisition module, configured to acquire a file of data to be extracted; wherein, the format of the file is XML, HTML or plain text;

[0119] The text preprocessing module is used to extract the plain text content in the file of XML format and HTML format, and filters out the publication information and URL information in the plain text content, and utilizes the plain text content after filtering to form the text corpus;

[0120] The target text screening module is used to filter out sentences containing preset information in the text corpus as target sentences; perform table recognition and table analysis on files in XML format and HTML format, convert table information into nested lists for representation and Filter out the form containing preset information as the target form;

[0121] An entity recognition...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a method and a system for automatically extracting scientific and technical literature data based on text mining. The method comprises the following steps of: obtaining a file (XML (Extensible Markup Language), HTML (Hypertext Markup Language) or plain text) of data to be extracted; extracting plain texts in the XML and HTML files and filtering out publication information and URL (Uniform Resource Locator) information in the plain texts to form a text corpus; screening sentences containing preset information in the text corpus as target sentences; carrying out table recognition and table analysis on the XML and HTML files, and screening a table containing preset information as a target table; performing named entity identification on the target sentence and the target table respectively, identifying target entities contained in the target sentence and the target table, and determining a relationship between the target entities; and splicing the mutually associated target entities in the same literature to form a complete structured data set. According to the scheme, the extraction precision is high, and the whole process is automatic and easy to implement.

Description

technical field [0001] The invention relates to the field of computer application technology, in particular to a method and system for automatically extracting scientific and technological literature data based on text mining. Background technique [0002] Artificial intelligence and machine learning techniques have been successfully applied in many fields of natural sciences, such as biology, medicine, chemistry and materials. A large amount of structured data is a prerequisite for the implementation of artificial intelligence and machine learning technologies. Usually, scientists collect data by manually reading published scientific literature, which is time-consuming and slow in data accumulation. Therefore, it is urgent to develop an automatic extraction technology of scientific and technological literature data to realize the automatic extraction of scientific and technological literature data, and provide a new method for the rapid accumulation and acquisition of scie...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F40/295G06F40/247G06F40/242G06F40/151G06K9/00

CPCG06F40/295G06F40/247G06F40/242G06F40/151Y02D10/00

Inventor 宿彦京姜雪王伟仁田少晗谢建新

Owner UNIV OF SCI & TECH BEIJING

Method and system for automatically extracting scientific and technical literature data based on text mining

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

no. 1 example

no. 2 example

no. 3 example

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology