Method for automatically identifying document research on the basis of text

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
An automatic identification and document technology, applied in natural language data processing, special data processing applications, instruments, etc., can solve the problems of high recall rate, poor flexibility, low accuracy rate, etc., to improve the improvement effect and improve the indexing ability. , the effect of the simple method

Inactive Publication Date: 2017-10-24

《中国学术期刊(光盘版)》电子杂志社有限公司

View PDF2 Cites 16 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0002] A scientific paper is written by the author to summarize and refine the presentation of the research work. A scientific paper generally includes different research elements, such as research background, research object, research process, research method, research conclusion, etc. The research object refers to the main content of the paper. The core subject of the research objective can efficiently and clearly locate the focus of the corresponding article, including attribute examples such as objective things, theories, events, processes, and relationships. The extraction of research objects can display the main research objectives of the paper in an intuitive form. It helps researchers to quickly grasp the relevant information of this object, and to retrieve and compare relevant research content conveniently. There are many existing processing methods, including rule-based methods. This method has achieved certain results, but due to natural language Due to the diversity of sentence patterns, this method cannot cover all the rules in the extraction of research objects, and there are too many omissions, and it cannot be updated in real time, so the flexibility is poor. However, the extraction process using statistical learning methods often introduces uncertain factors and imports recall. High and low accuracy, therefore, the practicability of pure rules or statistics is very limited

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0028] A text-based method for automatically identifying literature research objects, including firstly modeling a small amount of labeled data based on the CRF model, then predicting unlabeled data, and then selecting as few as possible parts of the data from most labeled sets for manual Annotate, then add the annotated results to the original corpus to re-model, and iterate the process appropriately to obtain the final module, and this model can be used to extract the research objects of scientific and technological literature. The specific steps are as follows:

[0029] Step 1: Obtain the title of scientific and technological literature and make initial annotation

[0030] Obtain a large collection of titles S of scientific and technological literature, and extract a small number of titles of scientific and technological literature S1(S 1 The total amount is greater than 2000) and manually mark these titles, mark the research objects mentioned in the corresponding titles, m...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a method for automatically identifying document research on the basis of a text. The method is characterized in that: firstly, on the basis of a CRF (Conditional Random Field) model, modeling is carried out on a small quantity of labeled data; then, unlabeled data is predicted, and parts of data are selected from a majority of prediction labeling set as little as possible to carry out artificial labeling; then, a labeled result is added into original corpora to carry out modeling again, the process is subjected to proper iteration to obtain a final model, and the model can be used for extracting the research objects of scientific and technical literatures. The method comprises the following steps that: S1: obtaining a scientific and technical literature title, and carrying out primary labeling; S2: carrying out standardization processing on data; S3: carrying out model characterization extraction; S4: training the data; S5: extracting parts of unlabeled data, and carrying out labeling; and S6: estimating model accuracy. Artificial assessment is optimally imported into a machine learning model, the improvement effect of the machine learning model can be effectively improved and artificial labeling cost can be saved as far as possible.

Description

technical field [0001] The invention relates to the technical field of artificial intelligence, in particular to a text-based method for automatically identifying literature research. Background technique [0002] A scientific paper is written by the author to summarize and refine the presentation of the research work. A scientific paper generally includes different research elements, such as research background, research object, research process, research method, research conclusion, etc. The research object refers to the main content of the paper. The core subject of the research objective can efficiently and clearly locate the focus of the corresponding article, including attribute examples such as objective things, theories, events, processes, and relationships. The extraction of research objects can display the main research objectives of the paper in an intuitive form. It helps researchers to quickly grasp the relevant information of this object, and to retrieve and co...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/30G06F17/27

CPCG06F16/313G06F40/253

Inventor贺惠新刘丽娟曹宇

Owner《中国学术期刊(光盘版)》电子杂志社有限公司

Method for automatically identifying document research on the basis of text

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology