Method for automatically identifying document research on the basis of text

An automatic identification and document technology, applied in natural language data processing, special data processing applications, instruments, etc., can solve the problems of high recall rate, poor flexibility, low accuracy rate, etc., to improve the improvement effect and improve the indexing ability. , the effect of the simple method

Inactive Publication Date: 2017-10-24
《中国学术期刊(光盘版)》电子杂志社有限公司
View PDF2 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] A scientific paper is written by the author to summarize and refine the presentation of the research work. A scientific paper generally includes different research elements, such as research background, research object, research process, research method, research conclusion, etc. The research object refers to the main content of the paper. The core subject of the research objective can efficiently and clearly locate the focus of the corresponding article, including attribute examples such as objective things, theories, events, processes, and relationships. The extraction of research objects can display the main research objectives of the paper in an intuitive form. It helps researchers to q

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for automatically identifying document research on the basis of text

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0028] A text-based method for automatically identifying literature research objects, including firstly modeling a small amount of labeled data based on the CRF model, then predicting unlabeled data, and then selecting as few as possible parts of the data from most labeled sets for manual Annotate, then add the annotated results to the original corpus to re-model, and iterate the process appropriately to obtain the final module, and this model can be used to extract the research objects of scientific and technological literature. The specific steps are as follows:

[0029] Step 1: Obtain the title of scientific and technological literature and make initial annotation

[0030] Obtain a large collection of titles S of scientific and technological literature, and extract a small number of titles of scientific and technological literature S1(S 1 The total amount is greater than 2000) and manually mark these titles, mark the research objects mentioned in the corresponding titles, m...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for automatically identifying document research on the basis of a text. The method is characterized in that: firstly, on the basis of a CRF (Conditional Random Field) model, modeling is carried out on a small quantity of labeled data; then, unlabeled data is predicted, and parts of data are selected from a majority of prediction labeling set as little as possible to carry out artificial labeling; then, a labeled result is added into original corpora to carry out modeling again, the process is subjected to proper iteration to obtain a final model, and the model can be used for extracting the research objects of scientific and technical literatures. The method comprises the following steps that: S1: obtaining a scientific and technical literature title, and carrying out primary labeling; S2: carrying out standardization processing on data; S3: carrying out model characterization extraction; S4: training the data; S5: extracting parts of unlabeled data, and carrying out labeling; and S6: estimating model accuracy. Artificial assessment is optimally imported into a machine learning model, the improvement effect of the machine learning model can be effectively improved and artificial labeling cost can be saved as far as possible.

Description

technical field [0001] The invention relates to the technical field of artificial intelligence, in particular to a text-based method for automatically identifying literature research. Background technique [0002] A scientific paper is written by the author to summarize and refine the presentation of the research work. A scientific paper generally includes different research elements, such as research background, research object, research process, research method, research conclusion, etc. The research object refers to the main content of the paper. The core subject of the research objective can efficiently and clearly locate the focus of the corresponding article, including attribute examples such as objective things, theories, events, processes, and relationships. The extraction of research objects can display the main research objectives of the paper in an intuitive form. It helps researchers to quickly grasp the relevant information of this object, and to retrieve and co...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/313G06F40/253
Inventor 贺惠新刘丽娟曹宇
Owner 《中国学术期刊(光盘版)》电子杂志社有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products