Unlock instant, AI-driven research and patent intelligence for your innovation.

Paper keyword extraction system and method

An extraction method and extraction system technology, applied in the direction of instruments, electrical digital data processing, calculation, etc., can solve the problems of slow running speed, reduced labor cost, low accuracy rate, etc., to reduce length, increase accuracy, and reduce time consumption Effect

Active Publication Date: 2021-05-07
XIHUA UNIV
View PDF13 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] (1) The unsupervised method does not require a manually generated and maintained vocabulary, nor does it require artificial standard corpus for training, which greatly increases the operating efficiency of the system and reduces labor costs. The TF algorithm is a An unsupervised statistical-based keyword extraction algorithm, used to evaluate the importance of a word in a document set to a document, and can also count the frequency of a word appearing in a document, if a word appears in a document If there are too many times, it means that the word expresses the article more strongly, but the disadvantage of using the unsupervised method is that the accuracy rate is not high;
[0004] (2) The supervised method trains the weight ratio, which can obtain higher precision and more accurate weights, so that the result can reduce the possibility of error, but the disadvantage is that the running speed is slow

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Paper keyword extraction system and method
  • Paper keyword extraction system and method
  • Paper keyword extraction system and method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0048] The present invention will be further described below in conjunction with the accompanying drawings. Embodiments of the present invention include, but are not limited to, the following examples.

[0049] Paper keyword extraction system, including

[0050] The training set contains several papers used for training;

[0051] The word screening module is used to obtain the words in the text part of the paper to form a word training set. Since the keywords will inevitably appear in the text, it is sufficient to form a word training set with the words in the text part of the paper to improve the running speed;

[0052] Jieba tokenizer, which is used for word segmentation to cut out all the words in the training set of words that appear in the abstract, text or summary of the paper;

[0053] The cleaning module is used to clean the words extracted by the Jieba tokenizer to obtain key words, and to clean the stop words in the paper, so that words with no practical meaning su...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the field of artificial intelligence, in particular to a paper keyword extraction system which comprises a training set, a Jieba word segmentation device, a cleaning module, a weight calculation model, a frequency calculation model and an output model. The paper keyword extraction method comprises the following steps: S1, acquiring words of a paper body in a training set by using a word screening module to form a word training set; s2, performing word segmentation through a Jieba word segmentation device to obtain words of all word training sets appearing in the abstract, the text or the summary in the paper, and outputting key words through a cleaning module; s3, inputting a result of the step S2 into a weight calculation model for training; s4, inputting a result of the step S2 into a frequency calculation model for training; s5, inputting the output results of the steps S3 and S4 into an output model for training; s6, inputting the target paper into the weight calculation model, the frequency calculation model and the output model to obtain keywords; a mode of combining an unsupervised method and a supervised method is provided to acquire keywords of the papers.

Description

technical field [0001] The invention relates to the field of artificial intelligence, in particular to a paper keyword extraction system and a method thereof. Background technique [0002] A paper is divided into abstract, keywords, table of contents, text, acknowledgments, references and summary. Readers need to quickly find out the keywords of the paper when searching or work needs, although the papers are set with a keyword item, but this keyword is not accurate, and readers need to judge by themselves. Since keywords mainly appear in the abstract, text and summary, the existing technology usually uses the following two methods for extraction: [0003] (1) The unsupervised method does not require a manually generated and maintained vocabulary, nor does it require artificial standard corpus for training, which greatly increases the operating efficiency of the system and reduces labor costs. The TF algorithm is a An unsupervised statistical-based keyword extraction algorit...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/289G06F40/216
CPCG06F40/289G06F40/216Y02D10/00
Inventor 李显勇李齐治杜亚军范永全陈晓亮
Owner XIHUA UNIV