Unlock instant, AI-driven research and patent intelligence for your innovation.

Keyword Automatic Extraction Method Based on Distributed Expression Word Vector Calculation

A technology for automatic extraction and keyword extraction, applied in computing, computer parts, character and pattern recognition, etc., it can solve the problem that there is no good solution for keyword extraction, poor keyword group extraction effect, and imbalanced keyword information labeling, etc. problem, to achieve the effect of excellent extraction accuracy, improved extraction performance, and improved accuracy

Inactive Publication Date: 2019-11-19
SHANGHAI UNIV
View PDF5 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] From the analysis of the research status at home and abroad, the current automatic keyword extraction research technology still has limitations:
[0005] (1) The existing automatic keyword extraction algorithms face many problems such as polysemous words, redundant expressions of synonyms, dynamics of thesaurus updates, and cross-domain content complexity.
[0006] (2) Most automatic keyword extraction algorithms are based on small-scale experimental samples or single documents. There is currently no good solution for keyword extraction for large-scale data sample applications, and it faces the problem of unbalanced keyword information labeling
[0007] (3) Phrases have more generalization ability than single words, and contain richer information. For practical applications, the extraction of keyword phrases is more valuable than single words, but the extraction effect of keyword phrases in current research is not good.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Keyword Automatic Extraction Method Based on Distributed Expression Word Vector Calculation
  • Keyword Automatic Extraction Method Based on Distributed Expression Word Vector Calculation
  • Keyword Automatic Extraction Method Based on Distributed Expression Word Vector Calculation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] Preferred embodiments of the present invention will be further described in detail below in conjunction with the accompanying drawings.

[0032] The data sets in this embodiment are four English papers in different fields of computer science obtained from the IEEE digital library. The following table lists the number of papers, the number of keywords, and the number of words in the word vector word list after training for each data set. In each data set, 50 data are extracted as the test sample set, and the rest are the initial training set, as shown in Table 1.

[0033] Table 1

[0034]

[0035] Among them, Data Mining, Information Extraction, and Recommendation datasets are concentrated in fields, and the corpus is relatively pure.

[0036] In this embodiment, the experiment of the automatic keyword extraction method uses the Word2vec tool of Google to carry out the experiment, uses the C language to implement the program, and runs under the Ubuntu environment. ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention relates to a method for automatically extracting keywords based on the calculation of distributed expression word vectors. The method automatically generates features to better solve the automatic extraction of keywords. The steps are as follows: Step 1, obtaining the original training data set; Step 2 , preprocessing of the training set and test text, including: removing punctuation, numbers, stop words, and part-of-speech filtering; step 3, after obtaining the training set, convert it into a word vector table through language model training; step 4, pass the distance The calculation method is to calculate the distance from the keyword word vector to the text to be tested; Step 5, by different distance calculation methods, respectively obtain the distributed expression word vectors of all keywords in the domain keyword set to the distributed expression words of all words in the test text Arithmetic mean semantic distance of vectors for selection and sorting. This method provides a new idea for keyword extraction, can make full use of the semantic information of the dataset, and significantly improves the accuracy of automatic extraction.

Description

technical field [0001] The invention relates to a method for automatically extracting keywords based on distributed expression word vector calculation, belonging to the field of text mining (Text Mining). Background technique [0002] The continuous development of information technology has led to explosive growth of information in many fields, and a large amount of text information has been digitized. Electronic information resources such as digital libraries, electronic thesis databases, E-books, etc. have brought great convenience to people in collecting, storing and using information, and have become an indispensable part of modern life. With the continuous increase of electronic information, how to quickly and accurately obtain the required information from large-scale text information has become a huge challenge. Keyword extraction is an effective means to solve the above problems. It is one of the core technologies in the field of text mining and plays a very importa...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06K9/62
CPCG06F16/35G06F18/24
Inventor 朱文浩刘懿霆陈洁郭心怡丁庆功缪慧
Owner SHANGHAI UNIV