Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for extracting keywords based on K-MEANS and WORD2VEC

A technology of keywords and words, applied in WORD2VEC to extract keywords, based on the field of K-MEANS, can solve problems such as not having the ability to reflect the relevance of words, and achieve the effect of ensuring effectiveness, wide adaptability, and accurate results

Active Publication Date: 2017-09-01
CHENGDU SEFON SOFTWARE CO LTD
View PDF6 Cites 23 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this algorithm does not have the ability to reflect the word correlation degree from the word vector space

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for extracting keywords based on K-MEANS and WORD2VEC

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] The technical solution of the present invention will be further described in detail below in conjunction with specific examples, but the protection scope of the present invention is not limited to the following description.

[0044] A method for extracting keywords based on K-MEANS and WORD2VEC. This method summarizes the global semantics and each branch theme, uses the WORD2VEC algorithm to construct a space vector, and uses the K-means algorithm to cluster the words in the multi-dimensional space. Clustering After evaluating and removing fuzzy words, high-quality keywords are obtained, and by increasing the weight value, the dynamic optimization of the thesaurus is realized, so that keyword extraction has the ability of learning and evolution. like figure 1, a kind of described method of extracting keywords based on K-MEANS, WORD2VEC, it comprises the following steps:

[0045] S1: text preprocessing;

[0046] S2: build space vector;

[0047] S3: Clustering;

[004...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention discloses a method for extracting keywords based on the K-MEANS and the WORD2VEC. The method is characterized in that: the global semantics and the theme of each branch are integrated; a space vector is constructed by using the WORD2VEC algorithm; the K-means algorithm is used to remove the fuzzy words; the centroids distance is calculated; high quality keywords are obtained after the clustering evaluation; and by lifting the weight value, the dynamic optimization of the lexicon is realized, so that the keyword extraction has the learning evolution ability. According to the method disclosed by the present invention, the extracted keywords can reflect the internal classification theme of the document, and each keyword can well reflect the classification; and the method has the characteristics of high quality of the final keywords, wider adaptability, a more accurate result, and the like.

Description

technical field [0001] The present invention relates to a method for extracting keywords, in particular to a method for extracting keywords based on K-MEANS and WORD2VEC. Background technique [0002] Document keywords allow readers to quickly grasp the main content of documents, efficiently control and retrieve documents, and have many applications in search result ranking, text summarization, document classification, document clustering, user portraits, and building document association networks. [0003] Usually, authors in fields such as press releases and academic papers will actively propose keywords for documents, but most known documents do not have keywords. With the increase of data in the information age, people's demand for methods for automatically processing documents and generating keywords is increasing day by day. At present, a large number of methods or devices for automatically processing document generation and extracting keywords have emerged in the indu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G06F17/30G06K9/62
CPCG06F16/35G06F40/284G06F18/23213
Inventor 蓝科王纯斌覃进学潘小东
Owner CHENGDU SEFON SOFTWARE CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products