Keyword extracting method based on Word2Vec and Query log

A keyword and vocabulary technology, applied in the field of information processing, can solve the problems of high computational complexity, limited scope of application, low efficiency of keyword extraction, etc., and achieve the effect of improving search effect and high quality

Active Publication Date: 2015-07-15
CHEZHI HULIAN BEIJING SCI & TECH CO LTD
View PDF4 Cites 48 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Because of the high computational complexity, this algorithm is rarely used in large-scale text keyword extraction.
[0008] In summary, the traditiona

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Keyword extracting method based on Word2Vec and Query log
  • Keyword extracting method based on Word2Vec and Query log
  • Keyword extracting method based on Word2Vec and Query log

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0049] Example

[0050] Reference figure 1 The method for extracting keywords based on Word2Vec and Query log described in this embodiment includes the following steps:

[0051] S1, use query log data to construct a specific vocabulary in the target field;

[0052] S2, on the basis of the document collection and the specific vocabulary, obtain candidate keywords for each document in the document collection;

[0053] S3, training to obtain a Word2Vec model of the target field, and substituting candidate keywords in each document into the model to obtain several-dimensional word vectors of each candidate keyword;

[0054] S4: Calculate the cosine similarity between the word vector corresponding to any candidate keyword L in any document A and the center vector of the document A, and determine whether the candidate keyword L appears in a specific vocabulary, and if it appears, enter directly S5; if it does not appear, go to S6;

[0055] S5: Multiply the cosine similarity of the candidate k...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a keyword extracting method based on Word2Vec and a Query log, and relates to the field of information processing. The method includes the steps of S1, building a specific word list of a target field; S2, obtaining candidate keywords of documents in a document set; S3, obtaining word vectors of a plurality of dimensions of each candidate keyword; S4, calculating the cosine similarities between the word vectors of any candidate keyword L and a center vector, judging whether the candidate keyword L exists in the specific word list or not, if the candidate keyword L exists in the specific word list, directly implementing the step S5, and if the candidate keyword L does not exist in the specific word list, directly implementing the step S6; S5, multiplying the obtained cosine similarities by a weighting factor i to obtain new cosine similarities, and implementing the step S6; S6, ranking the values of the cosine similarities from large to small, outputting the values of m cosine similarities from the cosine similarity with the largest value, and obtaining final keywords. By means of the keyword extracting method, the keywords with the ideal quality can be rapidly and efficiently extracted for texts in specific fields, oral words are prevented from being introduced, and the extracted keywords are high in quality.

Description

technical field [0001] The invention relates to the field of information processing, in particular to a method for extracting keywords based on Word2Vec and Query log. Background technique [0002] Through document keywords, people can quickly understand the content of the text and grasp the theme of the document. Keywords are widely used in news reports, scientific papers and other fields to facilitate people to efficiently manage and retrieve documents. In addition to helping people quickly filter content of interest, document keywords can also be used in upper-level application areas such as search result ranking, text summarization, document classification, document clustering, and user modeling. [0003] Traditional keyword extraction methods are divided into two types, namely unsupervised methods and supervised methods. Among them, the unsupervised methods include TFIDF, Chi-squared, Text Rank, LDA and other methods, while the supervised method converts the keyword e...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27G06F17/30
Inventor 张平
Owner CHEZHI HULIAN BEIJING SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products