Inquiring term rewriting method merging term vector model and naive Bayes

A technology of query rewriting and query words, which is applied in the field of query word rewriting, and can solve the problems of not considering the connection between query words and search recall results, weak semantic correlation between query words and rewritten words, etc.

Active Publication Date: 2015-09-23
重庆麦吉卡电子有限公司
View PDF5 Cites 72 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The Skip-Gram model of the Hierarchical Softmax algorithm is used for query rewriting. The rewritten words are only calculated from the semantic relevance, without considering the connection between the query word and the context of the search recall results, and there are not many semantically related words , requires a lo...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Inquiring term rewriting method merging term vector model and naive Bayes
  • Inquiring term rewriting method merging term vector model and naive Bayes
  • Inquiring term rewriting method merging term vector model and naive Bayes

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0017] The present invention will be further described below in conjunction with the accompanying drawings:

[0018] After the word2vec word vector model is established, it is combined with the Naive Bayes algorithm. The specific implementation steps are as follows:

[0019] Step 1: Build and train the word2vec word vector model according to the obtained corpus, and calculate the candidate words for query rewriting.

[0020]Using the Skip-gram model based on the Hierarchical Softmax algorithm in word2vec, the context-related words of the query words are predicted from the input user query words according to the model. For example, for each input query word, we can use word2vec to find its 50 related words. word. For example, if the number of related words of the query word is set to 50, the correlation between these related words and the input query word may be large or small, and some are even irrelevant, and the naive Bayes algorithm is further used to filter related words....

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an inquiring term rewriting method merging a term vector model and naive Bayes and relates to information processing technologies. The method includes the steps that word2vec is used for training the vector model, first N terms most similar to an inquiring term are calculated, and an initial relevant lexicon is formed; then, relevancy calculation and analysis are conducted, candidate terms for inquiring and rewriting the lexicon are filtered, and terms with high relevancy are reserved. The method can effectively improve the accuracy and recall degree of inquiring results and effectively solve the problem that search inquiring is in vain or few valid results are sent back.

Description

technical field [0001] The invention relates to the technical field of computer information processing, in particular to a query word rewriting method in data mining technology. Background technique [0002] The word vector model is a technology that uses a neural network to map each word in a high-dimensional discrete space (the dimension is the number of words in the dictionary) into a real-number vector in a low-dimensional continuous space (ie, word embedding). In natural language processing tasks, word embeddings provide better semantic-level word distributed feature representation, which brings many conveniences to text processing tasks. The goal of word embedding representation is to learn a vector representation of each word and use this vector representation for different text processing tasks. The learned word vectors can either be input into task-specific supervised learning algorithms as complete word features, or can be useful extensions that rely on specific e...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/2448G06F16/245
Inventor 唐贤伦周家林刘安静周冲彭永嘉朱俊张毅
Owner 重庆麦吉卡电子有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products