Unlock instant, AI-driven research and patent intelligence for your innovation.

A Query Word Rewriting Method Fused with Word Vector Model and Naive Bayesian

A technology of query rewriting and query words, which is applied in the field of query rewriting, can solve the problems of weak semantic correlation between query words and rewritten words, and does not consider the connection between query words and search recall results, so as to improve search experience and ensure search accuracy , the effect of expanding recall

Active Publication Date: 2018-02-06
重庆麦吉卡电子有限公司
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The Skip-Gram model of the Hierarchical Softmax algorithm is used for query rewriting. The rewritten words are only calculated from the semantic relevance, without considering the connection between the query word and the context of the search recall results, and there are not many semantically related words. , requires a lot of anticipation to be effectively mined; while the query rewriting method based on Naive Bayes mines rewritten words from the co-occurrence probability between the query word and the context of the search recall result, although the context between , but the semantic correlation between the query term and the rephrased term is weak

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Query Word Rewriting Method Fused with Word Vector Model and Naive Bayesian
  • A Query Word Rewriting Method Fused with Word Vector Model and Naive Bayesian
  • A Query Word Rewriting Method Fused with Word Vector Model and Naive Bayesian

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0017] The present invention will be further explained below in conjunction with the drawings:

[0018] After establishing the word2vec word vector model and combining with the naive Bayes algorithm, the specific implementation steps are as follows:

[0019] Step 1: Establish and train the word2vec word vector model based on the acquired corpus, and calculate the candidate words for query rewriting.

[0020] Using the Skip-gram model based on the Hierarchical Softmax algorithm in word2vec, the input user query words are used to predict the context-related words of the query words according to the model. For example, for each input query word, we can use word2vec to find its 50 correlations word. For example, if the related words of the query word are set to 50, the correlation between these related words and the input query words may be large or small, and some may not even be related. The naive Bayes algorithm is further used to screen related words. The screening criteria can be ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides an inquiring term rewriting method merging a term vector model and naive Bayes and relates to information processing technologies. The method includes the steps that word2vec is used for training the vector model, first N terms most similar to an inquiring term are calculated, and an initial relevant lexicon is formed; then, relevancy calculation and analysis are conducted, candidate terms for inquiring and rewriting the lexicon are filtered, and terms with high relevancy are reserved. The method can effectively improve the accuracy and recall degree of inquiring results and effectively solve the problem that search inquiring is in vain or few valid results are sent back.

Description

Technical field [0001] The invention relates to the technical field of computer information processing, in particular to a query word rewriting method in data mining technology. Background technique [0002] The word vector model is a technology that uses neural networks to map each word in a high-dimensional discrete space (the number of words in the dictionary) into a real number vector in a low-dimensional continuous space (that is, word embedding). In natural language processing tasks, word embedding provides a better semantic-level word distributed feature representation, which brings many conveniences to text processing tasks. The goal of word embedding representation is to learn the vector representation of each word and use this vector representation for different text processing tasks. The learned word vector can be used as a complete word feature input into the supervised learning algorithm of some specific tasks, or it can be used as a beneficial extension that depend...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/2448G06F16/245
Inventor 唐贤伦周家林刘安静周冲彭永嘉朱俊张毅
Owner 重庆麦吉卡电子有限公司