A Short Text Query Expansion and Retrieval Method Based on Word Vector

A query expansion, short text technology, applied in the field of short text query expansion and retrieval based on word vector, can solve the problems of reducing retrieval accuracy, topic offset, noise, etc., to reduce time complexity, improve richness, The effect of meeting the requirements of clustering

Active Publication Date: 2018-04-27
DALIAN UNIV OF TECH
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The disadvantage of this type of method is: when the user gives a search term, the search engine can only return documents containing the search term, but cannot return other documents that are semantically related but expressed in different words
The disadvantage of this type of method is: when the user gives a search term, the search engine will introduce a lot of noise information, although the recall rate of the retrieval system is improved to a certain extent, but it also introduces a large amount of irrelevant text, reducing the search accuracy
[0016] These methods only enrich the representation of query words semantically, but they do not attempt to understand the user's query intent, but find words that are similar to each word for query expansion, which can easily lead to problems such as topic deviation and introduction of noise.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Short Text Query Expansion and Retrieval Method Based on Word Vector
  • A Short Text Query Expansion and Retrieval Method Based on Word Vector
  • A Short Text Query Expansion and Retrieval Method Based on Word Vector

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0068] In order to illustrate the working process of this system in detail, the specific process of this system is introduced below in conjunction with specific examples.

[0069] A. Short text corpus information preprocessing

[0070] For short texts and forwarded texts less than 20 characters, delete them directly. Segment the remaining text in the corpus. Get a corpus dictionary, record the number of occurrences of each word, and remove words that appear too infrequently. Create an inverted index for the remaining short text.

[0071] B. The training model represents each word in the corpus dictionary with a word vector

[0072] Such as figure 2 As shown, each word is encoded and classified, and according to its context information, the logistic regression model is used for classification training, so as to obtain the vector representation of each word.

[0073] For the convenience of illustration, assume that the input data X = [0.2, -0.1, 0.3, -0.2] T , training to...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A short text query expansion and retrieval method based on word vectors, specifically including: A, short text corpus information preprocessing; B, training model to represent each word in the corpus dictionary with word vectors; C, query expansion; D .Using the query expansion word set and BM25 retrieval model to obtain text candidate sets; E. Topic extraction of short texts; F. Calculation of text vectors of short texts; G. Reordering of short texts returned by traditional retrieval models. The invention can more accurately and effectively meet the user's retrieval requirements, and the query expansion module can find words that can express the user's intention according to the existing data to perform query expansion.

Description

technical field [0001] The invention relates to the technical fields of data mining and search engines, in particular to a short text query expansion and retrieval method based on word vectors. Background technique [0002] With the rapid development of computer and Internet, it becomes more and more difficult to accurately obtain information from massive information resources. A large part of the massive information exists in the form of short text, and short text is also an indispensable data form in people's daily life. Short text information mainly includes blog messages, microblog information, short messages, chat records, etc., and is characterized by short message length, flexible language form, huge data scale, strong timeliness, and fast update speed. Traditional search engines are not very accurate in these short text retrievals, and cannot meet people's needs for accurate information acquisition. Therefore, the present invention designs and implements a search en...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
Inventor 林鸿飞王琳
Owner DALIAN UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products