Short text query expansion and indexing method based on word vector

A query expansion, short text technology, applied in the field of short text query expansion and retrieval based on word vector, can solve the problems of reducing retrieval accuracy, topic offset, noise, etc., to avoid the number of clusters and the process of iteration, The effect of reducing time complexity and meeting the requirements of clustering
CN104765769AActive Publication Date: 2015-07-08DALIAN UNIV OF TECH

Patent Information

Authority / Receiving Office
CN Β· China
Patent Type
Applications(China)
Current Assignee / Owner
DALIAN UNIV OF TECH
Publication Date
2015-07-08

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention discloses a short text query expansion and indexing method based on a word vector. The short text query expansion and indexing method particularly comprises: A, pretreatment of corpus information of a short text; B, expression of every word in a corpus dictionary by the word vector through a training model; C, query extension; D, obtaining of a text candidate set through a query expansion word set and a BM25 index model; E, extraction of subject of the short text; F, calculation of the text vector of the short text; G, re-sequencing of the short text returned by a traditional indexing model. The short text query expansion and indexing method can more exactly and effectively satisfy the indexing demand of a user; moreover, the query expansion module can find out words capable of expressing user's intension according to the existing data so as to perform the query expansion.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention relates to the technical fields of data mining and search engines, in particular to a short text query expansion and retrieval method based on word vectors. Background technique

[0002] With the rapid development of computer and Internet, it becomes more and more difficult to accurately obtain information from massive information resources. A large part of the massive information exists in the form of short text, and short text is also an indispensable data form in people's daily life. Short text information mainly includes blog messages, microblog information, short messages, chat records, etc., and is characterized by short message length, flexible language form, huge data scale, strong timeliness, and fast update speed. Traditional search engines are not very accurate in these short text retrievals, and cannot meet people's needs for accurate information acquisition. Therefore, the present invention designs and implements a search en...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More