Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Chinese query expansion method based on pattern mining and word vector similarity calculation

A technology of similarity calculation and Chinese query, applied in the field of information retrieval

Inactive Publication Date: 2020-11-06
GUANGXI UNIVERSITY OF FINANCE AND ECONOMICS
View PDF0 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of the present invention is to propose a Chinese query expansion method based on pattern mining and word vector similarity calculation, the method is used in the field of information retrieval, such as actual Chinese search engines and web information retrieval systems, can improve and enhance the information retrieval system Improve query performance, reduce query subject drift and word mismatch problems in information retrieval

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese query expansion method based on pattern mining and word vector similarity calculation
  • Chinese query expansion method based on pattern mining and word vector similarity calculation
  • Chinese query expansion method based on pattern mining and word vector similarity calculation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0073] One, in order to better illustrate the technical scheme of the present invention, the relevant concepts involved in the present invention are introduced as follows below:

[0074] 1. Itemset

[0075] In text mining, a text document is regarded as a transaction, each feature word in the document is called an item, the set of feature word items is called an itemset, and the number of all items in the itemset is called the item set length. k_itemset refers to an itemset containing k items, and k is the length of the itemset.

[0076] 2. Antecedents and Consequences of Association Rules

[0077] Suppose x and y are arbitrary feature word item sets, and the implication of the form x→y is called an association rule, where x is called the antecedent of the rule, and y is called the consequent of the rule.

[0078] 3. Feature word item set support and confidence based on Copulas function

[0079] Copulas theory (see literature: Sklar A.Fonctions de repartitionàn dimensions e...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a Chinese query expansion method based on pattern mining and word vector similarity calculation, which. The method comprises the following steps: firstly, retrieving a Chinese document set through user query to obtain an initial retrieval document, and performing word vector semantic learning training on the initial retrieval document set to obtain a word vector set comprising query word items and non-query word items; then mining extension words for the pseudo-correlation feedback document set by adopting a Copulas-function-based associated extension word mining method,and establishing an associated extension word set; and performing cosine similarity operation of two vectors in the word vector set to obtain a word embedding extension word set and a word vector association extension word set, finally fusing the word embedding extension word set and the word vector association extension word set to obtain a final extension word, combining the final extension words with the original query to form a new query, and retrieving the document set again to realize query extension. According to the method, association mode mining and word vector learning are fused, high-quality extension words can be mined, the information retrieval performance is improved, and the method has good application value and popularization prospects.

Description

technical field [0001] The invention relates to a Chinese query expansion method based on pattern mining and word vector similarity calculation, belonging to the technical field of information retrieval. Background technique [0002] Query expansion refers to modifying the original query weight or adding words related to the original query to make up for the lack of user query information and improve the recall and precision of the information retrieval system. Query expansion is to solve the problem of query topic drift and One of the core techniques for the word mismatch problem. [0003] In the past ten years, with the development of network technology and the advent of the era of big data, how to accurately retrieve the information needed by users from massive big data resources has become the focus of attention of academic and industrial circles at home and abroad, making query expansion Technology has been greatly developed, and some new query expansion methods have b...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/33G06F16/332
CPCG06F16/3325G06F16/3334G06F16/3335G06F16/3338G06F16/334
Inventor 黄名选
Owner GUANGXI UNIVERSITY OF FINANCE AND ECONOMICS
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products