Text retrieval method based on word vector learning and pattern mining fusion extension

A pattern mining and word vector technology, applied in the field of information retrieval

Inactive Publication Date: 2020-11-06
GUANGXI UNIVERSITY OF FINANCE AND ECONOMICS
View PDF13 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to propose a text retrieval method based on the fusion and expansion of word vector learning and pattern mining, and the method is used in the field of information retrieval, such as actual Chinese search engines and web information retrieval systems, which can improve and enhance the performance of information retrieval systems Query performance, reducing query topic drift and word mismatch problems in information retrieval

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text retrieval method based on word vector learning and pattern mining fusion extension
  • Text retrieval method based on word vector learning and pattern mining fusion extension
  • Text retrieval method based on word vector learning and pattern mining fusion extension

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0074] One, in order to better illustrate the technical scheme of the present invention, the relevant concepts involved in the present invention are introduced as follows below:

[0075] 1. Itemset

[0076] In text mining, a text document is regarded as a transaction, each feature word in the document is called an item, the set of feature word items is called an itemset, and the number of all items in the itemset is called the item set length. k_itemset refers to an itemset containing k items, and k is the length of the itemset.

[0077] 2. Antecedents and Consequences of Association Rules

[0078] Suppose x and y are arbitrary feature word item sets, and the implication of the form x→y is called an association rule, where x is called the antecedent of the rule, and y is called the consequent of the rule.

[0079] 3. Feature word item set support and confidence based on Copulas function

[0080] Copulas theory (see literature: Sklar A.Fonctions de repartitionàn dimensions e...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a text retrieval method based on word vector learning and pattern mining fusion extension. The method comprises the following steps: querying and retrieving a Chinese document set by a user to obtain an initial retrieval document, and performing word embedding semantic learning training on the initial retrieval document to obtain an initial retrieval document word vector set; constructing a pseudo-correlation feedback document set, mining extension words by adopting a rule posterior extension word mining method based on a Copulas function, establishing a rule consequentextension word set, calculating vector cosine similarity between the rule posterior extension words and an original query, and extracting a word vector rule posterior extension word set; calculating the vector cosine similarity between the non-query lexical items and the original query, and extracting a word vector extension word set; and fusing the word vector extension word set and the word vector rule consequent extension word set union set to obtain a final extension word, and combining the final extension word and the original query into a new query to realize query extension. According to the method, query expansion is realized by adopting a mechanism of two times of retrieval and two times of word vector similarity calculation, and the text information retrieval performance is wellimproved.

Description

technical field [0001] The invention relates to a text retrieval method based on fusion and extension of word vector learning and pattern mining, and belongs to the technical field of information retrieval. Background technique [0002] In the current field of information retrieval, there are still problems of query subject drift and word mismatch, which lead to the degradation of information query performance and affect users' access to required information resources. Using query expansion technology in information retrieval can solve the above problems. Query expansion refers to modifying the weight of the original query, or adding other feature words related to the semantics of the original query, to make up for the lack of semantic information caused by the original query being too simple, and to improve The purpose of information retrieval performance. In the past ten years, scholars have researched information retrieval methods based on query expansion from different ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/33G06F16/332
CPCG06F16/3338G06F16/3334G06F16/334G06F16/3335G06F16/3325
Inventor 黄名选
Owner GUANGXI UNIVERSITY OF FINANCE AND ECONOMICS
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products