Text retrieval method based on association rule and word vector fusion extension

A technology of word vectors and rules, applied in unstructured text data retrieval, text database query, digital data information retrieval, etc.

Inactive Publication Date: 2020-11-06
GUANGXI UNIVERSITY OF FINANCE AND ECONOMICS
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In the past ten years, scholars have researched information retrieval methods based on query expansion from different perspectives, and some effective information retrieval methods have been produced. For example, an information retrieval method based on query expansion and classification proposed by Yue Wen et al. (see Literature: Yue Wen, Chen Zhiping, Lin Yaping. Information Retrieval Algorithms Based on Query Expansion and Classification [J]. Srivastava N.Query Expansion Strategy based on Pseudo Relevance Feedback and Term Weight Scheme for Monolingual Retrieval[J].International Journal of Computer Applications,2015,105(8):1-6.) proposes an information retrieval method based on pseudo-relevance feedback expansion, etc. These methods have verified the effectiveness of the retrieval method through experiments, but they have not finally completely solved the technical problems such as query subject drift and word mismatch in information retrieval.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text retrieval method based on association rule and word vector fusion extension
  • Text retrieval method based on association rule and word vector fusion extension
  • Text retrieval method based on association rule and word vector fusion extension

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0057] 1. In order to better illustrate the technical scheme of the present invention, the related concepts involved in the present invention are introduced as follows:

[0058] 1. Itemset

[0059] In text mining, a text document is regarded as a transaction, each feature word in the document is called an item, the collection of feature word items is called an itemset, and the number of all items in an itemset is called the itemset length. k_itemsets refer to itemsets containing k items, where k is the length of the itemsets.

[0060] 2. Antecedents and Consequences of Association Rules

[0061] Let x and y be an arbitrary set of feature terms, and the implication in the form of x→y is called an association rule, where x is called the antecedent of the rule, and y is called the consequent of the rule.

[0062] 3. Support based on Copulas function

[0063] Copulas based Support is represented as Cop_Sup().

[0064] Feature word itemsets based on Copulas function (T 1 ∪T 2...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a text retrieval method based on association rule and word vector fusion extension. The method comprises the following steps of: constructing an initial checking document set byusing initial checking documents obtained by inquiring and retrieving an original Chinese document set by a user ; performing word vector semantic learning training on the initial document set by using a deep learning tool to obtain a feature word vector set; extracting front m documents from the initial detection document set to serve as a pseudo-correlation feedback document set; mining candidate extension words for the pseudo-correlation feedback document set by adopting a support degree and a confidence degree based on a Copulas function; establishing a candidate extension word set, finally calculating the vector cosine similarity between the candidate extension words and the original query, extracting the final extension words, combining the final extension words with the original query to form a new query, and retrieving the original document set again to obtain a final retrieval result. Experimental results show that the retrieval performance of the method is superior to that of an existing method, the problems of query topic drifting and word mismatching can be effectively solved, the information retrieval performance is improved, and the method has good application valueand popularization prospects.

Description

technical field [0001] The invention relates to a text retrieval method based on the fusion and expansion of association rules and word vectors, and belongs to the technical field of information retrieval. Background technique [0002] With the development of network technology and the rapid growth of digital resources, how network users can quickly and accurately find the information resources they need, and how to reduce query subject drift and word mismatch to meet the information needs of network users is an important issue that needs to be solved urgently in the field of information retrieval. question. The use of query expansion technology in information retrieval can solve the above problems. Query expansion refers to transforming the original query weight, or adding other feature words related to the original query semantics to make up for the lack of semantic information caused by the original query being too simple, so as to improve the Purpose of Information Retr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/33G06F16/332
CPCG06F16/3325G06F16/3334G06F16/3335G06F16/3338G06F16/334
Inventor 黄名选
Owner GUANGXI UNIVERSITY OF FINANCE AND ECONOMICS
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products