A Method for Mining Weighted Positive and Negative Association Patterns between English Words by Combining Item Weight and Frequency

A pattern mining, English technology, applied in the direction of text database query, unstructured text data retrieval, etc., can solve the problem of ignoring the frequency of feature words

Inactive Publication Date: 2019-07-09
GUANGXI UNIVERSITY OF FINANCE AND ECONOMICS
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the third method regards each document as a collection of feature word weights, only considers the impact of feature word item weights on support, and ignores the effect of feature word frequency on support

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Method for Mining Weighted Positive and Negative Association Patterns between English Words by Combining Item Weight and Frequency
  • A Method for Mining Weighted Positive and Negative Association Patterns between English Words by Combining Item Weight and Frequency
  • A Method for Mining Weighted Positive and Negative Association Patterns between English Words by Combining Item Weight and Frequency

Examples

Experimental program
Comparison scheme
Effect test

example

[0084] Example: If C k =(t 1 ∪t 2 ∪t 3 ∪t 4 ) (support degree is 0.65), its single item t 1 , t 2 , t 3 and t 4The support degrees of are 0.82, 0.45, 0.76 and 0.75 respectively, and its 2_sub-itemset and 3_sub-itemset (t 1 ∪t 2 ), (t 1 ∪t 3 ), (t 1 ∪t 4 ), (t 2 ∪t 3 ), (t 2 ∪t 4 ), (t 1 ∪t 2 ∪t 3 ), (t 1 ∪t 2 ∪t 4 ), (t 2 ∪t 3 ∪t 4 ) support degrees are 0.64, 0.78, 0.75, 0.74, 0.67, 0., 66, 0.56, 0.43 respectively, then the single item with the largest support degree (value 0.82) is t 1 , the sub-itemset with the largest support (value 0.78) in its 2_subitemset and 3_subitemset is (t 1 ∪t 3 ), then use formula (14) to calculate the positive itemset (t 1 ∪t 2 ∪t 3 ∪t 4 ) is 0.81. Its calculation process is as follows:

[0085]

[0086] 3. Improvement of weighted association rules

[0087] The limitation of the traditional association rule evaluation framework (support-confidence) is that it ignores the itemset support in the rule's consequen...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an English word weighed positive-negative relevancy pattern excavation method for fusing item weights and frequentness. The method comprises the steps of firstly conducting preprocessing of removing stopwords, extracting stems, calculating feature word weights and the like on a to-be-excavated English text dataset to construct an English text index library and a feature word library, adopting a weighted support calculation method for fusing the item weights and the frequentness and a support-relevancy evaluation framework to excavate a feature work weighed positive itemset and a negative itemset and constructing a positive itemset library and a negative itemset library item, adopting a support-lift-confidence evaluation framework to excavate an inter-word weighed positive-negative relevancy rule pattern and constructing a feature word weighed positive rule pattern library and a negative rule pattern library. According to the English word weighed positive-negative relevancy pattern excavation method for fusing the item weights and frequentness, the insufficiency of an existing weighed relevancy pattern excavation technology can be overcome, the item weights and the frequentness are effectively fused, a more practical and more reasonable text feature word positive-negative relevancy rule pattern is excavated, and by applying the inter-word relevancy modesto the field of query expansion, the performance of text information retrieval can be improved.

Description

technical field [0001] The invention belongs to the field of text mining, and specifically relates to a method for mining weighted positive and negative association patterns between English text words by integrating item weight and frequency, which is suitable for discovering the association patterns of characteristic words in English text mining, and can be applied to single-language information retrieval , cross-language information retrieval query expansion and other fields. Background technique [0002] The weighted association mode is divided into the association mode with fixed item weight and the association mode with variable item weight, among which the association mode based on the change of item weight is called weighted association mode. In recent years, weighted association pattern mining has been deeply studied, and its core problem is the weighted support calculation of itemsets. In the existing research, there are mainly three methods for calculating the sup...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/33
Inventor 黄名选
Owner GUANGXI UNIVERSITY OF FINANCE AND ECONOMICS
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products