Method and system for mining fully weighted positive and negative association patterns between words in text

A fully weighted, positive and negative correlation technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as neglect, boredom, and failure to consider fully weighted data

Inactive Publication Date: 2017-03-22
GUANGXI UNIVERSITY OF FINANCE AND ECONOMICS
View PDF2 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The disadvantage of the traditional item unweighted association mode mining is that it does not consider the existence of item weights, which often leads to a large number of redundant, uninteresting and invalid association modes during mining.
The defect of weighted positive and negative association rule mining is that although it pays attention to the different importance between items, it ignores the fact that the item weight has different weights in each transaction record in the database.
The traditional item unweighted mining method mines these fully weighted data, because it does not consider the inherent characteristics of fully weighted data, but only considers the frequency of items, often resulting in a large number of redundant, invalid and false association models. The item weighted mining method cannot be applied to fully weighted data mining

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for mining fully weighted positive and negative association patterns between words in text
  • Method and system for mining fully weighted positive and negative association patterns between words in text
  • Method and system for mining fully weighted positive and negative association patterns between words in text

Examples

Experimental program
Comparison scheme
Effect test

example

[0121] Example: In the examples in Table 2, wdR(i 1 )=(0.85+0.93+0.65+0.75) / 1=3.18, wdR(i 2 )=0.61, wdR(i 1 ,i 2 )=(0.93+0.21+0.65+0.35+0.75+0.05) / 2=1.47, awsup(i 1 ,i 2 )=1.47 / 5=0.29,

[0122] Definition 3

[0123] Fully weighted frequent itemsets and negative itemsets: set the minimum support threshold as minsup, and the minimum itemset weight dimension ratio threshold as minwdR, obviously, minwdR=n×minsup, if fully weighted itemsets support awsup(I)≥minsup , or wdR(I)≥minwdR, then the itemset I is a fully weighted frequent itemset; for a fully weighted itemset (I 1 , I 2 ), if its sub-itemset I 1 and I 2 is a frequent itemset, and awsup(I 1 , I 2 )1 , I 2 )1 , I 2 ) is a fully weighted negative itemset.

[0124] Example: Set minsup=0.1, then minwdR=5×0.1=0.5, as can be seen from the above example, wdR(i 1 ,i 2 )=1.47>minwdR, so, (i 1 ,i 2 ) is a fully weighted frequent itemset; wdR(i 1 )=3.18>minwdR, wdR(i 4 )=0.96>minwdR, wdR(i 1 ,i 4 )=0.381 ,i ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method of item-all-weighted positive or negative association model mining between text terms and a mining system applied to the method. The method comprises the following steps of preprocessing by using a Chinese text preprocessing module to establish a text database and a feature word item library; mining item-all-weighted feature word candidate item sets from the text database by utilizing a feature word frequent item set and negative item set mining implementation module, calculating a weight dimension ratio, and cutting out uninteresting item sets by adopting a multi-interestingness threshold value pruning strategy to obtain an interesting item-all-weighted feature work frequent item set and negative item set model; mining an effective item-all-weighted positive or negative association rule model from frequent item sets and negative item sets by utilizing an item-all-weighted positive or negative association rule mining implementation module between terms, and outputting the mined positive or negative association rule model to a user by utilizing an item-all-weighted association model result display module between terms. By applying the method and the system, unnecessary frequent item sets, negative item sets and association rule models can be greatly reduced, Chinese feature word association rule mining efficiency is improved and a high-quality association model between Chinese terms is obtained.

Description

technical field [0001] The invention belongs to the field of data mining, specifically a method for mining fully weighted positive and negative association patterns between text words based on the ratio of weight dimensions and a mining system thereof, which is applicable to the discovery of association patterns of characteristic words in text mining and the expansion of text information retrieval queries, etc. field. Background technique [0002] In the past 20 years, the research on association pattern mining technology has made remarkable achievements, and has gone through three research stages: item unweighted mining technology, item weighted mining technology and item fully weighted mining technology. [0003] Phase 1: Research on Mining Unweighted Positive and Negative Association Patterns [0004] The main feature of item unweighted positive and negative association pattern mining is that the probability of the item set appearing in the database is the support degree...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/313G06F16/3335
Inventor 黄名选
Owner GUANGXI UNIVERSITY OF FINANCE AND ECONOMICS
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products