Partial-sequence itemset based Chinese-English test word association rule mining method and system

An item set and text technology, which is applied in the field of mining association rules between Chinese and English text words and its mining system, can solve the problems of redundancy, increasing the difficulty for users to select the desired mode, increasing invalid and uninteresting association modes, etc.

Inactive Publication Date: 2014-12-03
GUANGXI UNIVERSITY OF FINANCE AND ECONOMICS
View PDF4 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Although the frequency-based mining method has been extensively studied, it still has the following defects: it only pays attention to the frequency of items, and ignores the existence of item weights, resulting in the increase of redundant, invalid and uninteresting association modes.
The disadvantage of the existing mining method based on weight change is that the number of association patterns it mines is still huge, which increases the difficulty for users to select the desired pattern, and there are still many boring, false and invalid association patterns. Its technology rises to the application level
Mining algorithms based on fixed item weights are not suitable for processing fully weighted data. At present, most of them still use frequency-based mining methods to process these data, resulting in a large number of redundant, invalid and uninteresting association models.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Partial-sequence itemset based Chinese-English test word association rule mining method and system
  • Partial-sequence itemset based Chinese-English test word association rule mining method and system
  • Partial-sequence itemset based Chinese-English test word association rule mining method and system

Examples

Experimental program
Comparison scheme
Effect test

example

[0081] Example: A fully weighted data example of Chinese text data (Text data) is as follows: Text =( TR , IS , IW ),in, TR ={ r 1 ,r 2 , r 3 , r 4 , r 5} for 5 document records, IS ={ i 1 ,i 2 , i 3 , i 4 , i 5} are 5 feature word items, IW ={i 1 , r 1 ,0>, i 2 , r 1 ,0.83>, i 3 , r 1 ,0.81>, i 4 , r 1 ,0>, i 5 , r 1 ,0.01>,i 1 , r 2 ,0>, i 2 , r 2 ,0.94>, i 3 , r 2 ,0.7>, i 4 , r 2 ,0.23>, i 5 , r 2 ,0>,i 1 , r 3 ,0>, i 2 , r 3 ,0.35>, i 3 , r 3 ,0.5>, i 4 , r 3 ,0.63>, i 5 , r 3 ,0>,i 1 , r 4 ,0.95>, i 2 , r 4 ,0>, i 3 , r 4 ,0.85>, i 4 , r 4 ,0>, i 5 , r 4 ,0>,i 1 , r 5 ,0.73>, i 2 , r 5 ,0.02>, i 3 , r 5 ,0>, i 4 , r 5 ,0.06>, i 5 , r 5 ,0.9>}. IW Collections can be used as follows figure 1 express.

[0082]

[0083] figure 1 fully weighted data instance

[0084] Definition 2 (Itemset Weight and Item Weight): Fully Weig...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Disclosed is a partial-sequence itemset based Chinese-English test word association rule mining method and system. A text information preprocessing module is used for performing preprocessing to establish a text information database and a feature word item base; a feature word frequent partial-sequence item implementation module is used for mining feature word candidate itemsets and solving out partial-sequence itemsets of the candidate itemsets, the candidate partial-sequence itemsets are pruned by a new itemset pruning method, weights of candidate partial-sequence itemsets are calculate, and supports of the candidate partial-sequence itemsets are calculated by a new calculation method so as to obtain frequent partial-sequence itemsets.

Description

technical field [0001] The invention belongs to the field of data mining, specifically a method for mining association rules between Chinese and English text words based on partially ordered itemsets and its mining system, which is suitable for the discovery of feature word association patterns in Chinese and English text mining and the retrieval and query of Chinese and English text information Expansion, Chinese and English text cross-language information retrieval and other fields. Background technique [0002] Over the past 20 years, association rule mining research has achieved remarkable technical results, mainly focusing on two aspects: mining based on item frequency and mining technology based on item weight. [0003] Mining based on item frequency is also called unweighted association rule mining. Its main feature is to process item sets according to the principle of equality and consistency, and use the probability and conditional probability of item sets appearing...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/316G06F16/334G06F16/353G06F40/216
Inventor 黄名选
Owner GUANGXI UNIVERSITY OF FINANCE AND ECONOMICS
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products