Method and system for mining association rules between Chinese and English words based on partially ordered itemsets

A Chinese-English, itemset technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as ineffectiveness, difficult technical advancement, and uninteresting correlation patterns.

Inactive Publication Date: 2017-07-18
GUANGXI UNIVERSITY OF FINANCE AND ECONOMICS
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Although the frequency-based mining method has been extensively studied, it still has the following defects: it only pays attention to the frequency of items, and ignores the existence of item weights, resulting in the increase of redundant, invalid and uninteresting association modes.
The disadvantage of the existing mining method based on weight change is that the number of association patterns it mines is still huge, which increases the difficulty for users to select the desired pattern, and there are still many boring, false and invalid association patterns. Its technology rises to the application level
Mining algorithms based on fixed item weights are not suitable for processing fully weighted data. At present, most of them still use frequency-based mining methods to process these data, resulting in a large number of redundant, invalid and uninteresting association models.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for mining association rules between Chinese and English words based on partially ordered itemsets
  • Method and system for mining association rules between Chinese and English words based on partially ordered itemsets
  • Method and system for mining association rules between Chinese and English words based on partially ordered itemsets

Examples

Experimental program
Comparison scheme
Effect test

example

[0082] Example: the complete weighted data example of a Chinese text data (Text data) is as follows: Text=(TR, IS, IW), wherein, TR={r 1 ,r 2 ,r 3 ,r 4 ,r 5} is 5 document records, IS={i 1 ,i 2 ,i 3 ,i 4 ,i 5} is 5 feature word items, IW={1 ,r 1 ,0>,2 ,r 1 ,0.83>,3 ,r 1 ,0.81>,4 ,r 1 ,0>,5 ,r 1 ,0.01>,1 ,r 2 ,0>,2 ,r 2 ,0.94>,3 ,r 2 ,0.7>,4 ,r 2 ,0.23>,5 ,r 2 ,0>,1 ,r 3 ,0>,2 ,r 3 ,0.35>,3 ,r 3 ,0.5>,4 ,r 3 ,0.63>,5 ,r 3 ,0>,1 ,r 4 ,0.95>,2 ,r 4 ,0>,3 ,r 4 ,0.85>,4 ,r 4 ,0>,5 ,r 4 ,0>,1 ,r 5 ,0.73>,2 ,r 5 ,0.02>,3 ,r 5 ,0>,4 ,r 5 ,0.06>,5 ,r 5 ,0.9>}. The IW set can be used as follows figure 1 express.

[0083]

[0084] figure 1 fully weighted data instance

[0085] Definition 2 (itemset weight and item weight): The fully weighted itemset I is composed of different items i 1 ,i 2 ,...,i p The set composed of, that is, I=(i 1 ,i 2 ,...,i p )(1≤p≤m), The item set weight of I means that all items of item set I appear in the same transa...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Disclosed is a partial-sequence itemset based Chinese-English test word association rule mining method and system. A text information preprocessing module is used for performing preprocessing to establish a text information database and a feature word item base; a feature word frequent partial-sequence item implementation module is used for mining feature word candidate itemsets and solving out partial-sequence itemsets of the candidate itemsets, the candidate partial-sequence itemsets are pruned by a new itemset pruning method, weights of candidate partial-sequence itemsets are calculate, and supports of the candidate partial-sequence itemsets are calculated by a new calculation method so as to obtain frequent partial-sequence itemsets.

Description

technical field [0001] The invention belongs to the field of data mining, specifically a method for mining association rules between Chinese and English text words based on partially ordered itemsets and its mining system, which is suitable for the discovery of feature word association patterns in Chinese and English text mining and the retrieval and query of Chinese and English text information Expansion, cross-language information retrieval of Chinese and English texts and other fields, using the method of the present invention in search engines (such as Baidu, Google, etc.) can obtain high-quality expansion words to realize user query expansion, and improve recall and precision. Background technique [0002] Over the past 20 years, association rule mining research has achieved remarkable technical results, mainly focusing on two aspects: mining based on item frequency and mining technology based on item weight. [0003] Mining based on item frequency is also called unweig...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/316G06F16/334G06F16/353G06F40/216
Inventor 黄名选
Owner GUANGXI UNIVERSITY OF FINANCE AND ECONOMICS
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products