A Method for Mining Association Rules of Large-Scale Data

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A large-scale data and rule technology, applied in the field of distributed computing and data mining, can solve the problems of long data mining operation time, achieve the effect of improving mining efficiency, good scalability, and meeting user needs

Inactive Publication Date: 2016-04-20

UNIV OF ELECTRONICS SCI & TECH OF CHINA

View PDF2 Cites 2 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

A large number of candidate item sets will be generated during the implementation of the Apriori algorithm, resulting in long data mining operations, which is a major shortcoming based on the Apriori algorithm.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

example 1

[0030] The input data table shown in Table 1 has 9 records (T1, T2, ..., T9) and the items contained in each record (I1, I2, I3, I4, I5):

[0031] Table 1 record table

[0032] Record number

collection of items

T1

I1,I2,I5

T2

I2, I4

T3

I2, I3

T4

I1,I2,I4

T5

I1,I3

T6

I2, I3

T7

I1, I3

T8

I1,I2,I3,I5

T9

I1,I2,I3

[0033] In order to facilitate the calculation of the similarity between items in the data, the input data table is converted into a 0,1 state table, as shown in Table 2, 0 means that the current item does not appear in the corresponding record, and 1 means that the current item appears in the corresponding record middle:

[0034] Table 20,1 State table

[0035]

I1

I2

I3

I4

I5

T1

1

0

1

T2

0

1

0

1

0

T3

0

1

0

T4

1

0 ...

example 2

[0069] Taking frequent itemset mining for a category (T2, T8) as an example, the default minimum support is 0.22.

[0070] The 0,1 state tables of records T2 and T8 are shown in Table 5:

[0071] Table 5 state table

[0072]

I1

I2

I3

I4

I5

T2

1

0

1

[0073] T8

0

1

0

1

0

[0074] In the first scan, the items contained in this category (I1, I2, I4, I5) are used as candidate item sets alone, and the corresponding support is greater than the minimum support of 0.22 as shown in Table 6:

[0075] Table 6 Support degree of the first scan

[0076]

Support

I1

50%

I2

1

I4

50%

I5

50%

[0077] The frequent 1-itemsets generated by the first scan are: I1, I2, I4, I5

[0078] In the second scan, 2 candidate item sets (I1, I2, I1, I4, I1, I5, I2, I4, I2, I5, I4, I5) including frequent 1-itemsets are generated, and the corresp...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The present invention provides a large-scale data association rule mining method, which includes the following steps: 1) performing similarity-based classification preprocessing on input data, so that records in the same classification have high similarity; 2) each classification The data in is mined based on the Apriori algorithm to obtain the frequent itemsets of each category; 3) Merge the frequent itemsets of all categories, and determine the association rules corresponding to the frequent itemsets greater than the minimum confidence as strong association rules. The invention can reduce unnecessary candidate item sets with small correlation, thereby improving the mining efficiency of the association rules of the overall data and having better expansibility.

Description

technical field [0001] The invention relates to distributed computing and data mining technology. Background technique [0002] Research on massive data management is not a new topic, but the definition of "massive" is constantly changing with the rapid development of storage devices. [0003] For large-scale data, the database management system indexes the data through Hash, B+'Iree and other means, which can effectively reduce the cost of reading and writing external memory and improve the efficiency of data query. In order to process a larger amount of data, Parallel Database System (Parallel Database System, referred to as PDBS) and Distributed Database System (Distributed Database System, referred to as DDBS) have emerged one after another. Multiple data processing nodes are connected through the network to form a whole, so as to complete the effective processing of massive data. task. [0004] Association rules were proposed by Agrawal et al. in the literature in 199...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06F17/30

Inventor 罗光春田玲秦科陈爱国段贵多

Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA

A Method for Mining Association Rules of Large-Scale Data

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

example 1

example 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology