Unlock instant, AI-driven research and patent intelligence for your innovation.

A spark-based apriori parallelization method, system and device

A frequent collection and transaction technology, applied in data mining, instrumentation, computing, etc., can solve the problems of slow frequent collection, inflexibility, and increased network overhead, so as to improve generation speed and efficiency, overcome large network overhead, and overcome Generating Slow Effects

Active Publication Date: 2021-06-01
SOUTH CHINA NORMAL UNIVERSITY +1
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Among them, the YAFIM (Yet Another Frequent Itemset Mining Algorithm) algorithm refers to the parallel operation of the association algorithm Apriori through the Spark computing framework, and the use of hash trees to screen candidate sets to generate frequent sets. The original local generation method for calculation is slow and inefficient; in the pruning step, the transaction database is broadcast, and the hash tree is used to filter candidate sets and output frequent sets, and the speed of generating frequent sets is slow
The R-Apriori algorithm is optimized for the YAFIM algorithm. The difference from the YAFIM algorithm is that the Bloom filter data structure is used instead of the hash tree to increase the speed of generating frequent sets, but this method of generating frequent sets is single and insufficient. Flexible, when faced with frequent sets of different dimensions, the network overhead increases and the efficiency is very low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A spark-based apriori parallelization method, system and device
  • A spark-based apriori parallelization method, system and device
  • A spark-based apriori parallelization method, system and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0096]The existing APRIORI is in parallel and operates locally. Since the stand-alone resources are limited, the speed of the candidate set is slow and the efficiency is low; in addition, the existing pruning operation only broadcasts the transaction database, leading to face When the big transaction database, the overhead of the network increased significantly, and the speed of the generation frequency also decreased significantly. The present invention proposes a SPARK-based APRIORI parallelization method, system and device, overcomes the above disadvantages of the prior art, improves operational speed and efficiency, and also reduces network overhead.

[0097]The following is a detailed description of the SPARK-based Apriori parallelization method of the present invention from the noun interpretation and specific implementation.

[0098](1) explanation of noun

[0099]The proprietary noun relating to the present invention is as follows:

[0100]Spark Calculation Frame: Spark is a framework f...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a Spark-based Apriori parallelization method, system and device. The method includes obtaining a transaction database, generating candidate sets and frequent sets according to the transaction database; distributing the candidate sets and frequent sets to clusters for merging operations and configurable Pruning operation: According to the results of the merge operation and the pruning operation, a frequent set satisfying the minimum support is generated. The system includes pre-acquisition module, processing module and generation module. The device includes memory and a processor. The invention improves the generation speed and efficiency of candidate sets by distributing frequent sets to clusters for merging operations; at the same time, the invention improves the generation speed of frequent sets and reduces network overhead by adopting configurable pruning operations. As a Spark-based Apriori parallelization method, system and device, the present invention can be widely used in the field of data mining.

Description

Technical field[0001]The present invention relates to the field of data mining, in particular a SPARK-based APRIORI parallelization method, system, and device.Background technique[0002]The APARK Calculation Framework Apriori Algorithm is parallelized with the YAFIM algorithm and the R-APRIORI algorithm. Among them, YAFORITHM algorithm refers to the combination of association algorithm Apriori through the Spark computing framework, and use hash trees to perform candidate sets, generate frequent sets, which passes through the merger The original local generated method is calculated, the speed is slow and the efficiency is low; on the twig step, the transaction database is broadcast, and the hash tree is used to screen and output frequently, and the speed is slow. The R-APRIORI algorithm is optimized for the YAFIM algorithm. The difference from the YAFIM algorithm is to use the Buron filter data structure to replace the hash tree, improve the speed of generating frequently, but this ge...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/2458
CPCG06F2216/03G06F16/2465
Inventor 赵淦森张海明王欣明庄序填林成创蔡斯凯李振宇李胜龙唐华张奇支
Owner SOUTH CHINA NORMAL UNIVERSITY