Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

MapReduce-based FP-Growth load balance parallel computing method

A parallel computing and load balancing technology, applied in computing, special data processing applications, instruments, etc., can solve problems such as inability to meet mass data storage and mining, lack of load balancing considerations, and achieve good load balancing capabilities and execution efficiency.

Inactive Publication Date: 2015-06-24
JIANGSU CAS JUNSHINE TECH
View PDF3 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] With the progress of society and the development of science and technology, data is growing explosively. The FP-Growth algorithm for mining association rules in the form of a single machine is far from meeting the storage and mining of massive data. Some existing FP-Growth algorithms The parallel algorithm solves the two problems of database division and subsequent parallel computing, but there are obvious differences and deficiencies in the algorithm in terms of parallel computing efficiency, memory consumption, communication consumption, and performance differences caused by differences in FP-Tree sparsity. Both have a lot to do with the lack of load balancing considerations when dividing database transaction sets

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • MapReduce-based FP-Growth load balance parallel computing method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] The present invention will be further described below in conjunction with specific drawings and embodiments.

[0021] Such as figure 1 Shown: In order to have better load balancing capability and execution efficiency, the load balancing parallel computing method of the present invention includes the following steps:

[0022] Step 1. Input the required database transaction set D and the minimum support count, and divide the database transaction set D into consecutive different partitions, and the sub-transaction sets of the database transaction set D are stored on multiple nodes;

[0023] Divide the database transaction set D into several consecutive parts and store them on different computing nodes. Each divided sub-transaction set is called data sharding. This process is directly completed by Hadoop. Users only need to copy the database transaction set to HDFS, and the Hadoop framework will divide the input file into multiple data shards (Blook). Stored on unconnecte...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a MapReduce-based FP-Growth load balance parallel computing method. The method comprises the steps that 1, a database transaction set D is divided into different continuous partitions, and a sub-transaction set are stored on multiple nodes; 2, parallel computing is conducted on support counts to obtain all the frequent one-item sets FList; 3, items of the frequent one-item sets are divided into M groups according to a load balancing method to obtain a new list GList; 4, the database transaction set D is also divided into M groups according to the new list GList, a local FP-Tree of each transaction set DB is created when the division of the database transaction set D is finished, and a corresponding GList[gidi] is mined according to each local FP-Tree to obtain the frequent patterns of all the items in the frequent one-item set; 5, the frequent patterns of all the items in the frequent one-item set obtained on each node are aggregately output. The MapReduce-based FP-Growth load balance parallel computing method has good load balancing capacity and execution efficiency.

Description

technical field [0001] The invention relates to a load balancing parallel computing method, in particular to a MapReduce-based FP-Grwoth load balancing parallel computing method, which belongs to the technical field of data mining. Background technique [0002] Association rule mining reflects the interdependence and correlation between a thing and other things, which is an important topic in data mining technology. Association rule mining needs to go through two steps, that is, the generation of frequent itemsets and the generation of association rules. The overall performance of association rule mining is mainly determined by the first stage. Classic association rule mining algorithms mainly include Apriori algorithm, FP-Growth algorithm and Eclat algorithm. The former two adopt horizontal data format for mining, while the latter adopt vertical data format for mining. Compared with the Apriori algorithm, the FP-Growth algorithm uses a divide-and-conquer strategy to mine t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/176G06F16/182
Inventor 杨勇陈曙东
Owner JIANGSU CAS JUNSHINE TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products