Method and device for realizing association rule mining algorithm supporting distributed computation

A distributed computing and mining algorithm technology, applied in the computer field, can solve problems such as slow computing efficiency and insufficient rule mining results, and achieve fast computing efficiency

Active Publication Date: 2013-02-27
杭州斯凯网络科技有限公司
View PDF3 Cites 36 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0011] The present invention aims to solve the problems that the existing technology cannot handle massive data mining, the calculation efficiency is very slow and the rule mining results are not comprehensive enough, and combines the PA association algorithm and the Hadoop distributed computing framework to provide a method that can handle massive data mining and has high calculation efficiency. Very fast, an association rule mining algorithm implementation method and device supporting distributed computing that can more comprehensively, quickly and efficiently mine association rule results for business support from massive business data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for realizing association rule mining algorithm supporting distributed computation
  • Method and device for realizing association rule mining algorithm supporting distributed computation
  • Method and device for realizing association rule mining algorithm supporting distributed computation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0035] Embodiment: a method for implementing an association rule mining algorithm that supports distributed computing in this embodiment, such as figure 1 As shown, using the distributed file system Hadoop programming model MapReduce to decompose the association rule mining algorithm PA into two stages of map function stage 9 and reduce function stage 10, the decomposition steps are as follows:

[0036] Step 1: Configure the job scheduler Recomjob1;

[0037] Step 2: Use the prior probability mapping module PriorMap3 to read the data set 2, and convert the data rows of the data set into key-value pairs through the map function;

[0038] Step 3: Use the prior probability reduction module PriorReduce4 to read the key-value pairs processed in step 2, and use the reduce function to randomly generate the sorting rule TopN8 including the i-item set, and calculate the prior probability distribution value 5 of the confidence;

[0039] Step 4: Use the rule mapping module ParMap6 to rea...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and a device for realizing an association rule mining algorithm supporting a distributed computation. An HDFS (Hadoop Distributed File System) programming model is used to carry out two-stage analysis of a map function stage and a reduce function stage on the association rule mining algorithm, and the analysis steps comprises the following steps: step 1, a job scheduler is configured; step 2, a data set is read by a prior probability mapping module, and the data of the data set are converted by a map function into a value pair; step 3, the value pair processed in the step 2 is read by the prior probability reduction module, an ordering rule Top N containing an i item set is randomly generated by a reduce function, and the prior probability distribution value of a confidence coefficient is calculated at the same time; step 4, the same data set is read by a rule mapping module, and the data row of the data set is converted by the map function into the value pair; and step 5, the value pair processed in the step 4 and the prior probability distribution value in the step 3 are read by a rule reduction module, and the predication accuracy value of the ordering rule Top N is calculated by the reduce function. The method and the device for realizing the association rule mining algorithm supporting the distributed computation are mainly applied to the PA (Pridictive Apriori)-distribution type computing technology.

Description

technical field [0001] The invention relates to the field of computer technology, in particular to a method and device for realizing an association rule mining algorithm supporting distributed computing. Background technique [0002] With the advent of the "big data" era, the amount of enterprise business data has increased sharply. Data analysts are trying various data analysis methods and data mining methods, aiming to discover potential user behaviors with business value from massive data. model. Data Mining: A technique for finding patterns in large amounts of data by analyzing each piece of data. In addition, big data, massive data, and data sets mentioned in the present invention have the same meaning. [0003] Association rule mining is a widely used and influential method in data mining methods, which can be used in various recommendation systems to recommend items of interest to users. Currently, the various versions of association rule mining algorithms that can...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 杨进张金伟
Owner 杭州斯凯网络科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products