Hadoop-based big data association rule mining method

A technology of big data and large data sets, applied in the fields of electrical digital data processing, special data processing applications, digital data information retrieval, etc., can solve the problems of long calculation time, inability to store intermediate results, etc., to avoid inefficiency, avoid memory and the effect of I/O consumption

Pending Publication Date: 2019-11-22
XIAN UNIV OF TECH
View PDF8 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] With the rapid development of information technology, the amount of data that needs to be stored and analyzed is increasing explosively. Human beings have entered the era of big data. The traditional association rule mining algorithm can no longer meet the requirements of big data mining. The

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Hadoop-based big data association rule mining method
  • Hadoop-based big data association rule mining method
  • Hadoop-based big data association rule mining method

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0044] The present invention will be described in detail below with reference to the drawings and specific embodiments.

[0045] The Hadoop-based big data association rule mining method of the present invention, such as figure 1 As shown, the specific operation process includes the following steps:

[0046] Step 1. Input the big data set to be mined, and divide the big data set into blocks;

[0047] The specific process of step 1 is as follows: Use the Hadoop core component HDFS to block the large data set. To ensure data integrity, the number of copies is set to 3.

[0048] Step 2: Use a two-stage MapReduce process to complete the task of mining association rules in a big data set;

[0049] Step 2 includes the following process:

[0050] Step 2.1, use the Map function to generate local candidate frequent itemsets, use the Reduce function to merge all local candidate frequent itemsets, and eliminate the local candidate frequent itemsets that do not meet the support requirements;

[0051]...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Hadoop-based big data association rule mining method, which comprises the following steps of: firstly, inputting a big data set to be mined, and partitioning the big data set; then, using a two-stage MapReduce process to complete a mining task of association rules in the big data set; and finally, evaluating the frequent item set by using the kulczynski measurement and the imbalance ratio, and eliminating the frequent item set which does not meet the threshold requirements of the kulczynski measurement and the imbalance ratio, so as to ensure that the mined frequent pattern has positive correlation. According to the method disclosed by the invention, great memory and I/O consumption in a one-stage MapReduce process and low efficiency in a multi-stage MapReduce process are avoided. According to the method, the number of candidate item sets is reduced. The support degree of the candidate item sets is quickly obtained by utilizing the characteristics of the crosslinked list. A database does not need to be scanned for multiple times. The positive correlation of the mined item sets can be ensured. The method can be applied to actual decisions.

Description

technical field [0001] The invention belongs to the technical field of large-scale data mining, and in particular relates to a method for mining association rules of big data based on Hadoop. Background technique [0002] Traditional association rule mining algorithms can be mainly divided into three categories: the first category is the "generation-test" method, which generates candidate frequent itemsets through iteration and counts them separately, and counts the frequent itemsets; the second category is "pattern growth" method, it does not need to generate candidate itemsets, but compresses all frequent items into a special data structure (usually a tree structure), and directly generates frequent itemsets by traversing the data structure. The third category is the "vertical format" method, which converts the data set in the horizontal format into a vertical format, and obtains frequent itemsets through the intersection operation. [0003] With the rapid development of ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/2455G06F16/2458G06F17/16
CPCG06F17/16G06F16/24553G06F16/2465
Inventor 邢毓华李明星
Owner XIAN UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products