Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A frequent item set mining method based on a Spark platform

A technology of frequent itemset mining and itemsets, applied in the fields of instruments, file system types, computing, etc., can solve the problems of time consumption, large memory consumption, and many candidate item sets, etc., to accelerate mining speed, accurate results, and improve execution. The effect of efficiency

Active Publication Date: 2019-05-21
KUNMING UNIV OF SCI & TECH
View PDF7 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

With the advent of the big data era, the stand-alone algorithm can no longer meet the time and space requirements of data processing. Distributed algorithms based on Hadoop and Spark cloud computing platforms continue to appear, but they cannot completely remove the Apriori algorithm and FP-Growth algorithm. However, the Apriori algorithm has the problem of scanning the database multiple times and generating too many candidate item sets; when the amount of data is too large, the FP-Growth algorithm will also spend a lot of time recursively extracting the conditional pattern tree
For this reason, many algorithms ported to the Hadoop platform continue to emerge. Compared with Spark, Hadoop has an I / O bottleneck problem, and frequent reading and reading of data sets will consume a lot of unnecessary time.
In summary, whether it is an algorithm based on Apriori, an algorithm based on FP-Growth, or an algorithm transplanted to the Hadoop platform, in the face of massive data, there are problems such as frequent itemsets mining time is too long and memory consumption is too large, and cannot Meet the performance requirements for fast and efficient mining

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A frequent item set mining method based on a Spark platform
  • A frequent item set mining method based on a Spark platform
  • A frequent item set mining method based on a Spark platform

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0056] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings. Apparently, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0057] Such as figure 1 As shown, by assigning tasks, Spark partitions the massive data and assigns the processing tasks of the massive data to each Worker node. Each Worker node will start a corresponding number of Executor processes to execute the Task tasks, and finally integrate the tasks of each Worker node. Calculate the result to get the final result.

[0058] Such as figure 2 As shown, for a Spark-based frequent itemset mining method,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a frequent item set mining method based on a Spark platform, and belongs to the technical field of data mining. Based on a Spark big data processing framework, the invention provides a novel BitMapFIM-Miner algorithm. A data set is processed in parallel, the data set does not need to be scanned many times, based on the thought of divide-and-conquer, transactions with the transaction lengths larger than a certain threshold value are segmented, then a bit operation method is used for calculating and generating frequent item sets, and finally the frequent item sets obtained by all parts are summarized and combined. By using a bit operation method, the mining speed of the frequent item set is accelerated, and the execution efficiency of the algorithm is greatly improved; Through theoretical analysis and experimental verification, it is found that an overlong transaction is segmented, a frequent item set can be efficiently obtained while it is guaranteed that the result is accurate, and a new idea is provided for a frequent item set mining method.

Description

technical field [0001] The invention relates to a frequent item set mining method based on a Spark platform, belonging to the technical field of data mining. Background technique [0002] How to improve the mining performance of frequent itemsets in massive data sets is a current research hotspot. The traditional Apriori algorithm, FP-Growth algorithm and its improved algorithm all show certain advantages when dealing with small-scale data. With the advent of the big data era, the stand-alone algorithm can no longer meet the time and space requirements of data processing. Distributed algorithms based on Hadoop and Spark cloud computing platforms continue to appear, but they cannot completely remove the Apriori algorithm and FP-Growth algorithm. However, the Apriori algorithm has the problem of scanning the database multiple times and generating too many candidate item sets; when the amount of data is too large, the FP-Growth algorithm will also spend a lot of time recursive...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/182G06F16/18
Inventor 丁家满李海滨
Owner KUNMING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products