Unlock instant, AI-driven research and patent intelligence for your innovation.

Big data association rule mining method based on Spark

A technology of big data and rules, applied in the fields of electrical digital data processing, special data processing applications, digital data information retrieval, etc., to achieve the effect of fast and efficient mining, less memory and IO usage

Inactive Publication Date: 2020-02-21
HARBIN UNIV OF SCI & TECH
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In order to solve the problem of mining association rules under big data, the present invention discloses a method for mining association rules of big data based on Spark, which can improve the running speed when mining large data sets

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Big data association rule mining method based on Spark
  • Big data association rule mining method based on Spark
  • Big data association rule mining method based on Spark

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] Below in conjunction with accompanying drawing, technical scheme of the present invention is described further:

[0027] figure 1 Shown is a flow chart of the method of the present invention, and each step is described in detail according to the content shown in the flow chart.

[0028] First read the converted vertical database file in the local or HDFS, get the RDD after reading the file, call the filter() in the conversion operator to filter the RDD, remove the data whose support degree is less than the minimum support degree, and then get frequent 1- itemsets. According to the obtained frequent 1-itemsets, the intersection sets are continuously obtained to obtain frequent 2-itemsets. For the acquisition of frequent K(K>2)-itemsets, the frequent 2-itemsets should be prefixed first, and then the frequent 2-itemsets and prefixes should be added to the divided data to obtain frequent 3-itemsets. By analogy, to find frequent K-itemsets, it is necessary to extra...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a big data association rule mining method based on Spark. According to the method, a Spark operation framework is adopted, a data set is read by reading a data set address givenby a user and converted into a vertical database, and the converted vertical database is read and filtered to obtain a frequent 1-item set; the intersection of the frequent 1-item set is solved to obtain a frequent 2-item set, wherein the intersection solving efficiency is improved by using a bitmap to store TidSet in the whole process; a prefix division principle is used in the frequent 2-item set to obtain a frequent K-item set. The two methods of data preprocessing and prefix division enable the method to be higher in operation efficiency, and also have good operation efficiency for a large data set.

Description

technical field [0001] With the rapid development of the Internet industry in today's era, the accumulation of data has far exceeded any previous time, and we have entered the era of big data. In the era of big data, data mining has become a popular technology. In the field of data mining, association rule mining is an important model that has been widely studied. The purpose of data mining based on association rules is to find frequent patterns in data sets. , that is, patterns and concurrency relationships that recur multiple times. The original motivation for association rules was proposed for the market basket analysis (Market Basket Analysis) problem. Association rule mining technology has a very wide range of applications, such as the financial industry, retail marketing, biopharmaceuticals, environmental protection, image classification, network traffic analysis and online learning and other fields. The present invention proposes a Spark-based big data association rul...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/2458
CPCG06F16/2465
Inventor 李成严辛雪赵帅
Owner HARBIN UNIV OF SCI & TECH