Big data association rule mining method based on Spark

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of big data and rules, applied in the fields of electrical digital data processing, special data processing applications, digital data information retrieval, etc., to achieve the effect of fast and efficient mining, less memory and IO usage

Inactive Publication Date: 2020-02-21

HARBIN UNIV OF SCI & TECH

View PDF5 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] In order to solve the problem of mining association rules under big data, the present invention discloses a method for mining association rules of big data based on Spark, which can improve the running speed when mining large data sets

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0026] Below in conjunction with accompanying drawing, technical scheme of the present invention is described further:

[0027] figure 1 Shown is a flow chart of the method of the present invention, and each step is described in detail according to the content shown in the flow chart.

[0028] First read the converted vertical database file in the local or HDFS, get the RDD after reading the file, call the filter() in the conversion operator to filter the RDD, remove the data whose support degree is less than the minimum support degree, and then get frequent 1- itemsets. According to the obtained frequent 1-itemsets, the intersection sets are continuously obtained to obtain frequent 2-itemsets. For the acquisition of frequent K(K>2)-itemsets, the frequent 2-itemsets should be prefixed first, and then the frequent 2-itemsets and prefixes should be added to the divided data to obtain frequent 3-itemsets. By analogy, to find frequent K-itemsets, it is necessary to extra...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a big data association rule mining method based on Spark. According to the method, a Spark operation framework is adopted, a data set is read by reading a data set address givenby a user and converted into a vertical database, and the converted vertical database is read and filtered to obtain a frequent 1-item set; the intersection of the frequent 1-item set is solved to obtain a frequent 2-item set, wherein the intersection solving efficiency is improved by using a bitmap to store TidSet in the whole process; a prefix division principle is used in the frequent 2-item set to obtain a frequent K-item set. The two methods of data preprocessing and prefix division enable the method to be higher in operation efficiency, and also have good operation efficiency for a large data set.

Description

technical field [0001] With the rapid development of the Internet industry in today's era, the accumulation of data has far exceeded any previous time, and we have entered the era of big data. In the era of big data, data mining has become a popular technology. In the field of data mining, association rule mining is an important model that has been widely studied. The purpose of data mining based on association rules is to find frequent patterns in data sets. , that is, patterns and concurrency relationships that recur multiple times. The original motivation for association rules was proposed for the market basket analysis (Market Basket Analysis) problem. Association rule mining technology has a very wide range of applications, such as the financial industry, retail marketing, biopharmaceuticals, environmental protection, image classification, network traffic analysis and online learning and other fields. The present invention proposes a Spark-based big data association rul...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F16/2458

CPCG06F16/2465

Inventor 李成严辛雪赵帅

Owner HARBIN UNIV OF SCI & TECH

Big data association rule mining method based on Spark

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology