Method of quickly mining rare item set of supermarket data

A data set and rare technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the problem of low execution efficiency, achieve the effect of reducing quantity, high mining efficiency, and reducing scale

Inactive Publication Date: 2018-08-21
CHONGQING UNIV OF POSTS & TELECOMM
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0013] Aiming at the problem of low execution efficiency of the existing algorithms mentioned above when mining rare ite

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method of quickly mining rare item set of supermarket data
  • Method of quickly mining rare item set of supermarket data
  • Method of quickly mining rare item set of supermarket data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0055] The original data set of a supermarket is shown in Table 1. Among them, TID is the serial number of the transaction, and Items is the items contained in the transaction. The items specifically represented by Items are as follows: a is orange, b is milk, c is cooked food, d is TV, e is juicer, f is humidifier, g is microwave oven, i is biscuit, k is hardware, l is mineral water , m is the recipe.

[0056] Define two support degrees, namely the minimum rare support degree and the minimum frequent support degree: minRareSup=2, minFreSup=4.

[0057] Table 1 Original transaction table

[0058] TID

Items

1

{a,b,c,d,f}

2

{a,g}

3

{a,c,d,f,g}

4

{a,b,e}

5

{a,c,e,f}

6

{a,b,d,g}

7

{i,b,c,l}

8

{a,i,l}

9

{i,l,k}

10

{i,b,l}

11

{i,m}

12

{i}

[0059] First, a vertical data set is obtained through steps such as flatMap() and Map(). Then,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method of quickly mining rare item sets of supermarket data, and belongs to the technical field of information mining and analysis. The method includes the following steps: step 1, utilizing an original data set to generate a vertical data set; step 2, dividing the vertical data set into a frequent vertical data set and a rare vertical data set according to support degreesof single items; step 3, obtaining a rare 1-item set according to the rare vertical data set, and deleting transactions, which does not contain a rare 1-item, in the original data set to obtain an original data set containing rare 1-items; step 4, carrying out mining of rare k-item sets on the original data set containing the rare 1-items through iteration, wherein k >= 2; and step 5, storing allthe mined rare item sets into the rare vertical data. According to the method, an idea of the vertical data set is adopted, reducing of a scale of data scanning is achieved through dividing the vertical data set into the frequent vertical data set and the rare vertical data set, and then the number of candidate item sets is decreased through storing the already-obtained rare item sets and supportdegrees thereof.

Description

technical field [0001] The invention belongs to the technical field of information mining and analysis, in particular to a method for quickly mining rare item sets of supermarket data. Background technique [0002] Association rule mining is one of the important problems in knowledge discovery. Since Professor Agrawal proposed the concept of association rules in 1993, the research on association rules has not been interrupted. The key to the existing mining association rules is to discover frequent itemsets, that is, to mine the patterns that appear frequently in the data set, hoping to reveal the laws contained in the data through frequent itemsets. Data mining is widely used in supermarket transaction analysis, drug research, network access analysis and other situations. [0003] In the existing research on data mining, Agrawal et al. proposed the Apriori algorithm in 1994, which uses connection and pruning to process candidate item sets to obtain frequent item sets. In...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/2465
Inventor 胡军刘赛男潘皓安邵瑞于洪
Owner CHONGQING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products