Data set frequent item set mining availability evaluation method

A technology of frequent itemset mining and maximum frequent itemsets, applied in special data processing applications, digital data protection, digital data processing, etc. Operation efficiency, avoid a lot of repeated calculation process, reduce the effect of search space

Pending Publication Date: 2021-10-29
NANJING NORMAL UNIVERSITY
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The similarity of frequent itemsets has an inseparable relationship with item similarity and support similarity. Using two evaluation indicators at the same time cannot co

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data set frequent item set mining availability evaluation method
  • Data set frequent item set mining availability evaluation method

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0032] Embodiment 1: see figure 1 , the present invention is a method for evaluating the publishing availability of privacy-preserving frequent itemsets, comprising the following steps:

[0033] Step 1: Given a data set D 1 and D 2 , to D 1 、D 2 Use the Apriori algorithm to mine the largest frequent itemset, denoted as FIS 1 、FIS 2 , where l 1 , l 2 for FIS 1 ,FIS 2 The cardinality of the itemsets in the middle;

[0034] In this example, the support threshold is set to 3, and FIS is obtained by mining 1 for {{a,b,c}:4,{a,c,d}:4,{b,d,e}:3,{a,d,e}:3,{b,d,f}: 3},

[0035] FIS 2 as {{a,b,c,d}:3,{b,c,d,e}:4,{a,d,e,f}:3,{b,d,g,h}:3} , then l 1 = 3, l 2 = 4;

[0036] Step 2: Put the FIS 1 Itemset I 1 with FIS 2 Itemset I 2 Perform pairing and get the pairing result pair1 , I 2 ,score 1 >, and add it to Pairs, where score 1 means I 1 , I 2 The similarity of items, the matching steps are as follows:

[0037] (a) For FIS 1 the I 1 , FIS 2 the I 2 , if I 1...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data set frequent item set mining availability evaluation method, which comprises the following steps of: (1) setting C = {I1, I2,..., In} as a set of items, giving transaction data sets D1 and D2, and mining D1 and D2 by utilizing an Apriori algorithm to obtain maximum frequent item set sets, and recording the maximum frequent item set sets as FIS1 and FIS2; (2) any item set MIS1 of the FIS1 and any item set MIS2 of the FIS2 are matched through an item set matching algorithm F, a paired item set table Pairs is obtained, the Pairs is composed of item set pairs < MIS1, MIS2 and score1 >, score1 represents the item similarity of the MIS1 and the MIS2, and the item similarity of the MIS1 and the MIS2 is obtained through calculation in the matching process. (3) for each item < MIS1, MIS2, score 1 > in the Pairs, calculating the support degree similarity score 2 of the MIS1 and the MIS2, further calculating to obtain the composite similarity score of the MIS1 and the MIS2, and updating the pair to be < MIS1, MIS2, score >; and (4) accumulating the composite similarity score of each item in the Pairs, and dividing the accumulated composite similarity score by the number of the items in the Pairs to obtain a similarity score SCORE of the D1 and the D2, and the value range of the score is [0, 1].

Description

technical field [0001] The invention relates to a method for assessing the usability of frequent item set mining of data sets, which is used for evaluating the usability of data sets regarding the availability of frequent item set mining and analysis. Background technique [0002] At present, frequent itemset mining analysis has been extensively studied, however, the evaluation of the availability of frequent itemsets in datasets is still in its infancy. The evaluation indicators include precision and relative error RE. [0003] However, the current commonly used evaluation method precision is mainly based on the item similarity of frequent itemsets, and RE uses the median of support similarity to represent the support similarity between frequent itemsets. These two measurement indicators are relatively independent. And they are relatively one-sided. The similarity of frequent itemsets has an inseparable relationship with item similarity and support similarity. Using two e...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/2458G06F21/62G06K9/62
CPCG06F16/2465G06F21/6245G06F18/22
Inventor 吴卓超
Owner NANJING NORMAL UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products