Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Probability frequent item set excavating method based on MapReduce

A frequent item set mining and frequent item set technology, which is applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as inability to handle uncertain data, inapplicability, and inability to handle large data

Active Publication Date: 2014-08-20
NANJING UNIV
View PDF3 Cites 22 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the problem it focuses on is to determine the frequent itemsets in the data. The frequent itemsets to be solved are also based on the support of the itemsets in all transactions, rather than the frequent probability of the itemsets. Determine the data
Another patent "A method and system for mining association rules" (101799810B) is also aimed at mining frequent itemsets in certain data, but on the one hand, this method cannot handle large data because there is no parallelization based on MapReduce; Unable to deal with uncertain data, the currently known patents on frequent itemset mining are not suitable for the mining of probabilistic frequent itemsets in complex uncertain data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Probability frequent item set excavating method based on MapReduce
  • Probability frequent item set excavating method based on MapReduce
  • Probability frequent item set excavating method based on MapReduce

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0072] This embodiment specifically introduces the advantages of probabilistic frequent itemsets in uncertain data. Take Table 1. Jack's online shopping data as an example. The items in each transaction in Table 1 represent the items that Jack will purchase on a shopping website in a week, and the value following the item indicates the probability that Jack will purchase the item in a week. For example, in the first week represented by transaction t1, Jack completed P transactions after browsing a certain shopping website (P is an integer greater than 0), and purchased CDs in a total of 0.7×P transactions, then the probability of CDs in t1 is 0.7, and food is purchased in every transaction, then the probability of food in t1 is 1.0. The reason why probability is used to represent shopping information is because real shopping data is very large. In order to better store these data, it is often necessary to compress the data, resulting in a large amount of uncertain data. Also,...

Embodiment 2

[0080] Take the uncertain data collected by wireless sensors in agricultural greenhouse production as an example. In traditional agriculture, people have limited ways to obtain farmland information, mainly through manual measurement. The acquisition process requires a lot of manpower, and the use of wireless sensor networks can effectively reduce manpower consumption and the impact on the farmland environment. However, due to the defects of the wireless sensor itself, there are errors and missing information when collecting temperature and humidity, and it is easily affected by noise in the process of transmitting information, resulting in a large amount of uncertain data. How to mine the potential from this uncertain data? The rule of is the problem that the present invention needs to consider. A certain farm greenhouse has a vegetable that can be grown in multiple seasons. The present invention obtains the uncertain data of the temperature and humidity of the vegetable for N...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a probability frequent item set excavating method based on MapReduce. The excavating method includes the following steps that (1), an uncertain data set T1 is read in; (2), all transactions in the uncertain data set T1 are sequentially processed at the Map end, and each item in one transactions and the probability value of the item are mapped into a key value pair (key, value); (3), output of the Map end is received at the Reduce end, and a probability frequent one-item set is generated by using a normal approximation method; (4), a list F-list is generated for the probability frequent one-item set output in the step (3); (5), the uncertain data set T1 stored in a distributed file system HDFS is read in and processed according to the list F-list to generate an uncertain data set T2; (6), a UApriori method based on the MapReduce is operated to continuously process the uncertain data set T2 obtained in the step (5) to generate a candidate item set, and then a probability frequent item set is generated from the candidate item set through the normal approximation method until all probability frequent item sets are generated.

Description

technical field [0001] The invention relates to a computer data mining method, in particular to an approximate mining method for large-scale uncertain data probability frequent itemsets based on MapReduce. Background technique [0002] In recent years, due to some new applications, such as: detection of sensor networks, search of moving objects, network analysis of protein-protein interactions, data integration and data cleaning, etc., the mining of uncertain data has become the focus of data mining. A new hot research topic. Uncertain data mining mainly includes clustering, classification, mining of association rules, outlier detection, etc. Among them, mining of frequent itemsets is a basis in the field of data mining. Therefore, the mining of probabilistic frequent itemsets in uncertain data has become a research hotspot. For example, the currently popular wireless sensor network collects a large amount of data. However, the collected data is usually imprecise due to t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/182
Inventor 杨育彬徐静王苏琦
Owner NANJING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products