Stochastic distributed data stream frequent item set mining system and method thereof

A technology of frequent itemset mining and frequent itemsets, which is applied in the fields of electrical digital data processing, special data processing applications, instruments, etc., can solve the difficulty of mining and updating frequency patterns, the data can not meet the user's real-time, time efficiency and space efficiency It can meet the real-time requirements, improve the mining accuracy, and ensure the coverage rate.

Inactive Publication Date: 2010-11-17
NORTHEAST DIANLI UNIVERSITY
View PDF0 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In data mining, frequent pattern mining has been extensively studied both in theory and in application and has achieved a lot of results. Many classic algorithms have emerged, but these algorithms are difficult to update incrementally and are not suitable for data stream mining. Because mining frequent patterns is a collection of continuous operations, the computation of any itemset cannot be completely completed before seeing all past and future data, making it difficult to mine and update frequent patterns in a data streaming environment; and Compared with the mining of static data sets, data streams have more information to track and more complex situations to deal with. Frequent item sets will change over time, and infrequent items may become frequent items later. The storage structure requires Dynamically adjust to reflect changes in frequent itemsets over time
[0004] At present, the traditional data flow frequent itemset mining methods are based on transactional data items, adopting a centralized mining mode, the time efficiency and space efficiency are relatively low, and cannot meet the user's real-time requirements for a large amount of data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Stochastic distributed data stream frequent item set mining system and method thereof
  • Stochastic distributed data stream frequent item set mining system and method thereof
  • Stochastic distributed data stream frequent item set mining system and method thereof

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] The present invention will be further described below using the accompanying drawings and examples.

[0020] refer to figure 1 , a random distributed data stream frequent itemset mining system, which includes:

[0021] A data item splitter, which is used to split different items included in the transaction into items, and send the split data items to n frequent itemset miners;

[0022] n frequent itemset miners mine frequent itemsets for several data items according to frequent itemsets based on frequent items transactions;

[0023] A frequent itemset memory, used to summarize and store the frequent itemsets mined by the miner;

[0024] A random mixer for frequent items, which randomly mixes the order of data items and feeds n+1 times to the data item splitter. Among them: n number of frequent itemset miners; number of transactions in N basic window; w basic window; i m The mth data item; the number of m data items, that is, the number of one-dimensional arrays; s m...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a stochastic distributed data stream frequent item set mining system. In the system, a new distributed data stream mining mode for frequent item transactions is adopted, a stochastic frequent item mixer is introduced to improve the accuracy of mining, new transactions, when accumulated to the same number as basic windows, in a data steam are send to n frequent item set mining devices by a data item dividing method according to the different items in the transactions respectively, transaction numbers to which different data items belong are attached to the different data items, and in the frequent item set mining devices, frequent item sets are mined by running and operating different frequent item transactions and the mined frequent item sets are stored in a frequent item set storage. Finally, the data items are fed back to data item dividers through the stochastic frequent item mixer for deep mining. Compared with other methods, the method has the advantages of small memory storage space, high response speed and the like. Meanwhile, the coverage of mode mining can be ensured by increasing the number of the frequent item set mining devices or feedback times.

Description

technical field [0001] The invention belongs to data stream processing technology, in particular, it is a random distributed data stream frequent item set mining system and method thereof. Background technique [0002] Data flow is a data sequence composed of a series of high-speed, real-time, unlimited, and orderly arriving data. Data flow data widely exists in many fields of daily life, such as network traffic monitoring, meteorological monitoring, and sensor network data management. , Web log analysis, etc. In these fields, it is of great significance to discover frequent patterns of transaction data flow. For example, in network traffic monitoring, frequent patterns may indicate network congestion, and network congestion may be a symptom of network attacks. When a large number of IP data packets with the same address appear, a denial of service attack may have occurred; Important meteorological information such as the distribution of precipitation; in the sensor networ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 曲朝阳王敬东孟凡奇董如意李鹏张亮程成
Owner NORTHEAST DIANLI UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products