Multi-layer frequent pattern discovery algorithm with high space extensibility and high time efficiency for mining mass data

A technology with frequent patterns and massive data, applied in electrical digital data processing, special data processing applications, computing, etc., and can solve problems such as a large amount of computing time, storage space, and space overhead expansion.

Inactive Publication Date: 2011-10-12
ZHEJIANG GONGSHANG UNIVERSITY
View PDF0 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

FPM-cross [4] Inherited FPGrowth [6] The disadvantage is that a large number of pattern support sub-libraries need to be generated during the mining process, and maintaining these pattern support sub-libraries requires a lot of computing time and storage space
In addition, FPM-cr...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-layer frequent pattern discovery algorithm with high space extensibility and high time efficiency for mining mass data
  • Multi-layer frequent pattern discovery algorithm with high space extensibility and high time efficiency for mining mass data
  • Multi-layer frequent pattern discovery algorithm with high space extensibility and high time efficiency for mining mass data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0086]The present invention "a multi-layer frequent pattern discovery algorithm with high spatial scalability and high time efficiency for mining massive data" proposes three new technologies. figure 1 The technical route of integrating these three new technologies is summarized.

[0087] Below in conjunction with accompanying drawing and example (given figure 2 The transactional database D, image 3 The shown project concept hierarchy tree H, minimum support degree minsup=3), divides the technical solution into two processes for further description.

[0088] Process 1: Using hierarchical label extension type FP-tree [6] , Represent transactional database and concept level information in an integrated manner.

[0089] The specific steps of process one are as follows:

[0090] 1.1RepresentDBbyFPtree: using basic FP-tree [6] Express transactional database D. The basic FP-tree consists of two parts, the prefix forest and the header table. Among them, the prefix forest exp...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a multi-layer frequent pattern discovery algorithm with high space extensibility and high time efficiency for mining mass data, relating to the field of intelligent information processing, which has a wide application prospect in mass data mining, particularly in network information search and knowledge discovery. Aiming at the problem that time and space expense bottleneck exists during mass data mining due to the present multi-layer algorithm for simply extending a single-layer algorithm, the invention provides three new technologies: the first one is a hierarchical labeling technology capable of integrating hierarchical structure information in a plurality of data expression methods by the least additional expense to solve the space expense bottleneck; the second one is an extensive virtual projection method for avoiding the repeat generation of a pattern support set and having a higher space utilization rate; the third one is an inverted set enumeration tree for organizing multilayer patterns and a cutting technology thereof, and the inverted set enumeration tree greatly reduces the search space of a frequent pattern, thereby solving an operation time bottleneck. The time and efficiency of the algorithm disclosed by the invention are about 5 times and 1-3 orders of magnitude higher than those of two reference algorithms and the space expense is the least. Various applications such as mass Web mining, multimedia mining and text mining become possible due to the high performance of the algorithm disclosed by the invention.

Description

technical field [0001] The invention relates to the field of intelligent information processing. The present invention designs an algorithm capable of discovering frequent patterns spanning multiple concept levels with high spatial scalability and high time efficiency. It is used in massive data mining, especially network information search and knowledge discovery, including Web mining, text mining, multimedia In mining, it has broad application prospects. Background technique [0002] mining in the web [8] , text mining, multimedia mining [3] Frequent pattern mining is the most basic task in various data mining applications. When mining massive data, it is difficult to extract frequent patterns at the lower or original conceptual level due to the sparsity of the data, and the frequent patterns at the highest conceptual level are often just commonsense knowledge. Therefore, it is valuable to discover frequent patterns across multiple conceptual levels. However, the resu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 刘君强
Owner ZHEJIANG GONGSHANG UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products