Parallelizing method of association analytical algorithm

A correlation analysis and algorithm technology, applied in computing, special data processing applications, instruments, etc., can solve the problem that Apriori cannot adapt well to parallelization, saving time, improving generation speed, and reducing network pressure.

Active Publication Date: 2014-07-09
NANJING UNIV OF POSTS & TELECOMM
View PDF6 Cites 27 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] Technical problem: the purpose of the present invention is to design a parallelization method of association analysis algorithm for a kind of classic association rule analysis algorithm Apriori can not be well adapted to parallelization, which reduces the synchronization dependence and network communication burden between nodes , improve the speed of database scanning and calculation, and use cloud computing to solve the difficulties and bottlenecks of massive data analysis

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Parallelizing method of association analytical algorithm
  • Parallelizing method of association analytical algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] Details:

[0037] k-itemset: The association rule algorithm is to find rules such as A→B from {A, B, C, D}, {A, B}... and other sets. Therefore, for example, {A}, {C} is called a 1-itemset, {A, B} is called a 2-itemset, and {A, B, C...} is called a k-itemset , where k represents how many items are in the set.

[0038] Frequent k-itemsets: 1-itemsets whose occurrence frequency satisfies the threshold value are called frequent 1-itemsets, and 2-itemsets whose occurrence frequency satisfies the threshold value are called frequent 2-itemsets. Similarly, occurrence The k-itemset whose frequency satisfies the threshold is called frequent k-itemset.

[0039] Candidate frequent k-itemsets: 2-itemsets obtained through set connection that may become frequent 2-itemsets are called candidate frequent 2-itemsets. The k-itemsets obtained through set connection that may become frequent k-itemsets are called candidate frequent k-itemsets.

[0040] Confidence: Indicates how credible...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention designs a novel parallelization scheme, particularly relates to a parallelizing method of association analytical algorithm in order to overcome the defect that a conventional association rule analysis algorithm Apriori cannot well adapt to parallelization. The parallelizing method includes blocking computation tasks via a master control node, allocating and distributing to various subsidiary computation nodes; parallelly computing via the various subsidiary computation nodes to screen frequent item sets, finally combining the nodes and returning results for statistics, and generating the frequent item sets; distributing the frequent item sets again and generating rules via various nodes. Since each computation node only processes a part of computation tasks, the problem that massive data cannot be processed by being read into an internal storage by one machine and processing speed is too slow is solved; the various nodes can be parallelly involved in processing, and processing efficiency is effectively improved; synchronous dependence, network communication overload, high frequency in I/O (input/output) operation among the nodes during computation are correspondingly improved, and scanning and computing speed of a database are improved.

Description

technical field [0001] The present invention aims at the defect that Apriori, a classic association rule analysis algorithm, cannot well adapt to parallelization, and designs a new parallelization method, which reduces the synchronization dependence and network communication burden between nodes, and improves database scanning and calculation. speed. It belongs to the field of distributed computing and cloud computing. Background technique [0002] Cloud Computing (Cloud Computing) is an emerging business computing model, which distributes computing tasks on a resource pool composed of a large number of computers, enabling various application systems to obtain computing power, storage space and various software services as needed. The result of continuous evolution in data management technology. At the end of the last century, distributed processing, parallel processing, and grid computing were quite mature. They are the technical basis for the development of cloud computi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/24532
Inventor 张琳邵天昊王汝传韩志杰付雄季一木
Owner NANJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products