Improved Canopy parallel algorithm implementation structure

Inactive Publication Date: 2015-11-18
FUDAN UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The traditional structure adopts Hadoop-style master-slave structure, which needs to move the global cluster multiple times, and the communication overhead is too large

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Improved Canopy parallel algorithm implementation structure
  • Improved Canopy parallel algorithm implementation structure
  • Improved Canopy parallel algorithm implementation structure

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] The general processing principle is: when the data is judged to be a strong node of the cluster to which the node belongs, the point is assigned to this cluster, and the next data is processed. Otherwise pass the data to the next node and start working on it. Repeat the above work until all the data are processed.

[0030] The chain structure is composed of nodes that equally distribute work in series and form a ring structure. Each node continuously scans the messages sent by the previous nodes, and if there is data stored in the data buffer, it waits for processing. If the data in the data buffer has been processed and there is still no data after a certain period of time, the node will automatically sleep until the next node wakes up.

[0031] The strong clustering points of each cluster are stored under their respective nodes. The weakly clustered point sets and center points of all clusters are globally visible. The specific implementation method is to update t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the field of algorithm parallelization, and in particular relates to an improved Canopy parallel algorithm implementation structure. The improved Canopy parallel algorithm implementation structure disclosed by the invention comprises the following steps that: a chain node connection structure is adopted; when data shows that a node is a strong node in a node belonging cluster, the node belongs to the cluster; next data is processed; or else, the data is transmitted to a next node, so that the next node is started; the above works are repeated till all the data is processed; the chain structure is formed from serial nodes for equally distributing works and forms an annular ring; each node continuously scans information sent by previous nodes; if being registered in a data buffer area, data waits for being processed; if the data in the data buffer area is processed and the data is still not transmitted after a certain time, the node is dormant automatically till the next node is awakened; strong clustering points of various clusters are stored below respective nodes; and weak clustering point sets and central points of all clusters are globally visible. By means of the improved Canopy parallel algorithm implementation structure disclosed by the invention, the communication traffic and the power consumption can be greatly reduced; and the operation speed is increased.

Description

technical field [0001] The invention belongs to the field of algorithm parallelism, and is specifically an improved Canopy parallel algorithm realization structure. Background technique [0002] With the advent of the big data era, the growing demand for big data analysis puts higher and higher requirements on the performance of analysis algorithms. At the same time, with the development of multi-core technology, algorithm parallelization is an efficient way to improve algorithm performance. Therefore, big data analysis algorithms with higher parallel efficiency have gradually become the mainstream solution. [0003] Among many big data analysis algorithms, Canopy is a more general clustering algorithm. In business, cluster analysis is used to discover different customer groups and find customer markets. On the Internet, cluster analysis is used to classify documents on the Internet to restore information. In e-commerce, cluster analysis is also a very important aspect in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/285Y02D10/00
Inventor 荆明娥周力君田书东谢志成尹颖颖王洁琳杨建伟
Owner FUDAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products