Cluster implementation method and system

An implementation method and clustering technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of low processing efficiency and the inability to achieve clustering processing of massive data, so as to speed up the calculation speed and solve the problem of clustering. The effect of low processing and processing efficiency and shortening of waiting time

Active Publication Date: 2011-03-30
CHINA MOBILE COMM GRP CO LTD
View PDF0 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] The embodiment of the present invention provides a clustering implementation method and a clustering implementation system, and solves the problem of inability to implement clustering processing and low processing efficiency for massive data in the prior art by using multiple nodes for parallel processing

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cluster implementation method and system
  • Cluster implementation method and system
  • Cluster implementation method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] The clustering implementation method and system provided by the embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

[0038] see figure 1 , is a flow chart of a clustering implementation method provided by an embodiment of the present invention, including the following steps:

[0039]Step S101, the master control node determines a core sample according to the samples in the sample database that are currently unmarked to belong to the cluster, and marks each sample in the ε neighborhood of the core sample as belonging to the current cluster, and the ε neighbor of the core sample Each sample in the domain is stored in the candidate queue.

[0040] Step S102, the master control node fragments the candidate samples in the candidate queue, distributes and issues the fragmented samples to at least two computing nodes.

[0041] Step S103, each calculation node respectively determines whether each sample in the a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a cluster implementation method and system, wherein the method comprises the following steps: carrying out sharding on candidate samples in a candidate queue by a master control node; and respectively determining whether each sample in allocated samples subject to sharding is a core sample parallelly according to a preset epsilon neighborhood and the minimum density by at least two computing nodes, thus due to the parallel processing of the computing nodes, the marking speed of a cluster to which each sample in a sample database belongs is quickened. The invention also discloses another cluster implementation method and system, and the cluster implementation method comprises the following steps: carrying out sharding on samples which are not marked currently in a sample database by a master control node; allocating and issuing the samples subject to sharding to at least two computing nodes; carrying out parallel processing on candidate samples in a candidate queue by the computing nodes; and combining the obtained processing results of the computing nodes by merge nodes. Because each computing node only processes part of samples, the problem that mass data can not be processed by one computer is solved, and because the mass data can be subject to parallel processing by a plurality of the computing nodes and a plurality of the merge nodes, the processing efficiency is greatly improved.

Description

technical field [0001] The invention relates to the field of data mining, in particular to a method for realizing clustering of massive sample data and a corresponding system. Background technique [0002] In the current field of data mining, existing clustering algorithms can be divided into several categories, including partition-based methods, hierarchical-based methods, density-based methods, grid-based methods, and model-based methods. [0003] When performing data mining, it is necessary to calculate and analyze all the data one by one, and the time complexity of the algorithm is high. Massive data is a challenge to various clustering algorithms. Most of the existing clustering algorithms are still in the laboratory stage. For massive data, some algorithms either cannot be processed effectively, or the processing efficiency is very low. [0004] The DBSCAN algorithm is a clustering algorithm based on spatial density. The algorithm divides regions with sufficiently h...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 徐萌高丹邓超罗治国周文辉孙少陵何清赵卫中马慧芳
Owner CHINA MOBILE COMM GRP CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products