Method and system for clustering optimization based on Canopy algorithm

An optimization method and clustering technology, applied in computing, special data processing applications, instruments, etc., can solve the problems of influence and lack of reference in value selection, and achieve the effect of improving calculation efficiency, reducing the number of comparisons, and eliminating influence.

Inactive Publication Date: 2015-11-25
INST OF ACOUSTICS CHINESE ACAD OF SCI
View PDF0 Cites 26 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

But the disadvantage is: K-means clustering requires the user to give the number of clusters in advance. The selection of k is generally based on some empirical values ​​and multiple experiment

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for clustering optimization based on Canopy algorithm
  • Method and system for clustering optimization based on Canopy algorithm
  • Method and system for clustering optimization based on Canopy algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0054] figure 1 It is the overall flow chart of the method of the present invention, which is mainly divided into 2 steps:

[0055] 1) Choose the simple and low computational cost of Canopy clustering method to calculate the object similarity, put similar objects in a subset, this subset is called Canopy, and get several Canopy through a series of calculations. Canopy can overlap. Yes, but there is no case that an object does not belong to any Canopy. This stage can be regarded as data preprocessing; after the Canopy clustering of the data set is completed, it is similar to Figure 2:

[0056] 2) The K-means clustering algorithm is used in each Canopy, and similarity calculations are not performed between objects that do not belong to the same Canopy.

[0057] The main idea of ​​generating Canopy: Initially, suppose we have a set of points S, and preset two distance thresholds, T1, T2 (T1> T2); Then select a point, calculate the distance between it and other points in S (a very low-co...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method for clustering optimization based on a Canopy algorithm. The method comprises the following steps of step (101) carrying out packet processing on all original data based on the Canopy algorithm in order to obtain N Canopy sets and a center of each Canopy set; and step (102) taking a number N of the Canopy sets as a divided number k constructed by a K-means clustering algorithm; taking the center of each Canopy set as a cluster center of a divided kth cluster; and adopting the K-means clustering algorithm to carry out clustering optimization processing on all original data based on the determined cluster number k and cluster center, and outputting a clustering optimization result. According to the method and the system, the Canopy clustering algorithm is used for preprocessing of the K-means clustering algorithm and is used for finding the proper k value and cluster center. The running time of the whole cluster is greatly reduced, the computational efficiency of the algorithm is improved, and the fault tolerance of the algorithm is increased.

Description

Technical field [0001] The invention relates to a clustering method, belonging to the field of data mining, and in particular to a clustering optimization method and system based on Canopy algorithm. Background technique [0002] With the rapid development of computer technology, the amount of data has shown an exponential growth. How to find hidden, previously unknown and potentially valuable information from a large amount of data has become a problem of increasing concern to people. This is why data mining produce. Cluster analysis is a very important part of it. Clustering is a process of dividing a group of data into each class, so as to minimize the distance within the class and maximize the distance between classes, that is, the data in the same class are as similar as possible, and the data in different classes are as similar as possible different. [0003] K-means clustering is a typical distance-based exclusive division method: given a data set of n objects, it can con...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 韩锐崔创雄
Owner INST OF ACOUSTICS CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products