Clustering algorithm based on minimal spanning tree

A clustering algorithm and tree-spanning technology, applied in computing, computer components, instruments, etc., can solve the problems of reducing data volume, expensive algorithm time, sensitive parameter input, etc., and achieve good stability

Inactive Publication Date: 2017-08-25
SHANGHAI NORMAL UNIVERSITY
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The hierarchical process of the clustering algorithm makes the clustering process clear, but the time cost of the algorithm is very expensive
In order to solve the problem of time complexity, Zhang et al. proposed the BRICH convergent clustering algorithm, which compresses data through clustering features and clustering feature numbers. This method not only reduces the amount of data to be processed, but also compresses The data still carry all the information needed by the BRICH algorithm, but the algorithm is only applicable to spherical data sets
The CURE algorithm proposed by Guha et al. can identify data of many complex shapes and handle outliers well. However, the algorithm is particularly sensitive to the input of parameters.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Clustering algorithm based on minimal spanning tree
  • Clustering algorithm based on minimal spanning tree
  • Clustering algorithm based on minimal spanning tree

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0074] (1), artificially generate a data set D={d containing 90 data points 1 , d 2 ,...,d 90}, the number of categories in this data set is K=3, and the attribute dimension of each data point is 2-dimensional. Below, the specific attributes of all data points are listed:

[0075] d 1 (0.6497, 1.7818), d 2 (1.6068, 1.0395), d 3 (1.9584, 0.6588), d 4 (1.8344, 0.3428), d 5 (0.9730, 0.8138), d 6 (0.6096, 1.4670), d 7 (0.0519, -0.1745), d 8 (1.0918, 0.9471), d 9 (1.7432, 1.6060), d 10 (0.7359, -0.1005), d 11 (0.8657, 1.5185), d 12 (1.4774, 1.5292), d 13 (1.6692, 0.3167), d 14 (-0.0975, -0.1654), d 15 (1.1549, 1.3355), d 16 (1.1517, 1.6261), d 17 (0.8829, 1.4484), d 18 (1.6915, 1.5909), d 19 (0.7179, 0.0843), d 20 (0.3600, 1.4522), d 21 (1.2138, 0.5507), d 22 (1.3562, 0.5652), d 23 (0.8315, -0.2052), d 24 (1.6167, 1.8522), d 25 (-0.1759, 1.1356), d26 (0.1674, 1.7774), d 27 (-1.3797, 2.1704), d 28 (1.4039, 0.1191), d 29 (0.2324, 1.4981), d 30 (0.8039, 0....

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a clustering algorithm based on a minimal spanning tree. The clustering algorithm includes the steps of S1, inputting a data set to be clustered and the number of categories K; S2, constructing a minimal spanning tree of the data set; S3, traversing the minimal spanning tree sequentially in a descending order of priority values of nodes; S4, merging between nodes; S5, calculating attributes of a new node after merging; S6, judging whether to end this traversal or not; and S7, judging whether to end the clustering. According to the invention, a skeleton of data points is constructed using the minimal spanning tree, multiple times of traversals and node merging are carried out according to the priority values of the nodes, and the clustering not is ended until the total number of nodes is equal to the number of categories K.

Description

technical field [0001] The invention relates to the fields of machine learning and data mining, in particular to a clustering algorithm based on minimum spanning trees. Background technique [0002] Clustering is to divide the set of data points into multiple clusters composed of similar objects according to the principle of similarity, so that the data belonging to the same cluster are similar to each other, and the data in different clusters are different from each other. Clustering has been very successfully applied in many fields such as business, biology, geography, Internet, etc. In the current era of big data, the research on fast and accurate clustering algorithms is particularly important. So far, typical clustering algorithms include: clustering algorithms based on hierarchy, clustering algorithms based on partition, clustering algorithms based on density and clustering algorithms based on grid, etc. [0003] The k-means clustering algorithm proposed by Forgy bel...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/23213
Inventor 马燕吕晓波张相芬李顺宝张玉萍
Owner SHANGHAI NORMAL UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products