Parameterization-free clustering algorithm and system based on minimum spanning tree

A clustering algorithm and tree-generating technology, applied in the field of clustering algorithms, can solve problems such as reducing algorithm dependence, clustering accuracy and computational complexity dependence, and large-scale data sets

Pending Publication Date: 2020-04-24
NANJING NORMAL UNIVERSITY
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this algorithm also has obvious defects: the accuracy and computational complexity of clustering are heavily dependent on the selection of initial cluster number k and initial cluster center parameters
In a large number of practical application scenarios, the data set is not only large in scale, but also in the process of dynamic change, so the number

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Parameterization-free clustering algorithm and system based on minimum spanning tree
  • Parameterization-free clustering algorithm and system based on minimum spanning tree
  • Parameterization-free clustering algorithm and system based on minimum spanning tree

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0099] In order to evaluate the beneficial effects of the present invention, the experiment compared three different algorithms - the traditional K-means algorithm, the MSTCluster algorithm based on the minimum spanning tree and the MNC algorithm in this paper, which clustered two-dimensional random data sets with different shapes The result is as Figure 5 . Since the traditional K-means is a parametric algorithm, that is, the input requires additional parameters besides the data set to be clustered, that is, the number of clusters k and the initial cluster center; MSTCluster based on the minimum spanning tree is compared with the traditional K-means It belongs to the non-parametric clustering algorithm, because it does not need to specify the number of clusters k and the initial cluster center, but in order to determine the pruning threshold, the algorithm still needs to input a parameter, that is, the adjustment factor, so the algorithm is not completely parametric; Howeve...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a parameterization-free clustering algorithm and system based on a minimum spanning tree, and the method comprises the following steps: firstly abstracting a to-be-clustered data set into an empowerment complete graph WCG, where points represent vectors, and empowerment edges represent the similarity relation between data; converting the WCG into a fully connected minimum spanning tree MST; clustering the one-dimensional weight space of the MST edge set by using a K-means algorithm in which k = 2 to obtain a pruning threshold; and finally, performing pruning and noise filtering on the MST, and obtaining a connected component being a clustered cluster. According to the algorithm, an original high-complexity uncertain-category-number dimensional space clustering problem is converted into a low-complexity two-category-number one-dimensional space clustering problem; according to the method, the defects of the K-means algorithm are overcome, non-parameterized clustering is truly realized, the clustering efficiency is improved while the clustering time is shortened, and the dependence of the algorithm on empirical parameters is eliminated.

Description

technical field [0001] The invention relates to a non-parameterized clustering algorithm based on the minimum spanning tree, which belongs to the field of clustering algorithms. Background technique [0002] Clustering algorithm is a very effective unsupervised machine learning algorithm, which is an important branch in the field of data mining. Traditional clustering algorithms can be roughly divided into partition clustering methods, hierarchical clustering methods, density clustering methods, grid clustering methods, model clustering methods and so on. As a partition clustering algorithm, K-means algorithm has many advantages such as simple principle, easy description, high time efficiency and suitable for processing large-scale data, so it is widely used in many fields. However, this algorithm also has obvious defects: the accuracy and computational complexity of clustering depend heavily on the selection of initial cluster number k and initial cluster center parameters...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62
CPCG06F18/22G06F18/23213
Inventor 吴怀岗陈靖飒窦万峰程开丰
Owner NANJING NORMAL UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products