Clustering method, system and medium for automatically confirming cluster number based on coefficient of variation

A technology of variation coefficient and clustering method, which is applied in the direction of instruments, character and pattern recognition, computer components, etc., can solve the problems of manually setting the number of clusters and improper selection of initial centroids, and achieve the effect of improving the quality of clustering

Active Publication Date: 2018-12-21
UNIV OF JINAN
View PDF6 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In order to solve the deficiencies of the prior art, the present invention provides a clustering method, system and medium for automatically confirming the number of clusters based on the coefficient of variation, which solves the defects of the traditional k-means++ clustering algorithm manually setting the number of clusters and improper selection of initial centroids , using the concept of variation coefficient and density index to improve the division-based k-means++ clustering algorithm, without manually setting the number of clusters, it also ensures the accuracy of the clustering results;

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Clustering method, system and medium for automatically confirming cluster number based on coefficient of variation
  • Clustering method, system and medium for automatically confirming cluster number based on coefficient of variation
  • Clustering method, system and medium for automatically confirming cluster number based on coefficient of variation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0059] It should be pointed out that the following detailed description is exemplary and intended to provide further explanation to the present application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

[0060] It should be noted that the terminology used here is only for describing specific implementations, and is not intended to limit the exemplary implementations according to the present application. As used herein, unless the context clearly dictates otherwise, the singular is intended to include the plural, and it should also be understood that when the terms "comprising" and / or "comprising" are used in this specification, they mean There are features, steps, operations, means, components and / or combinations thereof.

[0061] like figure 1 As shown, the clustering method that automatically confirms the number of clusters based o...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a clustering method, a system and a medium for automatically confirming the number of clusters based on a coefficient of variation, wherein, the density value of each data point in a data set is calculated, the density index is calculated according to the density value, and the data point with the largest density index is selected as a first clustering center; Calculating the shortest distance between each data point and the existing clustering center, calculating the probability that each data point is selected as the clustering center according to the shortest distance, and preselecting the clustering center according to the roulette disc method; Until the set cluster centers are selected, the initial cluster centers selected are used for k-means clustering to generate a corresponding number of clusters; Calculate the average intra-cluster coefficient of variation and the minimum inter-cluster coefficient of variation, then calculate the difference between theaverage intra-cluster coefficient of variation and the minimum inter-cluster coefficient of variation, compare the difference with the set value, and if the difference is less than the set value, merge the two clusters with the minimum inter-cluster coefficient of variation; Until the difference is greater than or equal to the set value, the clustering result is output.

Description

technical field [0001] The invention relates to a clustering method, system and medium for automatically confirming the number of clusters based on the coefficient of variation. Background technique [0002] With the rapid development of information technology, many industries, such as commerce, enterprises, scientific research institutions and government departments, have accumulated massive amounts of data stored in different forms. These massive amounts of data often contain various useful information. It is difficult to obtain these information only relying on the query and retrieval mechanism of the database and statistical methods, so data mining technology is also developing rapidly. Clustering analysis technology is an important research field in data mining and has been widely used in many applications. , including pattern recognition, data analysis, image processing, and market research. [0003] Clustering analysis technology is an unsupervised learning method, i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/23213
Inventor 刘腾腾曲守宁张坤杜韬王凯郭庆北朱连江王钦
Owner UNIV OF JINAN
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products