Density-based partitioning and clustering method for K center points in data mining

A clustering method and data mining technology, applied in the field of clustering, can solve problems that affect the clustering results, the local optimal solution of the results, and the sensitivity of the initial clustering center, so as to save computing time, stabilize the clustering results, and improve computing efficiency fast effect

Inactive Publication Date: 2015-07-08
无锡中科泛在信息技术研发中心有限公司
View PDF0 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, there are some defects in the original K-means algorithm: 1), the original algorithm requires the user to give the K value, that is, the number of clusters, this value is mainly obtained from experience, so it is difficult to determine the K value; 2), The algorithm is sensitive to the initial clustering center, and the selection of the initial center will affect the clustering results and the efficiency of the algorithm operation; 3), the algorithm is sensitive to abnormal data, which will cause the result to fall into a local optimal solution
Another problem is that the initial center point may be selected as a point in the same cluster, that is, although the density of a certain point is relatively high, a point in the cluster corresponding to the point has already been selected as the center point. You should choose representative points in other classes, otherwise, the result will easily fall into the local optimal solution

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Density-based partitioning and clustering method for K center points in data mining
  • Density-based partitioning and clustering method for K center points in data mining
  • Density-based partitioning and clustering method for K center points in data mining

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0021] The present invention will be further described below in conjunction with specific drawings and embodiments.

[0022] Such as Figure 5 Shown: in order to improve the accuracy rate of classification, stability is high, improves fast convergence, clustering method of the present invention comprises the following steps:

[0023] Step 1. Given the required data set, and determine the number K of clusters;

[0024] In the embodiment of the present invention, for the data set X={x i |i=1,2,…,n}, the data object has m-dimensional features, C j (j=1,2,...,K) represents K categories of clustering, c j (j=1,2,...,K) represents the initial cluster center.

[0025] Step 2, calculate the density of all data objects in the data set, and calculate the average density of the data set according to the density of the obtained data objects;

[0026] In the embodiment of the present invention, the density of data objects in the data set is

[0027] density ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a density-based partitioning and clustering method for K center points in data mining. The method comprises the following steps that 1, a needed data set is given, and the clustering number K is determined; 2, the densities and average density of data objects are calculated; 3, the minimum density distance value of each data object in the data set is calculated; 4, the minimum density distance values of the data objects in the data set are descendingly sorted, and K data objects corresponding to the minimum density distance values are selected as a clustering center from large to small according to the determined clustering number K, wherein the densities of the K data objects are larger than the average density; 5, the data objects in the data set are distributed to an initial clustering center closest to the data objects, and a clustering result is obtained. The high-quality center points can be selected, subsequent iteration updating steps in a K-means algorithm are not needed, computation complexity is lowered, classification accuracy is improved, high stability is achieved, and operation efficiency is improved.

Description

technical field [0001] The invention relates to a clustering method, in particular to a density-based K center point division clustering method in data mining, which belongs to the technical field of clustering analysis. Background technique [0002] Data mining is one of the hot topics in computer research today. As an unsupervised machine learning method, cluster analysis refers to how to automatically divide data objects into different clusters for a set of data objects, so that the same cluster Objects in a certain measure have high similarity, while data objects in different clusters have low similarity. Cluster analysis is widely used in cutting-edge fields such as machine learning, data mining, speech recognition, image segmentation, business analysis, and bioinformatics processing. At present, traditional clustering algorithms mainly include five categories, they are: partition-based clustering algorithms, hierarchical-based clustering algorithms, density-based clus...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 袁启龙史海波周晓锋
Owner 无锡中科泛在信息技术研发中心有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products