Check patentability & draft patents in minutes with Patsnap Eureka AI!

Improved K-means clustering algorithm based on density radius

A k-means clustering and radius technology, applied in the field of clustering algorithms, can solve problems such as inaccurate selection of k value, sensitivity to noise and outliers, etc.

Inactive Publication Date: 2018-09-18
成都康乔电子有限责任公司 +1
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The technical problem to be solved in the present invention is: provide a kind of improved K-means clustering algorithm based on density radius, solve the local optimum solution that existing K-means clustering algorithm exists, be sensitive to noise and outlier point, k Inaccurate value selection problem

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Improved K-means clustering algorithm based on density radius
  • Improved K-means clustering algorithm based on density radius
  • Improved K-means clustering algorithm based on density radius

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0055] Embodiment provides a kind of improved K-means clustering algorithm based on density radius, comprises the steps:

[0056] 1. Data set preparation, assuming that there are m sample points in the data set, each sample point is v dimension, where v∈Z * . The data set is denoted as T={n 1 ,n 2 ,...,n m}, where n i Represents the sample point, m represents the number of sample points, sample point n i The coordinates are marked as (x i,1 ,x i,2 ,...,x i,v ), v represents the dimension;

[0057] 2. Data preprocessing: use the lof method to remove noise and outliers;

[0058] 3. Normalize the data: Divide the coordinates of each dimension of the sample point by the maximum value of the coordinates of the sample point in the corresponding dimension. The calculation formula is shown in (1), so that the normalized sample coordinate x i.j ∈[0,1],

[0059]

[0060] 4. After normalization, calculate the Euclidean distance between all sample points, where the i-th samp...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention, which relates to the field of clustering algorithms, discloses an improved K-means clustering algorithm based on density radius so that problems that a local optimal solution exists, the sensitivity to the noises and outliers is high, and the k value selection is not accurate of the existing K-means clustering algorithm can be solved. All sample points are ranked according to the density radius, the sample point with the largest density radius is used as an initial value, the above-mentioned steps are repeated, all the initial points and the category number k are selected, and clustering operation is stated; two centroids at nearest distances are selected among the clustered category centroids, the categories of the two centroids are taken separately and viewed as a dichotomy, a Bayesian score of the dichotomy is calculated, the two categories are combined into one, a Bayesian score after combination is calculated, whether the two categories need to be combined is determined based on the score, and the above-mentioned steps are repeated until no combination is needed. The clustering algorithm is suitable for big data clustering processing.

Description

technical field [0001] The invention relates to the field of clustering algorithms, in particular to an improved K-means clustering algorithm based on density radius. Background technique [0002] Clustering is to divide some physical or abstract objects into several clusters according to the similarity between objects, so that the data in the same cluster has a high similarity, and the data in different clusters are similar Sex is low. Clustering is an unsupervised learning method that classifies unlabeled data without prior information. The K-means algorithm is the most commonly used typical partitioning algorithm in cluster analysis. This algorithm divides the data according to a certain similarity measurement method, so that the distance between each data and the centroid of the cluster to which it belongs is as small as possible. , the algorithm is widely used because of its simplicity and high efficiency. But at the same time, it also has some defects, such as the n...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/23213
Inventor 万思思刘丹王永松伍功宇
Owner 成都康乔电子有限责任公司
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More