Unlock instant, AI-driven research and patent intelligence for your innovation.

K-MEANS clustering method and system based on centroid median zone

A clustering method and technology of the intermediate zone, which is applied in the fields of instruments, character and pattern recognition, computer parts, etc., can solve the problems of difficult convergence of data sets, poor control of K value selection, and poor clustering effect, so as to reduce the Overfitting and improving the effect of generalization

Inactive Publication Date: 2020-11-03
青岛网信信息科技有限公司
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The main shortcomings of the current K-Means are: 1) The selection of the K value is not easy to grasp; 2) It is difficult to converge for a non-convex data set; 3) If the data of each hidden category is unbalanced, such as the hidden category If the amount of data is seriously unbalanced, or the variance of each hidden category is different, the clustering effect will not be good; 4) Using the iterative method, the result obtained is only a local optimum; 5) Sensitive to noise and outlier comparison

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • K-MEANS clustering method and system based on centroid median zone
  • K-MEANS clustering method and system based on centroid median zone
  • K-MEANS clustering method and system based on centroid median zone

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0037] Such as figure 1 Shown, the K-MEANS clustering method based on centroid intermediate band of the present invention, comprises the steps:

[0038] S1: Proposal of the middle zone of the center of mass: the middle zone of the center of mass refers to setting a middle zone along the place where the center of mass swings left and right, that is, the distance difference between the middle zone and each centroid is less than the set threshold, which is set as the minimum recognition threshold Y ;

[0039] Such as figure 2 Shown, S2: the selection of the minimum recognition threshold Y, including the following situations:

[0040] Case 1: Based on the understanding of the data, fixed experience is given, that is, prior experience;

[0041] Case 2: Increment or decrement within a certain range according to the number of iterations of the algorithm;

[0042] Situation 3: The default is biased toward certain categories, that is, preference clustering;

[0043] Such as ima...

Embodiment 2

[0057] The K-MEANS clustering system based on the middle band of the centroid of the present invention comprises the following modules:

[0058] The main control module is used to realize the K-MEANS clustering method based on the centroid median and the main control module of the system;

[0059] A storage control module, used to control the transmission and storage of data;

[0060] And calculate the initialization centroid and the minimum recognition threshold initialization module through the K-MEANS algorithm;

[0061] Calculate the distance from the sample point to each centroid, and the minimum recognition threshold calculation module;

[0062] A centroid update module that updates the centroid by comparing the two classifications with the minimum distance;

[0063] Calculate the distance between the centroids before and after the update, the two classification comparisons of the minimum distance, and the output variance determination module of the centroid middle zon...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a KMEANS clustering method and system based on a centroid median zone, and belongs to the technical field of data mining clustering analysis. The method comprises the following steps that S1, providing a centroid middle zone, specifically, the centroid middle zone is a middle zone arranged at the position where the centroid middle zone swings left and right, that is, the distance difference between the middle zone and each centroid is smaller than a set threshold value, and the threshold value is set to be a minimum recognition threshold value Y; S2, selecting a minimum identification threshold Y; S3, selecting two classifications with the minimum distance; and S4, outputting a determined value of a variance. According to the method, a centroid middle zone is arranged at the position where the centroid swings left and right, that is, the distance difference between the point and each centroid is smaller than a set threshold value, and points in the zone are randomly classified. The KMEANS clustering method based on the centroid median zone has the advantages that overfitting can be reduced, generalization is improved, and the KMEANS clustering method basedon the centroid median zone is a new KMEANS algorithm improvement for the algorithm.

Description

technical field [0001] The invention relates to a K-MEANS clustering method and system based on centroid intermediate bands, and belongs to the technical field of data mining clustering analysis. Background technique [0002] Cluster analysis is a statistical analysis method to study classification problems and an important method of data mining. The K-MEANS algorithm is a partition-based clustering algorithm. The main shortcomings of the current K-Means are: 1) The selection of the K value is not easy to grasp; 2) It is difficult to converge for a non-convex data set; 3) If the data of each hidden category is unbalanced, such as the hidden category If the amount of data is seriously unbalanced, or the variance of each hidden category is different, the clustering effect will not be good; 4) the iterative method is used, and the result obtained is only a local optimum; 5) it is sensitive to noise and outlier comparison. Contents of the invention [0003] Aiming at the abo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62
CPCG06F18/23213
Inventor 周书田薛雁于海洋
Owner 青岛网信信息科技有限公司
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More