Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Data clustering method for rapidly determining clustering center

A clustering method and clustering center technology, applied in the field of data analysis, can solve the problems of manual determination of clustering centers, small parameter dependence, low clustering accuracy, etc., achieve good applicability and scalability, and reduce parameters Sensitivity issues, good clustering effects

Inactive Publication Date: 2016-10-26
ZHEJIANG UNIV OF TECH
View PDF0 Cites 20 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Aiming at the shortcomings of the existing data clustering methods that the cluster centers need to be determined manually, the clustering accuracy rate is low, the clustering effects of different data sets are different and the parameter dependence is large, the present invention proposes a density-based clustering method Data clustering method for fast center determination, high accuracy, less difference in clustering effects of different data sets and less parameter dependence

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data clustering method for rapidly determining clustering center
  • Data clustering method for rapidly determining clustering center
  • Data clustering method for rapidly determining clustering center

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] The present invention will be further described below in conjunction with the accompanying drawings.

[0041] refer to Figure 1 to Figure 5 , a clustering method for quickly determining cluster centers, including the following steps:

[0042] 1) Read the original data set, perform dominant analysis on the data set, select the corresponding distance calculation method through the dominant analysis, and use this distance calculation method to obtain the distance matrix of the entire data set. The process is as follows:

[0043] 1.1 If the input data set has p-dimensional numerical attribute data and q-dimensional categorical attribute data, then by comparing the size of p and q, the data set is divided into numerical dominant data set and classification dominant data set.

[0044] 1.2 According to the results obtained by the dominant analysis, the corresponding distance calculation formula is used to calculate the data set, and the similarity distance matrix of the data...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Provided is a data clustering method for rapidly determining a clustering center, comprising the steps of: 1) reading an original data set, selecting a corresponding distance calculating method through dominance analysis, and solving the distance matrix of a whole data set; 2) rapidly determining a clustering center; and 3) selecting optimal dc: 3.1, finding the maximum value dmax and the minimum value dmin in a similarity distance matrix, and calculating a current dc value through setting a percent value; 3.2, after dc is selected and a clustering result is obtained, designing a Fitness function as an evaluation index; 3.3, employing a climbing algorithm to select optimal dc; and 3.4, outputting the optimal dc and the clustering result of the optimal dc. The data clustering method possesses the characteristics of higher accuracy, and smaller difference and parameter dependency of different data set clustering effects.

Description

technical field [0001] The invention belongs to data analysis technology, in particular to a data clustering method. Background technique [0002] With the development of big data technology, the amount of data generated has increased rapidly, and cluster analysis, as an important technology for the analysis of various data, has once again become a research hotspot. Cluster analysis is widely used in various fields such as finance, marketing, information retrieval, information filtering, scientific observation and engineering. Traditional clustering algorithms include: partition-based algorithms, hierarchy-based algorithms, density-based algorithms, etc. [0003] Algorithms based on partitioning include k-means algorithm and PAM algorithm. The similarity calculation of the k-means algorithm is based on the average value of objects in a cluster. The goal of the algorithm is to divide the data set into k clusters according to the input parameter k. The algorithm adopts an ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62
CPCG06F18/2321
Inventor 陈晋音林翔郑海斌保星彤
Owner ZHEJIANG UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products