Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Method for determining optimum cluster number

A clustering number, the best technology, applied in the direction of text database clustering/classification, relational database, database model, etc., can solve the problems of poor calculation efficiency, difficult to accurately determine the k value of clustering number, limitations, etc.

Inactive Publication Date: 2014-04-09
XIAN UNIV OF TECH
View PDF2 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In fact, there are several deficiencies in the trial-and-error process. First, the determination of the k value of the cluster number is difficult for users who lack rich experience in cluster analysis. The method of reasonable clustering number k; the second is that many indicators for testing the effectiveness of clustering have been proposed, and the main representatives are V xie Index, V wsj Indicators, etc.
Since these indicators are proposed based on a specific clustering algorithm, the method is greatly limited in practical application.
In addition, this method has poor computational efficiency for large-scale and complex-dimensional data sets.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for determining optimum cluster number
  • Method for determining optimum cluster number
  • Method for determining optimum cluster number

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] The present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0048] The present invention uses a validity index Q(C) to evaluate the clustering effect of the data set. The validity index measures the quality of clustering mainly through the compactness of data objects within a class and the separation degree of data objects between classes. The related concepts are introduced below.

[0049] 1. Effectiveness indicators

[0050] Suppose for a cube DB, one of the clusters is divided into C k ={C 1 ,C 2 ,...,C k}. At this time, cluster C k The intra-class compactness of is obtained by calculating the sum of the squares of the distances between any two data objects in the same class, using Scat(C k )To represent:

[0051] Scat ( C k ) = Σ i = ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Discloses is a method for determining an optimum cluster number. The cluster effect of a data set is evaluated through an effectiveness indicator Q (C), and the cluster number corresponding to the minimum value of the effectiveness indicator Q (C) is the optimum cluster number. According to the method for determining the optimum cluster number, a new similarity measuring method is provided, all possible cluster partitions are generated in a bottom-up mode by combining hierarchical clustering, the effectiveness indicator value at the moment is calculated, a cluster quality curve regarding different partitions is established according to the effectiveness indicator value, and the partition corresponding to the extreme point of the curve is the optimum cluster partition. Repeated clustering on a large data set can be avoided, and the method does not rely on specific clustering algorithms. Experimental results and theoretical analysis both show that the method has good performance and feasibility, and computational efficiency can be improved greatly.

Description

technical field [0001] The invention belongs to the technical field of data mining and relates to a method for determining the optimal number of clusters. Background technique [0002] Most of the determination of the optimal number of clusters is carried out using an iterative trial-and-error process. On a given data set, different parameters (usually the number of clusters k) are used to run a specific The clustering algorithm divides the data set differently, and then calculates the effectiveness index values ​​of various divisions. By comparing each index value, the number of clusters corresponding to the index value that meets the predetermined conditions is selected as the optimal number of clusters. . In fact, there are several deficiencies in the trial-and-error process. First, the determination of the k value of the cluster number is difficult for users who lack rich experience in cluster analysis. The method of reasonable clustering number k; the second is that m...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/35G06F16/285
Inventor 周红芳王啸赵雪涵段文聪郭杰张国荣王心怡何馨依
Owner XIAN UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products