Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Semi-supervised hierarchical clustering method based on ultrametric distance matrix

A technology of distance matrix and hierarchical clustering, applied in the field of clustering, can solve the problems of high time complexity of HAC, imprecise optimal number of clusters, limited effectiveness, etc.

Inactive Publication Date: 2015-03-04
NANJING UNIV OF SCI & TECH
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The HAC algorithm is very simple in the cluster object, it can use a similar method to find clusters of different shapes, but HAC also has some disadvantages: (1) HAC has a high time complexity, for example, for the centroid point algorithm (priority queue method), its time complexity is O(N2logN); (2) The effectiveness of obtaining clusters with pedigree graphs is limited
Many efficiency methods exhibit shifting patterns to the lower layers of the pedigree graph, which can lead to imprecise estimates of the optimal number of clusters

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Semi-supervised hierarchical clustering method based on ultrametric distance matrix
  • Semi-supervised hierarchical clustering method based on ultrametric distance matrix
  • Semi-supervised hierarchical clustering method based on ultrametric distance matrix

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] combine image 3 , a semi-supervised hierarchical clustering method based on hypermetric distance matrix, including the following steps:

[0032] Step 1, define inequality constraints A closed convex set of , and will be C, E projected to in is an m*1 vector used to represent the n*n symmetric dissimilarity matrix D; C is an m*r dissimilarity matrix x 1,1 x 1,2 . . . x 1 , r x 2,1 x 2,2 ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a semi-supervised hierarchical clustering method based on an ultrametric distance matrix. The method comprises steps as follows: step 1, a closed convex set of inequality constraint is defined, and parameter estimation is projected to the closed convex set; step 2, an estimation solution vector is updated by reducing a variable vector formed in the projection; and step 3, iteration projection is performed until a given constraint fixed set is converged to the least-square optimal solution. According to the method, a semi-supervised hierarchical clustering frame based on an ultrametric tree diagram distance is taken as a research background, an optimized method is adopted, and the semi-supervised hierarchical clustering method based on the ultrametric distance matrix is provided and used for improving the efficiency and accuracy for solving the semi-supervised hierarchical clustering problem.

Description

technical field [0001] The invention belongs to the clustering technology in data mining, in particular to a semi-supervised hierarchical clustering method based on an ultra-metric distance matrix realized by an optimization technology. Background technique [0002] The process of grouping a collection of physical or abstract objects into similar object classes is called clustering. Clustering problems arise in many disciplines and are widely used. Basically, the purpose of clustering is to classify given samples into corresponding clusters so that samples in the same cluster are similar to each other and samples in different clusters are different from each other. Based on the way clusters are generated, clustering methods can be divided into two categories: partitional clustering and hierarchical clustering. Partitioning clustering generally decomposes a data set into some disjoint clusters, and this decomposition is usually optimal in terms of some pre-defined objective...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/285
Inventor 徐建李涛周文强张宏许福李千目
Owner NANJING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products