Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Division type Similarity transmission data clustering method

A data clustering and similarity technology, applied in the field of data clustering, can solve problems such as running time growth

Inactive Publication Date: 2008-07-09
ZHEJIANG UNIV
View PDF0 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0022] However, for relation-intensive data sets, that is, data sets in which the similarity between any two objects is limited, when using the similarity propagation data clustering method AP for clustering, its running time will increase as the amount of data increases. Dacheng Cubic Polynomial Growth

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Division type Similarity transmission data clustering method
  • Division type Similarity transmission data clustering method
  • Division type Similarity transmission data clustering method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0080] As shown in FIG. 3 , a flow chart of clustering randomly generated three-dimensional data point data sets distributed in a manifold is given. Below in conjunction with the method of the present invention describe in detail the concrete steps that this example implements, as follows:

[0081] 1) Input the similarity matrix S of a set of 2000 randomly generated three-dimensional data objects in a manifold distribution that needs to be clustered 2000×2000 , s(i, j), i ∈ {1, Λ, 2000}, j ∈ {1, Λ, 2000}, i≠j;

[0082] 2) The matrix S 2000×2000 Divided into 8 parts:

[0083] S = S 11 S 12 Λ S 88 S ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an acceleration method for the similarity propagation data clustering method. The method includes the following steps: firstly, partitioning the similarity matrix of an imported data set and cluster the sub-data set on each sub-matrix using the similarity propagation data clustering method; then, combining the clustering result of the sub-data set in some way, and based on which, clustering the whole data set again using the similarity propagation data clustering method. The invention handles intensive relational data set with a large amount of data and can obtain a basically same result as the similarity propagation data clustering method in a shorter time. Compared with the similarity propagation data clustering method, the invention is provided with an obvious acceleration effect for the clustering of intensive relational data set with a large amount of data.

Description

technical field [0001] The invention relates to the field of massive multimedia data processing, in particular to a data clustering method. Background technique [0002] In the age of the information explosion, people are faced with massive amounts of data. in Google TM Search for the keyword "car" on the Internet, and you will get 217,000,000 results; search for the keyword "racing car", and your number of results will only be 13,600,000; search for the keyword "blue racing car", and your number of results will drop further , only 455,000. It can be seen that clustering and grouping the existing data, so that the data in each group has some common characteristics, will bring great convenience to your further processing of the data. [0003] There are many clustering methods, the most commonly used is the k-means clustering method. The k-means clustering method is very convenient to implement, but it is very sensitive to the selection of the initial cluster center - if t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 吴飞庄越挺张绪青郭同强夏丁胤
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products