Approximate quick clustering and index method for mass data

A clustering method and clustering technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve problems such as running time increase

Inactive Publication Date: 2009-01-07
ZHEJIANG UNIV
View PDF0 Cites 25 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0016] However, the distance formed between the data points is a dense matrix (Dense Matrix), and when the clustering method AP based on information transfer is used for clustering, its running time will increase exponentially with the increase in the amount of data.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Approximate quick clustering and index method for mass data
  • Approximate quick clustering and index method for mass data
  • Approximate quick clustering and index method for mass data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0058] The example of the data clustering method of the recursive segmentation information transmission that the present invention proposes is as figure 1 with figure 2 as shown,

[0059] The specific instructions are as follows:

[0060] The first clustering method based on block recursive information transfer includes the following steps:

[0061] The input includes a collection of N data objects, and the similarity matrix S between these objects N×N , where S[i, j]≤0 (i=1ΛN, j=1ΛN), we perform clustering based on block recursive information transfer:

[0062] 1) The similarity matrix S N×N Divide into k parts evenly, then divide each part into m parts evenly, ... and so on:

[0063] S = S 11 L ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a massive data-oriented recursive blocking information transfer clustering and indexing method. By using the method, the accurate and fast clustering of various massive data can be realized, and the method is suitable for an index structure of the query and the updating. Users can deal with massive and unordered data and can carry out fast clustering and indexing, thus being beneficial to the later query, search, maintenance and updating. The invention can be applied to the fast clustering and indexing of massive internet-oriented texts, images, videos, frequency, and the like, and can also be applied to similarity comparison of massive biological gene sequence and homologous protein detection. The invention also discloses a proximate and fast clustering method of massive data. The clustering method of the invention can ensure that the speed of the clustering can be increased exponentially under the circumstance of little loss of clustering effect and can be better beneficial to the clustering, inserting and updating of data outside a training set, therefore, the method can generally applied to the fast clustering and indexing of various complex massive data.

Description

technical field [0001] The invention relates to an approximate fast clustering and indexing method for massive data. The method provides judgment basis for the clustering analysis of massive complex data by transferring the similarity information between local area data, so as to realize the approximate fast clustering and indexing of massive data, which belongs to the field of multimedia information processing and data mining algorithm. Background technique [0002] In today's highly developed science and technology, people often have to deal with massive amounts of data, such as hundreds of millions of web pages, pictures, videos, audios, etc. on the Internet, and gene sequences sequenced by various organisms. , are huge projects containing massive amounts of data information, and these data are still in the process of dynamic and rapid growth. When we face such a huge amount of data to be processed, data mining is particularly important, and clustering is one of the most...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 庄越挺吴飞夏丁胤郭同强张绪青
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products