Method for efficiently clustering massive data

A technology of large-scale data and clustering methods, which is applied in the fields of electrical digital data processing, special data processing applications, instruments, etc., to achieve a wide range of applications and accurate clustering results.

Inactive Publication Date: 2011-11-16
XI AN JIAOTONG UNIV
View PDF1 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, there are still many problems in the application of existing clustering methods to large-scale data, and the main problems are timeliness and accuracy.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for efficiently clustering massive data
  • Method for efficiently clustering massive data
  • Method for efficiently clustering massive data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only part of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts fall within the protection scope of the present invention.

[0034] Such as figure 1 As shown, it is a schematic flow chart of efficient clustering of large-scale data in the present invention. First, the original data is sampled hierarchically. The number of sampling series is determined by the number of series to be clustered. The data volume of the sampled data increases step by step. The size of the data in the last level is at least %5 and not less than 30 times of the original data. The total numb...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method for efficiently clustering massive data. The method comprises the following steps of: performing graded sampling on original data, wherein the number of grades for sampling is determined by the number of grades to be clustered, the data quantity of sampled data is increased grade by grade, and the magnitude of the last grade of data needs to satisfy two conditions of being more than 5% of the original data and being more than 30 times of the number of total clustering centers; clustering a first grade of sampled data by using a K-mean value clustering algorithm; quantizing a next grade of sampled data to all the centers of the current grade by using a rapid quantification method; respectively clustering the current grade of grouped data subjected to quantification by using a self-adaptive K-mean value clustering algorithm; and converging the clustering center of each group in the current grade into a large center. In the invention, a graded sampling method, a small center number grading and self-adaptive clustering method and a rapid quantification method are used, so that the clustering time is shortened.

Description

technical field [0001] The invention relates to the technical fields of cluster analysis, data mining, knowledge discovery in data, etc., and particularly relates to a large-scale data high-efficiency clustering method. Background technique [0002] In recent years, the rapid development of computer technology and communication technology has brought tremendous changes to human social civilization. People can obtain and store data in a faster, more convenient and cheaper way; the scale, scope and depth of database applications are also increasing. Constantly expanding, a large number of databases are used in business management, government office, scientific research and engineering development; and this momentum will continue to develop rapidly, making the amount of data and information grow exponentially. However, while having large-scale data, we lack sufficient understanding and application of the information contained in the data, and the application of traditional data...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 廖开阳刘贵忠惠有师肖莉王喆南楠
Owner XI AN JIAOTONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products