Rapid mass data cluster processing method for computer

A technology of massive data and processing methods, applied in the field of data processing, to achieve the effect of reducing computational complexity, convenient, fast and effective processing, and good structure

Inactive Publication Date: 2014-04-23
NORTH CHINA ELECTRIC POWER UNIV (BAODING)
View PDF2 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The object of the present invention is to aim at the disadvantages of the prior art, to provide a fast massive data clustering method with data profile analysis capability, to solve the problems of efficiency and cluster data profile analysis when a computer clusters a large amount of data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Rapid mass data cluster processing method for computer
  • Rapid mass data cluster processing method for computer
  • Rapid mass data cluster processing method for computer

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] The object of the present invention is to provide a kind of fast massive data clustering processing method of computer with data profile analysis ability, described method is for the number of The data objects to be clustered, after The clustering results of any number of clusters can be obtained by combining calculations once, and the specific composition of the data objects contained in each sub-category and the centroid of the sub-category (that is, the arithmetic mean of the attribute values ​​​​of the contained data objects) can be obtained. It has the characteristics of fast calculation speed and strong data analysis ability.

[0029] In order to achieve the above object, the technical solution adopted in the present invention comprises the following steps:

[0030] Step 1. Data object preprocessing. For all data objects to be analyzed (the number is ) for preprocessing, the specific method of preprocessing is: for any given data dimension is The data obje...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a rapid mass data cluster processing method for a computer. The method comprises the following steps: firstly preprocessing data objects to be analyzed to complete grouping of the data objects; then calculating similarity matrixes of the data objects in a group, and merging to generate new data objects according to the similarity; recording the merging and generation process and meanwhile deleting the original data objects; operating repeatedly until the amount of the data objects is equal to the number of clustering classification expected by user; finally obtaining the results of clustering processing by inquiring the merging records. According to the method, specific composition of each subclass data object with any number of clusters, the number of subclass data objects and centroid thereof can be obtained during once implementation process, and the distribution general situation of each subclass interior data object and characteristics thereof can be inquired, so that rapid effective processing of mass data is greatly facilitated.

Description

technical field [0001] The invention relates to a fast massive data analysis method with data profile analysis capability, which belongs to the technical field of data processing. Background technique [0002] When a computer processes data, in order to improve the processing speed, it is necessary to cluster massive data. The clustering is to divide a data set into different classes or clusters according to the similarity of the data itself (generally the distance criterion, the smaller the distance, the greater the similarity), so that the similarity of the data objects in the class is as large as possible. At the same time, the difference of data objects between classes should be as large as possible. Clustering processing can help people discover potential laws hidden behind massive data, which is of great significance for information processing and knowledge discovery, and has been widely used in many fields such as data mining, machine learning, pattern recognition, s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/284
Inventor 李中杨宏张珂
Owner NORTH CHINA ELECTRIC POWER UNIV (BAODING)
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products