Big-data clustering algorithm based on cloud computing platform

A cloud computing platform and clustering algorithm technology, applied in computing, electrical digital data processing, special data processing applications, etc., can solve the problem of not considering the different effects of big data data points on knowledge discovery tasks, not considering the relative distance of data points, Problems such as uneven data distribution can achieve the effects of reducing data processing costs, facilitating development, and improving processing capacity and speed

Inactive Publication Date: 2014-06-04
INNER MONGOLIA UNIV OF SCI & TECH
View PDF2 Cites 22 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In the face of big data processing, the method based on sample sampling probability is generally adopted, but the sampling method does not consider the overall relative distance between data points or intervals and the uneven distribution of data, resulting in the problem of ...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Big-data clustering algorithm based on cloud computing platform
  • Big-data clustering algorithm based on cloud computing platform
  • Big-data clustering algorithm based on cloud computing platform

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] The technical solutions of the present invention will be further described in detail below in conjunction with specific embodiments.

[0045] refer to figure 1 , 2 , 3, figure 1 Among them, T: the distance between points; M: the number of points included in the cluster; N: the number of points in the cluster; SUM: the vector sum of each dimension of all points; SUMSQ: the square of each dimension of all points and. image 3 Among them, N1: the number of points in the initial data source; N2: the number of new data sources; K1: the number of initial clusters; K2: the number of new preprocessed clusters; Pi: the center point of the initial cluster; K=[(K1 +K2) / 2].

[0046] A big data clustering algorithm based on cloud computing platform, comprising the following steps:

[0047] (1) Preprocessing the raw data;

[0048] The basic idea is: first, scan the entire data source to see if there are null values, and supplement missing values; the selection of missing values...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a large-data clustering algorithm based on a cloud computing platform. Primitive data is pre-processed; data are divided into M sub-data and distributed to M Map functions and local clustering is carried out on the sub-data; clusters with the same key are combined; if the number R of practical clusters is smaller than the number k of clusters, the number c of representative points and a constriction factor a are regulated and clustering is carried out again until termination conditions are achieved. if a new data set is generated, local clustering is carried out according to judgment conditions that the number K of new data source centers is larger than the obtained number K of clusters before updating or the number of new data source points is larger than the number of data source points before updating. According to the large-data clustering algorithm based on the cloud computing platform, the parallel computing capacity of a high-performance clustering system of cloud computing is used for solving the problem that mass data need to be processed in clustering, and therefore a relation of the data can be rapidly and efficiently dug up.

Description

technical field [0001] The invention belongs to the technical field of data mining and relates to a big data clustering algorithm based on a cloud computing platform. Background technique [0002] As an interdisciplinary subject in the fields of statistics, machine learning and data mining, cluster analysis has attracted many researchers to join in it, making it a very active research topic in the field of data mining research. So far, researchers at home and abroad have proposed many clustering algorithms. The main clustering methods can be divided into: partition-based methods, hierarchical-based methods, density-based methods, grid-based methods, and model-based methods. . [0003] At the "Sixth International Symposium on Mobile Internet" held on August 21, 2012, Deng Kan, a Ph.D. in computer robotics from Carnegie Mellon, said that discovering the value in big data depends on data mining algorithms, and There must be data mining algorithms plus cloud computing parallel...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/355
Inventor 孟海东任敬佩宋宇辰
Owner INNER MONGOLIA UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products