Big-data clustering algorithm based on cloud computing platform

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A cloud computing platform and clustering algorithm technology, applied in computing, electrical digital data processing, special data processing applications, etc., can solve the problem of not considering the different effects of big data data points on knowledge discovery tasks, not considering the relative distance of data points, Problems such as uneven data distribution can achieve the effects of reducing data processing costs, facilitating development, and improving processing capacity and speed

Inactive Publication Date: 2014-06-04

INNER MONGOLIA UNIV OF SCI & TECH

View PDF2 Cites 22 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

In the face of big data processing, the method based on sample sampling probability is generally adopted, but the sampling method does not consider the overall relative distance between data points or intervals and the uneven distribution of data, resulting in the problem of hard division of intervals

Although later, clustering, fuzzy concepts, and cloud models were introduced to improve the problem of interval division and achieved good results, but these methods did not consider the different effects of big data data points on knowledge discovery tasks

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0044] The technical solutions of the present invention will be further described in detail below in conjunction with specific embodiments.

[0045] refer to figure 1 , 2 , 3, figure 1 Among them, T: the distance between points; M: the number of points included in the cluster; N: the number of points in the cluster; SUM: the vector sum of each dimension of all points; SUMSQ: the square of each dimension of all points and. image 3 Among them, N1: the number of points in the initial data source; N2: the number of new data sources; K1: the number of initial clusters; K2: the number of new preprocessed clusters; Pi: the center point of the initial cluster; K=[(K1 +K2) / 2].

[0046] A big data clustering algorithm based on cloud computing platform, comprising the following steps:

[0047] (1) Preprocessing the raw data;

[0048] The basic idea is: first, scan the entire data source to see if there are null values, and supplement missing values; the selection of missing values...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a large-data clustering algorithm based on a cloud computing platform. Primitive data is pre-processed; data are divided into M sub-data and distributed to M Map functions and local clustering is carried out on the sub-data; clusters with the same key are combined; if the number R of practical clusters is smaller than the number k of clusters, the number c of representative points and a constriction factor a are regulated and clustering is carried out again until termination conditions are achieved. if a new data set is generated, local clustering is carried out according to judgment conditions that the number K of new data source centers is larger than the obtained number K of clusters before updating or the number of new data source points is larger than the number of data source points before updating. According to the large-data clustering algorithm based on the cloud computing platform, the parallel computing capacity of a high-performance clustering system of cloud computing is used for solving the problem that mass data need to be processed in clustering, and therefore a relation of the data can be rapidly and efficiently dug up.

Description

technical field [0001] The invention belongs to the technical field of data mining and relates to a big data clustering algorithm based on a cloud computing platform. Background technique [0002] As an interdisciplinary subject in the fields of statistics, machine learning and data mining, cluster analysis has attracted many researchers to join in it, making it a very active research topic in the field of data mining research. So far, researchers at home and abroad have proposed many clustering algorithms. The main clustering methods can be divided into: partition-based methods, hierarchical-based methods, density-based methods, grid-based methods, and model-based methods. . [0003] At the "Sixth International Symposium on Mobile Internet" held on August 21, 2012, Deng Kan, a Ph.D. in computer robotics from Carnegie Mellon, said that discovering the value in big data depends on data mining algorithms, and There must be data mining algorithms plus cloud computing parallel...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/30

CPCG06F16/355

Inventor孟海东任敬佩宋宇辰

OwnerINNER MONGOLIA UNIV OF SCI & TECH

Big-data clustering algorithm based on cloud computing platform

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology