GMM clustering method for high-dimensional massive data under hadoop framework

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
A massive data and clustering method technology, applied in the computer field, can solve the problems of low scalability and scalability, not solving the dimensional disaster of large-scale data processing, and unable to effectively deal with massive data, etc., to solve the problem of I/O Bottleneck, the effect of clustering efficiency improvement

Active Publication Date: 2015-09-30

西安电子科技大学青岛计算技术研究院

View PDF2 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

The shortcomings of this method are: in the clustering process, it cannot effectively deal with massive data, and the algorithm efficiency is limited by time and space complexity.

However, the disadvantage of this clustering method is that this clustering method does not solve the problem of dimensionality disaster in large-scale data processing, and its scalability and scalability are not strong.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0053] The present invention will be further described below in conjunction with the accompanying drawings.

[0054] refer to figure 1 , the present invention comprises the following steps:

[0055] Step 1, set up a local area network

[0056] Connect multiple computers to the same LAN, and each computer acts as a node to establish a cluster that can communicate with each other.

[0057] Step 2, build Hadoop platform

[0058] Configure the Hadoop0.20.2 file for each node in the cluster. Through the settings of the attribute parameters dfs.namenode and dfs.datanode in the file, the cluster contains a name node and multiple data nodes; through the attribute parameters mapred.jobtracker and The setting of mapred.tasktracker enables the cluster to include a scheduling node and multiple task nodes, and establishes an open source Hadoop platform.

[0059] The specific steps to establish the Hadoop platform are as follows: first, install the ubuntu10.04 operating system for each ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a high-dimensional mass data GMM (Gaussian Mixture Model) clustering method under a Hadoop framework. The high-dimensional mass data GMM clustering method is used for clustering high-dimensional mass data by structuring the clustering problem of mass data on a distributed platform, mainly aiming at the defects of the existing clustering algorithms. The high-dimensional mass data GMM clustering method comprises the following specific steps of: 1. constructing a local area network; 2. establishing a Hadoop platform; 3. uploading data to a cluster; 4. initially clustering, 5. calculating the parameter and the discrimination function of each cluster; 6. discriminating whether clustering is completed or not; 7. clustering again; 8. calculating the mean value and the weight of each class in a new cluster; 9. calculating the variance of each class in the new cluster; and 10. outputting clustering results. By using the characteristics of a MapReduce operation model in the Hadoop framework, using a Map parallelizing method to process parallelizable parts in the cluster and adopting two Map / Reduce tasks to respectively calculate the mean value and the variance, high-efficiency and high-accuracy clustering can be realized, and the scalability and the fault tolerance are better.

Description

technical field [0001] The invention belongs to the field of computer technology, and further relates to a Gaussian Mixture Model (GMM) clustering method for high-dimensional and massive data under the Hadoop framework in the field of data mining. The present invention can conveniently and efficiently complete the clustering of high-dimensional and massive data, and overcome the problems of inefficiency and dimension disaster in the processing of massive data in the stand-alone mode. technical background [0002] A computing framework widely used in the process of massive data processing MR "Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters [J]. Communications of the ACM, 2005, 51 (1): 107-113" the The computing framework was invented by Google. It is a parallel programming model that has emerged in recent years. It puts parallelization, fault tolerance, data distribution, load balancing, etc. in a library, and boils down all operations of the syste...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityPatents(China)

IPC IPC(8): G06F17/30

Inventor崔江涛李林司蓁彭延国史玮陈煜崔小利王博

Owner西安电子科技大学青岛计算技术研究院

GMM clustering method for high-dimensional massive data under hadoop framework

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology