Big-data-oriented distributed density clustering method

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
A technology of density clustering and clustering method, applied in the field of big data processing, can solve problems such as low algorithm efficiency

Active Publication Date: 2015-05-13

浙江银江交通技术有限公司

View PDF3 Cites 19 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0007] In order to solve the problem of low algorithmic efficiency of existing density clustering methods when processing large data sets, the present invention proposes a distributed density clustering method for large data, which uses distributed methods to calculate point density and point-to-minimum density Clustering the points with equal thresholds is fast and effective when processing large data sets, and has a good effect of the robustness of the input parameters on the clustering results

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

example

[0118] Example: Combined with floating car data, an application example of clustering travel density points is used to further illustrate this method.

[0119] refer to Figure 5 , the main steps of this method are:

[0120] Step 1, virtualization environment

[0121] In one blade server, 8 virtual machines are virtualized, and the virtual machines are allocated on different hard disks, and IPs are assigned to establish mutual communication. The system is Centos6.5, 4 64-bit CPUs, and 8G memory.

[0122] Step 2: Build the Hadoop platform

[0123] Install Hadoop-2.2.0 in each virtual machine, configure the configuration file in the / etc / hadoop directory for each node in the cluster, and set the attribute parameters dfs.namenode and dfs.datanode in the file to make the cluster contain Two master nodes master (one active node, one hot standby node) and multiple data nodes datanode; through the setting of attribute parameters mapred.jobtracker and mapred.tasktrack...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

A big-data-oriented distributed density clustering method comprises the following steps that firstly, environment virtualization is performed, and a Hadoop platform is established; secondly, data are pre-processed and loaded, wherein an original data table is extracted from a database, a needed field is intercepted through a sqoop-query command, and the pre-processed data are directly extracted to an Hdfs; thirdly, a distance matrix is calculated; fourthly, a cut-off distance and dot density are calculated; fifthly, the minimum distance between a dot and a higher-density dot is calculated; sixthly, the critical distance of a critical density point and a critical density point are determined; seventhly, dot clustering is performed, so a final clustering result is obtained; eighthly, off-group points are removed. The big-data-oriented distributed density clustering method is fast and effective when a big data set is processed, and has the effect that input parameters have good robustness on the clustering result.

Description

technical field [0001] The invention relates to the field of big data processing, and relates to a distributed density clustering method. Background technique [0002] Density-based clustering methods view clusters as regions of high-density objects in data space separated by regions of low density. It has the characteristics of discovering clusters of arbitrary shapes, identifying noise points in data sets, insensitive to the order of input objects, and good scalability, so it has important applications in cluster analysis. However, most density-based clustering algorithms cannot find clusters in non-uniform density data sets and have shortcomings such as sensitivity to input parameters and complex iteration of large-scale data. The application of density-based algorithms is limited to a certain extent. [0003] DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a representative density-based clustering algorithm. The result of DBSCAN clustering is a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/30

CPCG06F16/182G06F16/2471G06F18/2321

Inventor王兴武李建元赵贝贝

Owner浙江银江交通技术有限公司

Big-data-oriented distributed density clustering method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

example

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology