Big-data-oriented distributed density clustering method

A technology of density clustering and clustering method, applied in the field of big data processing, can solve problems such as low algorithm efficiency

Active Publication Date: 2015-05-13
浙江银江交通技术有限公司
View PDF3 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] In order to solve the problem of low algorithmic efficiency of existing density clustering methods when processing large data sets, the present invention proposes a distributed density clustering method for large data, which uses distributed...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Big-data-oriented distributed density clustering method
  • Big-data-oriented distributed density clustering method
  • Big-data-oriented distributed density clustering method

Examples

Experimental program
Comparison scheme
Effect test

example

[0118] Example: Combined with floating car data, an application example of clustering travel density points is used to further illustrate this method.

[0119] refer to Figure 5 , the main steps of this method are:

[0120] Step 1, virtualization environment

[0121] In one blade server, 8 virtual machines are virtualized, and the virtual machines are allocated on different hard disks, and IPs are assigned to establish mutual communication. The system is Centos6.5, 4 64-bit CPUs, and 8G memory.

[0122] Step 2: Build the Hadoop platform

[0123] Install Hadoop-2.2.0 in each virtual machine, configure the configuration file in the / etc / hadoop directory for each node in the cluster, and set the attribute parameters dfs.namenode and dfs.datanode in the file to make the cluster contain Two master nodes master (one active node, one hot standby node) and multiple data nodes datanode; through the setting of attribute parameters mapred.jobtracker and mapred.tasktrack...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A big-data-oriented distributed density clustering method comprises the following steps that firstly, environment virtualization is performed, and a Hadoop platform is established; secondly, data are pre-processed and loaded, wherein an original data table is extracted from a database, a needed field is intercepted through a sqoop-query command, and the pre-processed data are directly extracted to an Hdfs; thirdly, a distance matrix is calculated; fourthly, a cut-off distance and dot density are calculated; fifthly, the minimum distance between a dot and a higher-density dot is calculated; sixthly, the critical distance of a critical density point and a critical density point are determined; seventhly, dot clustering is performed, so a final clustering result is obtained; eighthly, off-group points are removed. The big-data-oriented distributed density clustering method is fast and effective when a big data set is processed, and has the effect that input parameters have good robustness on the clustering result.

Description

technical field [0001] The invention relates to the field of big data processing, and relates to a distributed density clustering method. Background technique [0002] Density-based clustering methods view clusters as regions of high-density objects in data space separated by regions of low density. It has the characteristics of discovering clusters of arbitrary shapes, identifying noise points in data sets, insensitive to the order of input objects, and good scalability, so it has important applications in cluster analysis. However, most density-based clustering algorithms cannot find clusters in non-uniform density data sets and have shortcomings such as sensitivity to input parameters and complex iteration of large-scale data. The application of density-based algorithms is limited to a certain extent. [0003] DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a representative density-based clustering algorithm. The result of DBSCAN clustering is a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/182G06F16/2471G06F18/2321
Inventor 王兴武李建元赵贝贝
Owner 浙江银江交通技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products