C-DBSCAN-K clustering algorithm under Hadoop platform

A C-DBSCAN-K, clustering algorithm technology, applied in the field of C-DBSCAN-K clustering algorithm, can solve problems such as low clustering efficiency, and achieve the effect of speeding up the operation speed and improving the execution efficiency

Active Publication Date: 2017-11-10
SANMENG TECH CO LTD
View PDF10 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to provide a C-DBSCAN-K clustering algorithm under the Hadoop platform, which solves the problem of low clustering efficiency of the DBSCAN clustering algorithm existing in the prior art on large-scale data sets

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • C-DBSCAN-K clustering algorithm under Hadoop platform
  • C-DBSCAN-K clustering algorithm under Hadoop platform
  • C-DBSCAN-K clustering algorithm under Hadoop platform

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] The present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0042] Such as figure 1 As shown, the C-DBSCAN-K clustering algorithm under the Hadoop platform includes the following steps:

[0043] Step 1, connect multiple computers to the same local area network, and each computer acts as a node to establish a cluster that can communicate with each other;

[0044] Step 2, build the Hadoop platform for the cluster;

[0045] Step 2 is specifically: firstly install the redhat6.2 operating system for each node in the cluster; then install the Hadoop2.2.0 file for each node in the cluster, and install the jdk1.8.0_65 file for each node in the cluster; configure each node in the cluster The .bashrc file of the redhat6.2 system on each node makes the redhat6.2 system associate the Hadoop2.2.0 file on the node with the jdk1.8.0_65 file on the node; configure hadoop in the Hadoop2.2.0 file on each node -env.s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a C-DBSCAN-K clustering algorithm under a Hadoop platform. The algorithm comprises the following steps of step 1.establishing clusters capable of communicating with each other; step 2.establishing the Hadoop platform for the clusters; step 3.using a dfs-put command to upload a to-be-clustered data set A to an HDFS; step 4.executing a Canopy clustering algorithm to carry out initial clustering on data in the A in order to obtain a clustering result of coarse granularity; step 5.constructing a k-d tree on the clusters obtained in the step 4; step 6.executing a DBSCAN algorithm to the clusters obtained in the step 4, using the k-d tree to query an epsilon-neighborhood of a data object in each cluster and outputting a clustering result; and step 7.merging the clusters with the same data in the step 6 and outputting a clustering result. The algorithm of the invention solves a problem of low clustering efficiency of the DBSCAN clustering algorithm on a large-scale data set in the existing technology.

Description

technical field [0001] The invention belongs to the technical field of computer data mining, and relates to a C-DBSCAN-K clustering algorithm under the Hadoop platform. Background technique [0002] Today, with the rapid development of Internet technology and the penetration of the Internet into people's lives, modern society has entered an era of informationization, and a large amount of data information is scattered all over the world. In the face of massive data, the first task is to classify it reasonably, and cluster analysis is such a method. Using clustering, people can intelligently and automatically identify valuable classification knowledge from a data set containing a large number of objects, obtain the distribution status of data, observe the differences between different clusters, and on this basis, classify some specific The set of clusters for a deeper analysis. In business intelligence, image pattern recognition, Web search and other fields, clustering anal...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/182G06F16/285
Inventor 王彬安涛吕征
Owner SANMENG TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products