Distributed density peak value clustering algorithm based on z value

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A density peak and clustering algorithm technology, applied in the field of big data processing, can solve problems such as increased computing overhead, large randomness of seed objects, and unbalanced load of computer clusters

Inactive Publication Date: 2017-12-26

SHENYANG POLYTECHNIC UNIV

View PDF0 Cites 1 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

In order to improve the efficiency of the algorithm, the paper "EDDPC: An Efficient Distributed Density Central Clustering Algorithm" uses Voronoi segmentation technology to divide the data set into disjoint groups, and then send them to different machines for execution, but the grouping method is insufficient The reason is that the randomness of the seed object is very large, which may cause unbalanced load in the computer cluster

Secondly, when calculating the density value ρ and the repelling group value δ, there are still a large number of redundant copies, which increases the computational overhead

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0037] The specific embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0038] refer to Figure 1 ~ Figure 4 , using the z-value-based density peak clustering method to cluster high-dimensional big data, including the following steps:

[0039] Step 1: Data set selection.

[0040] This implementation uses three data sets of KDD'99_10%, FCoverType and facial. The KDD’99_10% data set is a data set composed of 494,021 data points with 42 attributes such as connection time and transmission data volume. This implementation intercepts 34 real-valued attributes. FCoverType is a dataset consisting of 581,012 data points of 54 attributes including latitude and longitude. The Facial dataset is a dataset consisting of 27,936 face images, each of which includes 300 pixels.

[0041] Step 2: Construction of software and hardware environment.

[0042] Step 2.1: Build a hardware computing platform.

[0043] Under the Ubu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

A distributed density peak value clustering algorithm based on a z value is disclosed. The algorithm comprises the following steps of (1) preparing a data set; (2) constructing software and hardware environments; (3) preprocessing data; (4) sampling the data set; (5) determining a cut-off distance parameter value of a density value calculating formula in a density clustering algorithm based on the z value and selecting a subgroup quantile; (6) according to the size of the z value, sending points in the data set to different groups; (7) calculating a density value in the distributed density peak value clustering algorithm based on the z value; (8) calculating a global outlier in the distributed density peak value clustering algorithm based on the z value; and (9) under a Hadoop environment, using the density peak value clustering algorithm based on the z value to carry out large data clustering. A z value characteristic is used, a filtering strategy is adopted during data interaction among subgroups, a lot of ineffective distance calculating and data transmission cost are reduced, and execution efficiency of the algorithm is effectively increased.

Description

technical field [0001] The invention relates to the field of big data processing, relates to a distributed density clustering algorithm, in particular to a distributed density peak clustering algorithm based on z value. Background technique [0002] Cluster analysis is one of the widely studied problems in the fields of data mining and pattern recognition. The density peak clustering algorithm (Density Peaks Clustering, DPC) published in the academic journal "Science" is a typical density-based clustering algorithm. The algorithm clusters the data set according to the property that each cluster has a density maximum point. The algorithm can find clusters of any shape and does not depend on the dimension of the data set; the implementation of the algorithm only needs to calculate two attribute values of each point: (1) the density value ρ (by a certain range (2) Repulsion group value δ (characterized by the minimum value of the distance from a point whose density value is...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G06K9/62G06F17/30

CPCG06F16/285G06F18/23

Inventor段勇卢晶

OwnerSHENYANG POLYTECHNIC UNIV

Distributed density peak value clustering algorithm based on z value

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology