Mass spatial data density clustering method based on elastic distribution dataset

A spatial data and elastic distribution technology, applied in structured data retrieval, electronic digital data processing, geographic information database, etc., can solve the problems of high algorithm I/O overhead, inability to use real-time clustering, high delay, etc. The effect of delayed, fast clustering

Inactive Publication Date: 2017-08-11
杭州杨帆科技有限公司
View PDF2 Cites 22 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Due to the pursuit of high throughput, the parallel clustering algorithm based on the Hadoop-MapReduce framework needs to read and write the disk multiple times to access the intermediate results, resulting in a large I / O overhead and high latency of the algorithm, which cannot be used for real-time clustering

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Mass spatial data density clustering method based on elastic distribution dataset
  • Mass spatial data density clustering method based on elastic distribution dataset
  • Mass spatial data density clustering method based on elastic distribution dataset

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] The present invention is further described below in conjunction with specific embodiment, but protection scope of the present invention is not limited thereto:

[0037] Embodiment: In the present embodiment, suppose D={p 1 ,p 2 ,...,p n} is the data set to be clustered, and its k-dimensional space area S is the calculation space of D. is a data object in k-dimensional space (1≤i≤n), for p i Projection on the k-th dimension axis. A mass spatial data density clustering method based on elastic distribution data sets The main steps of clustering are as follows: figure 1 Shown, the realization of the present invention is based on following basic concepts:

[0038] Definition 1 (dc neighborhood): with a given data object p i As the center, the k-dimensional space within its radius dc is called p i The dc neighborhood. For the data object p in the dc neighborhood j , with dist(p i ,p j )

[0039] Definition 2 (local density p i ):p i The number of data obje...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a mass spatial data density clustering method based on an elastic distribution dataset. According to the method, first, automatic meshing and data distribution are performed according to distribution of data in space based on the design ideology of "RDD partition--intra-partition parallel computing--local result merging" targeting quick mining of an aggregation characteristic base of large-scale spatial data, so that data volumes in meshes are relatively balanced, and the purpose of balancing arithmetic node loads is achieved; second, a local density definition suitable for parallel computing is proposed, a computing mode of a cluster center is improved, and the defect that a cluster center object needs to be judged by drawing a decision graph through an original algorithm is overcome; and last, quick clustering processing of the large-scale spatial data is realized through clustering, merging and other optimization strategies in the meshes and between the meshes. Through the method, quick clustering of the large-scale spatial data can be effectively realized, and the method has high precision and better system processing performance compared with traditional density clustering methods.

Description

technical field [0001] The invention relates to a mobile device, in particular to a mass spatial data density clustering method based on an elastic distribution data set. Background technique [0002] Cluster analysis plays an important role in spatial data mining. Spatial clustering analysis divides spatial data into several clusters according to their aggregation characteristics, so that the data in the same cluster have greater similarity, while the data in different clusters have greater differences. According to different guiding ideologies, clustering algorithms can be divided into partition-based clustering, hierarchical-based clustering, density-based clustering, grid-based clustering, and specific model-based clustering. The classic partitioning algorithm k-means and its improved algorithm k-medoids, k-means++ determine the cluster center and classify the data through multiple iterations. The algorithm is simple to implement, but it is sensitive to noise and handle...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06K9/62
CPCG06F16/29G06F18/23
Inventor 沈晔周天和李思剑任培荣
Owner 杭州杨帆科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products