Large spatial data clustering algorithm K-DBSCAN based on density

A technology of spatial data and clustering algorithm, applied in computing, computer components, instruments, etc., can solve the problems that DBSCAN cannot be applied

Active Publication Date: 2017-05-24
CHINA TOBACCO GUANGXI IND
View PDF2 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] What the present invention aims to solve is the technical problem that DBSCAN cannot be applied when a large amount of data needs to be clustered in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Large spatial data clustering algorithm K-DBSCAN based on density
  • Large spatial data clustering algorithm K-DBSCAN based on density
  • Large spatial data clustering algorithm K-DBSCAN based on density

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0119] This embodiment provides a density-based large-scale spatial data clustering algorithm K-DBSCAN, such as figure 1 shown, including:

[0120] S101: Divide the dataset into K 1 data subsets, where K 1 is a natural number greater than 1.

[0121] S102: Obtain an accessible subset of each data subset, and form an accessible subset index corresponding to the data subset.

[0122] S103: Perform density-based spatial clustering on the data of each data subset according to the reachable subset index.

[0123] In the above scheme, the data set is firstly divided into data subsets to obtain multiple data subsets, and then a reachable subset index is used to guide clustering, and finally a clustering algorithm is used for each divided data subset to carry out spatial analysis. clustering. This algorithm greatly reduces the computational complexity of density-based spatial data clustering, making the algorithm widely applicable to mass data clustering.

Embodiment 2

[0125] In the above step S101, the data set can be divided in various ways, specifically, it is necessary to ensure that each divided data subset has a specific space and data points. An implementation is provided in this embodiment, using an improved k-means clustering algorithm to perform spatial division and clustering on the data set, including:

[0126] Specifically, such as figure 2 shown, including the following steps:

[0127] S201: Calculate the maximum longitude LN of all data points in the data set respectively max , longitude minimum LN min and latitude maximum LA max , latitude minimum LA min ; Get the maximum spatial range value D of the data set max =LN max +LA max and the minimum spatial extent value D min =LN min +LA min , and then calculate the actual spatial extent value D of the data set len =D max -D min .

[0128] S202: According to the actual space range value D len The data points in the data set are initially divided, and the specific a...

Embodiment 3

[0212] Figure 9 It is a schematic diagram of the hardware structure of an electronic device that implements the density-based large-scale spatial data clustering algorithm K-DBSCAN provided in this embodiment, such as Figure 9 As shown, the equipment includes:

[0213] one or more processors 701 and memory 702, Figure 9 A processor 701 is taken as an example.

[0214] The device for executing the density-based large-scale spatial data clustering algorithm K-DBSCAN may further include: an input device 703 and an output device 704 .

[0215] The processor 701, the memory 702, the input device 703 and the output device 704 may be connected via a bus or in other ways, Figure 9 Take connection via bus as an example.

[0216] The memory 702, as a non-volatile computer-readable storage medium, can be used to store non-volatile software programs, non-volatile computer-executable programs and modules, such as the density-based large-scale spatial data aggregation in the embodim...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention particularly relates to a large spatial data clustering algorithm K-DBSCAN based on density. The algorithm comprises the steps that a density-based clustering parameter is preset: radius R, the minimum neighbor number Min_N, pre-division number K and division iteration number of times T are preset; a data set is divided into K1 subsets according to spatial distribution; the reachable subset of each data subset is calculated to form a reachable subset index; and based on the reachable subset index, spatial clustering based on density is carried out on the data of each subset. According to the technical scheme provided by the invention, density-based unsupervised and semi-supervised clustering can be carried out on the large spatial data set, and efficient and fast parallel clustering calculating is realized.

Description

technical field [0001] The invention relates to the fields of data mining and big data analysis, in particular to a density-based large-scale spatial data clustering algorithm K-DBSCAN. Background technique [0002] Spatial data clustering is widely used in many information technology fields, such as data mining, pattern recognition, machine learning, artificial intelligence, visual analysis, geographic information system, etc. Especially in the era of big data, it can be used to explore meaningful but unknown potential patterns and phenomena, and can be applied to many disciplines, such as social network analysis, economic network analysis, transportation network analysis, meteorological analysis, smart city development, etc. . There are three traditional spatial data clustering methods based on distance calculation: 1) partition-based clustering; 2) density-based clustering; 3) hierarchical clustering. [0003] Density-based clustering can effectively process noise point...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/2321G06F18/23213
Inventor 邓超陈智斌郭晓惠农英雄韦屹黄聪汪倍贝钱方远李喆
Owner CHINA TOBACCO GUANGXI IND
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products