Rapid DBSCAN clustering method based on grids

A clustering method and grid technology, applied to instruments, character and pattern recognition, computer components, etc., can solve the problem of high time complexity of DBSCAN clustering method

Pending Publication Date: 2020-09-22
中科海拓(北京)科技有限公司
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The purpose of the present invention is to solve the problem of high time complexity of the DBSCAN clustering method, and proposes a grid-based fast DBSCAN clustering method and system

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Rapid DBSCAN clustering method based on grids
  • Rapid DBSCAN clustering method based on grids
  • Rapid DBSCAN clustering method based on grids

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] The flow chart of the method is as figure 1 .

[0021] 1. Read data points

[0022] Read data points that need to be clustered from a database, file, or network.

[0023] 2. Set grid resolution

[0024] Each grid occupies memory N. When the total amount of memory is M, at most M / N grids can be created. The minimum value of the grid resolution (that is, the side length of the grid) is:

[0025]

[0026] Among them, Area is the area of ​​the spatial range where the data point is located.

[0027] Also, since the grid resolution does not need to be smaller than eps, the grid resolution ends up being:

[0028] L=max(L,eps)

[0029] 3. Construct the grid object and set the member variables of the grid object

[0030] Construct a grid object according to the set grid resolution, set member variables: 8 grid pointers around the grid, see image 3 .

[0031] 4. Construct the data point object collection S, and set the member variables of the data point

[0032] Firs...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A clustering algorithm can discover a potential aggregation mode between data, further excavate causes and influence factors, and provide a scientific basis for decisions, modes and the like. The method is suitable for the field of spatio-temporal data mining, a spatial clustering algorithm divides a data set into a plurality of clustering clusters according to spatial metric scales, the difference between the clusters is the largest, the similarity between data in the clusters is the largest, and therefore abnormal clustering clusters with significant differences with global or local distribution are formed. A space is quantified into a limited number of units through a clustering algorithm based on grids to form a grid structure, all clustering is conducted on the grids, the multi-layerrectangular units correspond to different resolutions, and a hierarchical structure is formed, specifically, each high-layer unit is divided into units of the lower layer. Statistical information (such as mean value, maximum value and minimum value) about attributes of each grid cell is pre-calculated and stored as statistical parameters. These statistical parameters are valid for query processingand other data analysis tasks.

Description

technical field [0001] The invention belongs to the field of information technology, and is particularly suitable for a grid-based fast DBSCAN clustering method in the field of spatio-temporal data mining. Background technique [0002] Clustering algorithms can discover potential aggregation patterns between data, and further mine the causes and influencing factors to provide scientific basis for decision-making and patterns. The spatial clustering algorithm divides the data set into several clusters according to the spatial measurement scale. Among them, the difference between the clusters is the largest, and the similarity between the data in the cluster is the largest, thus forming a significant difference from the global or local distribution. outlier clusters. At present, clustering methods have been widely used in many fields such as seismic source analysis, bird migration analysis, logistics employment group analysis, soil CO2 concentration influencing factor analysi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62
CPCG06F18/232G06F18/29
Inventor 陈静华
Owner 中科海拓(北京)科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products