Unlock instant, AI-driven research and patent intelligence for your innovation.

Density clustering method and device based on dynamic grid hash index

A hash index and density clustering technology, applied in the field of data processing, can solve the problems of high space complexity of PDBSCAN algorithm, no incremental clustering algorithm proposed, high time complexity of PDBSCAN algorithm, etc.

Pending Publication Date: 2020-09-01
SHENZHEN UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0017] 1. The time complexity of the PDBSCAN algorithm is too high, which is O(n) 2 )level;
[0018] 2. The space complexity of the PDBSCAN algorithm is too high, which is O(n) 2 )level;
[0019] 3. An incremental clustering algorithm based on dynamic uncertain data corresponding to the PDBSCAN algorithm has not been proposed

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Density clustering method and device based on dynamic grid hash index
  • Density clustering method and device based on dynamic grid hash index
  • Density clustering method and device based on dynamic grid hash index

Examples

Experimental program
Comparison scheme
Effect test

example 1

[0111] refer to image 3 , Efficiency comparison between GH-PDBSCAN algorithm and PDBSCAN algorithm

[0112] The experiment part uses four different data types, among which image and abalone are derived from UCI, and the negative values ​​in the two data types are deleted. The other two are artificially synthesized data, as shown in Table 1. The four data types all use the method of Gullo et al. to generate attribute uncertainty data, and each data type includes two forms of random distribution (random) and normal distribution (normal). The test platform is window 7, 32G memory, 32-core CPU, the development tool is Visual studio2012, and the programming language is C++.

[0113]

[0114]

[0115] Table 1

[0116] The efficiency comparison between the GH-PDBSCAN algorithm and the PDBSCAN algorithm needs to be tested separately from the two dimensions of data value range and data volume. The test results are as follows: image 3 shown.

[0117] ima...

example 2

[0153] refer to Figure 4.1 and 4.2 , Incremental GH-PDBSCAN algorithm and GH-PDBSCAN algorithm efficiency comparison

[0154] The GH-PDBSCAN algorithm can handle big data, so this article uses a data volume of 1 million, each data object is three-dimensional spatial data, and specifies the third dimension of each object as an uncertainty attribute. For specific implementation methods, refer to 2008 article by Gullo et al. The experimental method of the Incremental PDBSCAN algorithm proposed in this paper is similar to the experimental method of the Incremental DBSCAN algorithm proposed by Ester et al. in 1998.

[0155] The clustering time of each data object of the GH-PDBSCAN algorithm depends on the time of range search, and the time consumption of clustering n data objects can be recorded as Cost DBSCAN (n), namely

[0156] cost DBSCAN (n)=n (5)

[0157] The number of range searches for the Incremental GH-PDBSCAN algorithm depends on the specific applica...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a density clustering method and device based on a dynamic grid hash index, and the method comprises the steps: obtaining increment preset information including D: an increment data set; Rps: radius; Minpts: a judgment threshold value of whether to be a core point or not; unAttr: a dimension with an uncertain numerical value; generating a data set after incremental clusteringon the basis of the original data set through the density clustering method according to the acquired incremental preset information; and obtaining a data set finishing incremental clustering after the circulation is finished. By introducing a new index structure for correspondingly transforming uncertain data, the time complexity of the algorithm is reduced from O (n2) to O (n), and the space complexity is reduced from O (n2) to O (1); the algorithm is suitable for a dynamic data set, and incremental clustering is more efficient than total clustering; on the basis of a newly proposed GH-PDBSCAN algorithm, a DGrid Hash index structure is combined, and then an Incremental GH-PDBSCAN algorithm is proposed, so that the Incremetnal GH-PDBSCAN algorithm is suitable for clustering of a dynamicuncertainty data set.

Description

technical field [0001] This application relates to the field of data processing, in particular to a density clustering method and device based on a dynamic grid hash index. Background technique [0002] In computer science, uncertain data refers to data containing noise, which makes the original data deviate from the correct value. When such data exists in the database, it is necessary to introduce probability calculations. [0003] Currently, PDBSCAN is a clustering algorithm for attribute uncertainty data. The idea of ​​the PDBSCAN algorithm comes from the DBSCAN algorithm, but the DBSCAN algorithm is only suitable for deterministic data, while the PDBSCAN algorithm introduces probability instead of the previously determined value, making it suitable for uncertain data types. The algorithm steps of the PDBSCAN algorithm are as follows: [0004] Algorithm 1: PDBSCAN [0005] enter: [0006] D: uncertainty dataset; Eps: search radius; [0007] Minpts: the decision thres...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62
CPCG06F18/23G06F18/241
Inventor 毛睿张贺陆敏华廖好王毅刘刚
Owner SHENZHEN UNIV