Data stream clustering method and device based on density peak value

A data stream clustering and density peak technology, applied in the field of data processing, can solve problems such as clustering performance degradation, achieve the effects of reducing impact, quickly clustering data streams, and ensuring freshness

Inactive Publication Date: 2021-08-17
NANJING UNIV OF POSTS & TELECOMM
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The purpose of the present invention is to overcome the deficiencies in the prior art, and provide a data stream clustering method and device based on the density peak algorithm, which is used to improve the problem that the prior art is difficult to identify concept drift and arbitrary shape clusters lead to clustering performance degradation question

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data stream clustering method and device based on density peak value
  • Data stream clustering method and device based on density peak value
  • Data stream clustering method and device based on density peak value

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0053] see figure 1 , figure 1 It is a flowchart of a density peak-based data flow clustering method provided by an embodiment of the present invention. The method for clustering data streams based on density peaks includes the following steps:

[0054] A method for clustering data streams based on density peaks, comprising the following steps:

[0055] Obtain data source information, access data source, and obtain data flow;

[0056] Preprocessing the data stream to obtain data fragments;

[0057] The density calculation is performed on the data points in the data segment through the density formula combining the Jaccard similarity distance and the Gaussian kernel function to obtain the local density of the data points;

[0058] Search the range of cluster numbers through a heuristic strategy, combine the local density of the data points to select the optimal cluster center, and obtain the local cluster center information;

[0059] Using variable-scale bucket structure t...

Embodiment 2

[0080] Based on the same inventive concept, the present invention also proposes a data stream clustering device based on density peaks, please refer to figure 2 , figure 2 A structural block diagram of an apparatus for clustering data streams based on density peaks provided by an embodiment of the present invention. The device for clustering data streams based on density peaks includes:

[0081] A data acquisition module 110, configured to access a data source and acquire a data stream;

[0082] The first processing module 120 is configured to preprocess the data stream to obtain data fragments;

[0083] The density calculation module 130 is used to calculate the density of the data points in the data segment through the density formula combining the Jaccard similarity distance and the Gaussian kernel function to obtain the local density of the data points;

[0084] The second processing module 140 is used to search the range of the number of clusters through a heuristic ...

Embodiment 3

[0093] The embodiment of the present invention also provides a data stream clustering device based on a density peak algorithm, including a processor and a storage medium;

[0094] The storage medium is used to store instructions;

[0095] The processor is configured to operate according to the instructions to perform the steps of the method in Embodiment 1:

[0096] Obtain data source information, access data source, and obtain data flow;

[0097] Preprocessing the data stream to obtain data fragments;

[0098] The density calculation is performed on the data points in the data segment through the density formula combining the Jaccard similarity distance and the Gaussian kernel function to obtain the local density of the data points;

[0099] Search the range of cluster numbers through a heuristic strategy, combine the local density of the data points to select the optimal cluster center, and obtain the local cluster center information;

[0100] Using variable-scale bucket...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a data stream clustering method and device based on a density peak value, and relates to the technical field of data processing. According to the method, a Jaccard similarity distance and a Gaussian kernel function are combined to calculate the local density of an evolved data stream, a heuristic strategy for kernel density estimation is introduced, and a local clustering center point is selected from points with large local density. And finally, local clustering center points are recorded through a variable-scale bucket sequence, and when a clustering request arrives, the local clustering centers are merged to obtain a global clustering center. According to the method, the concept drift in the data stream can be identified, the data stream is quickly clustered, and good clustering performance is achieved.

Description

technical field [0001] The invention relates to a density peak-based data flow clustering method and device, belonging to the technical field of data processing. Background technique [0002] With the development of big data technology, data flow has become a new data form, and how to mine the information contained in data flow has gradually become a research hotspot. Among them, data flow clustering does not require prior information, and it is an efficient data mining method to divide objects with high similarity into one place. However, data stream clustering technology faces many challenges, such as the number of clusters cannot be determined in advance, the shape of data streams is variable, the memory space for storing data is limited, data abnormal points interfere with clustering, clustering of high-dimensional data streams, etc. [0003] At present, scholars have proposed many data stream clustering algorithms. The AdaptiveKMeans algorithm adaptively estimates the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/23213
Inventor 郎非周伟清
Owner NANJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products