Supercharge Your Innovation With Domain-Expert AI Agents!

Method for realizing clustering and clustering boundary defining of real-time data streams with noise points

A data stream and noise point technology, applied in electrical digital data processing, special data processing applications, instruments, etc., can solve the problems of high algorithm complexity, no real-time data stream clustering boundary definition method, and low algorithm execution efficiency.

Inactive Publication Date: 2012-06-13
WUHAN UNIV OF SCI & TECH
View PDF2 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, the definition and detection technology of data clustering boundary points has the following defects: (1) the existing algorithm only extracts the boundary points that define clusters for static data sets, and there is no method for defining cluster boundaries for real-time data streams; (2) Separate clustering and boundary detection and process them separately; (3) The complexity of the algorithm is high in large data processing
The BORDER algorithm can detect the boundary points of clusters in the data set without noise points, but its disadvantages are: (1) It cannot correctly identify the boundary points in the data set containing noise, because the reverse k-nearest neighbors of noise points The number is less than the number of reverse k-nearest neighbors of clustering boundary points; (2) The algorithm needs to find the k nearest neighbors of each object, and then calculate the number of reverse k-nearest neighbors of each object, The execution efficiency of the algorithm is not high; (3) the user needs to have prior knowledge, and the number n of boundary points of the data set is given

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for realizing clustering and clustering boundary defining of real-time data streams with noise points
  • Method for realizing clustering and clustering boundary defining of real-time data streams with noise points
  • Method for realizing clustering and clustering boundary defining of real-time data streams with noise points

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0055] A method for clustering and cluster boundary definition for real-time data streams containing noisy points. Firstly, the symbols involved in this embodiment are uniformly described as follows:

[0056] D is the real-time data stream containing noise points; λ is the attenuation factor; β is the threshold adjustment coefficient; k is the number of intervals in each dimension of the data space; δ is the similarity threshold; Data point; G is all the grids in the data space; g represents the grid to which the data point X can be mapped; g h is a high-density grid; g l is a low-density grid; g max is the high-density grid with the maximum density value that has not yet been clustered, g max ∈g h ; g l ’ is a low-density grid greater than or equal to the similarity threshold δ, g l ’∈g l ; g l ” is a low-density grid smaller than the similarity threshold δ, g l -g l '=g l ", g l ’∪g l "=g l ;speed is the flow rate of the data stream; N is the total number of gr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention particularly relates to a method for realizing clustering and clustering boundary defining of real-time data streams with noise points. The scheme includes that density of data points capable of mapping to grids g is updated; the grids g with densities higher than or equal to density threshold (t) are marked as high-density grids gh; adjacent high-density grids gh or low-density grids gl' which have densities higher than similar threshold are marked as clustering grids ggrid; all the clustering grids ggrid are clustered; grids gl' which are not clustered and are adjacent to the clustering grids ggrid or the clustering grids ggrid which are positioned at the edge of a grid space are marked as clustering boundary grids gboundary, and all the clustering boundary grids gboundary form clustering boundaries; all the clustering and clustering boundaries are outputted; next clustering request time tnext is calculated; and the above steps are repeated for a data stream at the moment of the time tnext, until the data stream D is operated. The method has the advantages of high clustering efficiency and fine boundary defining effect.

Description

technical field [0001] The invention belongs to the technical field of data processing of data flow. Specifically, it relates to a method for clustering and clustering boundary definition for real-time data streams containing noise points. Background technique [0002] The method of clustering and defining cluster boundaries for real-time data streams containing noise points can improve the accuracy of clustering and data classification, and can quickly discover clusters and cluster boundaries in real-time data streams, and can also be used for industrial production. Monitoring of real-time production process data in the system is helpful for production equipment and product quality monitoring. At present, the definition and detection technology of data clustering boundary points has the following defects: (1) the existing algorithm only extracts the boundary points that define clusters for static data sets, and there is no method for defining cluster boundaries for real-ti...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F19/00
Inventor 张晓龙梁小波曾伟
Owner WUHAN UNIV OF SCI & TECH
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More