Clustering model based high-dimensional data stream outlier detection method

A technology for high-dimensional data and data points, applied in structured data retrieval, database models, relational databases, etc., can solve problems such as low processing efficiency and meaninglessness, and achieve improved processing speed, reduced errors, and outlier detection good accuracy

Inactive Publication Date: 2016-08-17
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF0 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In high-dimensional data streams, the traditional outlier detection method based on sliding window has low processing efficiency, and the similarity calculation method based on Euclidean distance is meaningless in high-dimensional data sets

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Clustering model based high-dimensional data stream outlier detection method
  • Clustering model based high-dimensional data stream outlier detection method
  • Clustering model based high-dimensional data stream outlier detection method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] The method of the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that these examples are only intended to illustrate the present invention and not to limit the scope of the present invention.

[0029] Some specific parameters are initialized before the algorithm is executed. The algorithm tolerates the maximum number of clusters K, the minimum distance threshold r1 between data points and clusters, and the minimum distance threshold r2 between clusters and clusters (mindist in step 2). When judging outlier clusters The minimum number of data points m in the cluster (N in step 3 min ) and the maximum time interval t (Tr in step 3), etc.

[0030] The concrete steps of the present invention include:

[0031] Step 1: As figure 1 As shown in the figure, data preprocessing is performed, the input training set is clustered, and then the feature dimension of each cluste...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention belongs to application of a data mining algorithm in the field of high-dimensional data stream processing, and in particular, relates to a clustering model based high-dimensional data stream outlier detection method. According to the method, clustering is performed on sample data streams firstly; next, a feature dimension of each cluster in a clustering result is analyzed; when calculating which cluster a testing data set belongs to, only an attribute relates to the feature dimension of the cluster is calculated but calculation of a redundancy attribute is omitted, which thus effectively reduces a calculation amount; and if it is discovered that a certain data point does not belong to any cluster, the point forms a new cluster, and if a certain cluster does not attract a new data point for a long time and the number of data points of the cluster is small, then the cluster is a cluster that contains an outlier. The method provided by the present invention has the technical effect that the efficiency and accuracy of the method is both higher than the conventional outlier detection algorithm based on a sliding window when perform high-dimensional data stream outlier detection.

Description

technical field [0001] The invention belongs to the application of a data mining algorithm in the field of high-dimensional data stream processing, and particularly relates to a high-dimensional data stream outlier detection method based on a clustering model. Background technique [0002] With the popularization of sensor networks and the advent of the era of "big data", more and more data are transformed from traditional static data to dynamic data streams, which brings new opportunities for outlier detection methods based on static data. challenges, especially when the data dimensionality is very high. Compared with static data, dynamic data flow has the characteristics of mass, real-time and dynamic variability. [0003] Outlier detection, also known as outlier mining, is one of the key points of data stream mining. The purpose of outlier detection is to detect noise points in the data set for data cleaning, or to discover potentially meaningful information in the data ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/2465G06F16/285
Inventor 罗光春陈爱国段贵多邓璇
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products