High sliding window data stream anomaly detection method based on layered clustering

A multi-data stream and sliding window technology, which is applied in the field of multi-data stream anomaly detection, can solve the problems of reduced accuracy of data stream anomaly detection results, and achieve the effect of effective storage

Active Publication Date: 2013-11-20
HARBIN INST OF TECH
View PDF3 Cites 46 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] In order to solve the problem that the accuracy of data stream anomaly detection results is reduced due to the influence of expired data and historical data, the present inv

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • High sliding window data stream anomaly detection method based on layered clustering
  • High sliding window data stream anomaly detection method based on layered clustering
  • High sliding window data stream anomaly detection method based on layered clustering

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach 1

[0046] Specific implementation mode 1. Combination figure 1 This embodiment is specifically described. The hierarchical clustering-based sliding window multi-data stream anomaly detection method described in this embodiment includes the following steps:

[0047] Step 1. Set the sliding window size N, collect the data flow elements of the first window in the multi-data flow through the sensor as offline data for initial K-means clustering (K-means clustering), obtain k clustering structures, and complete To do offline initialization of the aggregate structure, perform step 2;

[0048] Among them, N is a positive integer, N is greater than or equal to 1000 (users can draw up by themselves), k is the maximum value of the set cluster feature index histogram for online clustering,

[0049] Step 2: collect the Tth data stream element in the multi-data stream through the sensor, perform online clustering according to the k clustering structures obtained in step 1, obtain k' cluster ...

specific Embodiment approach 2

[0128] Embodiment 2. The difference between this embodiment and the hierarchical clustering-based sliding window multi-data stream anomaly detection method described in Embodiment 1 is that in step 7, k' obtained after the arrival of the last data element The histograms of clustering feature indices are clustered online, the last data element is the T-1th data element, and the specific process of obtaining the updated k' clustering feature index histograms is as follows:

[0129] Step 1. Obtain the Tth data flow element in the data flow, and execute step 2;

[0130] Step 2. Determine whether the amount of data in the sliding window is greater than the size N of the sliding window. If it is larger, perform step 3. If it is less than or equal, perform step 5;

[0131] Step 3, delete the time feature vector with the minimum data number (that is, the bucket with the minimum data number) in the k' clustering feature index histogram, and perform step 4;

[0132] Step 4. Update the ...

specific Embodiment approach 3

[0176] Embodiment 3. The difference between this embodiment and the hierarchical clustering-based sliding window multi-data stream anomaly detection method described in Embodiment 1 is that the offline K-means clustering described in step 8 is:

[0177] In multiple data streams, online clustering obtains k' cluster feature index histograms, and the mean value of its head nodes is used as data for offline clustering, and the mean values ​​of k' head nodes for clustering structure are respectively {EHCF 1 .mean, EHCF 2 .meam,...,EHCF k’ .mean}, the mean value of k' nodes is used as the input of the K-means clustering algorithm (that is, as the original data that needs to be clustered), macro_k is the number of categories of offline clustering, and the similarity measure in the K-means clustering algorithm The function takes the Euclidean distance and the cosine of the included angle respectively.

[0178] In order to verify the detection performance of the sliding window multi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a high sliding window data stream anomaly detection method based on layered clustering, and aims to solve the problem that the accuracy of a data stream anomaly detection result is reduced due to influences of stale data and historical data. According to the method, by means of a layered clustering algorithm, the final clustering result cannot be considered during clustering, arrival data are processed at a higher speed, and a data volume of an off-line layer is greatly smaller than the number of original data due to the fact that the off-line layer only utilizes a clustering structure to respond to a user query result, so that the data can be effectively stored, and a more accurate clustering result can be obtained. As for a sliding window model, a clustering characteristic index histogram structure is adopted, so that insertion of new data and deletion of stale data can be better finished. A cosine coefficient is taken as a metric function, so that good clustering and anomaly detection results can be obtained. The high sliding window data stream anomaly detection method is applicable to fields of sensors, network click stream, share dealing and the like.

Description

technical field [0001] The invention relates to a multi-data flow anomaly detection method, in particular to a sliding window multi-data flow anomaly detection method based on hierarchical clustering. Background technique [0002] With the development and wide application of network technology, information collection, and sensor technology, a large number of data flow models have emerged. Their characteristics of unlimited potential, fast arrival, and continuous order have brought great challenges to traditional data anomaly detection methods, especially The rapid expansion of the amount of information has resulted in a large number of multiple data streams in many applications, such as sensor networks, stock transaction information, and network intrusion data. In the field of satellite telemetry, a large number of sensors are distributed in different subsystems and in the same subsystem. The parameters collected by the sensors reflect the changes of physical parameters in t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62
Inventor 刘大同庞景月彭宇罗清华彭喜元
Owner HARBIN INST OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products