A Nearest Neighbor Based Time Sensitive Anomaly Detection Method in Big Data Streams

An anomaly detection and data flow technology, which is applied in the directions of instruments, computing, character and pattern recognition, etc., can solve the problems of undetectable anomalies, etc., achieve the effect of low update cost, high space efficiency and update efficiency, and save space

Active Publication Date: 2021-05-25
INST OF INFORMATION ENG CHINESE ACAD OF SCI
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This approach is easy to implement, but cannot detect anomalies where the distribution deviates from the normal

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Nearest Neighbor Based Time Sensitive Anomaly Detection Method in Big Data Streams
  • A Nearest Neighbor Based Time Sensitive Anomaly Detection Method in Big Data Streams
  • A Nearest Neighbor Based Time Sensitive Anomaly Detection Method in Big Data Streams

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] In order to make the above-mentioned features and advantages of the present invention more comprehensible, the following specific embodiments are described in detail in conjunction with the accompanying drawings.

[0033] Before giving an example, the outliers in the data stream are defined as follows:

[0034] Definition 1. An outlier in a data stream: a data x in a given data stream DS, current window W, and DS t And two threshold parameters α and β; NN() and V() are neighbor calculation and variance calculation functions respectively. if or then x t is an outlier, otherwise it is normal data.

[0035] Technical scheme principle of the present invention is as follows:

[0036] 1. According to the principle of anomaly detection based on the nearest neighbor, the LSH algorithm is used to find the neighbors of the data in the large data stream. Data with high similarity are called neighbors, normal data usually have high similarity, and abnormal data have low sim...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a time-sensitive anomaly detection method based on the nearest neighbor in a large data flow, which belongs to the technical field of large data flow and anomaly detection. The core is a statistical estimator based on LSH sampling view, and the sliding window uses a definite The model will estimate and determine the count and variance of multiple random time intervals in the wave window to monitor the distribution of data in different time intervals. It can quickly find the neighbors of each data in a large data stream, reduce computational overhead, and do not need to separate each data Save its neighbor information, save space, improve update efficiency, and quickly judge whether the data distribution is abnormal and the time range when the abnormality occurs based on time sensitivity.

Description

technical field [0001] The invention belongs to the technical field of large data flow and anomaly detection, and in particular relates to a time-sensitive anomaly detection method. Background technique [0002] Anomaly detection in data streams is an important task in several domains such as fraud detection, computer network security, and medical and public health anomaly detection. The goal of anomaly detection is to detect data whose behavior or distribution is very different from other data, that is, outliers. For example, in the detection of liver tumors, once the content of alpha-fetoprotein in the blood greatly exceeds the normal value, then the patient has a great possibility of suffering from liver cancer. Anomaly detection helps to find such unusual data in the data that does not conform to the expected behavior. [0003] Data flow is a special data model, which is often infinite, high-speed, multi-dimensional, and dynamically changing. The new characteristics o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/62
Inventor 吴广君贾思宇张磊赵志慧李军
Owner INST OF INFORMATION ENG CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products