Abnormity detection method and device for parallelization of isolated forest algorithm based on Flink

A forest algorithm and anomaly detection technology, applied in computing, computing models, and other database retrieval, can solve problems such as poor effect and performance, high computational complexity, and inability to consider changes in local density, and achieve real-time anomaly detection, The effect of increasing the calculation speed

Pending Publication Date: 2020-04-17
中电福富信息科技有限公司
View PDF4 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, they all have their own shortcomings. Statistics-based methods are more effective for single-dimensional anomaly detection, but for multiple dimensions, the detection effect and performance are relatively poor; proximity-based methods cannot handle abnormalities with different densities. For regional data sets, because density is a local feature and distance is a global feature, it cannot take into account chan...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Abnormity detection method and device for parallelization of isolated forest algorithm based on Flink
  • Abnormity detection method and device for parallelization of isolated forest algorithm based on Flink
  • Abnormity detection method and device for parallelization of isolated forest algorithm based on Flink

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] IForest is a method based on similarity, which cuts the data space by randomly selecting a hyperplane until there is only one data point left in the subspace. The advantage of IForest is that it has linear time complexity, each tree is generated independently, and can be accelerated using distributed. like figure 1 and figure 2 As shown in the test results given by python's open source machine learning library scikit-learn and anomaly detection library pyod, they all performed well. Since IForest is more suitable for anomaly detection scenarios in massive data, currently only the stand-alone versions of R, Java, and Python are implemented. However, in actual business scenarios, it needs to be processed in parallel, otherwise it cannot meet the real-time detection tasks under massive data.

[0032] Due to the need for real-time data anomaly detection, in Apache Flink (hereinafter referred to as Flink), Apache Storm, Apache Spark and Apache Kafka several mainstream fr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an abnormity detection method and a device for parallelization of an isolated forest algorithm based on Flink. The isolated forest algorithm is realized based on the real-timeprocessing platform Flink platform, the calculation speed of the isolated forest algorithm is greatly improved, and the problem that the exception detection is limited by the data volume when the isolated forest is used for exception detection in a stand-alone mode is solved. Transverse expansion is carried out by configuring and adding machines, and an abnormity detection task is also carried outin a real-time scene. The method is implemented by adopting the Flink, and abnormity detection is carried out on the data in real time by utilizing the real-time processing characteristic of the Flink. The parallelism degree of the algorithm can be transversely expanded by setting the parallelism degree of the operator of the Flink, so that the parallelism degree of the algorithm is not limited by the data volume when abnormity detection is carried out on mass data.

Description

technical field [0001] The invention relates to the field of anomaly detection calculations, in particular to an anomaly detection method and device based on Flink-based parallelization of the isolated forest algorithm. Background technique [0002] Anomaly detection usually refers to an unsupervised process of finding data that is far from the expected data. The detected data are usually called outliers or outliers. Anomaly detection has played an important role in many fields, such as: intrusion detection system, credit card anti-fraud, intelligent operation and maintenance, etc. [0003] Generally, there are several detection methods for anomaly detection: statistics-based methods, proximity-based methods, density-based methods, cluster-based methods, etc. However, they all have their own shortcomings. Statistics-based methods are more effective for single-dimensional anomaly detection, but for multiple dimensions, the detection effect and performance are relatively poo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/903G06N20/00
CPCG06F16/90335G06N20/00
Inventor 陈伟黄东豫刘欣刘国伟
Owner 中电福富信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products