Supercharge Your Innovation With Domain-Expert AI Agents!

Distributed cross-dimensional abnormal data detection method in big data environment

An abnormal data detection and distributed technology, which is applied in the field of big data processing, can solve problems such as excessive overhead, unreasonable data segmentation, and large influence of human factors, and achieve the effect of accelerating computing and reducing system overhead

Active Publication Date: 2019-11-29
ZHEJIANG GONGSHANG UNIVERSITY
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, these algorithms have the disadvantages of poor portability, inability to deal with abnormal data points of local data, cumbersome parameter setting, large influence of human factors, and inability to apply to multidimensional data sets.
However, the existing distributed abnormal data point detection technology has some problems in adapting to the heterogeneous distributed parallel computing environment. To a large extent, data segmentation is unreasonable, data allocation is not collapsed, disk I / O and The network I / O is high and the overhead is too large

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed cross-dimensional abnormal data detection method in big data environment
  • Distributed cross-dimensional abnormal data detection method in big data environment
  • Distributed cross-dimensional abnormal data detection method in big data environment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] Such as figure 1 As shown, the distributed cross-dimensional abnormal data detection method in the big data environment provided by this embodiment includes: data segmentation, based on the dimensions of the input data set, the input data set is segmented to form multiple data buckets (step S1) . An unbalanced binary coding tree is formed according to the data buckets segmented in each dimension (step S2). Based on the unbalanced binary coding tree, distribute the divided data buckets to each computing node (step S3). The relative outlier of each data point is calculated on each computing node (step S4). Filter out the data points whose relative outlier amount is greater than or equal to the set threshold to form a set of abnormal points (step S5).

[0045] The method begins with step S1. In order to ensure the computing efficiency in a distributed environment in the processing of high-dimensional big data, the input data set needs to be segmented. In this embodimen...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a distributed interdimensional abnormal data detection method under big data environment. The method comprises the following steps of: data segmentation: segmenting an input data set on the basis of dimensionalities of the input data set so as to form a plurality of data buckets; forming a non-equilibrium binary coding tree according to the data bucket segmented from each dimensionality; distributing the plurality of data buckets segmented on the basis of the non-equilibrium binary coding tree onto each compute node; calculating and executing a relative outlier amount of each data point on each compute node; and screening the data points, the relative outlier amounts of which are greater than or equal to a set threshold value, so as to form an abnormal point set.

Description

technical field [0001] The invention relates to the technical field of big data processing, and in particular to a distributed cross-dimensional abnormal data detection method in a big data environment. Background technique [0002] With the continuous development of big data analysis and data mining technology, the volume of data is becoming larger and larger. In this large amount of data, the number of abnormal data also increases with the increase of data volume. Different from the conventional data in the data set, its characteristics deviate from the conventional data, so that the data analysis method will have obvious errors in the analysis results due to the existence of these data. Before traditional data mining activities, the data is carefully selected to ensure the integrity and consistency of the data. However, the massive data in the big data environment cannot be manually selected, so abnormal data detection plays a very important role. At the same time, abno...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F11/07
CPCG06F11/0709
Inventor 刘东升许翀寰
Owner ZHEJIANG GONGSHANG UNIVERSITY
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More