Distributed outlier detection method and system based on automatic coding machine

A technology of outlier detection and automatic coding machine, applied in the field of information security, can solve the problems of slow solution speed, difficult to handle large-scale data, and high global dependence of data, achieve good scalability and scalability, and reduce training time. Effect

Inactive Publication Date: 2014-08-27
INST OF INFORMATION ENG CAS
View PDF2 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The outlier detection algorithm based on the autoencoder usually uses the stochastic gradient descent method to solve the encoding and decoding par

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed outlier detection method and system based on automatic coding machine
  • Distributed outlier detection method and system based on automatic coding machine
  • Distributed outlier detection method and system based on automatic coding machine

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0058] The principles and features of the present invention are described below in conjunction with the accompanying drawings, and the examples given are only used to explain the present invention, and are not intended to limit the scope of the present invention.

[0059] MapReduce is a software framework proposed by Google to support distributed computing on large-scale clusters, and is used for parallel computing of large-scale data sets (greater than 1TB). The concepts "Map" and "Reduce", and their main ideas, are borrowed from functional programming languages, with features borrowed from vector programming languages. The current software implementation is to specify a Map (mapping) function to map a set of key-value pairs into a new set of key-value pairs, and to specify a concurrent Reduce (simplification) function to ensure that all mapped key-value pairs are Each of the shares the same set of keys. The Hadoop-based MapReduce framework has a high degree of parallelism, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a distributed outlier detection method and system based on an automatic coding machine. The method includes the steps that a training data set and a testing data set are defined; training data of the training data set are distributed to a plurality of calculation units randomly; the calculation units conduct parallel execution, and each calculation unit solves coding and decoding parameters; the coding and decoding parameters of each calculation unit are summarized to obtain a final coding and decoding parameter, and a self-duplication model is built; the self-duplication model is applied to the testing data set, and concurrent computation is conducted on reconstruction errors of all testing data; the testing data are arranged according to a descending order of the reconstruction errors, and the testing data with the reconstruction errors larger than a predetermined threshold value are outliers. According to the method, the total time required for processing and the number of processed samples are independent, and the total time and the number only depend on the required accuracy of parameter solution. The distributed outlier detection method and system based on the automatic coding machine are very suitable for detecting outliers on large-scale data sets on the basis of MapReduce frameworks, and have good flexibility and good expansibility.

Description

technical field [0001] The invention relates to the field of information security, in particular to a distributed outlier detection method and system based on an automatic encoding machine. Background technique [0002] In data mining and statistics, outliers refer to observations or sample points that deviate significantly from other data. In many data mining applications, outliers are filtered and discarded. However, from the perspective of knowledge discovery, rare events are often more valuable than ordinary events, so outliers are a research field with high application value. For example, in fraud detection, network intrusion detection and other fields. At present, computer-based outlier detection methods at home and abroad can be divided into four categories: statistical distribution-based, depth-based, distance-based and density-based local outlier detection methods. [0003] Auto-encoders are a special type of neural network. An autoencoder consists of encoding an...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06N3/08
Inventor 马云龙张鹏曹亚男翟立东杜跃进
Owner INST OF INFORMATION ENG CAS
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products