The invention relates to a distributed
outlier detection method and
system based on an automatic coding
machine. The method includes the steps that a training
data set and a testing
data set are defined; training data of the training
data set are distributed to a plurality of calculation units randomly; the calculation units conduct parallel execution, and each calculation unit solves coding and decoding parameters; the coding and decoding parameters of each calculation unit are summarized to obtain a final coding and decoding parameter, and a self-duplication model is built; the self-duplication model is applied to the testing data set, and
concurrent computation is conducted on reconstruction errors of all testing data; the testing data are arranged according to a descending order of the reconstruction errors, and the testing data with the reconstruction errors larger than a predetermined threshold value are outliers. According to the method, the total time required for
processing and the number of processed samples are independent, and the total time and the number only depend on the required accuracy of parameter solution. The distributed
outlier detection method and
system based on the automatic coding
machine are very suitable for detecting outliers on large-scale data sets on the basis of MapReduce frameworks, and have good flexibility and good expansibility.