Abnormal data detection method and device and data pre-processing method and system

An abnormal data detection and abnormal data technology, applied in the computer field, can solve problems such as high feature dimension, large difference in sample attributes, data limitation, etc., to avoid interference, ensure stability, strong reliability and versatility.

Active Publication Date: 2017-03-29
TENCENT TECH (SHENZHEN) CO LTD
View PDF6 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0011] The purpose of the embodiments of the present invention is to provide an abnormal data detection method and device, data preprocessing method and system to solve the problem of the existing abnormal point detection method when processing data with a large number of missing values, high feature dimensions, and large differences in sample attributes. restricted question

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Abnormal data detection method and device and data pre-processing method and system
  • Abnormal data detection method and device and data pre-processing method and system
  • Abnormal data detection method and device and data pre-processing method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0043] An embodiment of the present invention proposes a method for detecting abnormal data, which is used to find out abnormal data in the data to be detected. Please refer to figure 1 , the method of this embodiment includes the following steps:

[0044] S101. Perform dimensionality reduction processing on the data set to be detected using a principal component algorithm to form a first data set.

[0045] S102. Reconstruct the first data set using a principal component algorithm to form a second data set, where the second data set has the same dimension as the data set to be detected.

[0046] S103. Calculate a correlation between the data set to be detected and data corresponding to the second data set.

[0047] S104. Obtain abnormal data that is greatly different from corresponding data in the second data set among the data to be detected.

[0048] In step S101, the data to be detected in this embodiment can be, for example, big data such as an image processing system, a...

Embodiment 2

[0076] In the present invention, an algorithm for simplifying a high-dimensional data set based on principal component matrix decomposition may be used, preferably using singular value decomposition (Singular value decomposition, SVD). See figure 2 , which is a flowchart of another abnormal data detection method according to an embodiment of the present invention, which includes the following steps:

[0077] S201. Calculate the covariance matrix of the data set to be detected.

[0078] S202. Decompose the covariance matrix of the data set to be detected through singular value decomposition to obtain a (k, k)-dimensional one-orthogonal matrix. The k is the dimension of the data set to be detected.

[0079] S203. Take the first j dimensions of the orthogonal matrix, and form the projection matrix.

[0080] S204. Calculate the first data set according to the acquired projection matrix and the data set to be detected.

[0081] S205. Reconstruct the first data set using a prin...

Embodiment 3

[0107] The embodiment of the present invention also proposes a data preprocessing method, which is used to find and filter out abnormal data in a large amount of data through the principal component analysis method, and is especially suitable for system input such as image processing, credit card fraud detection, and credit warning. Data preprocessing. In the data preprocessing method of this embodiment, the abnormal data in the data to be detected is obtained first by using the abnormal data detection method, and then the abnormal data in the data to be detected is filtered out. Wherein, the process of the abnormal data detection method is the same as that of the first embodiment and the second embodiment, and will not be repeated here.

[0108] The data preprocessing method in this embodiment can select out the abnormal sample points without assuming that the data to be processed obeys a certain distribution, and is suitable for cases where there are a large number of missin...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention proposes an abnormal data detection method and device and a data pre-processing method and system. The abnormal data detection method comprises the following steps of performing dimension reduction processing on a data set to be detected by a principal component algorithm to form a first data set; reconstructing the first data set by the principal component algorithm to form a second data set, wherein the second data set has the same dimension as the data set to be detected; calculating correlation between corresponding data of the data set to be detected and corresponding data of the second data set; and acquiring abnormal data, having big difference with the corresponding data in the second data set, from the data to be detected. According to the invention, a hypothesis that a data set to be analyzed conforms to certain specific distribution is not needed, and reliability, universality and stability are high.

Description

technical field [0001] The invention relates to the field of computer technology, in particular to an abnormal data detection method and device, and a data preprocessing method and system. Background technique [0002] In the fields of image processing systems, credit card fraud detection systems, and credit early warning systems, the detection of outliers is often involved. Outlier detection (also called outlier detection) is to find out that its behavior is very different from the expected object. A detection process of , these points that are different from the expected object are called outliers or outliers. The most common outlier detection is based on statistical methods, which can be divided into unary and multivariate cases according to the number of processing variables, for example: [0003] 1) Univariate outlier detection method based on normal distribution [0004] Suppose there are n sample points (x 1 , x 2 ,...,x n ), then the mean μ and variance σ of the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/00
Inventor 张戎赵伟冯亚兵廖宇赖俊斌柴海霞潘宣良刘黎春
Owner TENCENT TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products