Local outlier detection method based on density clustering

A technology of outlier detection and density clustering, which is applied in the field of outlier detection, can solve the problems that the selection of parameter values ​​is very sensitive, local outliers cannot be effectively detected, and it is difficult to determine

Inactive Publication Date: 2015-03-25
INFORMATION & TELECOMM COMPANY SICHUAN ELECTRIC POWER +1
View PDF2 Cites 28 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

There are also many algorithms that are originally designed to perform cluster analysis, but further extend the clustering results to the judgment of outliers.
At present, most of the outlier detection methods based on clustering technology are often not optimal, and many o

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Local outlier detection method based on density clustering
  • Local outlier detection method based on density clustering
  • Local outlier detection method based on density clustering

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0059] Such as figure 1 , figure 2 As shown, a local outlier detection method based on density clustering described in this embodiment includes the following steps:

[0060] (a) Obtain the number of data families and cluster centers of the detected data set;

[0061] (b) by calculating the mean and standard deviation of the descriptive features of each data object in different data clusters;

[0062] (c) Use the 3sigma criterion to detect the outliers of each data cluster.

[0063] Step (a) is refined as follows:

[0064] (a1) Data set preprocessing: including data cleaning and data normalization. Data cleaning can manually delete noise data and data with missing values. To measure the influence of difference on data clustering and outlier detection process, it is necessary to normalize the data. Data normalization methods include maximum and minimum normalization, z-score normalization, and decimal scaling normalization.

[0065] (a2) Calculating the dissimilarity betwe...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a local outlier detection method based on density clustering. The method includes the steps: (a) acquiring data cluster number and clustering center of a to-be-detected dataset; (b) calculating the mean and standard deviation of description features of each data object in different data clusters; (c) detecting according to 3sigma criterion to obtain outliers of each data cluster. By adoption of the local outlier detection method based on density clustering, easiness in setting of parameters is realized, the method is applicable to datasets with different density regions and arbitrary shapes and can be used for detection of local outliers, accuracy of outlier detection results is high, algorithms are insensitive to selection of parameter values, and excellent robustness is achieved.

Description

technical field [0001] The invention relates to the field of outlier point detection, in particular to a local outlier point detection method based on density clustering. Background technique [0002] Outlier detection is a branch of data mining whose task is to identify observations whose data characteristics are significantly different from other data objects. Outlier detection is very important in data mining, because if the anomalies are caused by the variation of inherent data, analyzing them can reveal deeper, potential and valuable information hidden in it. Therefore, outlier detection is a very meaningful research direction. [0003] Data mining expert Hawkins defines outliers as: "Outliers are data objects that are distinctive in the data set, and their performance is so different from other data objects that people suspect that these data objects are not random deviations, but caused by produced by a completely different mechanism". This definition reveals the n...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F19/00
Inventor 王电钢黄林黄昆常健陈龙潘可佳
Owner INFORMATION & TELECOMM COMPANY SICHUAN ELECTRIC POWER
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products