Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Angle-based high dimensional data outlier detection method

A technology of high-dimensional data and detection methods, applied in multi-dimensional databases, structured data retrieval, database models, etc., can solve problems such as distance dimension disasters

Inactive Publication Date: 2015-12-09
HOHAI UNIV
View PDF0 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, the existing outlier data mining is mainly based on the concept of distance or nearest neighbor for outlier mining. In high-dimensional data, high-dimensional space distance and nearest neighbor no longer have the characteristics of Euclidean space, and there will be a disaster of distance dimension.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Angle-based high dimensional data outlier detection method
  • Angle-based high dimensional data outlier detection method
  • Angle-based high dimensional data outlier detection method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] The present invention will be further described below in conjunction with the accompanying drawings. The following examples are only used to illustrate the technical solution of the present invention more clearly, but not to limit the protection scope of the present invention.

[0033] Such as figure 1 As shown, an angle-based outlier detection method for high-dimensional data includes the following steps:

[0034] 1) In the data set D, for each data point A∈D, obtain the k nearest neighbor points of A;

[0035] In order to obtain the k nearest neighbors of each data point, it is necessary to give a formal description of the high-dimensional data and the calculation method of the k nearest neighbors, respectively:

[0036] 1-1) formalize the data set, the high-dimensional data is formalized as:

[0037] For a given high-dimensional dataset Norm || || is defined for R d →R + , the inner product is defined as R d × R d → R, point A, B ∈ D, representation vec...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an angle-based high dimensional data outlier detection method and belongs to the technical field of outlier data mining. The method comprises the specific steps that 1, k nearest neighbour points of each data point A belonging to a data set D are obtained in the data set D; 2, an angle-based outlier factor of each data point is calculated; 3, the outlier factors of the data points are ranked, and a point set with the minimum outlier factor is selected as an outlier point set with the largest data outlier degree; 4, outlier data are determined. According to the method, outlier data concealed in large-scale high dimensional data can be found efficiently and rapidly, the problem of curse of dimensionality of the outlier detection method based on high dimensional distance, nearest neighbour and the like can be effectively solved, and the method can be widely applied in high dimensional data for credit card fraud detection, traffic accident detection, scientific data measurement abnormal detection and the like.

Description

technical field [0001] The invention relates to an angle-based high-dimensional data outlier detection method, which belongs to the technical field of outlier data mining. Background technique [0002] Outlier data mining technology is one of the research hotspots in the field of data mining at present, and it is widely used in network traffic intrusion detection, traffic accident detection, scientific data measurement anomaly detection and other fields. At present, the existing outlier data mining is mainly based on the concept of distance or nearest neighbor for outlier mining. In high-dimensional data, high-dimensional space distance and nearest neighbor no longer have the characteristics of Euclidean space, and there will be a disaster of distance dimension. . In high-dimensional data, because the outliers are far away from other data points, the angle between the vectors formed by the outliers and other points does not change much, while the non-outliers are surrounded...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/283
Inventor 刘文婷
Owner HOHAI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products