Outlier detection method based on uncertain data set

A technology for determining data and detection methods, applied in instruments, character and pattern recognition, computer parts, etc., can solve the problems of difficult to give outlier data, difficult to accurately judge whether the data is abnormal, and abnormal data.

Inactive Publication Date: 2016-03-02
HOHAI UNIV
View PDF0 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] Outlier data mining technology is one of the research hotspots in the field of data mining at present. The existing outlier data mining is mainly based on the concept of distance or nearest neighbor to determine the outlier mining. With the widespread popularity of the Internet and mobile Internet, a large number of Uncertain data is widely used in different fields such as financial and economic analysis, electronic communication, and modern logistics. The uncertainty of the data itself makes it difficult to accurately judge whether the data is abnormal, which makes it difficult to give exact outlier data
In uncertain data sets, even if a data object itself does not seem to be an outlier, if its uncertainty level is very high, this data is likely to be suspected to be abnormal

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Outlier detection method based on uncertain data set
  • Outlier detection method based on uncertain data set
  • Outlier detection method based on uncertain data set

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0053] In order to better understand the technical features, technical content and technical effects of the present invention, the accompanying drawings of the present invention will now be described in more detail in conjunction with the embodiments.

[0054] Below in conjunction with accompanying drawing and embodiment the patent of the present invention is further described.

[0055] Such as figure 1 As shown, the present invention provides an outlier detection method based on uncertain data sets, which includes the following steps:.

[0056] Step 1,) Calculate the k-distance and k-distance neighborhood of each data point o in the uncertain data set D, the specific calculation process is as follows:

[0057] 1-1) Formalized data set;

[0058] Uncertain data set D is expressed as D={o 1 ,o 2 ,...o i ...,o n}, n represents the size of the uncertain data set D, where o i Represents a data point in the data set, each data point has d dimensions, that is, d attribute valu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an outlier detection method based on an uncertain data set and belongs to the technical field of outlier data mining. The method comprises a step 1 of computing the k distance and the k distance neighborhood of each data point in the uncertain data set; a step 2 of computing a probability that a data point q becomes a neighbor of a data point o in the k distance neighborhood; a step 3 of computing the reachable distance of each data point; a step 4 of computing the reachable density of each data point; and a step 5 of computing the outlier factor of each data point in order to determine outlier data. The method may effectively find out outlier data concealed in the uncertain data set.

Description

technical field [0001] The invention relates to the technical field of outlier data mining, in particular to an outlier detection method based on an uncertain data set. Background technique [0002] Outlier data mining technology is one of the research hotspots in the field of data mining at present. The existing outlier data mining is mainly based on the concept of distance or nearest neighbor to determine the outlier mining. With the widespread popularity of the Internet and mobile Internet, a large number of Uncertain data is widely used in different fields such as financial and economic analysis, electronic communication, and modern logistics. The uncertainty of the data itself makes it difficult to accurately judge whether the data is abnormal, which makes it difficult to give exact outlier data. In uncertain data sets, even if a data object itself does not seem to be an outlier, if its degree of uncertainty is very high, the data is likely to be suspected to be abnorma...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62
CPCG06F18/22
Inventor 刘文婷
Owner HOHAI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products