Outlier detection method based on clustering

A technology of outlier detection and clustering algorithm, applied in structured data retrieval, special data processing applications, instruments, etc., can solve problems such as difficult to give outlier data, difficult to accurately judge whether the data is abnormal, etc.

Inactive Publication Date: 2016-04-20
HOHAI UNIV
View PDF0 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

With the widespread popularity of the Internet and mobile Internet, a large amount of data is widely used in different fields such as financial and economic analysis, electronic communication, and modern logistics. The complexity of the data itself makes it difficult to accurately judge whether the data is abnormal, which makes it difficult to give an exact separation. group data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Outlier detection method based on clustering
  • Outlier detection method based on clustering
  • Outlier detection method based on clustering

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0051] Below in conjunction with accompanying drawing of description, the present invention will be further described.

[0052] Such as figure 1 As shown, the present invention provides a kind of outlier detection method based on clustering, comprises the following steps:

[0053] 1) Obtain the data set and use the improved k_means clustering algorithm to calculate k clusters;

[0054] 1-1) Obtain data set D;

[0055] Data set with D={x 1 ,x 2 ,...,x i ,...,x n}, i=1,2...n means, where n is the size of the data set D, x i is a data object in the dataset;

[0056] 1-2) Using the maximum and minimum clustering method, initialize m cluster centers;

[0057] 1-2-a) Calculate any data object x in the data set D according to formula (1) i distance to sample center d i , forming a distance sample;

[0058] d i = Σ j = 1 , i ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an outlier detection method based on clustering. The outlier detection method comprises that a data set is obtained, k class clusters are calculated by means of an improved k_means clustering algorithm, the k class clusters are divided into a big class cluster set BC and a small class cluster set SC, outlier factors of data objects are calculated based on the big-small class cluster method and then are collected to form an outlier factor sequence, and outlier data is determined based on the outlier factor sequence. The outlier detection method is advantaged in that outlier data hidden in a lot of data sets can be effectively discovered, the outlier degree of each data object is determined, the outlier detection method is accurate and efficient in detection, and can be widely applied to fields of finance, economic analysis, electronic communication, modern logistics and the like.

Description

technical field [0001] The invention relates to a method for detecting outliers, in particular to a method for detecting outliers based on clustering, and belongs to the technical field of outlier data mining. Background technique [0002] Outlier data mining technology is one of the research hotspots in the field of data mining at present, and the clustering method has a good research foundation in the field of data mining. [0003] At present, the existing outlier data mining is mainly based on the concept of distance or nearest neighbor. With the widespread popularity of the Internet and mobile Internet, a large amount of data is widely used in different fields such as financial and economic analysis, electronic communication, and modern logistics. The complexity of the data itself makes it difficult to accurately judge whether the data is abnormal, which makes it difficult to give an exact separation. group data. Contents of the invention [0004] The main purpose of...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/254G06F16/285
Inventor 刘文婷
Owner HOHAI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products