Data classification method based on improved local abnormal factor detection

A technology of local abnormal factor and data classification, applied in the field of data processing, it can solve problems such as failure to meet expected requirements, poor stability of clustering result accuracy, and failure to take into account the correlation of data within clusters, to achieve the effect of improving accuracy.

Pending Publication Date: 2019-08-02
GUIZHOU NORMAL UNIVERSITY
View PDF0 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the above improvements to the K-means algorithm do not take into account the correlation of data within the cluster, which often leads to poor stability of the accuracy of the clustering results and thus fails to meet the expected requirements.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data classification method based on improved local abnormal factor detection
  • Data classification method based on improved local abnormal factor detection
  • Data classification method based on improved local abnormal factor detection

Examples

Experimental program
Comparison scheme
Effect test

experiment example

[0086] Experimental example: prove the practicability of the inventive method, concrete steps are as follows:

[0087] Select six public data sets of Iris, Wine, Seeds, Wifi Localization, CMC, and Abalone in the UCI database, and test the results of K-means++, FCM, OFMMK-means, and optimized algorithms respectively. A detailed description of the datasets used is shown in Table 1.

[0088] Table 1 is the data set of the laboratory

[0089]

[0090] In the LOF algorithm, the parameter k_dist represents the number of detected neighborhood points. The larger the value is, the more sample points are selected, and the accuracy of clustering is more easily affected by the LOF value. This paper uses the above six data sets to do the following experiments on the value of the parameter k_dist, such as figure 1 shown.

[0091] Run the K-means++ algorithm, FCM algorithm, OFMMK-means algorithm, and the proposed optimization algorithm on the sample data sets Iris, Wine, Seeds, Wifi L...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data classification method based on improved local abnormal factor detection. The method comprises the steps of outlier factor detection; similarity measurement; selection ofan initial clustering center point, and screening of data with relatively small outlier factors as a candidate set of the initial clustering center through a local outlier factor detection LOF algorithm for adaptively adjusting k distance parameters; and iterative optimization on the clustering center. In the iteration stage of the optimized clustering center, the outlier factor between the datais standardized by using the outlier standardization, so that the value range of the new outlier factor new _ ri is greater than or equal to 1. According to the invention, the accuracy of cluster center positioning and cluster division is improved.

Description

technical field [0001] The invention belongs to the technical field of data processing, and in particular relates to a data classification method based on improved detection of local abnormal factors. Background technique [0002] At present, the use of cluster analysis to realize data classification has become an indispensable technology in the field of data mining, and has broad application prospects in the fields of commerce, insurance, biology, and e-commerce. [0003] There are many kinds of clustering algorithms, including K-means algorithm based on distance division, FCM fuzzy clustering based on membership degree division, etc. Among them, the K-means algorithm has the advantages of simple thinking, easy implementation and fast clustering speed, but its clustering center is easily affected by outliers and abnormal points, which will cause the clustering to fall into local optimum. Therefore, the application and optimization of this algorithm in data classification h...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/28G06K9/62
CPCG06F16/285G06F18/22
Inventor 游子毅
Owner GUIZHOU NORMAL UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products