Classification method based on improved DBSCAN-SMOTE algorithm

A classification method and algorithm technology, applied in the field of classification applicable to unbalanced samples, can solve problems such as not taking into account the imbalance within the class, the effect cannot be achieved, and the classification effect is affected

Inactive Publication Date: 2016-09-07
SHENZHEN ETTOM TECH CO LTD
View PDF0 Cites 33 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0021] At the same time, there are certain disadvantages in the Boosting algorithm. In each iteration, the different strategies for assigning sample weights can often have a greater impact on the final classification effect.
At the same time, the determination of the voting weight coefficient of the base classifier is often only determined based on the final classification result of this base classifier. However, in the combination of base classifiers, these base classifiers often have a certain relationship. Only using their classification accuracy to determine the final coefficient combination, how to determine the optimal combination of these coefficients is also a problem that needs to be considered
[0022] The most important drawback of the existing methods is that most of the schemes deal with the imbalance between classes, without considering the existence of imbalance within the class, so the final effect is often not very ideal.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Classification method based on improved DBSCAN-SMOTE algorithm
  • Classification method based on improved DBSCAN-SMOTE algorithm
  • Classification method based on improved DBSCAN-SMOTE algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] The present invention will be further described below in combination with specific embodiments.

[0036] The problem of overlapping sample boundaries is also one of the reasons for the difficulty of unbalanced sample classification. Therefore, in the present invention, the influence of boundary samples on the final classification effect is emphatically considered, and corresponding measures are taken to eliminate this influence. The main method used is: the sampling rate in the minority class boundary sample space is greater than that of the non-boundary sample; for the boundary samples in the majority class sample space, clustering is used, and then the cluster centroid is used to replace the current majority class boundary sample cluster , to remove other original samples in the cluster. After processing from these two aspects, the boundary samples of the original data space are clearer, and the influence of boundary samples on the classification of unbalanced samples...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a classification method based on an improved DBSCAN-SMOTE algorithm for an intra-class unbalanced condition in data sample space processing, firstly, in a data sample set, which are belongs to boundary samples is judged, the boundary samples are divided into majority boundary samples and minority boundary samples, and cluster is performed on the boundary samples in a majority boundary sample space; a PSO algorithm is adopted to optimize oversampling rate of the boundary samples and safe samples in cluster, oversampling with different sampling rates is performed on minority boundary samples through an SMOTE algorithm; wherein the cluster is based on the improved DBSCAN algorithm, the algorithm can generate cluster of minority, perform oversampling in the sample cluster, and can fully resolve the problem of uneven distribution and data fragment or small disjunct in intra-class unbalance.

Description

technical field [0001] The invention belongs to the field of classification technology optimization in data mining, in particular to a classification method applicable to unbalanced samples. Background technique [0002] In the problem of unbalanced sample classification, unbalanced data means that there is a huge gap in the number of samples of one class in the sample space of the entire data set and the samples of the other class or several classes. Often the minority class in this case requires us to devote more attention. For example, in the application of medical diagnosis, the data sample space of cancer or heart disease is an unbalanced sample. In this type of sample, the objects we pay attention to are often diseased samples. By accurately classifying the attributes of these samples, we can accurately Diagnose the patient's condition and give these patients timely and targeted treatment. [0003] In order to pursue a higher global classification accuracy when deali...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/2321G06F18/241
Inventor 张春慨
Owner SHENZHEN ETTOM TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products