Unbalanced data classification method based on adaptive upsampling

A classification method and upsampling technology, applied in the field of pattern recognition, can solve problems such as unsatisfactory effects, achieve the effect of enhancing learning and improving overall performance

Inactive Publication Date: 2016-09-28
TIANJIN UNIV
View PDF5 Cites 27 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, since the traditional Adaboost algorithm itself does not pay too much attention to positive samples, the effect is still not ideal
[0006] From the above analysis, it can be se

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Unbalanced data classification method based on adaptive upsampling
  • Unbalanced data classification method based on adaptive upsampling
  • Unbalanced data classification method based on adaptive upsampling

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0017] The present invention is subject to adaptive upsampling algorithm and figure 1 Inspired by the Adaboost algorithm shown, the two are combined to form an ensemble classifier. The present invention will be described in further detail below in conjunction with the accompanying drawings.

[0018] (1) Obtaining test and training data: the present invention selects the vehicle type identification database in the KEEL database, which contains 846 samples altogether. The positive samples in the database are small truck data, a total of 199, namely n p =199. Negative samples include the data of three types of vehicles including buses, Opel cars, and Saab cars, a total of 647, that is, n n =647. The database contains a total of 18-dimensional features such as torque, steering radius, and maximum braking distance. Calculate the unbalance rate according to formula (1),

[0019] IR=n n / n p (1)

[0020] It can be obtained that the imbalance ratio in this experiment should b...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an unbalanced data classification method based on adaptive upsampling. The method includes the following steps of calculating the total of positive samples to be newly generated; calculating the probability density distribution for each positive sample by taking the Euclidean distance as the metric; determining the number of the new samples to be generated of the positive sample; generating a new positive sample and adding the newly generated positive sample points to an original unbalanced training set to make the positive and negative samples be same in number, namely, obtaining a new balance training set including n<n> positive samples and n<n> negative samples; and training the newly generated balance training set by means of an Adaboost algorithm and obtaining a final classification model after the iteration for T times. According to the invention, the classification performance of the unbalanced dataset is improved.

Description

Technical field [0001] The invention relates to pattern recognition technology, in particular to a classifier for unbalanced data sets. Background technique [0002] With the rapid development of data mining, pattern recognition and machine learning technologies, data classification has been applied and played an important role in many fields such as image retrieval, medical detection and diagnosis, polygraph detection, text classification and crude oil spill detection. However, classical classification algorithms such as support vector machines, artificial neural networks, and linear discriminant analysis are designed with the assumption that the training data set contains approximately the same number of samples for each class. But in fact, in the above-mentioned fields, the number of abnormal samples (positive samples) is often much less than that of normal samples (negative samples). At this time, in order to obtain a higher overall accuracy, the classical classifier wi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62
CPCG06F18/2148G06F18/24
Inventor 吕卫李喆褚晶辉
Owner TIANJIN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products