Classification method for unbalanced data set

A classification method and data set technology, applied in computing models, machine learning, computing, etc., can solve problems such as loss of useful information, category imbalance, etc., to achieve the effect of improving the integration effect

Pending Publication Date: 2019-05-21
XIAMEN UNIV
View PDF3 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0017] The purpose of the present invention is to solve the problem of class imbalance in the data source. Most methods down-sample the majority class samples, resulting in the loss of a large amount of useful information, and provide sufficient and reasonable use of the majority class samples based on integrated learning and logistic regression, thereby A classification method for imbalanced data sets to further improve the classification effect

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Classification method for unbalanced data set
  • Classification method for unbalanced data set
  • Classification method for unbalanced data set

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] The following embodiments will further illustrate the present invention in conjunction with the accompanying drawings. Without loss of generality, when introducing specific implementation methods, negative samples are used as majority class samples, and positive samples are used as minority class samples.

[0033] figure 1 The construction process of the present invention is shown. The process consists of two steps: data preparation and model training. In the data preparation stage, the preparation of relevant data is mainly completed. Prepare a corresponding number of negative sample sets, hyperparameter combinations, and feature sets according to the number of weak learners used in the model. In the model training phase, the training of multiple weak learners and logistic regression classifiers in the model is mainly completed.

[0034] The data variable and parameter thereof that the present invention uses are shown in table 1 and 2 respectively:

[0035] Table ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a classification method for an unbalanced data set, and relates to class imbalance. Aiming at the problem of class imbalance in a data source, most of the methods downsample aplurality of types of samples to cause loss of a large amount of useful information, the invention provides the classification method for the unbalanced data set, which fully and reasonably uses the plurality of types of samples based on ensemble learning and logistic regression so as to further improve the classification effect. The method comprises data preparation and model training. Each weaklearner uses a plurality of types of samples which are completely different to learn, and compared with a traditional undersampling method, the weak learner can make full use of information of the plurality of types of samples. Each weak learner uses a plurality of different types of samples, and the feature set and the training parameters certainly improve the diversity of the samples, so that the integration effect is improved. The output of each weak learner is adaptively combined by using logic regression, and compared with the traditional simple average output, the output is more reasonable and anti-interference.

Description

technical field [0001] The present invention relates to category imbalance, in particular to a classification method for imbalanced data sets. Background technique [0002] Class imbalance refers to the situation that the number of training samples of different classes in the classification task is very different. In reality, most classification tasks face unbalanced data sets. At present, there are mainly two types of mitigation methods for the problem of class imbalance in binary classification datasets: the first type is to directly "under-sample" the majority class samples in the training set, that is, to remove some majority class samples so that the number of classes is close to [1-4] , and then learn; the second type is to "oversample" the minority class samples in the training set [5-6] , that is, increase the number of minority class samples to make the number of classes close, and then learn. Oversampling methods are prone to overfitting, especially when the clas...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06N20/00
Inventor 张仲楠杨杰
Owner XIAMEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products