Two-stage resampling method for unbalanced data

A two-stage, resampling technology, applied in neural learning methods, instruments, biological neural network models, etc., can solve problems such as unbalanced data, distribution characteristics and difficult synthesis of samples, etc., to improve quality, improve classification performance indicators, and enrich and diversify sexual effect

Pending Publication Date: 2022-02-22
KUNMING UNIV OF SCI & TECH
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] The purpose of the present invention is: traditional sampling methods often produce more overlapping samples when constructing a balanced data set, and CGAN, which is more advantageous in dealing with unbalanced data, often c

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Two-stage resampling method for unbalanced data
  • Two-stage resampling method for unbalanced data
  • Two-stage resampling method for unbalanced data

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0038] The present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0039] The technical solution adopted in the present invention is a two-stage resampling method for unbalanced data, and the overall process of the present invention is as follows figure 1 As shown, the specific steps of the invention and related pseudocodes are shown in Algorithm 1.

[0040] Algorithm 1: Imbalanced data classification preprocessing method based on CGAN and SMOTEENN

[0041] Input: the original unbalanced dataset S after normalization;

[0042] Output: classification results after sample set processing

[0043] Step 1: Divide the unbalanced data set S after data normalization processing into training sets Strain and Stest, record the number of positive and negative samples in the training set as S0, S1, and add_num=S1-S0 the number of generated samples

[0044] Step 2: Use the SMOTEENN method to oversample the original unbal...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a two-stage resampling method for unbalanced data. The method comprises the following specific steps: firstly, increasing the number of minority class samples by adopting an SMOTEENN method to enable a data set to tend to be balanced; then introducing positive samples and labels sampled by the SMOTEENN as the input of a generative network, so as to enable a CGAN to be able to fully learn distribution characteristics of the positive samples in the training process; then, based on the generative network in the CGAN, carrying out amplification on minority class samples to synthesize a new balanced data set. The method provided by the invention is applied to a plurality of shared standard unbalanced data set classification experiments, and experiment results show that compared with other classical unbalanced data set processing methods, the method provided by the invention has the advantages that the distribution of generated samples is more reasonable, and the performance on a classifier is more advantageous.

Description

technical field [0001] The invention belongs to the technical field of classification methods for unbalanced data sets in machine learning, and relates to a preprocessing method for unbalanced data classification based on conditional generative confrontation networks and SMOTEENN. Background technique [0002] As the core technology in the field of artificial intelligence, machine learning is widely used in big data analysis. By establishing classification or regression models for massive and complex data, valuable information and laws can be learned from them. Traditional classification methods are usually based on a basic assumption that the number of samples in each category in the data set is consistent or equal and the cost of misclassification is equal. However, the data that can be collected in practical application scenarios often has the problem of data imbalance. Under the condition of data imbalance, the traditional classification learning method with the overall ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62G06N3/04G06N3/08
CPCG06N3/08G06N3/045G06F18/2411G06F18/2414G06F18/24147G06F18/214
Inventor 朱波刘宁徐淼陈春梅李岫宸
Owner KUNMING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products