Two-stage resampling method for unbalanced data

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A two-stage, resampling technology, applied in neural learning methods, instruments, biological neural network models, etc., can solve problems such as unbalanced data, distribution characteristics and difficult synthesis of samples, etc., to improve quality, improve classification performance indicators, and enrich and diversify sexual effect

Pending Publication Date: 2022-02-22

KUNMING UNIV OF SCI & TECH

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0008] The purpose of the present invention is: traditional sampling methods often produce more overlapping samples when constructing a balanced data set, and CGAN, which is more advantageous in dealing with unbalanced data, often cannot fully learn its distribution characteristics due to the limitation of the number of positive samples It is difficult to generate high-quality synthetic samples, and a two-stage resampling method for imbalanced data is proposed

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0038] The present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0039] The technical solution adopted in the present invention is a two-stage resampling method for unbalanced data, and the overall process of the present invention is as follows figure 1 As shown, the specific steps of the invention and related pseudocodes are shown in Algorithm 1.

[0040] Algorithm 1: Imbalanced data classification preprocessing method based on CGAN and SMOTEENN

[0041] Input: the original unbalanced dataset S after normalization;

[0042] Output: classification results after sample set processing

[0043] Step 1: Divide the unbalanced data set S after data normalization processing into training sets Strain and Stest, record the number of positive and negative samples in the training set as S0, S1, and add_num=S1-S0 the number of generated samples

[0044] Step 2: Use the SMOTEENN method to oversample the original unbal...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a two-stage resampling method for unbalanced data. The method comprises the following specific steps: firstly, increasing the number of minority class samples by adopting an SMOTEENN method to enable a data set to tend to be balanced; then introducing positive samples and labels sampled by the SMOTEENN as the input of a generative network, so as to enable a CGAN to be able to fully learn distribution characteristics of the positive samples in the training process; then, based on the generative network in the CGAN, carrying out amplification on minority class samples to synthesize a new balanced data set. The method provided by the invention is applied to a plurality of shared standard unbalanced data set classification experiments, and experiment results show that compared with other classical unbalanced data set processing methods, the method provided by the invention has the advantages that the distribution of generated samples is more reasonable, and the performance on a classifier is more advantageous.

Description

technical field [0001] The invention belongs to the technical field of classification methods for unbalanced data sets in machine learning, and relates to a preprocessing method for unbalanced data classification based on conditional generative confrontation networks and SMOTEENN. Background technique [0002] As the core technology in the field of artificial intelligence, machine learning is widely used in big data analysis. By establishing classification or regression models for massive and complex data, valuable information and laws can be learned from them. Traditional classification methods are usually based on a basic assumption that the number of samples in each category in the data set is consistent or equal and the cost of misclassification is equal. However, the data that can be collected in practical application scenarios often has the problem of data imbalance. Under the condition of data imbalance, the traditional classification learning method with the overall ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06K9/62G06N3/04G06N3/08

CPCG06N3/08G06N3/045G06F18/2411G06F18/2414G06F18/24147G06F18/214

Inventor 朱波刘宁徐淼陈春梅李岫宸

Owner KUNMING UNIV OF SCI & TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Two-stage resampling method for unbalanced data

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology