Class-imbalance problem classification method based on expansion training data set

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A training data set, problem classification technology, applied in character and pattern recognition, instruments, computer parts and other directions, can solve problems such as limited improvement in accuracy, relatively limited improvement, and low time complexity, and achieve improved results and good results. effect, the effect of improving the classification accuracy

Inactive Publication Date: 2018-08-31

SOUTH CHINA UNIV OF TECH

View PDF0 Cites 41 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

These algorithms generally have low time complexity, so the improvement of the results in the experiment is relatively limited: Based on the method of random oversampling, some samples are re-sampled, although the number of minority samples is increased, but to a certain extent The risk of over-fitting is increased; the over-sampling method based on SMOTE is often in the minority class samples, and the data is expanded according to certain rules. The distribution of the original data cannot be well simulated, and it will not be applicable to all data sets, so the improvement of the accuracy of the results is limited

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0033] The problem of class imbalance is a common problem in the process of obtaining data sets. It is specifically manifested as: the number of samples of a certain class in the data set is far from the number of other samples. For example, in the data set of credit card fraud, the behavior of most users is normal, and only a very small number of users will be judged as fraudulent. If the data set or algorithm is not improved accordingly, and the classification training is carried out directly, the result is that the sample data of the minority class will not be given sufficient attention, and in severe cases, it will even be ignored by the classifier as noise, resulting in poor classification results. Serious deviation.

[0034] In this context, how to obtain our ideal results from category-imbalanced data has become a problem that requires in-depth exploration. At present, there are two main types of optimization methods for the imbalance problem: (1) change the original d...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a class-imbalance problem classification method based on expansion training data set; the method comprises the following steps: obtaining a true data set needed by a classification task; screening a few class samples from the true data set, and distinguishing samples that are close to and far away from the decision boundary; inputting said samples, running a productive confrontation network, thus obtaining artificial samples similar to the true data; adding certain amount of artificial samples into the true data set, thus obtaining a mixed data set; inputting the mixeddata set, and using a classifier to classify. The method combines a CycleGAN model with the boundary information of an original data set, thus effectively simulating distribution features of the truedata. The method samples small sample data so as to improve the classifier precision, and effectively preventing the class-imbalance problem from affecting the classification task.

Description

technical field [0001] The invention relates to the technical field of classification optimization in data mining, in particular to a classification method for class imbalance problems based on an expanded training data set. Background technique [0002] With the continuous deepening of network informatization, the total amount of data on the entire Internet is constantly increasing. How to fully explore and utilize the useful information contained in the data has become a hot issue in the field of computer science in recent years. For massive data sets, various machine learning methods have achieved good results, but there are still many insurmountable obstacles. The imbalance of sample categories is a common problem in the process of obtaining data sets. It is specifically manifested as: the number of samples of a certain type in the data set is far from the number of other samples. For example, in the data set of credit card fraud, the behavior of most users is normal, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06K9/62

CPCG06F18/2411G06F18/214

Inventor 俞彬王家兵

Owner SOUTH CHINA UNIV OF TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Class-imbalance problem classification method based on expansion training data set

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology