Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Class-imbalance problem classification method based on expansion training data set

A training data set, problem classification technology, applied in character and pattern recognition, instruments, computer parts and other directions, can solve problems such as limited improvement in accuracy, relatively limited improvement, and low time complexity, and achieve improved results and good results. effect, the effect of improving the classification accuracy

Inactive Publication Date: 2018-08-31
SOUTH CHINA UNIV OF TECH
View PDF0 Cites 41 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

These algorithms generally have low time complexity, so the improvement of the results in the experiment is relatively limited: Based on the method of random oversampling, some samples are re-sampled, although the number of minority samples is increased, but to a certain extent The risk of over-fitting is increased; the over-sampling method based on SMOTE is often in the minority class samples, and the data is expanded according to certain rules. The distribution of the original data cannot be well simulated, and it will not be applicable to all data sets, so the improvement of the accuracy of the results is limited

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Class-imbalance problem classification method based on expansion training data set
  • Class-imbalance problem classification method based on expansion training data set
  • Class-imbalance problem classification method based on expansion training data set

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0033] The problem of class imbalance is a common problem in the process of obtaining data sets. It is specifically manifested as: the number of samples of a certain class in the data set is far from the number of other samples. For example, in the data set of credit card fraud, the behavior of most users is normal, and only a very small number of users will be judged as fraudulent. If the data set or algorithm is not improved accordingly, and the classification training is carried out directly, the result is that the sample data of the minority class will not be given sufficient attention, and in severe cases, it will even be ignored by the classifier as noise, resulting in poor classification results. Serious deviation.

[0034] In this context, how to obtain our ideal results from category-imbalanced data has become a problem that requires in-depth exploration. At present, there are two main types of optimization methods for the imbalance problem: (1) change the original d...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a class-imbalance problem classification method based on expansion training data set; the method comprises the following steps: obtaining a true data set needed by a classification task; screening a few class samples from the true data set, and distinguishing samples that are close to and far away from the decision boundary; inputting said samples, running a productive confrontation network, thus obtaining artificial samples similar to the true data; adding certain amount of artificial samples into the true data set, thus obtaining a mixed data set; inputting the mixeddata set, and using a classifier to classify. The method combines a CycleGAN model with the boundary information of an original data set, thus effectively simulating distribution features of the truedata. The method samples small sample data so as to improve the classifier precision, and effectively preventing the class-imbalance problem from affecting the classification task.

Description

technical field [0001] The invention relates to the technical field of classification optimization in data mining, in particular to a classification method for class imbalance problems based on an expanded training data set. Background technique [0002] With the continuous deepening of network informatization, the total amount of data on the entire Internet is constantly increasing. How to fully explore and utilize the useful information contained in the data has become a hot issue in the field of computer science in recent years. For massive data sets, various machine learning methods have achieved good results, but there are still many insurmountable obstacles. The imbalance of sample categories is a common problem in the process of obtaining data sets. It is specifically manifested as: the number of samples of a certain type in the data set is far from the number of other samples. For example, in the data set of credit card fraud, the behavior of most users is normal, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62
CPCG06F18/2411G06F18/214
Inventor 俞彬王家兵
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products