Classification of class-imbalanced data

An unbalanced class and data technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve problems such as dependence on classification effects, small proportion of illegal transactions, and difficulty in determining the optimal distribution of data sets

Inactive Publication Date: 2015-09-23
CHINA UNIONPAY
View PDF3 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For example, in the application of credit card transaction fraud detection, more attention is paid to fraudulent customers, but most of the monitored data sets are normal credit card transaction records, and the proportion of illegal transactions is very small
There are many existing techniques for dealing with classification problems, such as decision trees, Bayesian networks, support vector machines, etc., but these techniques are mostly designed for balanced data, without considering the huge difference in the distribution of positive and negative data nature, so the processing effect is not good
[0004] At present, the classification of unbalanced data mainly adopts two ideas: one is to change the distribution of training set samples to reduce the degree of imbalance, which mainly includes resampling methods that change the distribution of data sets. The disadvantage is that the classification effect depends on the resampling algorithm. For many applications, it is difficult to determine the optimal distribution of data sets; the second is to construct new algorithms or transform existing algorithms according to the characteristics of unbalanced data (such as cost-sensitive learning methods, feature selection methods, and single-class learning methods, etc.), The disadvantage of the cost-sensitive learning method is that it is difficult to give an accurate estimate of the cost of misclassification, so that the overall performance improvement cannot be guaranteed. Feature selection methods are more suitable for text classification problems, and the scope of application is relatively limited. The disadvantage of the single-class learning method is that only a small number of positive data are used, and the useful information contained in the negative data is completely ignored.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Classification of class-imbalanced data
  • Classification of class-imbalanced data
  • Classification of class-imbalanced data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] The present invention will be described more fully hereinafter with reference to the accompanying drawings, in which exemplary embodiments of the invention are shown. This invention may, however, be embodied in different forms and should not be construed as limited to only the embodiments set forth herein. The above-mentioned embodiments are given to make the disclosure of this document comprehensive and complete, so as to make the understanding of the protection scope of the present invention more comprehensive and accurate.

[0037] Words such as "comprising" and "comprising" mean that in addition to the units and steps that are directly and explicitly stated in the specification and claims, the technical solution of the present invention does not exclude other elements that are not directly or explicitly stated. Situation of units and steps.

[0038] According to one aspect of the invention, the classification of unbalanced class data is based on the splitting of ob...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention relates to the data mining technology, and especially relates to a method for training a class-imbalanced data classifier, a class-imbalanced data classifier and a method for classifying the class-imbalanced data. According to one embodiment of the method for training the class-imbalanced data classifier, data classified by the class-imbalanced data classifier has a plurality of properties. The method comprises the following steps that the properties are divided into a plurality of property groups, each property group corresponds to one sub-classifier, and each sub-classifier is suitable for classifying the data based on the corresponding property group, so as to obtain an ultimate classification result by the classification results of the sub-classifiers according to pre-set rules; training data samples are divided into multiple test sets; and for each property group, the corresponding sub-classifiers are trained by using different test sets.

Description

field of invention [0001] The invention relates to data mining technology, in particular to a training method for an unbalanced data classifier, an unbalanced data classifier and a method for classifying unbalanced data. Background technique [0002] Classification is one of the most commonly used techniques in data mining and machine learning, in which a classifier is trained on a set of objects of known classes and then objects of unknown classes are applied to the classifier to determine the corresponding classes. In the unbalanced class data, the number of samples of a certain class is much larger than that of other classes, the former is called negative class data, and the latter is called positive class data. [0003] In practical applications (such as credit card transaction fraud detection, network intrusion detection, medical disease diagnosis, etc.), the classification problem of unbalanced data is often encountered. The common point of these problems is that minor...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 杨鸿超赵金涛邱雪涛王骏
Owner CHINA UNIONPAY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products