A sample balancing method in data mining

A data mining and balancing technology, applied in the direction of electrical digital data processing, special data processing applications, digital data information retrieval, etc., can solve problems such as noise amplification, wrong prediction results, etc., to enhance generalization ability, avoid noise amplification, The effect of avoiding information loss

Inactive Publication Date: 2019-02-22
SUNING CONSUMER FINANCE CO LTD
View PDF0 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] 2. Sample weighting: In the process of classification algorithm, according to the ratio of positive and negative samples, higher weights are given to samples of rare classes. This method is similar to over-sampling technology, and it is also easy to cause noise amplification and over-fitting
This technique has a dangerous flaw: if the patterns that generate the random samples are not as random as they are supposed to be, but constitute some subtly non-random pattern, then the entire simulation (and its predictions) can be wrong

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A sample balancing method in data mining

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] The present invention will be further described in detail below in conjunction with the accompanying drawings and through the examples. The following examples are to explain the present invention and the present invention is not limited to the following examples.

[0029] like figure 1 As shown, the embodiment of the present invention takes the detection of normal samples and fraudulent samples in the consumer finance industry as an example to illustrate the specific operation of sample balance.

[0030] A sample balancing method in data mining, comprising the following steps:

[0031] Step 1: Prepare fraud samples and normal samples according to the project tasks, and divide the samples into training sets and test sets. Perform the operations of step 2-step 5 on the samples of the training set.

[0032] Step 2: Count the number of normal samples (positive samples) pos_num and the number of fraudulent samples (negative samples) neg_num, and calculate the ratio of norm...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a sample balancing method in data mining, which randomly divides the samples in most classes into equal frequency according to the proportion of positive and negative samples.The classification algorithm model is constructed by combining each group of most class samples with the total number of sparse class samples. Finally, the model integration method is used to fuse multiple models. The invention integrates the advantages of two methods of over-sampling and under-sampling, and improves the generalization ability of the model.

Description

technical field [0001] The invention relates to a sample balancing method, in particular to a sample balancing method in data mining. Background technique [0002] In the process of risk control modeling in the consumer finance industry, the proportion of positive and negative samples is extremely unbalanced, that is, there are far more normal customers than overdue customers, and far more normal transactions than fraudulent transactions. In this case, correct predictions for rare classes are more valuable than correct predictions for majority classes, but current classification algorithms are based on balanced samples, where positive and negative samples are treated equally. The imbalance of the class distribution has a serious impact on the performance of the classifier. For example, if 1% of the transactions are fraudulent transactions, the classifier can predict all transactions as normal transactions when predicting, and the prediction accuracy can be 99%, even if it is...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/2458
Inventor 黄付杰戚文平
Owner SUNING CONSUMER FINANCE CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products