Unbalanced data classification method based on unbalanced classification indexes and integrated learning

A classification index and ensemble learning technology, applied in the field of data processing, can solve problems such as the fact that the actual distribution cannot be well reflected, the classification accuracy of minority classes is low, and the data is mixed with noise, so as to reduce the amount of calculation, avoid overfitting, and improve the calculation. effect of speed

Inactive Publication Date: 2015-09-30
XIDIAN UNIV
View PDF3 Cites 42 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Facing the learning problem of unbalanced data sets, the difficulty of research mainly comes from the characteristics of the unbalanced data itself: the minority class samples in the unbalanced data set are insufficient, and the distribution of samples cannot well reflect the actual distribution of the entire class; the majority class Noisy data is usually mixed in, so that the two types of samples tend to overlap to varying degrees
In addition, when the traditional classification methods in the field of machine learning are directly applied to unbalanced data, if the data imbalance is not considered, it is easy to misclassify the minority class samples into the majority class. Although the overall classification accuracy is relatively high, it is difficult for Minority class has very low classification accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Unbalanced data classification method based on unbalanced classification indexes and integrated learning
  • Unbalanced data classification method based on unbalanced classification indexes and integrated learning
  • Unbalanced data classification method based on unbalanced classification indexes and integrated learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] refer to figure 1 , the specific implementation steps of the present invention are as follows:

[0023] Step 1, select the training set and test set, and set the maximum number of iterations T.

[0024] Input an unbalanced data set containing two types of data, and record the class with more samples as the majority class, and the class with fewer samples as the minority class, and randomly select nine out of ten samples from the unbalanced data set as training samples, use the remaining samples as test samples, and set the maximum number of iterations T.

[0025] Step 2, initialize the weights of the training samples.

[0026] Assume that the initial weight of the training sample obeys the uniform distribution, that is, each (x i ,y i ) ∈ S, Where i=1,2,...,N, t=1, N represents the number of training samples, S represents the training set, x i Indicates the i-th training sample, y i Indicates the category identification of the i-th training sample, D t Indicate...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an unbalanced data classification method based on unbalanced classification indexes and integrated learning, and mainly solves the problem of low classification accuracy of the minority class of the unbalanced data in the prior art. The method comprises steps as follows: (1), a training set and a testing set are selected; (2), training sample weight is initialized; (3), part of training samples is selected according to the training sample weight for training a weak classifier, and the well trained weak classifier is used for classifying all training samples; (4), the classification error rate of the weak classifier on the training set is calculated, is compared with a set threshold value and is optimized; (5), voting weight of the weak classifier is calculated according to the error rate, and the training sample weight is updated; (6), whether the training of the weak classifier reaches the maximum number of iterations is judged, if the training of the weak classifier reaches the maximum number of iterations, a strong classifier is calculated according to the weak classifier and the voting weight of the weak classifier, and otherwise, the operation returns to the step (3). The classification accuracy of the minority class is improved, and the method can be applied to classification of the unbalanced data.

Description

technical field [0001] The invention belongs to the field of data processing, and relates to an integrated learning classification method, in particular to an unbalanced data classification method based on unbalanced classification index and integrated learning, which can be used for classification and identification of unbalanced data. Background technique [0002] With the rapid development of global information technology, powerful computers, data collection equipment and storage equipment provide people with a large amount of data information for transaction management, information retrieval and data analysis. Although the amount of data obtained is extremely large, the data that is useful to people often only accounts for a small part of the total data. This kind of data set in which the amount of sample data of a certain type is significantly less than that of other types of samples is called an unbalanced data set, and the classification problems of unbalanced data se...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/2411
Inventor 张向荣焦李成宋润青李阳阳白静马文萍侯彪马晶晶
Owner XIDIAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products