Unlock instant, AI-driven research and patent intelligence for your innovation.

Imbalanced Data Classification Method Based on Local Mean

A local average and data classification technology, which is applied to computer parts, instruments, character and pattern recognition, etc., can solve problems such as overfitting, high process complexity, and unstable classification performance

Active Publication Date: 2018-08-31
XIDIAN UNIV
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The disadvantages of this method are: using active learning to iterate, the process is more complicated, and it is prone to overfitting
The disadvantages of this method are: using fuzzy clustering and self-training of support vector machine to update the sample set, the process complexity is high, and part of the sample information may be lost at the same time
The disadvantage of this method is that when it is applied to unbalanced data classification, the recognition rate will be biased towards the majority of data samples, and the classification performance is unstable when only calculating the local mean of a single sample number.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Imbalanced Data Classification Method Based on Local Mean
  • Imbalanced Data Classification Method Based on Local Mean
  • Imbalanced Data Classification Method Based on Local Mean

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0088] Attached below figure 1 , further describe in detail the steps realized by the present invention.

[0089] Step 1, input training samples and test samples.

[0090] Input an unbalanced data training sample set containing two different categories of samples, and record the samples of the two categories as minority samples and majority samples according to the number of samples.

[0091] Enter the test sample set.

[0092] In the embodiment of the present invention, an input training sample set of imbalanced data containing two types of samples with different sizes is selected from the KEEL data set (http: / / www.keel.es / imbalanced.php).

[0093] Step 2, normalization processing.

[0094] Using the minimum-maximum Min-Max normalization method, normalize the feature components of each dimension of all samples in the data training sample set and test sample set to obtain standardized feature component values, the minimum-maximum Min-Max normalization method formula as fol...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an unbalanced classification method based on a local mean value. The invention mainly solves the problem that the existing traditional classification algorithm has a low recognition rate for minority classes on an unbalanced data set. The implementation steps are: 1. Input training samples and test samples; 2. Normalization processing; 3. Construct feature weighted vector; 4. Obtain minority class verification samples and majority class verification samples; 5. Obtain verification test sample set and verification Training sample set 6. Calculate and verify the local mean set; 7. Calculate the verification weighted distance; 8. Get the verification result; 9. Determine whether the verification is completed; 10. Get the number of verified local mean values; . Calculate the weighted distance; 13. Obtain the judgment result. The invention effectively improves the recognition rate of minority class samples on the unbalanced data set, and can be applied to the classification and recognition of unbalanced data.

Description

technical field [0001] The invention belongs to the technical field of computer data processing, and further relates to an unbalanced data classification method based on a local mean in the technical field of data classification. The invention can be used in the classification of unbalanced data to improve the recognition rate of minority class data samples. Background technique [0002] Unbalanced data refers to the unbalanced distribution of the number of training samples in different categories. For example, in fault detection, the number of samples with faults is usually small, while the number of samples in normal operation is large. Traditional classification algorithms pursue the overall recognition rate. , naturally tends to the majority class samples, but in practice the minority class samples are the focus of attention, so it is necessary to adjust the traditional classification algorithm to improve the recognition rate of the minority class samples. Imbalanced cl...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/62
CPCG06F18/24133
Inventor 刘靳孙宽宏姬红兵阿鹏仁刘艳丽葛倩倩王芳
Owner XIDIAN UNIV