Classification method of unbalanced data

A classification method and data technology, applied in the fields of instruments, character and pattern recognition, computer parts, etc., can solve the problems that the minority class cannot be correctly identified, the minority class is misclassified, and the classification results of the classifier are affected, so as to reduce the number of participants in training. The number of samples, high classification accuracy, and the effect of saving computing consumption

Pending Publication Date: 2021-10-01
NANJING UNIV OF POSTS & TELECOMM
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

If a conventional algorithm is used to deal with this problem, the classification result will tend to be biased towards the majority class, resulting in the minority class being unable to be correctly identified
However, most traditional algori...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Classification method of unbalanced data
  • Classification method of unbalanced data
  • Classification method of unbalanced data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0037] see figure 1 As shown, the present invention provides a classification method for unbalanced data, wherein the classification method includes an active learning method and an oversampling method, the unbalanced data includes the first type of data and the second type of data, the first type of data and / or Or the second type of data includes labeled data and unlabeled data. In other words, labeled data may or may not exist in the first type of data, and labeled data may or may not exist in the second type of data. There may be no marked data, and marked data may exist in both the first type of data and the second type of data.

[0038] Unbalanced data refers to the unbalanced data category, that is, the amount of the first type of data is u...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for classifying unbalanced data, and belongs to the technical field of machine learning, the method comprises an active learning method and an oversampling method, the unbalanced data comprises marked data and unmarked data, and the method specifically comprises the following steps: preprocessing the marked data, and calculating distance features to obtain an initial training set; training the initial training set to obtain an initial classifier; calculating the uncertainty of the unmarked data by using the initial classifier; sorting the unmarked data according to the uncertainty, and manually marking the unmarked data to obtain a marked data set; performing probability oversampling on the marked data set to obtain a balanced data set; and training the balanced data set to obtain a classifier which is used for classifying unbalanced data. According to the unbalanced data classification method, active learning and an oversampling method are combined, so that the number of samples participating in training is reduced; meanwhile, it is guaranteed that the classifier has high classification precision for majority class data and minority class data.

Description

technical field [0001] The invention relates to a classification method of unbalanced data, which belongs to the field of machine learning. Background technique [0002] At present, the research on the problem of data imbalance is mainly carried out at the level of data preprocessing, feature level and classification algorithm level, so as to ensure that the classifier has high classification accuracy for both majority and minority class data. At the level of data preprocessing, the imbalance is reduced or eliminated by changing the sample distribution of the training set. The specific method is a series of undersampling and oversampling techniques; the imbalance of the number of samples at the feature level is often accompanied by the uneven distribution of feature attributes. Balance, use the feature selection method to select features with distinguishing characteristics, improve the classification accuracy of minority classes; at the level of classification algorithms, ac...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62
CPCG06F18/2415G06F18/214
Inventor 赵正旦章韵
Owner NANJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products