Method for classifying unbalanced data sets

A data set and balanced technology, which is applied in the directions of instruments, character and pattern recognition, and computing models, can solve problems such as subsequent algorithm fusion and improvement, and achieve improved accuracy and classification accuracy, universal applicability, and improved classification performance. Effect

Pending Publication Date: 2020-04-10
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF2 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

That is to say, this method only focuses on the processing of data sets, and does not integrate and improve with subsequent algorithms, which also has certain limitations.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for classifying unbalanced data sets
  • Method for classifying unbalanced data sets
  • Method for classifying unbalanced data sets

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] When dealing with the classification of an imbalanced data set, finding and utilizing the relationship between the minority class and the majority class in the original data set can effectively help improve classification performance. At the same time, the selection of the classifier is also very important, and if the classification algorithm can be integrated with the information in the data set to jointly realize the classification of the unbalanced data set, the performance will be greatly improved and it will have good generalization. The present invention is based on the above ideas, and uses the familiar SMOTE and K nearest neighbor algorithms to process the original data to achieve oversampling of minority classes and undersampling of majority classes. Then two online random forests of the same size are trained as classifiers for the original data and the newly established data, and finally merged into a forest to test the test set.

[0044] Such as figure 1 Sho...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for classifying unbalanced data sets. The method is applied to the fields of network intrusion detection, animal age prediction, vehicle performance evaluation and thelike. For the problem that the classification precision of minority classes is low in the prior art is solved, according to the invention, on the basis of original training data, the relation betweenminority classes and majority classes in an original data set is utilized, a SMOTE and K nearest neighbor algorithm is utilized to process an original training data set to construct a new set, and the set focuses on the minority classes and the majority class samples related to the minority classes; according to the method, two random forests with the same size are constructed according to original training data and the new set, decision trees in the two forests are combined into a large forest, a test set is tested together to obtain a classification result, and the obtained classification precision is greatly improved compared with the prior art.

Description

technical field [0001] The invention belongs to the fields of network intrusion detection, animal age prediction, vehicle performance evaluation, etc., and particularly relates to a technology for classification of unbalanced data sets. Background technique [0002] Classification is one of the important research directions in the field of machine learning. After years of development, some relatively mature algorithms have been formed and have been successfully applied in practice. These traditional classification algorithms take classification accuracy as the greatest goal, and assume that the number of samples in each category in the data set is basically balanced. However, there is a situation in practical problems: a data set contains two types of data, and the number of samples contained in one type of data is far less than that of the other type of data. The former is called a minority. The latter class is called the majority class. Due to the difficulties encountere...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06N20/00G06K9/62
CPCG06N20/00G06F18/24323
Inventor 简玉琳叶茂闵艳
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products