Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Efficient imbalanced data set classification method

A technology of balanced data and classification methods, applied in the field of data processing, can solve problems such as classification of unbalanced data sets, failure to consider data sets, and failure to give minority sample resources, etc., to achieve the effect of improving accuracy and recall

Inactive Publication Date: 2016-07-13
UNIV OF SCI & TECH OF CHINA
View PDF1 Cites 29 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although the traditional random forest algorithm guarantees randomness, it does not take into account the particularity of the unbalanced data set, and does not give more resources to the minority samples. When forming a decision forest, all trees are selected to be included in the voting tree, which is not conducive to unbalanced Classification of Datasets

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Efficient imbalanced data set classification method
  • Efficient imbalanced data set classification method
  • Efficient imbalanced data set classification method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0040] Embodiments of the present invention provide an efficient method for classifying unbalanced datasets, such as figure 1 As shown, it mainly includes the following steps:

[0041] Step 11. Based on the BSMOTE sampling technology, perform k-nearest neighbor and linear interpolation calculations on the majority class samples and minority class samples in the unbalanced data set, and combine several new minority class samples obtained through cal...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an efficient imbalanced data set classification method, which includes the steps of adding boundary samples and isolated points into consideration based on a traditional SMOTE method, so as to acquire an approximately balanced data set; and then making designs on the type of data based on a sub space selection and tree model scheme of ensemble random forests, to clearly classify two types of data. In this way, the accuracy and recall ratio are improved, the result is close to the actual situation, and the invention can be applied to the actual industry analysis.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to an efficient method for classifying unbalanced data sets. Background technique [0002] The classification problem is one of the most important problems in data analysis, and the actual data sets often have the problem of unbalanced number of categories. For unbalanced data sets, traditional classification methods such as decision trees, SVM, and Bayesian networks are less effective because traditional algorithms contain assumptions about sample balance. When the data set is unbalanced, the minority class samples are not easy to be identified because the multi-class samples occupy an absolute advantage in the classification model training process. Data classification problems are usually concerned with minority samples, such as the prediction of telecommunication customer churn in practical applications, abnormal detection of credit card transactions, network intrusion p...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62
CPCG06F18/241
Inventor 陈宗海曹璨王鹏
Owner UNIV OF SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products