Unbalanced ensemble classification method based on data partition hybrid sampling

A mixed sampling and data partitioning technology, applied in the field of machine learning, can solve problems such as the imbalanced classification of positive and negative samples

Inactive Publication Date: 2020-05-01
BEIJING UNIV OF POSTS & TELECOMM
View PDF0 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In view of this, the embodiment of the present invention proposes an unbalanced integrated classification method based on mixed sampling of data partitions, which can effectively solve the classification problem of unbalanced positive and negative samples, and generate different classification models by adjusting the data distribution to improve the unbalanced problem. Classification performance, improve the comprehensive performance of the classification model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Unbalanced ensemble classification method based on data partition hybrid sampling
  • Unbalanced ensemble classification method based on data partition hybrid sampling
  • Unbalanced ensemble classification method based on data partition hybrid sampling

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0016] In order to better understand the technical solutions of the present invention, the embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0017]It should be clear that the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0018] The embodiment of the present invention provides an unbalanced integrated classification method based on mixed sampling of data partitions, please refer to figure 1 , which is a schematic flow chart of the unbalanced integrated classification method based on mixed sampling of data partitions proposed by the embodiment of the present invention, as shown in figure 1 As shown, the method includes the following steps:

[0019]...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention provides an unbalanced ensemble classification method based on data partition mixed sampling. The method comprises the following steps: dividing a sample space into four regions according to majority class proportions in minority class neighborhoods; generating a weight according to the ratio of the majority class ratio of each minority class neighborhood to the sumof the majority class ratios, the minority class safety regions, the boundary regions and the minority class noise regions, determining the synthesis number of each minority class neighborhood according to the weight, and performing oversampling on the minority classes of the boundary regions in a random linear interpolation mode; random under-sampling is carried out on the majority class of safety regions, a few class of noise region samples are removed, a few class of safety region samples are reserved, and a balance data set is generated; and constructing three ensemble learning models: anoriginal model biased to majority classes, a local domain reinforcement and weakening model and a hybrid model biased to peripheral boundaries, and adaptively selecting a corresponding model according to the unbalance degree of test point neighbors placed in an original data set.

Description

【Technical field】 [0001] The invention relates to a classification method for solving category imbalance in the field of machine learning, in particular to an unbalanced integrated classification method based on mixed sampling of data partitions. 【Background technique】 [0002] Classification has become a hot topic in machine learning, and it plays the role of data analysis and prediction in many application fields. For class imbalance, the data distribution of each class is uneven, and one or several classes (minority class) samples contain a small amount of data while other classes (majority class) samples contain a large amount of data. Given a training set with imbalanced categories, it is difficult to train an effective classification model, which is called an imbalanced classification problem. Many methods have been proposed to solve such problems, mainly divided into data-level methods, algorithm-level methods, and methods combining data processing and algorithms. T...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06N20/20G06K9/62
CPCG06N20/20G06F18/2431G06F18/214
Inventor 高欣任昺何杨李康生井潇纪维佳查森王锋
Owner BEIJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products