Check patentability & draft patents in minutes with Patsnap Eureka AI!

Deep learning-based balanced sampling and modeling method for unbalanced data

A deep learning and balanced technology, applied in the fields of informatics, medical informatics, character and pattern recognition, etc., can solve the problems of poor classification performance and inaccurate prediction models, so as to reduce workload, improve accuracy, and handle problems. effect of ability

Inactive Publication Date: 2018-11-30
TIANJIN UNIV
View PDF0 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] The technical problem to be solved by the present invention is to provide a balanced sampling and modeling method for unbalanced data based on deep learning, which can effectively solve the problem of poor classification performance for minority classes when ordinary classifiers process unbalanced medical data. Poor, which leads to the inaccurate problem of the established forecasting model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Deep learning-based balanced sampling and modeling method for unbalanced data
  • Deep learning-based balanced sampling and modeling method for unbalanced data
  • Deep learning-based balanced sampling and modeling method for unbalanced data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] The method for balanced sampling and modeling of unbalanced data based on deep learning of the present invention will be described in detail below in conjunction with the embodiments and drawings.

[0021] The balanced sampling and modeling method of unbalanced data based on deep learning of the present invention is composed of KE-SMOTE algorithm and AE-DBN algorithm. In the following steps, the first) step to the fifth) step are KE-SMOTE Algorithm, steps 6) to 7) are AE-DBN algorithms.

[0022] Such as figure 1 As shown, the balanced sampling and modeling method of unbalanced data based on deep learning of the present invention includes,

[0023] 1) Take out the majority class and minority class sample sets in the data set, and count them separately;

[0024] It includes preprocessing the original data first, including: removing vacant values, format processing, removing useless features, correcting illegal values, normalization and data transformation.

[0025] 2) ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a deep learning-based balanced sampling and modeling method for unbalanced data. The method comprises the steps of extracting majority class and minority class sample sets ina data set, and performing counting respectively; and carrying out multi-time K-Means clustering on the majority class sample set to obtain R clustering results; performing clustering fusion on the Rclustering results by adopting a clustering fusion algorithm based on an incidence matrix, thereby obtaining a new majority class sample set; performing over-sampling on the minority class sample set,thereby obtaining a new minority class sample set; combining the obtained new majority class sample set and new minority class sample set, thereby forming a new data set with class balance; extracting an abstract feature of the new data set with the class balance, and adding the abstract feature as a new feature into a feature set of the new data set with the class balance, thereby forming a newfeature set; and training a DBN model by adopting the obtained new feature set, thereby obtaining an optimal DBN model. The method overcomes the shortcomings in a single processing method, has a better processing capability, and has relatively high accuracy.

Description

technical field [0001] The invention relates to a balanced sampling and modeling method of unbalanced data. In particular, it relates to a balanced sampling and modeling method for unbalanced data based on deep learning. Background technique [0002] For the problem of data imbalance, the current research directions are mainly divided into two types: research on data imbalance methods based on the data level and research on data imbalance methods based on the algorithm level. [0003] The data imbalance problem at the data level mainly adopts two methods: oversampling and undersampling. Oversampling is to increase the number of minority class samples so that the total number is balanced with the majority class samples, and the corresponding undersampling is to reduce the number of samples, or to extract certain samples from them so that the number is balanced with the minority class. [0004] A commonly used oversampling method is Adaptive Synthetic Sampling Approach (ADAS...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62G16H50/20
CPCG16H50/20G06F18/23213G06F18/214
Inventor 喻梅邓锐徐天一赵满坤高洁赵永伟
Owner TIANJIN UNIV
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More