Method for processing two-category unbalance medical data

A medical data and binary classification technology, applied in the field of data classification, can solve problems such as loss of useful information, achieve feature selection of accurate attributes, solve data imbalance problems, and have broad application prospects

Inactive Publication Date: 2018-09-07
KUNMING UNIV OF SCI & TECH
View PDF3 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Undersampling of multi-class samples may be blind, resulting in the loss of most useful information; while over-sampling of few-class samples adds new information to the samples, resulting in over-fitting

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for processing two-category unbalance medical data
  • Method for processing two-category unbalance medical data
  • Method for processing two-category unbalance medical data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0030] Embodiment 1: A method for processing two-category unbalanced medical data, the specific steps are as follows: first, preprocess the data, delete the original data set that has nothing to do with the subject of the classification, duplicate data, then smooth the noise data, and process outliers and missing value; secondly, integrate the data from different data sources, solve the problem of entity recognition and attribute redundancy, and standardize the data; and then use the ROSE method to unbalance the data, thus solving the imbalance problem of the two-category medical data .

[0031] The specific operation steps are as follows:

[0032] (1) Data cleaning: Preprocess the medical raw data sets from multiple data sources that need to be classified, delete the duplicate data irrelevant to the classification theme in the original data set, smooth the noise data, and then perform missing value processing. If the missing value of an attribute is greater than 30%, the att...

Embodiment 2

[0040] Embodiment 2: as Figure 1~6 As shown, the data in this embodiment uses UCI machine learning data of 10-year diabetic patient readmission data sets from 130 hospitals in the United States to deal with the imbalance of the original medical data. The specific steps are as follows:

[0041] (1) Data cleaning: Preprocess the 10-year medical raw data sets of 130 hospitals, delete the duplicate data irrelevant to the re-admission of diabetic patients in the original data set, smooth the noise data, and then perform missing value processing. If the missing value of an attribute is greater than 30%, the attribute will be deleted directly. If the missing value of an attribute is less than 30%, the missing value will be supplemented by Lagrangian interpolation method. The processing of outliers is also in accordance with The method of missing values ​​is carried out, and this embodiment shows the matrix diagram of real values ​​and missing values ​​by row, such as figure 2 As s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention relates to a method for processing two-category unbalance medical data, belonging to the technical field of data classification. The method comprises the steps of: performing preprocessing of data, preliminarily deleting irrelevant and repeated data in an original data set, smoothing noise data, and processing an abnormal value and a missing value; performing integration of data of different data sources, solving the entity identification and attribute redundancy problem, and performing normalizing processing of the data; and employing a ROSE method to perform unbalance processing of the data. On the basis of data preprocessing, the ROSE and Boruta algorithms are employed to improve the classification precision of the unbalance medical data and solve the problem thatthe classification correction rate of few categories of samples of medical unbalance data is low.

Description

technical field [0001] The present invention relates to a kind of method of processing binary classification unbalanced medical data, relate in particular to a kind of method combining ROSE data balance and Boruta algorithm to carry out feature selection and carry out data classification, belong to the technical field of data classification. Background technique [0002] At present, most of the classification algorithms generally assume that the proportion of different classes is balanced, but most of the data sets in our life are unbalanced, such as the click prediction of advertisements, the recommendation of products or the fraud detection of credit cards. There are certain extreme unbalanced classification phenomena in the data sets. For example, 1% of people are bad people and 99% are good people. The classification model that identifies good people and bad people will naturally classify all people as good people. At the same time, the accuracy of the model obtained by ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62G16H10/00
CPCG16H10/00G06F18/2415G06F18/214
Inventor 马磊杜国栋
Owner KUNMING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products