Binary classification method for processing non-small cell lung cancer data with missing values and imbalance

A non-small cell lung cancer and missing value technology, which is applied in database models, relational databases, structured data retrieval, etc., can solve problems such as classification accuracy impact, imbalance, and medical data missing values, and achieve excellent classification accuracy , improving data quality, and the effect of accurate medical decision-making

Inactive Publication Date: 2020-02-21
KUNMING UNIV OF SCI & TECH
View PDF7 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Usually, there are missing values ​​and class imbalance problems in

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Binary classification method for processing non-small cell lung cancer data with missing values and imbalance
  • Binary classification method for processing non-small cell lung cancer data with missing values and imbalance
  • Binary classification method for processing non-small cell lung cancer data with missing values and imbalance

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0022] Embodiment 1: as Figure 1-4 As shown, a binary classification method for dealing with missing values ​​and unbalanced non-small cell lung cancer data includes the following steps: First, preprocess the data, fill the samples with a missing value ratio below 70% with the median, and delete For samples with more than 70% missing values, Tukey's method was used to remove outliers, and standardization was used to normalize the data; secondly, the SMOTEENN comprehensive sampling method combined with oversampling and undersampling was used for data balance to solve the problem. The problem of class imbalance in the data set; finally, the balanced data set is used to train a random forest classifier and test the classification effect on the test set, so as to effectively target non-small cell lung cancer with missing values ​​and class imbalance A Binary Classification Method for Survival Prediction.

[0023] The specific steps of the binary classification method for process...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a binary classification method for processing non-small cell lung cancer data with missing values and imbalance, and belongs to the technical field of data classification. Themethod comprises the following steps: preprocessing data, filling missing values with medians, eliminating abnormal values by using a Tukey's method, and normalizing the data by using deviation standardization; secondly, carrying out data balance by adopting an SMOTEENN comprehensive sampling method combining oversampling and undersampling; finally, the balanced data set is used for training a random forest classifier, the classification effect is tested on the test set, and therefore the non-small cell lung cancer survival prediction dichotomy method effectively aiming at the problems of missing values and class imbalance is achieved. Experiments performed on a non-small cell lung cancer data set prove the effectiveness and superiority of the method, the classification precision of the non-small cell lung cancer data with missing values and imbalance is improved, and more accurate medical decisions can be achieved.

Description

technical field [0001] The invention relates to a binary classification method for processing missing values ​​and unbalanced non-small cell lung cancer data, in particular to a method for data balance processing combined with median filling of missing values ​​and SMOTEENN comprehensive sampling, which belongs to the technical field of data classification. Background technique [0002] Lung cancer is a malignant tumor that has become one of the deadliest diseases in the world. Non-small cell lung cancer accounts for about 85% of the total lung cancer cases. Due to its high morbidity and mortality, non-small cell lung cancer accounts for a large part of medical expenditure and imposes a heavy burden on families and communities. Therefore, it is particularly important to more accurately predict cancer patient survival and make better clinical decisions in diagnosis and treatment, including the choice of treatment, timing of treatment, and subsequent visits, which can have an ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/28
CPCG06F16/285
Inventor 赵阳马磊张力
Owner KUNMING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products