Heart disease prediction method based on dual feature selection and XGBoost algorithm

A feature selection and prediction method technology, applied in the field of medical data analysis, can solve problems such as easy overfitting, inability to handle missing values, single base classifier selection, etc., and achieve the effect of overcoming the lack of accuracy

Active Publication Date: 2020-06-05
HEBEI UNIV OF TECH
View PDF10 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the GBDT algorithm only samples the classification and regression tree (CART) as the base classifier. Compared with the XGBoost algorithm, the base classifier is single, and it cannot handle missing values, which is easy to overfit.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Heart disease prediction method based on dual feature selection and XGBoost algorithm
  • Heart disease prediction method based on dual feature selection and XGBoost algorithm
  • Heart disease prediction method based on dual feature selection and XGBoost algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0075] A kind of heart disease prediction method based on double feature selection and XGBoost algorithm of the present embodiment, concrete steps are as follows:

[0076] The first step is to preprocess the open source heart disease data set to obtain a sample data set D with a size of N;

[0077] The detailed process of data preprocessing is that the original heart disease dataset will have problems such as missing data, abnormal data, and multiple categories of a certain feature. It is necessary to fill in missing data, delete abnormal data, and multi-category data for the original data. Ordinal mapping or one-hot encoding and normalization of data.

[0078] The above standardization refers to setting the mean value of the feature column to 0 and the variance to 1, so that the value of the feature is in a standard normal distribution. The standardized formula is:

[0079]

[0080] In the above formula (2), μ x and σ x are the mean and standard deviation of a feature ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a heart disease prediction method based on dual feature selection and an XGBoost algorithm. The method comprises the following steps that original data is processed, the processed data is subjected to a random forest algorithm and feature correlation analysis, feature indexes are calculated according to importance ranking of features, correlation among features and correlation between the features and sample tags, the features are selected for model training, and thus, the defects that existing heart disease prediction needs too many features and is poor in accuracy areovercome.

Description

technical field [0001] The invention belongs to the technical field of medical data analysis, in particular to a heart disease prediction method based on dual feature selection and XGBoost algorithm. Background technique [0002] Heart disease is a common and serious cardiovascular disease in life. Cardiovascular disease is one of the biggest threats to people's health in my country and even in the world. This disease has brought a serious burden to the medical system in our country. The "Global Burden of Disease Report 2013" published by the famous journal "The Lancet" evaluated the death of patients in 190 countries between 1990 and 2013. It pointed out that coronary heart disease, chronic lung disease, and sudden brain death are the three most common diseases in Chinese people. The mortality rate was as high as 46% in that year, and this number is still increasing. Based on existing medical data, we can train heart disease prediction models to provide health guidance fo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G16H50/70G16H50/30G06K9/62
CPCG16H50/70G16H50/30G06F18/24323
Inventor 孙昊崔子超
Owner HEBEI UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products