Cardiovascular disease prediction system and method based on machine learning and deep learning

By constructing a cardiovascular disease prediction model using machine learning and deep learning algorithms, the problems of difficult feature extraction and high false negative rate in existing technologies have been solved, achieving high-performance disease risk prediction and early intervention, and improving the model's medical interpretability.

CN122201744APending Publication Date: 2026-06-12ANHUI NORMAL UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
ANHUI NORMAL UNIV
Filing Date
2026-01-22
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing methods for predicting cardiovascular diseases struggle to extract features highly correlated with disease risk from complex physical examination data, and existing models have a high false negative rate, making it difficult to achieve high-performance early diagnosis and intervention.

Method used

By employing machine learning and deep learning algorithms, and through data preprocessing, feature engineering, model building and optimization modules, combined with recursive feature elimination, convolutional neural networks, Bayesian optimization and ensemble voting strategies, a cardiovascular disease prediction model is constructed to reduce the false negative rate and improve prediction accuracy.

🎯Benefits of technology

It significantly reduces the false negative rate, improves the accuracy of cardiovascular disease risk prediction, assists medical professionals in early intervention and treatment, and improves patients' quality of life.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122201744A_ABST
    Figure CN122201744A_ABST
Patent Text Reader

Abstract

The application provides a cardiovascular disease prediction system and method based on machine learning and deep learning, which comprises a data preprocessing module for preprocessing original cardiovascular disease related data sets; a feature engineering module for screening important features from the preprocessed data through a recursive feature elimination (RFE) algorithm and extracting deep learning features through a convolutional neural network (CNN) to generate combined features; a model construction and optimization module for training various machine learning algorithms and deep learning models based on the combined features, optimizing hyperparameters through Bayesian optimization, and fusing prediction results through an ensemble voting strategy to obtain an optimal prediction model; and a prediction analysis module for calling the optimal prediction model to predict disease risk and outputting prediction results and key feature contribution analysis based on an explainability analysis method. The application can accurately predict the risk of individuals suffering from cardiovascular diseases, assist early intervention and treatment, and improve the quality of life of patients.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of medical data analysis and artificial intelligence technology, specifically to a cardiovascular disease prediction system and method based on machine learning and deep learning. Background Technology

[0002] In the current field of cardiovascular disease prediction, a significant challenge lies in extracting features highly correlated with cardiovascular disease risk from complex physical examination data and constructing high-performance predictive models using machine learning and deep learning techniques to achieve early diagnosis and intervention. Furthermore, combining the predictive capabilities of multiple models through ensemble learning methods and further reducing the false negative rate through hyperparameter optimization remains a key research area.

[0003] Therefore, there is an urgent need to provide a method for building cardiovascular disease prediction models using machine learning and deep learning algorithms to solve the above problems. Summary of the Invention

[0004] The purpose of this invention is to provide a cardiovascular disease prediction system and method based on machine learning and deep learning, which can accurately predict an individual's risk of developing cardiovascular disease, assist medical professionals in early intervention and treatment, and improve patients' quality of life.

[0005] To achieve the above objectives, embodiments of the present invention provide a cardiovascular disease prediction system based on machine learning and deep learning, the system comprising: The data preprocessing module is used to perform null value processing, classification feature encoding, numerical feature standardization, and class imbalance processing on the original cardiovascular disease-related dataset. The feature engineering module, connected to the data preprocessing module, is used to filter important features from the preprocessed data through the recursive feature elimination (RFE) algorithm, extract deep learning features through the convolutional neural network (CNN), and fuse the important features with the deep learning features to generate combined features. The model building and optimization module connects to the feature engineering module. It is used to train multiple machine learning algorithms and deep learning models based on combined features, and to perform hyperparameter tuning using Bayesian optimization. It also integrates the prediction results of at least the XGBoost model and the random forest model through an ensemble voting strategy to obtain the optimal prediction model. The predictive analysis module connects to the model building and optimization module. It receives the data to be predicted, calls the optimal predictive model to predict disease risk, and outputs the prediction results and key feature contribution analysis based on interpretability analysis methods.

[0006] Preferably, in the data preprocessing module, the null value processing adopts the mean filling method, the classification feature encoding adopts the one-hot encoding, the numerical feature standardization adopts the Z-score standardization, and the class imbalance processing adopts the synthetic minority class oversampling (SMOTE) technique.

[0007] Preferably, the key features selected by the feature engineering module include: resting blood pressure, serum cholesterol, maximum heart rate, ST segment depression value, number of major blood vessels, type of chest pain, electrocardiogram results, and ST segment slope.

[0008] Preferably, the various machine learning algorithms in the model building and optimization module include XGBoost and Random Forest, and the deep learning model includes Convolutional Neural Network (CNN).

[0009] Preferably, the integrated voting strategy in the model building and optimization module is soft voting, and the final predicted probability is generated by the probability values ​​output by the weighted average XGBoost model and the random forest model.

[0010] On the other hand, the present invention provides a cardiovascular disease prediction method based on machine learning and deep learning, applied to the system described above, the method comprising: S1: Obtain the original cardiovascular disease-related dataset and perform data preprocessing; S2: Perform feature filtering and deep feature extraction on the preprocessed data, and generate combined features; S3: Based on combined features, construct and train multiple base models, use Bayesian optimization to perform preliminary hyperparameter tuning on each base model, and fuse the prediction results of at least two base models through ensemble voting. After cross-validation and multi-index evaluation, determine the optimal prediction model. S4: Fine-tune the parameters of the optimal prediction model and use interpretability analysis tools to analyze the model's prediction decision-making process, outputting prediction results and explanations of feature importance.

[0011] Preferably, S2 includes: S21: Use the Recursive Feature Emission (RFE) algorithm based on Random Forest to filter out a preset number of important features; S22: Input the preprocessed data into a one-dimensional convolutional neural network (CNN) and extract deep learning features from the fully connected layers of the network; S23: Combine important features with deep learning features to form high-dimensional combined features.

[0012] Preferably, the multi-metric evaluation in S3 includes accuracy, precision, recall, F1 score, ROC-AUC value, and confusion matrix.

[0013] Preferably, the parameters tuned by Bayesian optimization in S3 include the learning rate, maximum depth, number of iterations of the XGBoost model, and the maximum depth and number of trees of the random forest.

[0014] Preferably, the interpretability analysis methods in S4 include SHAP and LIME.

[0015] Through the above technical solution, this invention first obtains a raw dataset by integrating cardiovascular disease-related physical examination data. Then, it performs null value processing, one-hot encoding of categorical features, numerical feature standardization, and class imbalance handling on the dataset. Next, it uses recursive feature elimination to filter out important features, extracts deep learning features using a convolutional neural network (CNN), and combines the deep learning features with the original features to form a combined feature set. Then, based on the selected features and the combined features, it uses various machine learning algorithms and deep learning models for modeling. The model's performance is evaluated through cross-validation and performance evaluation metrics to identify the best-performing model. This includes initial hyperparameter tuning using Bayesian optimization and prediction using an ensemble voting model combined with XGBoost and random forest. Finally, the optimal model undergoes further parameter fine-tuning, and its medical interpretability is improved through interpretability analysis. Thus, this method can more accurately predict an individual's risk of cardiovascular disease, helping medical professionals to intervene and treat early, improving patients' quality of life. Furthermore, this method significantly reduces the false negative rate through ensemble voting models and hyperparameter optimization, providing a powerful tool for medical research. It enables cardiovascular disease risk analysis and prediction on large-scale datasets, promoting progress in disease prevention and management.

[0016] Other features and advantages of the embodiments of the present invention will be described in detail in the following detailed description section. Attached Figure Description

[0017] The accompanying drawings are provided to further illustrate embodiments of the present invention and form part of the specification. They are used together with the following detailed description to explain the embodiments of the present invention, but do not constitute a limitation thereof. In the drawings: Figure 1 This is a simplified flowchart of the cardiovascular disease prediction method based on machine learning and deep learning provided by the present invention. Figure 2 This is a detailed flowchart of the cardiovascular disease prediction method based on machine learning and deep learning provided by the present invention; Figure 3 This is a flowchart of recursive feature elimination according to an embodiment of the present invention. Detailed Implementation

[0018] The specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are for illustration and explanation only and are not intended to limit the scope of the present invention.

[0019] It should be noted that the acquisition, transmission, storage, use, and processing of data in the technical solution of this application all comply with relevant laws and regulations. In the embodiments of this application, certain existing industry solutions such as software, components, and models may be mentioned. These should be considered exemplary, intended only to illustrate the feasibility of implementing the technical solution of this application, and do not imply that the applicant has already used or necessarily used such solutions.

[0020] This invention provides a cardiovascular disease prediction system based on machine learning and deep learning, the system comprising: The data preprocessing module is used to perform null value processing, classification feature encoding, numerical feature standardization, and class imbalance processing on the original cardiovascular disease-related dataset. The feature engineering module, connected to the data preprocessing module, is used to filter important features from the preprocessed data through the recursive feature elimination (RFE) algorithm, extract deep learning features through the convolutional neural network (CNN), and fuse the important features with the deep learning features to generate combined features. The model building and optimization module connects to the feature engineering module. It is used to train multiple machine learning algorithms and deep learning models based on combined features, and to perform hyperparameter tuning using Bayesian optimization. It also integrates the prediction results of at least the XGBoost model and the random forest model through an ensemble voting strategy to obtain the optimal prediction model. The predictive analysis module connects to the model building and optimization module. It receives the data to be predicted, calls the optimal predictive model to predict disease risk, and outputs the prediction results and key feature contribution analysis based on interpretability analysis methods.

[0021] Specifically, in the data preprocessing module, the mean-filling method is used for handling null values, the one-hot encoding is used for class feature encoding, the Z-score standardization is used for numerical feature standardization, and the synthetic minority class oversampling (SMOTE) technique is used to handle class imbalance.

[0022] Meanwhile, in this embodiment, the important features selected by the feature engineering module include: resting blood pressure, serum cholesterol, maximum heart rate, ST segment depression value, number of major blood vessels, type of chest pain, electrocardiogram results, and ST segment slope.

[0023] The model building and optimization module includes various machine learning algorithms such as XGBoost and Random Forest, and deep learning models such as Convolutional Neural Networks (CNN).

[0024] Furthermore, in this embodiment, the preferred integrated voting strategy in the model building and optimization module is soft voting, and the final predicted probability is generated by the probability values ​​output by the weighted average XGBoost model and the random forest model.

[0025] See Figure 1 and Figure 2 In another aspect, the present invention provides a cardiovascular disease prediction method based on machine learning and deep learning, applied to the aforementioned system, the method comprising: S1: Obtain the original cardiovascular disease-related dataset and perform data preprocessing; S2: Perform feature filtering and deep feature extraction on the preprocessed data, and generate combined features; S3: Based on combined features, construct and train multiple base models, use Bayesian optimization to perform preliminary hyperparameter tuning on each base model, and fuse the prediction results of at least two base models through ensemble voting. After cross-validation and multi-index evaluation, determine the optimal prediction model. S4: Fine-tune the parameters of the optimal prediction model and use interpretability analysis tools to analyze the model's prediction decision-making process, outputting prediction results and explanations of feature importance.

[0026] Specifically, S2 above includes: S21: Use the Recursive Feature Emission (RFE) algorithm based on Random Forest to filter out a preset number of important features; S22: Input the preprocessed data into a one-dimensional convolutional neural network (CNN) and extract deep learning features from the fully connected layers of the network; S23: Combine important features with deep learning features to form high-dimensional combined features.

[0027] In this embodiment, the multi-index evaluation in S3 preferably includes accuracy, precision, recall, F1 score, ROC-AUC value, and confusion matrix.

[0028] Furthermore, the parameters for Bayesian optimization tuning in S3 above include the learning rate, maximum depth, number of iterations of the XGBoost model, and the maximum depth and number of trees of the random forest.

[0029] In addition, in this embodiment, the interpretability analysis methods in S4 above include SHAP and LIME.

[0030] According to the above technical solution, this invention first obtains a raw dataset by integrating cardiovascular disease-related physical examination data. Then, it performs null value processing, one-hot encoding of categorical features, standardization of numerical features, and class imbalance handling on the dataset. Next, it uses recursive feature elimination to filter out important features, extracts deep learning features through a convolutional neural network (CNN), and combines the deep learning features with the original features to form a combined feature. Then, based on the selected features and the combined features, it uses various machine learning algorithms and deep learning models for modeling. The model's performance is evaluated through cross-validation and performance evaluation metrics to identify the best-performing model. This includes initial hyperparameter tuning using Bayesian optimization and prediction using an ensemble voting model combined with XGBoost and random forest. Finally, the optimal model undergoes further parameter fine-tuning, and its medical interpretability is improved through interpretability analysis. Thus, this method can more accurately predict an individual's risk of cardiovascular disease, helping medical professionals to intervene and treat early, improving patients' quality of life. Furthermore, this method significantly reduces the false negative rate through ensemble voting models and hyperparameter optimization, providing a powerful tool for medical research. It enables cardiovascular disease risk analysis and prediction on large-scale datasets, promoting progress in disease prevention and management.

[0031] The following provides a specific embodiment to illustrate the present invention: Step 1: Integrate cardiovascular disease-related physical examination data to obtain a raw dataset containing 1000 samples, including features such as resting blood pressure, serum cholesterol, maximum heart rate, ST segment depression value, number of major blood vessels, chest pain type, electrocardiogram results, and ST segment slope.

[0032] Step 2: Perform null value processing, one-hot encoding of categorical features, and standardization of numerical features on the original dataset, while addressing class imbalance. Missing values ​​are filled using mean imputation. Categorical features (such as chest pain type, ECG results, and ST segment slope) are converted to numerical form (0 or 1) using one-hot encoding. For example, chest pain types are divided into 4 categories, generating features such as chestpain_1 and chestpain_2. Numerical features (such as resting blood pressure and maximum heart rate) are standardized using the following formula:

[0033] Where Z is the standardized value, and X is the value of the original data point. The average value of the dataset. denoted as the standard deviation of the dataset. The SMOTE method is then used to balance the class distribution.

[0034] Step 3: Use recursive feature elimination to filter out important features (see...) Figure 3Simultaneously, deep learning features are extracted using a convolutional neural network (CNN), and these deep learning features are combined with the original features to form a combined feature. The training model for recursive feature elimination is set to random forest, with the parameter n_features_to_select set to 10. The final selected features include resting blood pressure, serum cholesterol, maximum heart rate, ST segment depression value, number of major blood vessels, chest pain type, ECG results, and ST segment slope. The CNN model contains convolutional layers (filters=64, kernel_size=3), max pooling layers (pool_size=2), dropout layers (0.5), and fully connected layers (128 neurons). Deep learning features (128 dimensions) are extracted from the fully connected layers. The deep learning features are combined with the original features using a column concatenation method to form a combined feature (the dimension of the original feature dimension plus 128).

[0035] Step 4: Based on the features and combined features selected in Step 3, various machine learning algorithms and deep learning models are used for modeling. 10-fold cross-validation and performance evaluation metrics are used to evaluate the model's performance, identifying the best-performing model. In the experiment, XGBoost and Random Forest models are first trained separately, and Bayesian optimization is used for initial hyperparameter tuning. For XGBoost, the optimized parameters include colsample_bytree (range: 0.5-1.0), gamma (range: 0-0.5), learning_rate (range: 0.01-0.1), max_depth (range: 3-15), n_estimators (range: 100-300), reg_lambda (range: 0-1.0), and subsample (range: 0.5-1.0). After 45 iterations, the optimal parameters are: colsample_bytree=0.6677, gamma=0.2689, learning_rate=0.0685, max_depth=13, n_estimators=243, reg_lambda=0.0881, and subsample=0.8477. For the random forest, the optimized parameters include max_depth (range: 10-20), min_samples_leaf (range: 1-5), min_samples_split (range: 2-10), n_estimators (range: 100-200), and class_weight (option: 'balanced' or None). After 30 iterations, the optimal parameters are: class_weight='balanced', max_depth=17, min_samples_leaf=1, min_samples_split=3, n_estimators=132. Subsequently, an ensemble voting model is used to combine the prediction results of both. The ensemble voting model adopts a soft voting method, and the final prediction is made by weighted averaging the prediction probabilities of XGBoost and random forest. The specific process of the ensemble voting model is as follows: First, the optimized XGBoost and Random Forest models are used to predict the combined features to obtain the predicted probability of each sample; second, the weighted average probability is calculated, where the weights are determined by 10-fold cross-validation to balance the contributions of both; experimental results show that the combined feature model (combining original features and CNN features) performs best under the ensemble voting model, with an accuracy of 98%, an ROC-AUC of 0.9970, and a false negative of only 1, which is significantly better than the models that use only original features (accuracy 98%, ROC-AUC 0.9982, false negative 2) or CNN features (accuracy 96%, ROC-AUC 0.9960, false negative 2).

[0036] Step 5: Further fine-tuning of the parameters of the optimal model (i.e., the ensemble voting model) and improvement of its medical interpretability through interpretability analysis. Further fine-tuning mainly targets the weight ratio of XGBoost and Random Forest in the ensemble voting model, adjusting the weighting parameters of soft voting to further reduce the false negative rate. Parameter tuning uses Bayesian optimization methods, optimizing XGBoost and Random Forest separately. The specific process of Bayesian optimization is as follows: First, define the parameter range and prior distribution of each model, and construct the posterior distribution using Gaussian process regression; second, calculate the objective function (F1-Score of 10-fold cross-validation) based on prior knowledge and initial random sampling, and select the next set of hyperparameters for evaluation by obtaining the expected improvement (EI); finally, iteratively update the posterior distribution until convergence or the maximum number of iterations is reached. Interpretability analysis uses SHAP and LIME methods. The results show that slope_2, slope_3, and restingBP are key features affecting cardiovascular disease prediction, with slope_2 having a SHAP value of 0.1649, indicating its greatest contribution to prediction.

[0037] The above methods are used to integrate cardiovascular disease-related physical examination information. After data preprocessing, deep learning features are combined with original features, and multiple machine learning and deep learning algorithms are used for modeling. The model's performance is evaluated using a combination of Bayesian optimization and ensemble voting models with XGBoost and random forest predictions, along with multiple performance metrics. The optimal model is then further fine-tuned, and its medical interpretability is improved through interpretability analysis.

[0038] In step 2, missing values ​​are filled using the mean imputation method, the categorical features are numerically processed using the one-hot encoding method, the numerical features are standardized, and the SMOTE method is used to balance the class distribution.

[0039] In step 3, recursive feature elimination is used to filter out important features, deep learning features are extracted through convolutional neural networks (CNN), and the deep learning features are combined with the original features through column concatenation to form combined features.

[0040] In step 4, XGBoost uses gbtree as the boosting type; Random Forest uses Gini impurity as the splitting criterion; and the CNN model uses the Adam optimizer and the binary cross-entropy loss function. The ensemble voting model employs soft voting, making the final prediction by weighted averaging of the prediction probabilities of XGBoost and Random Forest. When evaluating the model's prediction results, a confusion matrix is ​​used for quantification. Performance metrics such as accuracy, precision, recall, F1-Score, and ROC-AUC are calculated using the confusion matrix to comprehensively evaluate the model's performance. The confusion matrix of the combined feature model under the ensemble voting model is as follows:

[0041] The false negative rate is only 1, which is suitable for the high requirements of medical scenarios to prevent missed diagnoses.

[0042] In step 5, the optimal model is further fine-tuned in terms of parameters, and interpretability analysis is performed using SHAP and LIME methods. SHAP analysis shows that slope_2 (SHAP value 0.1649), slope_3 (SHAP value 0.1117), and restingBP (SHAP value 0.0911) are key features, and LIME analysis further verifies the contribution of these features to individual predictions.

[0043] In summary, this invention utilizes machine learning and deep learning algorithms to build a cardiovascular disease prediction model. This method integrates multiple techniques to screen for relevant factors associated with cardiovascular disease. One-hot encoding, feature combination, Bayesian optimization, and ensemble voting models enhance the model's predictive ability. This approach can be used to explore new risk factors, treatment methods, and prevention strategies for cardiovascular disease, thereby advancing medical research. Furthermore, a comprehensive evaluation of multiple experimental indicators of the cardiovascular disease prediction model allows for a thorough comparison of the predictive capabilities of various models. In addition, further parameter fine-tuning and interpretability analysis enhance the medical interpretability of the method, further improving the model's predictive ability on this dataset.

[0044] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0045] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart... Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0046] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0047] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0048] In a typical configuration, a computing device includes one or more processors (CPU), input / output interfaces, network interfaces, and memory.

[0049] Memory may include non-persistent memory in computer-readable media, such as random access memory (RAM) and / or non-volatile memory, like read-only memory (ROM) or flash RAM. Memory is an example of computer-readable media.

[0050] Computer-readable media includes both permanent and non-permanent, removable and non-removable media that can store information using any method or technology. Information can be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transferable medium that can be used to store information accessible by a computing device. As defined herein, computer-readable media does not include transient computer-readable media, such as modulated data signals and carrier waves.

[0051] It should also be noted that the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such process, method, article, or apparatus. Unless otherwise specified, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes that element.

[0052] The above are merely embodiments of this application and are not intended to limit the scope of this application. Various modifications and variations can be made to this application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the scope of the claims of this application.

Claims

1. A cardiovascular disease prediction system based on machine learning and deep learning, characterized in that, The system includes: The data preprocessing module is used to perform null value processing, classification feature encoding, numerical feature standardization, and class imbalance processing on the original cardiovascular disease-related dataset. The feature engineering module, connected to the data preprocessing module, is used to filter important features from the preprocessed data using the recursive feature elimination (RFE) algorithm, extract deep learning features using a convolutional neural network (CNN), and fuse the important features with the deep learning features to generate combined features. The model building and optimization module, connected to the feature engineering module, is used to train the model based on the combined features using multiple machine learning algorithms and deep learning models, and to perform hyperparameter tuning using Bayesian optimization. The prediction results of at least the XGBoost model and the random forest model are fused through an ensemble voting strategy to obtain the optimal prediction model. The predictive analysis module, connected to the model building and optimization module, is used to receive the data to be predicted, call the optimal predictive model to predict disease risk, and output the prediction results and key feature contribution analysis based on the interpretability analysis method.

2. The cardiovascular disease prediction system based on machine learning and deep learning according to claim 1, characterized in that, In the data preprocessing module, the null value processing adopts the mean filling method, the classification feature encoding adopts one-hot encoding, the numerical feature standardization adopts Z-score standardization, and the class imbalance processing adopts the synthetic minority class oversampling (SMOTE) technique.

3. The cardiovascular disease prediction system based on machine learning and deep learning according to claim 1, characterized in that, The key features selected by the feature engineering module include: resting blood pressure, serum cholesterol, maximum heart rate, ST segment depression value, number of major blood vessels, type of chest pain, electrocardiogram results, and ST segment slope.

4. The cardiovascular disease prediction system based on machine learning and deep learning according to claim 1, characterized in that, The model building and optimization module includes various machine learning algorithms such as XGBoost and Random Forest, and deep learning models such as Convolutional Neural Networks (CNN).

5. The cardiovascular disease prediction system based on machine learning and deep learning according to claim 4, characterized in that, The integrated voting strategy in the model building and optimization module is soft voting, and the final predicted probability is generated by the probability values ​​output by the weighted average XGBoost model and the random forest model.

6. A cardiovascular disease prediction method based on machine learning and deep learning, applied to the system as described in any one of claims 1-5, characterized in that, The method includes: S1: Obtain the original cardiovascular disease-related dataset and perform data preprocessing; S2: Perform feature filtering and deep feature extraction on the preprocessed data, and generate combined features; S3: Based on the combined features, construct and train multiple base models, use Bayesian optimization to perform preliminary hyperparameter tuning on each base model, and fuse the prediction results of at least two base models through ensemble voting. After cross-validation and multi-index evaluation, determine the optimal prediction model. S4: Fine-tune the parameters of the optimal prediction model, and use interpretability analysis tools to analyze the prediction decision process of the model, outputting the prediction results and explanations of feature importance.

7. The cardiovascular disease prediction method based on machine learning and deep learning according to claim 6, characterized in that, S2 includes: S21: Use the Recursive Feature Emission (RFE) algorithm based on Random Forest to filter out a preset number of important features; S22: Input the preprocessed data into a one-dimensional convolutional neural network (CNN) and extract deep learning features from the fully connected layers of the network; S23: Concatenate the important features with the deep learning features to form a high-dimensional combined feature.

8. The cardiovascular disease prediction method based on machine learning and deep learning according to claim 6, characterized in that, The multi-metric evaluation in S3 includes accuracy, precision, recall, F1 score, ROC-AUC value, and confusion matrix.

9. The cardiovascular disease prediction method based on machine learning and deep learning according to claim 6 or 8, characterized in that, The parameters for Bayesian optimization tuning in S3 include the learning rate, maximum depth, and number of iterations of the XGBoost model, and the maximum depth and number of trees of the random forest.

10. The cardiovascular disease prediction method based on machine learning and deep learning according to claim 6, characterized in that, The interpretability analysis methods in S4 include SHAP and LIME.