Stroke prognosis prediction method and system, computer device, and storage medium

By screening key features in stroke research datasets, constructing an XGBoost model and combining it with SHAP analysis, the problems of data acquisition and model generalization in existing stroke prognosis prediction methods are solved, enabling efficient and accurate prognosis prediction and personalized rehabilitation guidance for stroke patients.

WO2026137555A1PCT designated stage Publication Date: 2026-07-02SUN YAT SEN UNIV +1

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
SUN YAT SEN UNIV
Filing Date
2025-01-25
Publication Date
2026-07-02

Smart Images

  • Figure CN2025075031_02072026_PF_FP_ABST
    Figure CN2025075031_02072026_PF_FP_ABST
Patent Text Reader

Abstract

A stroke prognosis prediction method and system, a computer device, and a storage medium. The method comprises: acquiring feature data to be analyzed of a stroke patient, said feature data comprising an age, a gender, a place of birth, an mRS score at discharge, a stroke-related medical history feature, and a change feature of an NIHSS score between admittance and discharge; and inputting the data to be analyzed into a pre-constructed stroke prognosis prediction model for prognosis prediction to obtain a corresponding prognosis prediction result, the stroke prognosis prediction model being obtained by training on the basis of a data set having the same features as the feature data to be analyzed. By screening key data features that are highly acquirable and can also be used for accurate prediction of different post-stroke outcomes to train and construct a stroke prognosis prediction model, not only can the efficiency and accuracy of post-stroke outcome prediction be effectively improved, but the generalization and ease of use of the model can also be ensured, thereby providing reliable technical support for health management of stroke patients.
Need to check novelty before this filing date? Find Prior Art

Description

A method, system, computer device, and storage medium for predicting stroke prognosis. Technical Field

[0001] This invention relates to the field of stroke prognosis prediction technology, and in particular to a stroke prognosis prediction method, system, computer device and storage medium. Background Technology

[0002] Stroke is characterized by high incidence, high disability rate, high mortality rate, high recurrence rate, and a limited treatment window. It is the leading cause of death and disability worldwide, and its treatment costs impose a huge economic burden on countries, societies, and families. Timely and reliable stroke prognosis prediction is crucial for the health management of stroke patients.

[0003] Existing stroke prognostic prediction methods mainly include those based on traditional regression analysis and those based on machine learning to build stroke outcome prediction models for predicting stroke patient outcomes. However, existing stroke outcome prediction models have significant limitations and are difficult to promote effectively: 1) Model construction mainly relies on the imaging data or laboratory test results of stroke patients, and these indicators involve patient privacy and are poorly available in practical applications; 2) Different data features are needed to construct different stroke outcome prediction models for different stroke outcomes, failing to meet the convenience of predicting different stroke outcomes using the same features; 3) The area under the curve (ACU) of the model prediction is poor, failing to meet the needs of clinical application. Therefore, there is an urgent need to provide a highly reliable stroke prognostic prediction method that does not completely rely on imaging data or laboratory test results and meets the common features for predicting different stroke outcomes. Summary of the Invention

[0004] The purpose of this invention is to provide a stroke prognosis prediction method. By analyzing stroke research datasets, key data features are selected that are independent of imaging data or laboratory test results, have high data availability, and can be applied to accurately predict different post-stroke outcomes. Based on these key data features, an efficient, accurate, and risk-discriminating stroke prognosis prediction model is trained and constructed for use in predicting the prognosis of stroke patients. This method can effectively improve the efficiency and accuracy of stroke outcome prediction while ensuring the generalizability and ease of use of the prediction model, facilitating its widespread application and providing reliable technical support for the health management of stroke patients.

[0005] In order to achieve the above objectives, it is necessary to provide a stroke prognosis prediction method, system, computer device, and storage medium to address the aforementioned technical problems.

[0006] In a first aspect, embodiments of the present invention provide a method for predicting stroke prognosis, the method comprising the following steps:

[0007] Acquire the characteristic data to be analyzed of stroke patients; the characteristic data to be analyzed includes age, gender, place of origin, mRS score at discharge, stroke-related medical history characteristics, and NIHSS score change characteristics from admission to discharge;

[0008] The data to be analyzed is input into a pre-constructed stroke prognosis prediction model for prognosis prediction, and the corresponding prognosis prediction results are obtained; the stroke prognosis prediction model is trained based on a dataset with the same features as the data to be analyzed.

[0009] Furthermore, the stroke-related medical history characteristics include whether there is a history of transient ischemic attack (TIA), whether there is a history of diabetes, and whether there is a history of TIA; the changes in NIHSS scores upon admission and discharge include the total changes in NIHSS scores upon admission and discharge and the corresponding changes in several individual NIHSS scores upon admission and discharge.

[0010] Furthermore, the steps for constructing the stroke prognosis prediction model include:

[0011] Obtain a stroke research dataset; the stroke research dataset includes a first stroke patient dataset and a second stroke patient dataset; both the first stroke patient dataset and the second stroke patient dataset include age, gender, place of origin, mRS score at discharge, stroke-related medical history characteristics, total NIHSS score upon admission and several corresponding individual NIHSS scores upon admission, and total NIHSS score upon discharge and several corresponding individual NIHSS scores upon discharge.

[0012] After preprocessing the stroke research dataset, a first training set, a first external validation set, and a second external validation set are obtained.

[0013] Based on the first training set and the XGBoost model, key features for stroke prognosis prediction are extracted from the data features of the first training set.

[0014] Based on the key features for stroke prognosis prediction and the corresponding prognostic outcomes in the first training set, the first external validation set, and the second external validation set, respectively, the corresponding second training set, first validation set, and second validation set are obtained.

[0015] The XGBoost model is trained using the second training set, and the optimal parameter model is obtained through K-fold cross-validation to obtain the stroke prognosis prediction model. The stroke prognosis prediction model is then validated using a combination of homologous and non-homologous data based on the first and second validation sets to obtain the prediction model validation results.

[0016] Furthermore, the step of preprocessing the stroke patient data in the stroke research dataset to obtain the first training set, the first external validation set, and the second external validation set includes:

[0017] The total discharge NIHSS score in each stroke patient data set in the stroke research dataset is subtracted from the total admission NIHSS score to obtain the corresponding change in the total admission and discharge NIHSS score. The corresponding individual discharge NIHSS score is subtracted from the individual admission NIHSS score to obtain the corresponding change in the individual admission and discharge NIHSS score.

[0018] The changes in the total NIHSS score upon admission and discharge and the corresponding changes in individual NIHSS scores upon admission and discharge for each stroke patient in the stroke research dataset are added to the stroke research dataset to obtain the updated stroke research dataset.

[0019] The first stroke patient dataset portion of the updated stroke research dataset is divided into a first training set and a first external validation set, and a predetermined number of patient data are randomly selected from the second stroke patient dataset portion of the updated stroke research dataset to obtain a second external validation set.

[0020] Further, the step of extracting key features for stroke prognosis prediction from the data features of the first training set based on the first training set and the XGBoost model includes:

[0021] The XGBoost model is trained based on the first training set, and after training is completed, the corresponding feature importance ranking table is obtained.

[0022] Based on the feature importance ranking table, the first important features for predicting stroke prognosis are obtained; the first important features for predicting stroke prognosis include the changes in NIHSS scores upon admission and discharge and the mRS score at discharge.

[0023] Based on the feature accessibility suggestions of clinical analysis, a second important feature for predicting stroke prognosis is obtained from the data features of the first training set; the second important feature for predicting stroke prognosis includes age, gender, place of origin, and stroke-related medical history.

[0024] The first important feature for predicting stroke prognosis and the second important feature for predicting stroke prognosis are combined to obtain the key features for predicting stroke prognosis.

[0025] Further, the steps of training the XGBoost model based on the second training set, obtaining the optimal parameter model through K-fold cross-validation to obtain the stroke prognosis prediction model, and validating the stroke prognosis prediction model using a combination of homologous and non-homologous data based on the first and second validation sets to obtain the prediction model validation results include:

[0026] The XGBoost model is trained using the training set and the inner and outer K-fold cross-validation method. The optimal parameters of the XGBoost model are obtained through grid search, resulting in a stroke prognosis prediction model corresponding to the optimal parameters. The optimal parameters include the learning rate and the maximum depth.

[0027] The stroke prognosis prediction model is validated using the first validation set with source data to obtain the corresponding model source data validation results.

[0028] The stroke prognosis prediction model is validated using non-homogeneous data based on the second validation set to obtain the corresponding model heterogeneous data validation results.

[0029] Furthermore, the method also includes:

[0030] The SHAP attribution analysis method is used to obtain the feature importance ranking corresponding to the prognostic prediction results, and corresponding rehabilitation exercise guidance suggestions are generated based on the feature importance ranking.

[0031] Secondly, embodiments of the present invention provide a stroke prognosis prediction system, the system comprising:

[0032] The data acquisition module is used to acquire the characteristic data to be analyzed of stroke patients; the characteristic data to be analyzed includes age, gender, place of origin, mRS score at discharge, stroke-related medical history characteristics, and NIHSS score change characteristics from admission to discharge.

[0033] The prognosis prediction module is used to input the data to be analyzed into a pre-constructed stroke prognosis prediction model to perform prognosis prediction and obtain the corresponding prognosis prediction result; the stroke prognosis prediction model is trained based on a dataset with the same features as the data to be analyzed.

[0034] Thirdly, embodiments of the present invention also provide a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the above-described method.

[0035] Fourthly, embodiments of the present invention also provide a computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the above-described method.

[0036] The present invention provides a method, system, computer device, and storage medium for predicting stroke prognosis. The method acquires characteristic data of stroke patients, including age, gender, place of origin, mRS score at discharge, stroke-related medical history characteristics, and changes in NIHSS scores from admission to discharge. This data is then input into a stroke prognosis prediction model pre-trained on a dataset with the same characteristics as the target data for prognosis prediction, yielding the corresponding prognosis prediction result. Compared with existing technologies, this stroke prognosis prediction method, through research and analysis of stroke research datasets, selects key data features that are independent of imaging data or laboratory test results, have high data availability, and can be applied simultaneously to accurately predict different post-stroke outcomes. Based on these key data features, an efficient, accurate, and highly risk-discriminating stroke prognosis prediction model is trained for use in predicting the prognosis of stroke patients. This not only effectively improves the efficiency and accuracy of stroke outcome prediction but also ensures the generalizability and ease of use of the prediction model, facilitating widespread adoption and providing reliable technical support for the health management of stroke patients. Attached Figure Description

[0037] Figure 1 is a flowchart illustrating the stroke prognosis prediction method in an embodiment of the present invention;

[0038] Figure 2 is a schematic diagram of the feature data to be analyzed in an embodiment of the present invention;

[0039] Figure 3 is a schematic representation of the feature importance ranking for screening important features for predicting the first stroke prognosis in an embodiment of the present invention;

[0040] Figure 4 is a schematic diagram showing the comparison of prediction results between the DISCO model and the existing clinically commonly used RRE model in this embodiment of the invention.

[0041] Figure 5 is a schematic diagram of the DISCO model's effectiveness in stratifying relapse risk over 3 months on the second training set, the first validation set, and the second validation set in an embodiment of the present invention.

[0042] Figure 6 is another flowchart of the stroke prognosis prediction method in an embodiment of the present invention;

[0043] Figure 7 is a schematic diagram of the feature importance ranking results of the DISCO model obtained by SHAP attribution analysis in an embodiment of the present invention.

[0044] Figure 8 is a schematic diagram of the stroke prognosis prediction system in an embodiment of the present invention;

[0045] Figure 9 is another structural schematic diagram of the stroke prognosis prediction system in an embodiment of the present invention;

[0046] Figure 10 is an internal structural diagram of a computer device in an embodiment of the present invention. Detailed Implementation

[0047] To make the objectives, technical solutions, and beneficial effects of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings and embodiments. Obviously, the embodiments described below are only part of the embodiments of this invention and are used to illustrate the invention, but are not intended to limit the scope of the invention. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without creative effort are within the scope of protection of this invention.

[0048] In one embodiment, as shown in Figure 1, a stroke prognosis prediction method is provided, including the following steps:

[0049] S11. Obtain the characteristic data to be analyzed for stroke patients; wherein, the characteristic data to be analyzed can be understood as key features for predicting stroke prognosis obtained by analyzing existing stroke patient research datasets, and discharged patient data indicators obtained based on relevant report information at the time of admission and discharge of stroke patients, which can be used to analyze different prognostic outcomes simultaneously; in this embodiment, preferably, the characteristic data to be analyzed includes age, gender, place of origin, mRS (modified Rankin Score) score at discharge, stroke-related medical history characteristics, and changes in NIHSS scores at admission and discharge. Among them, stroke-related medical history characteristics can be understood as relevant medical history data that may affect the development of stroke. Specifically, the stroke-related medical history characteristics include whether there is a history of transient ischemic attack (TIA), whether there is a history of diabetes, and whether there is a history of TIA. In practical applications, the value of such indicators is 0 or 1, where 0 represents no and 1 represents yes, and can be obtained based on the patient's chief complaint or the information filled in on the registration form.

[0050] In this embodiment, the admission and discharge NIHSS (National Institutes of Health Stroke Scale) score change characteristics can be understood as the data change information obtained by comparing and analyzing the NIHSS score scale (National Institutes of Health Stroke Scale) obtained at the time of patient discharge and admission, that is, by calculating the difference between the data at discharge and the data at admission. Specifically, the admission and discharge NIHSS score change characteristics include the total NIHSS score change value at admission and discharge and the corresponding change values ​​of several individual NIHSS scores at admission and discharge. The number of individual NIHSS score change values ​​at admission and discharge can be selected according to the actual application needs. In order to ensure the comprehensiveness of the analysis, as shown in Figure 2, it can include the individual score change values ​​of 15 aspects of the NIHSS scale, such as consciousness, speech, visual field, facial movement, upper limb movement, lower limb movement, coordination, sensation, language comprehension and reading.

[0051] S12. Input the data to be analyzed into a pre-constructed stroke prognostic prediction model to perform prognostic prediction and obtain the corresponding prognostic prediction results. The stroke prognostic prediction model can, in principle, be constructed using existing machine learning model construction methods capable of prognostic prediction analysis. However, to ensure the efficiency, accuracy, and generalization of the predictive analysis, this embodiment preferably employs the following approach: first, analyze and screen key features for stroke prognostic prediction based on multiple existing classic stroke research datasets using the XGBoost algorithm; then, train and validate the XGBoost model based on the key features for prognostic prediction from different datasets to obtain the final required stroke prognostic prediction model. Specifically, the steps for constructing the stroke prognostic prediction model include:

[0052] Obtain a stroke research dataset; the stroke research dataset includes a first stroke patient dataset and a second stroke patient dataset; both the first stroke patient dataset and the second stroke patient dataset include age, gender, place of origin, mRS score at discharge, stroke-related medical history characteristics, total NIHSS score upon admission and several corresponding individual NIHSS scores upon admission, and total NIHSS score upon discharge and several corresponding individual NIHSS scores upon discharge. To ensure the reliability of the research data, the first stroke patient dataset in this embodiment is the ischemic stroke patient data after excluding transient ischemic attack (TIA) patients from the China National Stroke Registry Cohort III (CNSR-III, which included 15,166 patients; this is a nationwide stroke or TIA registry cohort involving 201 hospitals in China, with registration from August 2015 to March 22, 2018). The second stroke patient dataset is the ischemic stroke patient data after excluding TIA patients from the CHANCE-2 cohort study on the efficacy of clopidogrel combined with aspirin in the treatment of acute minor stroke or transient ischemic attack. It should be noted that the admission and discharge NIHSS individual scores in both the first and second stroke patient datasets are consistent with the NIHSS individual scores in the aforementioned feature data to be analyzed.

[0053] After preprocessing the stroke research dataset, a first training set, a first external validation set, and a second external validation set are obtained. The preprocessing mainly includes comparing and analyzing the total admission NIHSS score and corresponding individual admission NIHSS scores, as well as the total discharge NIHSS score and corresponding individual discharge NIHSS scores for each patient, to obtain the change values ​​of the total admission and discharge NIHSS scores and the corresponding individual admission and discharge NIHSS scores. These changes are then added to the current dataset to obtain a new dataset. Specifically, the steps of preprocessing the stroke patient data in the stroke research dataset to obtain the training set, the first external validation set, and the second external validation set include:

[0054] The total discharge NIHSS score in each stroke patient data set in the stroke research dataset is subtracted from the total admission NIHSS score to obtain the corresponding change in the total admission and discharge NIHSS score. The corresponding individual discharge NIHSS score is subtracted from the individual admission NIHSS score to obtain the corresponding change in the individual admission and discharge NIHSS score.

[0055] The changes in the total NIHSS score upon admission and discharge and the corresponding changes in individual NIHSS scores upon admission and discharge for each stroke patient in the stroke research dataset are added to the stroke research dataset to obtain the updated stroke research dataset. The updated stroke research dataset can be understood as the original stroke research dataset with the changes in the total NIHSS score upon admission and discharge and the changes in individual NIHSS scores upon admission and discharge obtained through the above analysis.

[0056] The updated stroke research dataset is divided into a first training set and a first external validation set. A predetermined number of patients are randomly selected from the second stroke patient dataset in the updated stroke research dataset to obtain a second external validation set. The ratio of the first training set to the first external validation set of the first stroke patient dataset can be set according to actual application needs. For example, if the first stroke patient dataset is the ischemic stroke patient dataset after removing TIA (transient ischemic attack) patient data from the China National Stroke Registry Cohort III (CNSR-III) data as mentioned above, then 11,813 patients from 169 hospitals in the CNSR-III cohort can be randomly selected to be assigned to the first training set, and 2,127 patients from another 40 independent hospitals can be assigned to the first external validation set. It should be noted that the preset number of the second external validation set can be determined according to the actual application requirements, and no specific limit is made here. For example, if the second stroke patient dataset is, as mentioned above, the ischemic stroke patient dataset after removing TIA patient data from the CHANCE-2 cohort data of the efficacy study of clopidogrel combined with aspirin in the treatment of acute minor stroke or transient ischemic attack, then 5158 ischemic stroke patients can be selected as the research subjects.

[0057] Based on the first training set and the XGBoost model, key features for stroke prognosis prediction are extracted from the data features of the first training set; wherein, key features for stroke prognosis prediction can be understood as data features that have a significant impact on the post-stroke outcome; specifically, the step of extracting key features for stroke prognosis prediction from the data features of the first training set based on the first training set and the XGBoost model includes:

[0058] The XGBoost model is trained based on the first training set, and after training, a corresponding feature importance ranking table is obtained. The feature importance ranking table can be obtained by using the feature importance analysis and ranking function built into the XGBoost algorithm after training the XGBoost model based on the first training set, or by analyzing the predictive contribution of each data feature in the first training set to the XGBoost model based on SHAP attribution analysis, or by combining the analysis results of the two methods to obtain the indicator with a relatively high importance ranking in the results of both methods. No specific limitation is made here.

[0059] According to the feature importance ranking table, the first important features for predicting stroke prognosis are obtained. The first important features for predicting stroke prognosis can be understood as the data features that rank relatively high in the feature importance ranking table. The specific number of features can be determined according to the actual application requirements. In order to improve the convenience and efficiency of data acquisition while ensuring the reliability of subsequent model training, this embodiment preferably selects two data features with particularly prominent importance. As shown in Figure 3, the NIHSS score change feature upon admission and discharge and the mRS score at discharge both rank in the top two in the feature importance ranking table obtained by the feature importance analysis ranking function of the XGBoost algorithm and the feature importance ranking table based on SHAP attribution analysis, and the corresponding scores are very high. Therefore, it can be determined that the first important features for predicting stroke prognosis include the NIHSS score change feature upon admission and discharge and the mRS score at discharge.

[0060] Based on the feature accessibility recommendations of clinical analysis, a second important feature for predicting stroke prognosis is obtained from the data features of the first training set. Here, the feature accessibility recommendations of clinical analysis can be understood as the results of surveys and consultations with relevant clinical experts on the impact indicators of post-stroke outcomes. Similarly, based on the convenience and efficiency of data acquisition, this embodiment preferably sets the second important feature for predicting stroke prognosis to include age, gender, place of origin (the patient's geographical location in China), and stroke-related medical history features. The stroke-related medical history features include whether there is a history of transient ischemic attack (TOAST classification), whether there is a history of diabetes, and whether there is a history of transient ischemic attack (TIA).

[0061] The first important feature for predicting stroke prognosis and the second important feature for predicting stroke prognosis are combined to obtain the key features for predicting stroke prognosis. The key features for predicting stroke prognosis preferably include age, gender, place of origin, mRS (modified Rankin Score) score at discharge, history of transient ischemic attack (TOAST classification), history of diabetes and history of transient ischemic attack (TIA), change in total NIHSS score at admission and discharge, and corresponding change in 15 individual NIHSS scores at admission and discharge.

[0062] Based on the key features for stroke prognosis prediction and the corresponding prognostic outcomes in the first training set, the first external validation set, and the second external validation set, respectively, a second training set, a first validation set, and a second validation set are obtained; that is, the second training set, the first validation set, and the second validation set are datasets constructed from the key features for stroke prognosis prediction and the corresponding prognostic outcomes selected from the first training set, the first external validation set, and the second external validation set, respectively.

[0063] The XGBoost model is trained using the second training set, and the optimal parameter model is obtained through K-fold cross-validation to obtain the stroke prognosis prediction model. The stroke prognosis prediction model is then validated using a combination of homologous and non-homologous data based on the first and second validation sets to obtain the prediction model validation results. Specifically, the steps of training the XGBoost model using the second training set, obtaining the optimal parameter model through K-fold cross-validation to obtain the stroke prognosis prediction model, and validating the stroke prognosis prediction model using a combination of homologous and non-homologous data based on the first and second validation sets to obtain the prediction model validation results include:

[0064] The XGBoost model is trained using the second training set and the inner and outer K-fold cross-validation method. The optimal parameters of the XGBoost model are obtained through grid search, resulting in a stroke prognosis prediction model corresponding to the optimal parameters. The optimal parameters include the learning rate and the maximum depth. The ratio of training data to validation data used in the K-fold cross-validation method can be set according to actual application requirements. In this embodiment, it is preferably set to ten-fold cross-validation.

[0065] To ensure the efficiency and reliability of model training, when using XGBoost Classifier for model fitting training based on the second training set and K-fold cross-validation, a grid search method was used to tune the learning rate (eta) and maximum depth. Precision, recall, F1 score (the harmonic mean of precision and recall, F1 score = [(2 * precision * recall) / (precision + recall)]), and AUC value of the ROC curve (the area under the subject's operating curve; an area under the curve of 0.5 indicates random judgment, a larger area is better, and the ideal value is 1) were obtained under different parameter combinations. Based on the AUC value of the ROC curve, the optimal parameters were selected, and the XGBoost model corresponding to the optimal parameters was used as the initial prognostic prediction model.

[0066] The stroke prognosis prediction model is validated using source data based on the first validation set to obtain the corresponding model source data validation results. The model source data validation results can be understood as an index value of the model prediction effect obtained based on the first validation set (which is from the same source as the training set of the initial prognosis prediction model). This index can be the mean square error of prediction (MSE) of the model.

[0067] The stroke prognosis prediction model is validated using non-homogeneous data based on the second validation set to obtain the corresponding model heterogeneous data validation results. The model heterogeneous data validation results can be understood as the index values ​​of the model prediction effect obtained based on the second validation set (which is different from the training set source of the initial prognosis prediction model). These index values ​​are consistent with the index selected in the above model homogeneous data validation results, or the model's mean square error (MSE) can be selected.

[0068] Furthermore, considering that in practical applications, the stroke prognosis prediction model trained on the second training set may perform poorly on the validation set, in order to improve the generalization ability of the prediction model as much as possible, this embodiment preferably performs parameter fine-tuning on the stroke prognosis prediction model based on the validation results of the model using homologous data and the validation results of the model using heterologous data. Specifically, the process of fine-tuning the parameters of the stroke prognosis prediction model includes: comparing the obtained model validation results with the corresponding validation thresholds for the same and different sources of data, respectively. If both the same and different sources of data validation results are less than the corresponding thresholds, the generalization ability of the obtained stroke prognosis prediction model is considered to be guaranteed, and it can be directly used as the final stroke prognosis prediction model. Conversely, if either the same or different sources of data validation results are greater than the corresponding thresholds, the generalization ability of the obtained stroke prognosis prediction model needs further improvement. Based on the second training set, the model parameters of the stroke prognosis prediction model are optimized and trained according to a preset loss function to obtain the final stroke prognosis prediction model. The preset loss function is expressed as follows:

[0069] In the formula, L represents the model prediction loss value; M1 and M2 represent the training sample subset and validation sample subset respectively in the K-fold cross-training of the second training set; n1 and n2 represent the number of samples in the training sample subset and validation sample subset respectively; y i,ture and y i,pre Let y represent the true value and predicted value of the i-th sample in the training sample subset M1, respectively; j,ture and y j,pre Let represent the true value and predicted value of the j-th sample in the validation sample subset, respectively; λ represents the weighting coefficient.

[0070] By following the above steps, a stroke prognosis prediction model with high accuracy and risk discrimination can be obtained by validating on different data sources. For ease of subsequent description, the obtained stroke prognosis prediction model is named DISCO (Delta-NIHSS based post-Stroke Composite Outcome) model.

[0071] It should be noted that the stroke outcome labels used in the construction of the above stroke prognostic prediction models vary depending on the prediction requirements: when predicting stroke recurrence, the corresponding recurrence index value is used as the training label; when predicting stroke disability, the corresponding mRS score variation index (mRS>2 = 1, mRS<=2 = 0) is used as the training label; when predicting stroke death, the corresponding mortality index value is used as the training label. Predictive models for different prognostic outcomes can all be constructed using the above methods. If the obtained stroke prognostic prediction model is for predicting stroke recurrence, the corresponding prediction result is the stroke recurrence probability value; if the obtained stroke prognostic prediction model is for predicting stroke death, the corresponding prediction result is the stroke death probability value; if the obtained stroke prognostic prediction model is for predicting stroke disability, the corresponding prediction result is the probability value of mRS>2.

[0072] This invention provides a technical solution for obtaining characteristic data of stroke patients, including age, gender, place of origin, mRS score at discharge, stroke-related medical history characteristics, and changes in NIHSS scores from admission to discharge. This data is then input into a stroke prognostic prediction model pre-trained on a dataset with the same characteristics as the target data for prognostic prediction, yielding corresponding prognostic prediction results. By analyzing stroke research datasets, key data features are selected that are independent of imaging data or laboratory test results, have high data availability, and can be applied simultaneously to accurately predict different post-stroke outcomes. Based on these key data features, an efficient, accurate, and risk-discriminating stroke prognostic prediction model is trained for use in predicting the prognostic outcomes of stroke patients. This not only effectively improves the efficiency and accuracy of stroke outcome prediction but also ensures the generalizability and ease of use of the prediction model, facilitating widespread adoption and providing reliable technical support for the health management of stroke patients.

[0073] To verify the application effect of the stroke prognosis prediction method provided by the present invention, this embodiment also uses the first stroke patient dataset constructed based on the data from the China National Stroke Registry Cohort III (CNSR-III) and the second stroke patient dataset constructed based on the data from the CHANCE-2 cohort study on the efficacy of clopidogrel combined with aspirin in the treatment of acute minor stroke or transient ischemic attack to obtain the first training set, the first external validation set and the second external validation set. Taking the prediction of the risk of recurrence in 3-month stroke patients as an example, the DISCO model is compared and analyzed with the existing clinically commonly used RRE model, and the analysis results shown in Figure 4 are obtained. As shown in Figure 4, the DISCO model provided by the present invention has a better prediction effect.

[0074] In addition, the predictive performance of the DISCO model for the three outcomes of relapse, death, and disability over 3 months was validated based on the second training set, the first validation set, and the second validation set, and the model prediction index results are shown in Table 1-3. At the same time, the stratification effect of the DISCO model on the risk of relapse over 3 months was also validated on the three datasets of the second training set, the first validation set, and the second validation set, and the results are shown in Figure 5.

[0075] Table 1. Predictive metrics of the DISCO model on the second training set.

[0076] Table 2. DISCO model prediction results on the first validation set.

[0077] Table 3. DISCO model prediction results on the second validation set.

[0078] Based on the performance of the DISCO model on the second training set in Table 1, it can be seen that the DISCO model has good predictive results for the most clinically relevant 3-month stroke recurrence event, with AUC=0.80, precision=0.98, recall=0.93, and F1=0.95. Moreover, the predictive effect of the DISCO model for death within 3 months is better than that for recurrence, with AUC>0.8, precision>0.99, recall>0.8, and F1>0.89. Although the predictive effect of the 3-month disability mRS is relatively weaker than that for recurrence, it can still ensure that the AUC value is between 0.7 and 0.8.

[0079] Similarly, based on the performance of the DISCO model on the first validation set in Table 2, we can see that DISCO has an AUC of 0.74, precision of 0.96, recall of 0.93, and F1 of 0.94 for 3-month stroke recurrence events.

[0080] Based on the performance of the DISCO model on the second validation set in Table 3, the DISCO model has an AUC of 0.80, precision of 0.96, recall of 0.93, and F1 score of 0.95 for predicting stroke recurrence within 3 months. Furthermore, in both validation sets, the DISCO model also outperforms recurrence in predicting death within 3 months, with AUC > 0.8, precision > 0.98, recall > 0.80, and F1 score > 0.88. Although its prediction of 3-month disability mRS is relatively weaker than that for recurrence, it still maintains an AUC of around 0.7.

[0081] As shown in Figure 5, the risk score of stroke occurrence in each person within 3 months was calculated based on the model. The relative risk (RR) of the highest risk group and the reference group were 33.6, 27.46 and 46.3, respectively. This assessment result shows that the DISCO model has a very good ability to identify high-risk groups for stroke recurrence within 3 months.

[0082] Furthermore, to further enhance the application value of the aforementioned stroke prognostic prediction model, this embodiment of the invention preferably provides personalized rehabilitation exercise guidance based on the importance ranking of specific characteristic data for each patient and professional advice from clinicians, in addition to the required prognostic outcome prediction using the characteristic data to be analyzed provided for discharged stroke patients. Specifically, as shown in Figure 6, the method further includes:

[0083] S13. Using SHAP attribution analysis, obtain the feature importance ranking corresponding to the prognostic prediction results, and generate corresponding rehabilitation exercise guidance suggestions based on the feature importance ranking. The feature importance ranking can be understood as the ranking of the contribution of each feature data to be analyzed provided by the stroke patient at discharge to the predicted outcome. After obtaining the feature importance ranking results of the DISCO model as shown in Figure 7 through SHAP attribution analysis, targeted rehabilitation exercise guidance suggestions can be given based on the features with the highest contribution ranking and combined with the experience of clinicians. Alternatively, the obtained feature importance ranking can be compared with a pre-built knowledge base of guidance suggestions based on different feature ranking results. To further improve the scientific rigor and intelligence of the generated rehabilitation exercise guidance suggestions, a reinforcement learning model can be constructed using the feature importance ranking results as state variables and the rehabilitation exercise guidance suggestions as corresponding action instructions. In practical applications, the obtained feature importance ranking is input into the trained reinforcement learning model to obtain the required rehabilitation exercise guidance suggestions.

[0084] This invention provides a technical solution that acquires characteristic data of stroke patients, including age, gender, place of origin, mRS score at discharge, stroke-related medical history characteristics, and changes in NIHSS scores from admission to discharge. This data is then input into a stroke prognosis prediction model pre-trained on a dataset with the same characteristics as the target data for prognosis prediction. The model obtains corresponding prognostic prediction results and uses SHAP attribution analysis to obtain a ranking of feature importance corresponding to the prognostic prediction results. Based on this ranking, corresponding rehabilitation guidance suggestions are generated. This approach not only effectively improves the efficiency and accuracy of stroke outcome prediction but also ensures the generalizability and usability of the prediction model, facilitating widespread adoption. Furthermore, it provides targeted rehabilitation guidance suggestions based on the prediction results, demonstrating significant health management guidance value and providing reliable technical support for the health management of stroke patients.

[0085] It should be noted that although the steps in the flowchart above are shown sequentially as indicated by the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless otherwise specified in this document, there is no strict order requirement for the execution of these steps, and they can be executed in other orders.

[0086] In one embodiment, as shown in Figure 8, a stroke prognosis prediction system is provided, the system comprising:

[0087] Data acquisition module 1 is used to acquire the characteristic data to be analyzed of stroke patients; the characteristic data to be analyzed includes age, gender, place of origin, mRS score at discharge, stroke-related medical history characteristics, and NIHSS score change characteristics upon admission and discharge.

[0088] The prognosis prediction module 2 is used to input the data to be analyzed into a pre-constructed stroke prognosis prediction model to perform prognosis prediction and obtain the corresponding prognosis prediction results.

[0089] In one embodiment, as shown in Figure 9, a stroke prognosis prediction system is provided, the system further comprising:

[0090] It is recommended that module 3 be used to obtain the feature importance ranking corresponding to the prognostic prediction results through SHAP attribution analysis, and generate corresponding rehabilitation guidance suggestions based on the feature importance ranking.

[0091] Specific limitations regarding the stroke prognosis prediction system can be found in the limitations of stroke prognosis prediction methods described above; the corresponding technical effects are equivalent and will not be repeated here. Each module in the aforementioned stroke prognosis prediction system can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in or independent of the processor in a computer device, or stored in the computer device's memory as software, allowing the processor to call and execute the corresponding operations of each module.

[0092] Figure 10 illustrates the internal structure of a computer device in one embodiment, which may specifically be a terminal or a server. As shown in Figure 10, the computer device includes a processor, memory, network interface, display, camera, and input device connected via a system bus. The processor provides computing and control capabilities. The memory includes a non-volatile storage medium and internal memory. The non-volatile storage medium stores the operating system and computer programs. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The network interface is used to communicate with external terminals via a network connection. When the computer program is executed by the processor, it implements a stroke prognosis prediction method. The display screen of the computer device may be a liquid crystal display (LCD) or an e-ink display. The input device may be a touch layer covering the display screen, buttons, a trackball, or a touchpad mounted on the computer device casing, or an external keyboard, touchpad, or mouse.

[0093] Those skilled in the art will understand that the structure shown in Figure 10 is merely a block diagram of a portion of the structure related to the present invention and does not constitute a limitation on the computer device to which the present invention is applied. Specific computing devices may include more or fewer components than those shown in the figure, or combine certain components, or have the same component arrangement.

[0094] In one embodiment, a computer device is provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the method described above.

[0095] In one embodiment, a computer-readable storage medium is provided having a computer program stored thereon, which, when executed by a processor, implements the steps of the above-described method.

[0096] In summary, the stroke prognosis prediction method, system, computer device, and storage medium provided by this invention have the following advantages: The stroke prognosis prediction method acquires the characteristic data to be analyzed of stroke patients, including age, gender, place of origin, mRS score at discharge, stroke-related medical history characteristics, and changes in NIHSS scores from admission to discharge. The data to be analyzed is input into a stroke prognosis prediction model pre-trained on a dataset with the same characteristics as the data to be analyzed, and corresponding prognosis prediction results are obtained. Furthermore, the importance ranking of the features corresponding to the prognosis prediction results is obtained through SHAP attribution analysis, and corresponding rehabilitation guidance technical solutions are generated based on the feature importance ranking. This method not only effectively improves the efficiency and accuracy of stroke outcome prediction but also ensures the generalizability and ease of use of the prediction model, facilitating widespread application. It can also provide targeted rehabilitation guidance suggestions based on the prediction results, possessing high significance for health management guidance and thus providing reliable technical support for the health management of stroke patients.

[0097] The various embodiments in this specification are described in a progressive manner. For directly identical or similar parts of the embodiments, refer to each other. Each embodiment focuses on its differences from other embodiments. In particular, the system embodiments are basically similar to the method embodiments, so the description is relatively simple; relevant parts can be referred to the descriptions in the method embodiments. It should be noted that the technical features of the above embodiments can be combined arbitrarily. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as the combination of these technical features does not contradict each other, it should be considered within the scope of this specification.

[0098] The above-described embodiments are merely preferred embodiments of the present invention, and while the descriptions are specific and detailed, they should not be construed as limiting the scope of the invention. It should be noted that those skilled in the art can make various improvements and substitutions without departing from the principles of the present invention, and these improvements and substitutions should also be considered within the scope of protection of the present invention. Therefore, the scope of protection of this invention should be determined by the scope of the claims.

Claims

1. A method for predicting stroke prognosis, characterized in that, The method includes the following steps: Acquire the characteristic data to be analyzed of stroke patients; the characteristic data to be analyzed includes age, gender, place of origin, mRS score at discharge, stroke-related medical history characteristics, and NIHSS score change characteristics from admission to discharge; The data to be analyzed is input into a pre-constructed stroke prognosis prediction model for prognosis prediction, and the corresponding prognosis prediction results are obtained; the stroke prognosis prediction model is trained based on a dataset with the same features as the data to be analyzed.

2. The stroke prognosis prediction method as described in claim 1, characterized in that, The stroke-related medical history features include whether there is a history of transient ischemic attack (TIA), whether there is a history of diabetes, and whether there is a history of TIA. The changes in NIHSS scores upon admission and discharge include the total changes in NIHSS scores upon admission and discharge and the corresponding changes in several individual NIHSS scores upon admission and discharge.

3. The stroke prognosis prediction method as described in claim 1, characterized in that, The steps for constructing the stroke prognosis prediction model include: Obtain a stroke research dataset; the stroke research dataset includes a first stroke patient dataset and a second stroke patient dataset; both the first stroke patient dataset and the second stroke patient dataset include age, gender, place of origin, mRS score at discharge, stroke-related medical history characteristics, total NIHSS score upon admission and several corresponding individual NIHSS scores upon admission, and total NIHSS score upon discharge and several corresponding individual NIHSS scores upon discharge. After preprocessing the stroke research dataset, a first training set, a first external validation set, and a second external validation set are obtained. Based on the first training set and the XGBoost model, key features for stroke prognosis prediction are extracted from the data features of the first training set. Based on the key features for stroke prognosis prediction and the corresponding prognostic outcomes in the first training set, the first external validation set, and the second external validation set, respectively, the corresponding second training set, first validation set, and second validation set are obtained. The XGBoost model is trained using the second training set, and the optimal parameter model is obtained through K-fold cross-validation to obtain the stroke prognosis prediction model. The stroke prognosis prediction model is then validated using a combination of homologous and non-homologous data based on the first and second validation sets to obtain the prediction model validation results.

4. The stroke prognosis prediction method as described in claim 3, characterized in that, The step of preprocessing the stroke patient data in the stroke research dataset to obtain the first training set, the first external validation set, and the second external validation set includes: The total discharge NIHSS score in each stroke patient data set in the stroke research dataset is subtracted from the total admission NIHSS score to obtain the corresponding change in the total admission and discharge NIHSS score. The corresponding individual discharge NIHSS score is subtracted from the individual admission NIHSS score to obtain the corresponding change in the individual admission and discharge NIHSS score. The changes in the total NIHSS score upon admission and discharge and the corresponding changes in individual NIHSS scores upon admission and discharge for each stroke patient in the stroke research dataset are added to the stroke research dataset to obtain the updated stroke research dataset. The first stroke patient dataset portion of the updated stroke research dataset is divided into a training set and a first external validation set, and a predetermined number of patient data are randomly selected from the second stroke patient dataset portion of the updated stroke research dataset to obtain a second external validation set.

5. The stroke prognosis prediction method as described in claim 3, characterized in that, The step of extracting key features for stroke prognosis prediction from the data features of the first training set based on the first training set and the XGBoost model includes: The XGBoost model is trained based on the first training set, and after training is completed, the corresponding feature importance ranking table is obtained. Based on the feature importance ranking table, the first important features for predicting stroke prognosis are obtained; the first important features for predicting stroke prognosis include the changes in NIHSS scores upon admission and discharge and the mRS score at discharge. Based on the feature accessibility suggestions of clinical analysis, a second important feature for predicting stroke prognosis is obtained from the data features of the first training set; the second important feature for predicting stroke prognosis includes age, gender, place of origin, and stroke-related medical history. The first important feature for predicting stroke prognosis and the second important feature for predicting stroke prognosis are combined to obtain the key features for predicting stroke prognosis.

6. The stroke prognosis prediction method as described in claim 3, characterized in that, The steps of training the XGBoost model based on the second training set, obtaining the optimal parameter model through K-fold cross-validation to obtain the stroke prognosis prediction model, and validating the stroke prognosis prediction model using a combination of homologous and non-homologous data based on the first and second validation sets to obtain the prediction model validation results include: The XGBoost model is trained using the second training set and the inner and outer K-fold cross-validation method. The optimal parameters of the XGBoost model are obtained through grid search, resulting in a stroke prognosis prediction model corresponding to the optimal parameters. The optimal parameters include the learning rate and the maximum depth. The stroke prognosis prediction model is validated using the first validation set with source data to obtain the corresponding model source data validation results. The stroke prognosis prediction model is validated using non-homogeneous data based on the second validation set to obtain the corresponding model heterogeneous data validation results.

7. The stroke prognosis prediction method as described in claim 1, characterized in that, The method further includes: The SHAP attribution analysis method is used to obtain the feature importance ranking corresponding to the prognostic prediction results, and corresponding rehabilitation exercise guidance suggestions are generated based on the feature importance ranking.

8. A stroke prognosis prediction system, characterized in that, The system includes: The data acquisition module is used to acquire the characteristic data to be analyzed of stroke patients; the characteristic data to be analyzed includes age, gender, place of origin, mRS score at discharge, stroke-related medical history characteristics, and NIHSS score change characteristics from admission to discharge. The prognosis prediction module is used to input the data to be analyzed into a pre-constructed stroke prognosis prediction model to perform prognosis prediction and obtain the corresponding prognosis prediction result; the stroke prognosis prediction model is trained based on a dataset with the same features as the data to be analyzed.

9. A computer device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the steps of the method according to any one of claims 1 to 7.

10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 7.