A method for constructing a stroke recurrence risk assessment model

By constructing a stroke recurrence risk assessment model based on Logistic and Cox proportional hazards regression models, the problem of accuracy in assessing the recurrence risk of patients with atrial fibrillation and cerebral infarction was solved, enabling accurate prediction of recurrence risk and optimization of treatment plans.

CN122201745APending Publication Date: 2026-06-12THE FIRST AFFILIATED HOSPITAL OF GUANGXI MEDICAL UNIVERSITY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
THE FIRST AFFILIATED HOSPITAL OF GUANGXI MEDICAL UNIVERSITY
Filing Date
2026-01-23
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Current technologies lack effective models for assessing the risk of stroke recurrence, especially for patients with atrial fibrillation and cerebral infarction, making it impossible to accurately assess their recurrence risk and resulting in a lack of suitable tools for clinical risk prediction.

Method used

A risk assessment model for stroke recurrence was constructed. Independent risk factors were screened using a logistic regression model, and a risk prediction model was established using a Cox proportional hazards regression model. A nomogram model was then used for visualization scoring to predict the patient's survival probability and risk.

🎯Benefits of technology

It significantly improves the accuracy and specificity of assessing the risk of stroke recurrence, can screen out key factors affecting recurrence, guide the optimization of anticoagulation/antiplatelet therapy regimens, and reduce the recurrence rate, disability and mortality rate of stroke.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122201745A_ABST
    Figure CN122201745A_ABST
Patent Text Reader

Abstract

The application belongs to the field of biomedicine, and particularly relates to a construction method of a stroke recurrence risk assessment model, which comprises the following steps: determining a standard population, selecting atrial fibrillation patients combined with cerebral infarction; determining a sample size, estimating the sample size by using a Logistic regression model; formulating a follow-up plan, formulating a follow-up cycle and content; data management, inputting follow-up data and analysis; statistical analysis, performing single-factor Logistic regression analysis on variables and stroke recurrence correlation, and screening P value <0.1 variables; multi-factor regression, taking stroke recurrence within 3 years as a dependent variable, inputting single-factor variables and confounding factors, and adopting a stepwise forward method to construct a model; and risk prediction, further obtaining a risk function and a survival function based on Cox regression, and using the risk function and the survival function to predict survival probability, improve recurrence risk assessment accuracy, and reduce stroke recurrence and mortality.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of biomedicine, specifically a method for constructing a risk assessment model for stroke recurrence. Background Technology

[0002] Stroke is the leading cause of death in my country, and its high incidence, high disability rate and high recurrence rate have brought a heavy disease burden.

[0003] Atrial fibrillation is a major contributing factor to cerebral infarction, and it is associated with about one-third of ischemic strokes. These patients not only have large infarct areas, are prone to hemorrhagic transformation, and have high mortality and disability rates, but also have a recurrence rate of 5%-8% within 2 weeks, and the risk of long-term recurrence is significantly higher than that of non-atrial fibrillation patients.

[0004] However, there is currently a lack of large-sample studies on the risk factors, recurrence patterns, and long-term prognosis of patients with atrial fibrillation and cerebral infarction, as well as a lack of effective risk assessment models to evaluate the risk of stroke recurrence in these patients. Summary of the Invention

[0005] The purpose of this invention is to overcome the above-mentioned shortcomings and provide a method for constructing a stroke recurrence risk assessment model.

[0006] To address the aforementioned technical problems, this invention provides the following technical solution: a method for constructing a stroke recurrence risk assessment model, comprising the following steps: Standard population determination steps: Select patients with atrial fibrillation and cerebral infarction who meet the criteria; Sample size determination steps: Use the Logistic regression model to estimate the sample size required for independent risk factors; Follow-up plan development steps: Determine the follow-up cycle and follow-up content; Data management and statistical analysis steps: Based on the follow-up period and content, the follow-up data is entered and archived, and statistical analysis is performed on the follow-up data; the statistical analysis includes inferential statistical analysis, which includes: Univariate analysis steps: Logistic regression model was used to analyze the association between each variable and stroke recurrence, calculate the OR value (odds ratio) and 95% confidence interval, and screen variables with P-value < 0.1 to enter the multivariate regression model; Construction of the multivariate regression model: With "stroke recurrence within 3 years" as the dependent variable, variables screened by univariate analysis and clinically important confounding factors were included, and the final model was constructed using the stepwise forward method; Steps for building a risk prediction model: Based on multi-factor regression coefficients, construct a nomogram model and transform the weights of each variable into a visual scoring system; The nomogram model is based on a constructed Cox proportional hazards regression model and is used to predict the patient's survival probability. The Cox proportional hazards regression model includes a hazard function and a survival function.

[0007] The risk function is used to quantify the instantaneous event risk of an individual at time t, and the survival function is used to convert the risk into the probability that the individual survives to time t.

[0008] Preferably, the risk function formula is: h(t|X) = h0(t)exp[β1·rcs(age, 3) + β2·sex + β3·history of diabetes + β4·history of hypertension + β5·history of hyperlipidemia + β6·carotid and cerebral arteriosclerosis + β7·anticoagulant drugs + β8·lipid-lowering drugs + β9·ejection fraction + β 10 Hemoglobin + β 11 Platelet count + β 12 Fasting blood glucose + β 13 Direct bilirubin + β 14 Total protein + β 15 • Bicarbonate + β 16 Creatinine clearance + β 17 • Retinol-binding protein + β 18 High-sensitivity C-reactive protein + β 19 [Drinking habits]; In the formula, h0(t) represents the baseline risk function, exp represents the risk ratio, and β1 ~ β 19 This represents the regression coefficient.

[0009] Preferably, the survival function formula is: S(t|X) = S0(t)^exp(β'X); In the formula, S0(t) represents the baseline survival function, and exp(β'X) represents the combined hazard ratio.

[0010] Preferably, the logic of the nomogram model is as follows: based on the Cox proportional hazards regression model, define the survival probability calculation method for multiple key time points, hide intermediate indicators with no clinical significance, and set survival probability scales and labels.

[0011] Preferably, the statistical analysis further includes: data preprocessing and variable definition, descriptive statistical analysis, model validation and sensitivity analysis, and survival analysis.

[0012] Preferably, the data preprocessing and variable definition include: importing data into Excel using a double entry method, performing logical verification, and handling missing values; using multiple imputation for missing continuous variables, classifying missing categorical variables as unexposed or nonexistent, standardizing continuous variables, converting categorical variables to dummy variables, and maintaining the original level for ordinal variables.

[0013] Preferably, the descriptive statistical analysis includes: baseline characteristics: normally distributed continuous variables are represented by mean ± standard deviation, and non-normally distributed variables are represented by median; categorical variables are represented by number of cases, and inter-group comparisons are performed using chi-square test or Fisher's exact test; comparisons are made by grouping according to stroke recurrence status to screen relevant factors.

[0014] Preferably, the model validation and sensitivity analysis include: Internal validation steps: The semi-cross-validation method was used to repeatedly calculate the area under the mean curve and the stability index. The Bootstrap resampling method was used to evaluate the robustness of the model parameters. Subgroup analysis steps: Stratify the analysis by gender, age, atrial fibrillation type, and anticoagulant type, and compare the predictive efficacy of the models in each subgroup; Sensitivity analysis steps: exclude lost-to-follow-up cases and remodel to assess the impact of lost-to-follow-up on the results; remove variables with missing values ​​>10% to verify model stability.

[0015] Preferred, The survival analysis included: analyzing relapse time, using a Cox proportional hazards regression model to calculate the hazard ratio and 95% CI, plotting Kaplan-Meier survival curves, and comparing differences between groups using the Log-rank test.

[0016] Preferably, in the follow-up plan development step, the follow-up period is three years, and the number of follow-ups is multiple. Compared with the prior art, the beneficial effects achieved by the present invention are: The method for constructing a stroke recurrence risk assessment model provided by this invention can significantly improve the accuracy and specificity of patient recurrence risk assessment by analyzing the stroke recurrence rate, recurrence pattern and independent risk factors and establishing a risk assessment model, effectively solving the pain point of lacking suitable tools for clinical risk prediction.

[0017] This model can identify key factors affecting recurrence and construct a nomogram risk prediction model based on independent risk factors, thereby guiding the optimization of anticoagulation / antiplatelet therapy regimens and helping to reduce the recurrence rate, disability and mortality of stroke.

[0018] Explanation of reference numerals in the attached figures To more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0019] Figure 1 This is a schematic diagram illustrating the construction and validation of a recurrent ischemic stroke (IS) risk prediction model using LASSO regression with minimum absolute contraction and selection operator in an embodiment of the present invention. (B) LASSO regression analysis identified key predictors of recurrent IS. (C) ROC curve (receiver operating characteristic) analysis showed that the predictive model for recurrent ischemic stroke has excellent discriminative ability, with an AUC (area under the curve) of 0.917 (95% CI: 0.876–0.958).

[0020] Figure 2 This is a schematic diagram of the nomogram and decision curve analysis for predicting recurrence risk in an embodiment of the present invention; (A) A nomogram of five key predictive indicators, hsCRP (high-sensitivity C-reactive protein), PLT (platelet count), RBP (retinol-binding protein), DB (direct bilirubin), and EF (ejection fraction), was used to calculate an individual recurrence risk score; the higher the total score, the greater the risk of recurrence.

[0021] (B) Decision curve analysis (DCA) compares the clinical net benefits of the LASSO regression-based predictive model (red curve) with the all-variable model (black curve), showing that the LASSO model performs better in identifying high-risk patients requiring early intervention. Figure 3 This is a schematic diagram illustrating the predictive efficacy and long-term stability of the Cox model screened by LASSO in an embodiment of the present invention for the recurrence of ischemic stroke. (A) Univariate predictive power (B) Time-based C-index analysis shows that the C-index of the Cox model selected by LASSO increased from 0.650 at 1 year to 0.760 at 7 years, significantly outperforming random guessing (C-index=0.5), demonstrating its continuously improving long-term predictive accuracy and stability; Figure 4This is a Kaplan-Meier analysis chart of the recurrence probability for each risk group in an embodiment of the present invention; Figure 5 According to the Cox proportional hazards model using restricted cubic splines (RCS) in this embodiment of the invention, there is a significant nonlinear association between age and relapse risk (overall P = 0.029, nonlinear P = 0.021). Figure 6 This is a schematic diagram comparing the cumulative recurrence rate of ischemic stroke over 7 years between the anticoagulation group and the non-anticoagulation group in an embodiment of the present invention. Detailed Implementation

[0022] The embodiments of the present invention will be described in detail below with reference to the accompanying drawings. These embodiments are implemented based on the technical solution of the present invention and provide detailed implementation methods and specific operation processes. However, the scope of protection of the present invention is not limited to the following embodiments.

[0023] This embodiment discloses a method for constructing a stroke recurrence risk assessment model, including the following steps: Standard population determination steps: Inclusion criteria: ① Meets the 2013 American College of Cardiology / Stroke Association diagnostic criteria for stroke and the 2014 American College of Cardiology diagnostic criteria for atrial fibrillation; ② Has a history of non-valvular atrial fibrillation and is experiencing its first spontaneous stroke or transient ischemic attack (TIA); ③ Was hospitalized at the research center within 7 days of the onset of the disease; ④ Is older than 18 years and has voluntarily signed an informed consent form.

[0024] Exclusion criteria: ① Severe coagulation dysfunction, liver and kidney dysfunction (creatinine clearance <30 mL / min); ② Life expectancy <3 months; ③ Significant sequelae of previous stroke (mRS score >3); ④ Poor compliance and inability to complete follow-up.

[0025] Selection method and source: Inpatients who meet the inclusion criteria were searched through the electronic medical record system of each center and included after the investigators assessed their eligibility.

[0026] The cases were sourced from the neurology, emergency and stroke units of various centers.

[0027] Follow-up strategy: A combination of outpatient and telephone follow-up was adopted, with follow-up periods of 3 months, 6 months, 1 year, 2 years and 3 years after discharge. Each follow-up recorded relapse events, medication use, laboratory indicators and functional scores.

[0028] Loss to follow-up is defined as two consecutive failures to respond to follow-up visits and inability to be contacted. The last known status of the lost to follow-up person must be recorded.

[0029] Sample size determination steps: Use a logistic regression model to estimate the sample size required for independent risk factors; parameter settings are as follows: Expected recurrence rate: The 3-year recurrence rate for patients with atrial fibrillation and cerebral infarction is approximately 25% (P0=0.25). Statistical power: 1-β=0.80, significance level α=0.05; It is expected that 5 independent risk factors will be tested, with at least 10 events (EPV=10) required for each factor. Number of events required = 5 × 10 = 50 cases; Sample size calculation formula: N = [Zα / 2√(2P0 (1-P0)) + Zβ√(2P1 (1-P1))] 2 / (P1-P0) 2 Assuming the minimum effect size OR = 1.8 (P1 = 0.35), the calculated N = 312 cases; Considering a 20% dropout rate, the final sample size was adjusted to 500 cases.

[0030] The calculation software is G×Power 3.1, which calculates based on the main indicator "recurrence rate" and meets the needs of multifactor analysis and model validation.

[0031] Investigation content (collected using case report forms) Exposure and baseline indicators: ① General information: initials of name (in pinyin), gender, age, ethnicity, education level, body mass index, family income, etc.; ② History of atrial fibrillation: type (paroxysmal / persistent / permanent), treatment regimen (type of anticoagulants) (Class, initiation time, dosage), control status (seizure frequency, international normalized ratio (INR) value); ③ Stroke-related factors: onset time of symptoms, infarct location (results of head computed tomography (CT) / magnetic resonance imaging (MRI) scans), hemorrhagic transformation classification, etiological classification (Org 10172 trial (TOAST) criteria for acute stroke treatment); ④ Risk factors: hypertension, diabetes, smoking history, dyslipidemia, history of stroke, etc.; ⑤ Medication history: Antiplatelet / anticoagulation regimen and lipid-lowering therapy before the onset of the disease.

[0032] Outcomes and follow-up indicators: ①Main outcome: Timing, type, and imaging of stroke recurrence (ischemic / hemorrhagic) within 3 years. Learn from evidence; ②Secondary outcomes: peripheral vascular events, massive hemorrhage, all-cause mortality; ③ Functional scores: National Institutes of Health Stroke Scale (NIHSS), Modified Rankin Scale (mRS), and Barthel Index (BI) scores at admission and follow-up.

[0033] Laboratory and imaging indicators: Complete blood count, liver and kidney function tests, coagulation function (international normalized ratio (INR), D-dimer, blood lipids, high homocysteine, C-reactive protein); results of transcranial Doppler ultrasound (TCD) / CT angiography (CTA) / magnetic resonance angiography (MRA) of head and neck vessels, to determine whether there is vascular stenosis or occlusion.

[0034] Hybridization and Modification Factors: Age, sex, anticoagulation therapy adherence, blood pressure / glycemia control, complications (pneumonia, deep vein thrombosis) (e.g., venous thrombosis).

[0035] Data collection method: Inpatient data were extracted by researchers who had received standardized training through the electronic medical record system. During follow-up, long-term data were collected through outpatient examinations and telephone questionnaires. The data was entered into the database using a double entry method, and the data integrity was checked monthly.

[0036] The Case Report Form (CRF) was designed with reference to the "CRF Template" and included items for the entire lifecycle of the screening, treatment and follow-up periods to ensure the accuracy of exposure-outcome association analysis.

[0037] Follow-up plan development steps: Determine the follow-up cycle and follow-up content; 1. Baseline follow-up (0 visits, at enrollment) Timeframe: Within 7 days of admission (screening period).

[0038] content: Sign an informed consent form and collect basic information (name, age, medical history, etc.). Baseline tests: complete blood count, liver and kidney function tests, coagulation function (international normalized ratio (INR)), blood lipids, blood glucose, high homocysteine, and C-reactive protein; Imaging: Computed tomography (CT) / magnetic resonance imaging (MRI) of the head (to determine the location and area of ​​the infarction), transcranial Doppler ultrasound (TCD) / CT angiography (CTA) of the head and neck vessels (to assess vascular stenosis); Functional scores: National Institutes of Health Stroke Scale (NIHSS), Glasgow Coma Scale (GCS), and Modified Rankin Scale (mRS) scores at admission.

[0039] 2. First follow-up visit (Visit 1, 3 months after discharge) Time point: 90±14 days after discharge.

[0040] content: Clinical events: Whether a recurrence of stroke or hemorrhagic complications occurred; Medication status: adherence to anticoagulants / antiplatelet drugs (dosage, regularity of medication), lipid-lowering / antihypertensive regimen; Laboratory tests: complete blood count, liver and kidney function tests, and coagulation function tests (international normalized ratio (INR)). Functional Rating Scale (mRS) and Barthel Index (BI).

[0041] 3. Second follow-up (Visit 2, 6 months after discharge) Time point: 180±30 days after discharge.

[0042] content: Clinical events: For relapsed cases, record the time of relapse, type (ischemic / hemorrhagic), and imaging data; Atrial fibrillation control: record of episode frequency and anticoagulation therapy adjustments; Laboratory tests: blood lipids, blood glucose, INR; Imaging: If necessary, repeat head CT / MRI (if new symptoms appear).

[0043] 4. Third follow-up (3 visits, 1 year after discharge) Time point: 365±60 days after discharge.

[0044] content: Comprehensive assessment: statistics on relapse cases, all-cause mortality events; Laboratory tests: complete blood count, liver and kidney function tests, coagulation function tests, and inflammatory markers (CRP); Functional scores: mRS, NIHSS, BI index (assessing long-term prognosis); Comorbidities: New risk factors such as hypertension and diabetes were recorded.

[0045] 5. Fourth follow-up visit (visit 4, 2 years after discharge) Time point: 730±90 days after discharge.

[0046] content: Relapse events: Focus on risk factors for delayed relapse (>1 year); Medication adjustments: Long-term safety of anticoagulation therapy (e.g., gastrointestinal bleeding, liver damage); Imaging: CTA of head and neck vessels (to assess the progression of vascular lesions).

[0047] 6. Fifth follow-up visit (5 visits, 3 years after discharge) Time point: 1095±120 days after discharge.

[0048] content: Final outcomes: Statistics on stroke recurrence rate and recurrence patterns (ischemic vs. hemorrhagic); Laboratory tests: Complete set of baseline indicators were re-examined (compared with those at enrollment); Model validation: The consistency between the relapse risk model prediction and the actual relapse rate; Data summary: Complete the final update of the Case Report Form (CRF) and collect the cause of death for each death (whether it is related to cardiovascular or cerebrovascular disease).

[0049] Follow-up operation procedures and quality control Follow-up method: Outpatient follow-up: Patients visit the neurology department of each center to complete laboratory and imaging examinations; Telephone follow-up: For patients unable to visit an outpatient clinic, information such as medication use and symptom changes was collected via telephone questionnaires, along with laboratory data. According to scanned copies of the examination reports provided by the patient.

[0050] Handling of loss to follow-up: Definition: Two consecutive unresponsive follow-up visits and failed attempts to contact the individual via phone / SMS; Measures: Record the last known status (e.g., "follow-up until January 2024, no recurrence") and include it in the intention-to-treat (ITT) analysis.

[0051] Data Management: Complete the electronic entry of the CRF form within 72 hours after each follow-up visit, and double-check the copies; Laboratory data is integrated with the hospital information system (HIS) to ensure the accuracy of objective indicators; Imaging data are archived by the central radiology department and marked "for research purposes only".

[0052] Data management and statistical analysis steps: Data management: including paper / electronic spreadsheets, database creation and entry, whether double entry is required, and whether electronic data entry is required. Data acquisition, database cleaning and locking, data archiving, etc.

[0053] Statistical analysis section: 1. Data preprocessing and variable definition Data Management: The Case Report Form (CRF) data was entered into a spreadsheet using a double entry method, and logical validation was performed using Epidata 3.1 software to ensure there were no formatting errors or missing value markings ("ND" not marked, "UK" unknown, "NA" not applicable).

[0054] Missing values ​​for continuous variables were handled using the chain equation multiple imputation (MICE) method. Missing values ​​for categorical variables were classified as "not exposed" or "none". The imputation results were verified for their reasonableness using the mice package in R.

[0055] Variable standardization: Continuous variables (such as age and blood lipid levels) are standardized using Z-scores, categorical variables (such as atrial fibrillation type and type of anticoagulant) are converted into dummy variables, and ordinal variables (such as modified Rankin Scale (mRS) scores) retain their original rank.

[0056] 2. Descriptive statistical analysis: Baseline feature description: Continuous variables: those conforming to a normal distribution are described by mean ± standard deviation (Mean ± SD), and those not conforming to a normal distribution are described by median (interquartile range) [M (Q1, Q3)]. Categorical variables: described by the number of cases (percentage) [n (%)].

[0057] Intergroup comparisons were performed using the chi-square test (χ²). 2 (Test) or Fisher's Exact Test.

[0058] Group comparison: Baseline characteristics were compared based on stroke recurrence status (recurrence group vs. non-recurrence group) to screen potential stroke patients. Related factors.

[0059] 3. Inferential statistical analysis: Univariate analysis: Logistic regression analysis was used to analyze the association between each variable and the recurrence rate. The OR value and 95% confidence interval (CI) were calculated, and variables with P < 0.1 were selected for inclusion in the multivariate model.

[0060] Multivariate Logistic Regression Model: The study included univariate analysis with "recurrence of stroke within 3 years" as the dependent variable (yes = 1, no = 0). The selected variables and clinically important confounding factors (such as age, sex, and anticoagulation therapy) were used to construct the final model using a stepwise forward approach (αin = 0.05, αout = 0.1).

[0061] Model hypothesis testing: Hosmer-Lemeshow test (P>0.05 indicates a good model fit).

[0062] Risk prediction model construction: Based on multi-factor regression coefficients, a nomogram model is constructed, and the weights of each variable are transformed into a visual scoring system, which is implemented using the rms package in R language.

[0063] The nomogram model is constructed based on a pre-built Cox proportional hazards regression model to predict the patient's survival probability. The Cox proportional hazards regression model includes a hazard function and a survival function.

[0064] The risk function is used to quantify the instantaneous event risk of an individual at time t, and the survival function is used to convert the risk into the probability that the individual survives to time t.

[0065] By analyzing the recurrence rate, recurrence patterns, and independent risk factors of stroke and establishing a risk assessment model, the accuracy and specificity of patient recurrence risk assessment can be significantly improved, effectively addressing the pain point of lacking suitable tools for clinical risk prediction.

[0066] This model can identify key factors affecting recurrence and construct a nomogram risk prediction model based on independent risk factors, thereby guiding the optimization of anticoagulation / antiplatelet therapy regimens and helping to reduce the recurrence rate, disability and mortality of stroke.

[0067] In some embodiments, the risk function formula is: h(t|X) = h0(t)exp[β1·rcs(age, 3) + β2·sex + β3·history of diabetes + β4·history of hypertension + β5·history of hyperlipidemia + β6·carotid and cerebral arteriosclerosis + β7·anticoagulant drugs + β8·lipid-lowering drugs + β9·ejection fraction + β 10 Hemoglobin + β 11 Platelet count + β 12 Fasting blood glucose + β 13 Direct bilirubin + β 14 Total protein + β 15 • Bicarbonate + β 16 Creatinine clearance + β 17 • Retinol-binding protein + β 18 High-sensitivity C-reactive protein + β 19[Drinking habits]; In the formula, h0(t) represents the baseline risk function, exp represents the risk ratio, and β1~β 19 Representing the regression coefficient, rcs(age, 3) indicates a 3-node restricted cubic spline fit on the variable "Age". Gender is a binary qualitative variable (e.g., male = 1 / female = 0). History of diabetes, hypertension, and hyperlipidemia are all binary qualitative variables (yes = 1 / no = 0). Carotid and cerebral artery atherosclerosis are binary / multi-category qualitative variables (yes = 1 / no = 0; or mild / moderate / severe = 1 / 2 / 3). Anticoagulant medication (history of use) and lipid-lowering medication (history of use) are binary qualitative variables (yes = 1 / no = 0). Ejection fraction (cardiac function indicators) is a continuous variable (e.g., 55%, 60%). Hemoglobin and total protein are continuous variables (unit: g / L). Platelet count is a continuous variable (unit: ×10). 9 Fasting blood glucose, direct bilirubin, and bicarbonate are also continuous variables (unit: μmol / L). Creatinine clearance rate is a continuous variable (reflecting renal function, unit: mL / min). Retinol-binding protein is a continuous variable (reflecting nutrition / renal function). High-sensitivity C-reactive protein is a continuous variable (inflammatory marker, unit: mg / L). Alcohol consumption is a binary / multi-category variable (drinking = 1 / not drinking = 0; or small amount / moderate amount / large amount = 1 / 2 / 3). The standard Cox model assumes a linear relationship between covariates and risk (e.g., risk increases by a fixed multiple for every year of age increase), but in actual clinical practice, the effect of age on survival may be non-linear (e.g., risk increases slowly at ages ≤60 and rapidly at ages >60).

[0068] rcs(Age, 3) fits this nonlinear relationship through a smooth curve, making the model more closely resemble the real data patterns. β1 is the comprehensive regression coefficient after this nonlinear fitting (no longer a simple linear coefficient); exp[β1·rcs(Age, 3) + ...+ β 19·drink] The linear combination of "all covariates × corresponding regression coefficients" (even if rcs(Age, 3) is a non-linear fit, it will ultimately be transformed into one term in the linear combination for calculation). After being transformed by exp() (natural exponential), the result is the comprehensive hazard ratio (HR), which represents "how many times the risk of this individual is compared to the risk of the baseline population"; for example: if the result of this index term for a certain patient = 2.0, it means the instantaneous endpoint event risk of this patient is 2 times that of the baseline population (when the covariates take the reference values); if the result = 0.5, it means the risk is 50% of the baseline population (i.e., a 50% reduction).

[0069] In some embodiments, the survival function is: S(t|X) = S0(t)^exp(β'X); in the formula, S0(t) represents the baseline survival function, which corresponds one-to-one with h0(t) (baseline hazard), and is "the probability that the population survives to time t when all covariates take the reference values" (only related to time t, for example, the 3-year survival probability of the baseline population S0(3)=0.8, that is, 80%), exp(β'X) represents the comprehensive hazard ratio; the survival function is used to convert "hazard" into an intuitive "survival probability", which is used here as the power (exponential scaling factor) of S0(t) to adjust the baseline survival probability and obtain the survival probability of an individual; β'X represents the linear combination of covariates (i.e., β1·rcs(Age, 3) +... + β 19 ·drink), β' is the transpose of the regression coefficient vector, X is the covariate vector, and essentially it is the comprehensive hazard score of the covariates.

[0070] The mathematical meaning of the survival function is to perform "power adjustment" on the baseline survival probability through the hazard ratio. For example: Scaling rules: ① If exp(β'X)=1 (the individual has the same risk as the baseline population), then S(t|X)=S0(t)^1=S0(t), and the individual's survival probability is equal to that of the baseline population; ② If exp(β'X)>1 (the individual's risk is higher than the baseline), then S(t|X)<S0(t) (when the power > 1, the larger the power of the base number S0(t) (0<S0(t)<1), the smaller the result), that is, the individual's survival probability is lower than that of the baseline population; ③ If exp(β'X)<1 (the individual's risk is lower than the baseline), then S(t|X)>S0(t), that is, the individual's survival probability is higher than that of the baseline population.

[0071] Specific example: Assuming the baseline population's 5-year survival probability S0(5) = 0.7 (70%), and a patient's exp(β'X) = 1.5 (the risk is 1.5 times that of the baseline), then the patient's 5-year survival probability S(5|X) = 0.7^1.5 ≈ 0.58 (58%), which intuitively reflects the difference in patients' prognoses.

[0072] The intrinsic relationship between the survival function and the risk function: S(t|X) and h(t|X) are dual relations (one-to-one correspondence, mutually derived): The Cox model first quantifies individual risk using h(t|X) = h0(t)exp(β'X); Then, by defining the survival function (S(t|X) = exp[-∫0^th(u|X)du]), we can derive S(t|X) = S0(t)^exp(β'X) (where S0(t) = exp[-∫0^t h0(u)du] is the integral transformation of baseline risk). In essence, converting the cumulative effect of "instantaneous risk" into the probability of "surviving to time t" is more in line with the understanding of "prognosis" by clinicians and patients (who are more concerned with "how long to live and what the probability of survival is" rather than the abstract "instantaneous risk").

[0073] In some embodiments, the nomogram model is based on the Cox proportional hazards regression model, defines the survival probability calculation method for multiple key time points, hides clinically insignificant intermediate indicators, and sets survival probability scales and labels. The specific function code for the nomogram model is as follows: nom_cox<- nomogram( cox_model fun = list( function(x) predict(cox_model, newdata = as.data.frame(t(x)), type = "survival", times = 1), function(x) predict(cox_model, newdata = as.data.frame(t(x)), type = "survival", times = 2), function(x) predict(cox_model, newdata = as.data.frame(t(x)), type = "survival", times = 3) ), lp = FALSE, fun.at = seq(0.1, 0.9, by = 0.1), funlabel = c( "1-year Survival Probability", "2-year Survival Probability", "3-year Survival Probability" ) ) nom_cox<- nomogram( cox_model means: A nomogram object is created using the nomogram function (usually from the rms package, used to build clinical nomograms), and the result is assigned to nom_cox. The nomogram can then be displayed using plotting functions.

[0074] The core input is cox_model: a pre-trained Cox proportional hazards regression model that has already identified key factors affecting patient survival (such as age, tumor stage, biochemical indicators, etc.).

[0075] fun = list( function(x) predict(cox_model, newdata = as.data.frame(t(x)), type = "survival", times = 1), function(x) predict(cox_model, newdata = as.data.frame(t(x)), type = "survival", times = 2), function(x) predict(cox_model, newdata = as.data.frame(t(x)), type = "survival", times = 3) ),express: The `fun` parameter defines the predictive metric to be output by the nomogram (in this case, the survival probability at three time points). It accepts a list of three custom functions, each corresponding to a survival probability calculation at a given time point. The specific logic is as follows: The input x for each custom function represents the combination of values ​​for each predictor variable (such as age, tumor stage) in the nomogram (i.e., the complete indicator data for a particular patient).

[0076] predict(cox_model, ...): Invokes the prediction function of the Cox model to calculate survival-related outcomes based on the input patient data.

[0077] `newdata = as.data.frame(t(x))`: Transposes the input `x` (`t(x)`) and converts it to a data frame. The purpose is to match the "data frame format" input required by the `predict()` function, ensuring that the Cox model can correctly identify the patient's various indicator values.

[0078] type = "survival": Specifies the prediction type as "survival probability" (instead of hazard ratio, linear prediction value, etc.).

[0079] times = 1 / 2 / 3: Specifies the prediction time points as 1 year, 2 years, and 3 years respectively, and outputs the survival probability in 1 year, 2 years, and 3 years respectively.

[0080] The list format corresponds to 3 time points, and the final Column Graph will simultaneously display the prediction results for these 3 time points for easy comparison.

[0081] `fun.at = seq(0.1, 0.9, by = 0.1)` defines the scale range of the survival probability. `fun.at` specifies the coordinate axis scale value of the final indicator "survival probability" in the nomogram. `seq(0.1, 0.9, by = 0.1)` generates a sequence from 0.1 to 0.9 with a step size of 0.1 (i.e., 0.1, 0.2, ..., 0.9), meaning that the survival probability scale of the nomogram will clearly mark these 9 values. The purpose is to make the reading of survival probability more intuitive and accurate, avoiding the difficulty of interpretation caused by overly dense or sparse scales, while covering the survival probability range (10%~90%) that is of most concern in clinical practice.

[0082] The `funlabel = c(...)` function adds clear Chinese / English labels to survival probability indicators. `funlabel` adds readable labels to the three predictive indicators (survival probabilities at three time points) defined by the `fun` parameter. These labels are then displayed on the nomogram, here labeled as "1-year Survival Probability," "2-year Survival Probability," and "3-year Survival Probability," respectively. This is to allow those viewing the nomogram (such as clinicians) to quickly distinguish the survival probability results at different time points, avoid confusion, and improve the practicality of the nomogram.

[0083] Model performance evaluation: Discrimination: The area under the receiver operating characteristic curve (AUC) is calculated. An AUC > 0.7 indicates that the model has clinical value. Calibration degree: Plot a calibration curve to compare the consistency between the predicted probability and the actual recurrence rate; Clinical applicability: Decision curves (DCA) are plotted to evaluate the net benefit of the model at different threshold probabilities.

[0084] 4. Model Validation and Sensitivity Analysis Internal validation: We employ a split cross-validation method (randomly dividing the sample into two halves, which are used alternately as the modeling set and the validation set). The average AUC and stability index (SI) were calculated 100 times.

[0085] The robustness of the model parameters was evaluated using the Bootstrap resampling method (1000 iterations).

[0086] Subgroup analysis: By gender, age (<65 years vs. ≥65 years), and atrial fibrillation type (paroxysmal vs. persistent). Stratified analysis was performed on anticoagulant types (vitamin K antagonists vs. novel oral anticoagulants) to compare the predictive efficacy of models in each subgroup.

[0087] Sensitivity analysis: Excluding cases lost to follow-up (n=100), remodel the model and assess the impact of loss to follow-up on the results; Remove variables with more than 10% missing values ​​to verify model stability.

[0088] 5. Survival analysis (for time-dependent outcomes) If it is necessary to analyze the relapse time, the Cox proportional hazards model is used to calculate the hazard ratio (HR) and 95% CI. Plot Kaplan-Meier survival curves and use the Log-rank test to compare differences between groups.

[0089] 6. Statistical software and parameter settings Software: SPSS 26.0 (descriptive statistics), R 4.2.1 (regression modeling and visualization); Significance level: two-sided α = 0.05, confidence interval 95%; Data visualization: Using the ggplot2 package to plot noctilinear plots, ROC curves, and calibration curves, and survminer. The package plots the survival curve.

[0090] 7. Expected Outcomes and Clinical Significance Multivariate analysis identified 3-5 independent risk factors (such as age, CHA2DS2-VASc score, anticoagulation therapy adherence, LDL-C level, etc.), with the model AUC expected to reach 0.75-0.80, indicating moderate predictive power.

[0091] Nonographs can integrate clinical indicators and laboratory parameters, providing clinicians with an intuitive tool for assessing relapse risk and guiding the optimization of anticoagulation therapy strategies.

[0092] In this embodiment, 113 patients with atrial fibrillation (AF) complicated with ischemic stroke (IS) were followed up for 7 years.

[0093] The results showed that 51 patients (45.13%) experienced a recurrence of ischemic stroke.

[0094] Compared with the non-relapse group, the relapse group had significantly different clinical characteristics: a higher proportion of males (69%), a greater number of patients with a history of hypertension (73%), and a significantly lower rate of anticoagulation drug use (29%), and all of the above differences were statistically significant (P<0.05).

[0095] The results suggest that male sex, a history of hypertension, and non-standard anticoagulation therapy may be important risk factors for recurrent ischemic stroke in patients with atrial fibrillation.

[0096] For a detailed comparison of clinical features, please refer to Table 1 of this instruction manual.

[0097] Table 1. Comparison of characteristics between patients with recurrent ischemic stroke (IS) and those without recurrence. In this embodiment, LASSO regression combined with 10-fold cross-validation was used to screen predictors of ischemic stroke recurrence, and a total of 19 variables with non-zero coefficients were identified.

[0098] The results showed that anticoagulant use (β=-1.011) was a significant protective factor; while male sex (β=0.381), history of diabetes (β=0.178), history of hyperlipidemia (β=0.117), and history of hypertension (β=0.080) were significantly associated with an increased risk of relapse.

[0099] Other predictive factors identified include clinical indicators such as age, hemoglobin, and platelet count.

[0100] In patients with atrial fibrillation and concurrent ischemic stroke, the use of anticoagulants significantly reduced the risk of recurrent ischemic stroke (P<0.001).

[0101] In addition, elevated platelet count and elevated diastolic blood pressure are also associated with an increased risk of recurrent ischemic stroke. Figure 1 AB).

[0102] Receiver operating characteristic (ROC) curve analysis showed that the predictive model for ischemic stroke recurrence based on LASSO regression screening variables had excellent discriminative ability, with an area under the curve (AUC) of 0.917 (95% confidence interval: 0.876-0.958) (Figure 1C).

[0103] This indicates that the model in this embodiment has important clinical application value in distinguishing whether a patient will experience a recurrence of ischemic stroke.

[0104] In this embodiment, the nomogram includes five key predictors: high-sensitivity C-reactive protein (hsCRP), platelet count (PLT), retinol-binding protein (RBP), diastolic blood pressure (DB), and ejection fraction (EF).

[0105] Each variable corresponds to a score based on its value, and the total score is obtained by adding these scores together.

[0106] The higher the total score, the greater the individual's risk of recurrent ischemic stroke (Figure 2A).

[0107] Decision curve analysis (DCA) results showed that, across most risk thresholds, the LASSO regression-based predictive model (red curve) had a significantly higher net clinical benefit than the all-variable model (black curve).

[0108] This indicates that the LASSO model can more accurately identify patients at high risk of relapse who require early intervention (Figure 2B).

[0109] In this embodiment, the model constructed by screening variables through LASSO regression analysis has a significant predictive ability for recurrence of ischemic stroke.

[0110] In the univariate predictive efficacy evaluation, age and anticoagulant use were the strongest predictors, with C-indexes of 0.620 and 0.606, respectively, indicating that these two variables had the best predictive effect on the recurrence of ischemic stroke.

[0111] Other variables such as fasting blood glucose (FBG), history of hypertension, and ejection fraction (EF) also showed some predictive ability, but their predictive efficacy was relatively weak.

[0112] It is noteworthy that the C-index for the alcohol consumption variable was the lowest, at only 0.508, suggesting that it contributed the least to the prediction of ischemic stroke recurrence (Figure 3A).

[0113] Time-dependent C-index analysis showed that the predictive efficacy of the Cox model constructed using variables selected by LASSO for recurrence of ischemic stroke gradually improved over time.

[0114] Over a time span of 1–7 years, the model’s C-index increased from 0.650 to 0.760, indicating that the model can better capture long-term risk factors, thereby improving prediction accuracy.

[0115] These results highlight the stability and reliability of the model in long-term risk assessment.

[0116] Furthermore, the C-index of the model was significantly higher than the baseline value of random guessing at all time points (C-index = 0.5), further confirming its effectiveness in predicting recurrence of ischemic stroke (Figure 3B).

[0117] Key variables were screened using LASSO regression, and patients' risk scores were calculated and divided into three risk groups: low, medium, and high.

[0118] Kaplan-Meier relapse curve analysis showed that there were significant differences in relapse probabilities among different risk groups (P<0.0001).

[0119] The results showed that the recurrence rate in the low-risk group remained at a low level, while the recurrence rate in the high-risk group was significantly higher, and the medium-risk group was in between.

[0120] This result validates the effectiveness of the LASSO regression model in risk stratification, suggesting that the model can reliably distinguish patient groups with different recurrence probabilities (Figure 4).

[0121] This embodiment uses the Cox proportional hazards model combined with restricted cubic splines (RCS) to analyze the nonlinear relationship between age and the risk of recurrent ischemic stroke.

[0122] The results showed a significant non-linear association between age and hazard ratio (HR) (the overall association P-value was 0.029, and the non-linear part P-value was 0.021).

[0123] Restricted cubic spline analysis showed that the effect of age on recurrence risk peaks around age 70, at which point the hazard ratio is highest.

[0124] Before age 70, the risk ratio gradually increases with age; however, after age 70, the risk ratio begins to decline.

[0125] This finding suggests that patients around 70 years of age may face a higher risk of recurrent ischemic stroke (Figure 5).

[0126] This example follows up 114 patients with atrial fibrillation and concurrent ischemic stroke for 7 years.

[0127] The results showed a significant difference in the 7-year cumulative recurrence rate of ischemic stroke between the anticoagulation therapy group and the non-anticoagulation therapy group (log-rank test, P=0.001), indicating that anticoagulation therapy can significantly reduce the risk of ischemic stroke recurrence. Figure 6 ).

[0128] Subgroup analysis showed that anticoagulation therapy reduced the risk of recurrent ischemic stroke by 70% (HR=0.30, 95% confidence interval: 0.16-0.58, P<0.001), indicating that anticoagulation therapy is an independent protective factor for patients with atrial fibrillation and concurrent ischemic stroke.

[0129] Subgroup analysis also found that smoking status had a significant impact on the efficacy of anticoagulation therapy (interaction P=0.012): in non-smokers, anticoagulation therapy significantly reduced the risk of relapse (HR=0.18, 95% confidence interval: 0.08-0.40, P<0.001); while in smokers, the opposite trend was observed (HR=2.03, 95% confidence interval: 0.51-8.06, P=0.316).

[0130] Furthermore, the study found that female patients (HR=0.19, P=0.004) and patients without a history of heart failure (HR=0.26, P<0.001) benefited more significantly from anticoagulation therapy.

[0131] Notably, anticoagulation therapy showed consistent clinical benefit in both the high-inflammation (hsCRP ≥ 1 mg / L; HR = 0.31, 95% confidence interval: 0.15–0.64) and low-inflammation (hsCRP < 1 mg / L; HR = 0.13, 95% confidence interval: 0.02–1.06) subgroups, and no significant interaction was observed (interaction P = 0.997).

[0132] There was no statistically significant difference in the low-inflammation group (P=0.056), which is likely due to the small sample size (n=22) leading to insufficient statistical power, rather than the treatment being truly ineffective (Table 2).

[0133] Table 2: Subgroup analysis of the efficacy of anticoagulation therapy in patients with atrial fibrillation and ischemic stroke.

[0134]

[0135] In summary, the method for constructing a stroke recurrence risk assessment model provided by this invention can significantly improve the accuracy and specificity of patient recurrence risk assessment by analyzing the recurrence rate, recurrence pattern and independent risk factors of stroke and establishing a risk assessment model, effectively solving the pain point of lacking suitable tools for clinical risk prediction.

[0136] This model can identify key factors affecting recurrence and construct a nomogram risk prediction model based on independent risk factors, thereby guiding the optimization of anticoagulation / antiplatelet therapy regimens and helping to reduce the recurrence rate, disability and mortality of stroke.

[0137] The foregoing has shown and described the basic principles, main features, and advantages of the present invention.

[0138] Those skilled in the art should understand that the present invention is not limited to the above embodiments. The embodiments and descriptions in the specification are merely illustrative of the principles of the present invention. Various changes and modifications can be made to the present invention without departing from its spirit and scope, and all such changes and modifications fall within the scope of protection claimed by the present invention.

[0139] The scope of protection claimed by this invention is defined by the appended claims and their equivalents.

Claims

1. A method for constructing a stroke recurrence risk assessment model, characterized in that, Includes the following steps: Standard population determination steps: Select patients with atrial fibrillation and cerebral infarction who meet the criteria; Sample size determination steps: Use a logistic regression model to estimate the sample size required for independent risk factors; Follow-up plan development steps: Determine the follow-up cycle and follow-up content; Data management and statistical analysis steps: Based on the follow-up period and content, the follow-up data is entered and archived, and statistical analysis is performed on the follow-up data; the statistical analysis includes inferential statistical analysis, which includes: Univariate analysis steps: Logistic regression model was used to analyze the association between each variable and stroke recurrence, the OR value and 95% confidence interval were calculated, and variables with P value < 0.1 were selected to enter the multivariate regression model; Construction of the multivariate regression model: With "stroke recurrence within 3 years" as the dependent variable, variables screened by univariate analysis and clinically important confounding factors were included, and the final model was constructed using the stepwise forward method; Steps for building a risk prediction model: Based on multi-factor regression coefficients, construct a nomogram model and transform the weights of each variable into a visual scoring system; The nomogram model is based on a constructed Cox proportional hazards regression model and is used to predict the patient's survival probability. The Cox proportional hazards regression model includes a hazard function and a survival function. The risk function is used to quantify the instantaneous event risk of an individual at time t, and the survival function is used to convert the risk into the probability that the individual survives to time t.

2. The method for constructing the stroke recurrence risk assessment model according to claim 1, characterized in that, The risk function formula is: h(t|X) = h0(t)exp[β1·rcs(age, 3) + β2·sex + β3·history of diabetes + β4·history of hypertension + β5·history of hyperlipidemia + β6·carotid and cerebral arteriosclerosis + β7·anticoagulant drugs + β8·lipid-lowering drugs + β9·ejection fraction + β 10 Hemoglobin + β 11 Platelet count + β 12 Fasting blood glucose + β 13 Direct bilirubin + β 14 Total protein + β 15 • Bicarbonate + β 16 Creatinine clearance + β 17 • Retinol-binding protein + β 18 High-sensitivity C-reactive protein + β 19 [Drinking habits]; In the formula, h0(t) represents the baseline risk function, exp represents the risk ratio, and β1 ~ β 19 This represents the regression coefficient.

3. The method for constructing the stroke recurrence risk assessment model according to claim 1, characterized in that, The survival function formula is: S(t|X) = S0(t)^exp(β'X); In the formula, S0(t) represents the baseline survival function, and exp(β'X) represents the combined hazard ratio.

4. The method for constructing the stroke recurrence risk assessment model according to claim 1, characterized in that, The logic of the nomogram model is as follows: based on the Cox proportional hazards regression model, it defines the survival probability calculation method for multiple key time points, hides intermediate indicators with no clinical significance, and sets survival probability scales and labels.

5. The method for constructing the stroke recurrence risk assessment model according to claim 1, characterized in that, The statistical analysis also includes: data preprocessing and variable definition, descriptive statistical analysis, model validation and sensitivity analysis, and survival analysis.

6. The method for constructing the stroke recurrence risk assessment model according to claim 5, characterized in that, The data preprocessing and variable definition include: importing data into Excel using a double entry method, performing logical validation, and handling missing values; using multiple imputation for missing continuous variables, classifying missing categorical variables as unexposed or nonexistent, standardizing continuous variables, converting categorical variables to dummy variables, and maintaining the original level for ordinal variables.

7. The method for constructing the stroke recurrence risk assessment model according to claim 5, characterized in that, The descriptive statistical analysis included: baseline characteristics: normally distributed continuous variables were represented by mean ± standard deviation, and non-normally distributed variables were represented by median; categorical variables were represented by number of cases, and inter-group comparisons were performed using chi-square test or Fisher's exact test; comparisons were made by grouping according to stroke recurrence status to screen relevant factors.

8. The method for constructing the stroke recurrence risk assessment model according to claim 5, characterized in that, The model validation and sensitivity analysis include: Internal validation steps: The semi-cross-validation method was used to repeatedly calculate the area under the mean curve and the stability index. The Bootstrap resampling method was used to evaluate the robustness of the model parameters. Subgroup analysis steps: Stratify the analysis by gender, age, atrial fibrillation type, and anticoagulant type, and compare the predictive efficacy of the models in each subgroup; Sensitivity analysis steps: exclude lost-to-follow-up cases and remodel to assess the impact of lost-to-follow-up on the results; remove variables with missing values ​​> 10% to verify model stability.

9. The method for constructing the stroke recurrence risk assessment model according to claim 5, characterized in that, The survival analysis included: analyzing relapse time, using a Cox proportional hazards regression model to calculate the hazard ratio and 95% CI, plotting Kaplan-Meier survival curves, and comparing differences between groups using the Log-rank test.

10. The method for constructing the stroke recurrence risk assessment model according to claim 1, characterized in that, In the follow-up plan development steps, the follow-up period is three years and the number of follow-ups is multiple.