Cardiovascular event prediction method, apparatus, device, storage medium, and program product
By acquiring multidimensional sleep breathing and cardiovascular metabolic parameters, and using a cardiovascular event prediction model for cluster analysis and scoring, the problem of insufficient accuracy in cardiovascular risk assessment for patients with sleep-disordered breathing in existing technologies has been solved, enabling accurate identification and risk assessment of patient subgroups in different physiological states.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- FUWAI HOSPITAL CHINESE ACAD OF MEDICAL SCI & PEKING UNION MEDICAL COLLEGE
- Filing Date
- 2026-02-14
- Publication Date
- 2026-06-12
AI Technical Summary
Existing technologies have insufficient accuracy in assessing the cardiovascular risk of patients with sleep apnea, and cannot identify patient subgroups with the same AHI but different physiological conditions. This leads to the problem that high-risk patients are not identified in time while low-risk patients are over-intervened.
By acquiring multidimensional sleep breathing monitoring parameters and clinical cardiovascular metabolic parameters, cluster analysis is performed using a cardiovascular event prediction model to form naturally clustered preset phenotypic categories. These categories are then mapped to scores through a linear transformation layer, and risk levels are determined by combining preset rules, thereby achieving accurate risk assessment of cardiovascular events.
It effectively identifies patient subgroups with the same AHI but different physiological states, improving the accuracy and clinical suitability of cardiovascular event risk prediction and solving the problem of insufficient specificity and accuracy of risk prediction in existing technologies.
Smart Images

Figure CN122201754A_ABST
Abstract
Description
Technical Field
[0001] This application belongs to the field of cardiovascular event prediction technology, and in particular relates to a cardiovascular event prediction method, device, equipment, computer storage medium and program product. Background Technology
[0002] Sleep-disordered breathing (SDB) is a common sleep-related disorder that is closely associated with a significantly increased risk of various cardiovascular diseases, including hypertension, arrhythmia, heart failure, and stroke. Therefore, accurate cardiovascular risk assessment for patients with sleep-disordered breathing is of significant clinical importance for early identification of high-risk individuals, guiding individualized interventions, and improving patient prognosis.
[0003] Currently, the mainstream clinical method for assessing the severity of sleep-disordered breathing and cardiovascular risk mainly relies on several key indicators obtained from polysomnography (PSG). The most crucial indicator is the apnea-hypopnea index (AHI), which measures the number of apneas and hypopneas occurring per hour during sleep and classifies patients into mild, moderate, or severe based on its value. In addition, indicators such as minimum oxygen saturation and oxygen depletion index are often used as supplementary references. In clinical practice, physicians typically combine these sleep monitoring parameters with the patient's age, body mass index, blood pressure, and other baseline information for a comprehensive assessment.
[0004] However, the aforementioned existing technologies have significant limitations in practical applications. Clinical practice frequently reveals that patients with the same AHI classification may exhibit significant differences in long-term cardiovascular prognosis, leading to some high-risk patients failing to be identified in a timely manner, while some low-risk patients may receive excessive intervention. Therefore, accurately predicting cardiovascular risk has become a pressing technical problem to be solved in this field. Summary of the Invention
[0005] This application provides a method, apparatus, device, computer storage medium, and program product for predicting cardiovascular events, which can improve the accuracy of cardiovascular event risk prediction.
[0006] On one hand, embodiments of this application provide a method for predicting cardiovascular events. The method includes: acquiring multi-dimensional first physiological characteristics of a first target object, the first physiological characteristics including at least sleep apnea monitoring parameters and clinical cardiovascular metabolic parameters; mapping the multi-dimensional first physiological characteristics to multiple values representing preset phenotypic categories through a linear transformation layer of a cardiovascular event prediction model, and obtaining a first score corresponding to each phenotypic category; wherein the preset phenotypic categories are obtained by clustering analysis of the multi-dimensional physiological characteristics of multiple object samples; the multiple object samples are multiple objects that have experienced cardiovascular events; and determining the first risk level of the first target object for experiencing cardiovascular events based on the first score corresponding to each phenotypic category and through preset risk level determination rules.
[0007] In some possible implementations, before mapping the multi-dimensional first physiological features to multiple numerical values representing preset phenotypic categories through the linear transformation layer of the cardiovascular event prediction model to obtain the first score corresponding to each phenotypic category, the method further includes: acquiring multi-dimensional physiological feature samples of multiple object samples, including sleep breathing monitoring parameter samples and clinical cardiovascular metabolic parameter samples; calculating the feature space distance between any two object samples based on the multi-dimensional physiological features of each object sample; assigning all object samples to multiple sets according to the feature space distance to obtain multiple target sets; and determining the corresponding phenotypic category based on the multi-dimensional physiological features of each target set.
[0008] In some possible implementations, all object samples are assigned to multiple sets based on feature space distance to obtain multiple target sets. This includes: randomly selecting the same number of object samples as the preset number of sets from the object samples, and setting their multi-dimensional physiological characteristics as the initial set center points; assigning each object sample to the set with the smallest feature space distance based on the feature space distance between each object sample and each initial set center point; obtaining the object samples contained in each set and recalculating the target center point of each set; updating the set affiliation of each object sample based on the target center point; and if the preset first convergence condition is not met, returning to obtaining the object samples contained in each set and recalculating the target center point of each set until the first convergence condition is met, thus obtaining multiple target sets.
[0009] In some possible implementations, based on the first score corresponding to each phenotype category, the first risk level of the first target object for cardiovascular events is determined by a preset risk level determination rule, including: calculating the first cardiovascular risk score of the target object based on the first score corresponding to each phenotype category by a preset scoring calculation rule; dividing the first cardiovascular risk score according to a first preset threshold and outputting the first risk level of the first target object.
[0010] In some possible implementations, before classifying the first cardiovascular risk score according to the first preset threshold, the method further includes: obtaining the first cardiovascular risk scores of multiple object samples; performing statistical analysis on the multiple first cardiovascular risk scores to determine one or more quantiles of their statistical distribution; and determining the one or more quantiles as the first preset threshold.
[0011] In some possible implementations, obtaining the multi-dimensional first physiological features of the first target object includes: obtaining the multi-dimensional second physiological features of the first target object; normalizing the continuous variables in the second physiological features to obtain normalized continuous variables, encoding the discrete variables to obtain encoded data of the discrete variables; and using the normalized continuous variables and the encoded data of the discrete variables as the first physiological features.
[0012] In some possible implementations, the method further includes: acquiring multiple training samples; wherein the training samples include multi-dimensional third physiological features and phenotypic category labels of the third target object; mapping the multi-dimensional third physiological features to multiple values representing preset phenotypic categories through the linear transformation layer of the cardiovascular event prediction model, obtaining the third score corresponding to each phenotypic category; converting the third score into the predicted probability of the third target object belonging to each phenotypic category through the normalized exponential layer of the cardiovascular event prediction model; extracting the target predicted probability corresponding to the phenotypic category label from the predicted probabilities of each phenotypic category; constructing a likelihood function based on the target predicted probability; and adjusting the parameters of the cardiovascular event prediction model by maximizing the likelihood function until the second convergence condition is met, thereby obtaining the trained cardiovascular event prediction model.
[0013] In some possible implementations, sleep apnea monitoring parameters include at least one of the following: apnea-hypopnea index, lowest blood oxygen saturation, percentage of time blood oxygen saturation below a second preset threshold, arousal index, total sleep time, and percentage of REM sleep; clinical cardiovascular metabolic parameters include at least one of the following: age, sex, body mass index, waist circumference, systolic blood pressure, diastolic blood pressure, history of hypertension, history of diabetes, and history of dyslipidemia.
[0014] On the other hand, embodiments of this application provide a cardiovascular event prediction device, comprising: a feature acquisition module, used to acquire multi-dimensional first physiological features of a first target object, the first physiological features including at least sleep apnea monitoring parameters and clinical cardiovascular metabolic parameters; a feature mapping module, used to map the multi-dimensional first physiological features into multiple values representing preset phenotypic categories through a linear transformation layer of a cardiovascular event prediction model, to obtain a first score corresponding to each phenotypic category; wherein the preset phenotypic categories are obtained by cluster analysis of multi-dimensional physiological features of multiple object samples; the multiple object samples are multiple objects that have experienced cardiovascular events; and a risk determination module, used to determine the first risk level of the first target object for experiencing cardiovascular events based on the first score corresponding to each phenotypic category and through preset risk level determination rules.
[0015] In another aspect, embodiments of this application provide an electronic device, the device including: a processor and a memory storing computer program instructions; the processor executes the computer program instructions to implement a cardiovascular event prediction method.
[0016] In another aspect, embodiments of this application provide a computer storage medium on which computer program instructions are stored, and when the computer program instructions are executed by a processor, a cardiovascular event prediction method is implemented.
[0017] In another aspect, embodiments of this application provide a computer program product in which instructions, when executed by the processor of an electronic device, cause the electronic device to perform a cardiovascular event prediction method.
[0018] The cardiovascular event prediction method, apparatus, device, and computer storage medium of this application acquire multi-dimensional first physiological features, including at least sleep apnea monitoring parameters and clinical cardiovascular metabolic parameters, to achieve a comprehensive characterization of the sleep apnea and cardiovascular metabolic status of the target subject, laying the foundation for accurate identification of patient subgroups. The preset phenotypic categories are not artificially divided, but rather formed by cluster analysis based on the multi-dimensional physiological features of multiple subject samples that have experienced cardiovascular events. This clustering process relies on the inherent structure of the sample data to achieve natural aggregation, while comprehensively considering all dimensions of sleep apnea monitoring parameters and clinical cardiovascular metabolic parameters. This effectively identifies patient subgroups with the same AHI but different physiological states and assigns them to different phenotypic categories, precisely solving the problem of different prognoses for the same AHI. Furthermore, through the linear transformation layer of the cardiovascular event prediction model, the multi-dimensional first physiological features are mapped to the first score corresponding to each phenotypic category, and then the first risk level is determined based on preset rules. This allows the risk assessment to fit the pathophysiological differences of different subgroups, effectively solving the problem of insufficient targeting and accuracy in risk prediction in the prior art, and improving the reliability and clinical adaptability of cardiovascular event risk prediction. Attached Figure Description
[0019] To more clearly illustrate the technical solutions of the embodiments of this application, the accompanying drawings used in the embodiments of this application will be briefly introduced below. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0020] Figure 1 This is a schematic flowchart of a cardiovascular event prediction method provided in one embodiment of this application; Figure 2 This is a flowchart illustrating a cardiovascular event prediction method provided in another embodiment of this application; Figure 3 This is a schematic diagram showing the results of clustering to construct phenotypic categories using a cardiovascular event prediction method provided in another embodiment of this application; Figure 4 This is a flowchart illustrating a cardiovascular event prediction method provided in another embodiment of this application; Figure 5 This is a schematic diagram of the interface output by the cardiovascular event prediction method provided in another embodiment of this application; Figure 6 This is a schematic diagram of the structure of a cardiovascular event prediction device provided in another embodiment of this application; Figure 7 This is a schematic diagram of the structure of an electronic device provided in another embodiment of this application. Detailed Implementation
[0021] The features and exemplary embodiments of various aspects of this application will be described in detail below. To make the objectives, technical solutions, and advantages of this application clearer, the application will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are only intended to explain this application and not to limit it. For those skilled in the art, this application can be implemented without some of these specific details. The following description of the embodiments is merely to provide a better understanding of this application by illustrating examples.
[0022] It should be noted that, in this document, relational terms such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising..." does not exclude the presence of additional identical elements in the process, method, article, or apparatus that includes the element.
[0023] It should be noted that the acquisition, storage, use, and processing of data in this application embodiment all comply with the relevant provisions of national laws and regulations.
[0024] It should be noted that in the embodiments of this application, certain software, components, models and other existing solutions in the industry may be mentioned. These should be regarded as exemplary and are only intended to illustrate the feasibility of implementing the technical solution of this application. However, it does not mean that the applicant has used or necessarily used the solution.
[0025] Existing technologies for cardiovascular risk assessment of sleep-disordered breathing rely on the apnea-hypopnea index as the core assessment indicator. This index can only quantify the frequency of apnea and hypopnea events during sleep. It cannot characterize key information that directly affects the physiological load of the cardiovascular system, such as the duration of respiratory events, the depth of hypoxia, and the characteristics of blood oxygen fluctuations. Nor can it reflect sleep physiological states that are closely related to cardiovascular prognosis, such as sleep structure integrity and wakefulness load. Its one-dimensional quantitative characteristics determine that this index cannot fully capture the multidimensional physiological load differences caused by sleep-disordered breathing, which is an important reason why existing assessment methods cannot accurately reflect the true risk of patients.
[0026] Meanwhile, current technologies fail to systematically integrate patients' cardiovascular metabolic characteristics with sleep monitoring indicators. Physical indicators such as age, body mass index, and waist circumference, blood pressure levels such as systolic and diastolic blood pressure, and comorbidities such as hypertension, diabetes, and dyslipidemia are all core risk factors influencing the probability of cardiovascular events. Patients with the same apnea-hypopnea index (APH) grade exhibit significant individual differences in these indicators, resulting in varying susceptibility and burden levels of underlying cardiovascular diseases. Current assessment methods do not incorporate this multidimensional clinical information into the risk assessment system. Furthermore, current technologies use artificially set numerical limits to classify the APH, a fixed demarcation method that ignores the individual heterogeneity of the disease and cannot adapt to the risk differences among patients with different physiological and metabolic states. Ultimately, this leads to limited predictive ability of current assessment methods for long-term cardiovascular events, resulting in significant differences in prognosis among patients with the same grade, underdiagnosis of high-risk patients, and over-intervention of low-risk patients. Therefore, accurately predicting the cardiovascular risk of patients with sleep-disordered breathing has become a pressing technical problem to be solved in this field.
[0027] The cardiovascular event prediction method of this application acquires multi-dimensional first physiological features, including at least sleep apnea monitoring parameters and clinical cardiovascular metabolic parameters, to achieve a comprehensive characterization of the sleep apnea and cardiovascular metabolic status of the target subject, laying the foundation for accurate identification of patient subgroups. Its preset phenotypic categories are not artificially divided, but rather formed by cluster analysis based on the multi-dimensional physiological features of multiple subject samples that have experienced cardiovascular events. This clustering process relies on the inherent structure of the sample data to achieve natural aggregation, while comprehensively considering all dimensions of sleep apnea monitoring parameters and clinical cardiovascular metabolic parameters. This effectively identifies patient subgroups with the same AHI but different physiological states and assigns them to different phenotypic categories, precisely solving the problem of different prognoses for the same AHI. Furthermore, through the linear transformation layer of the cardiovascular event prediction model, the multi-dimensional first physiological features are mapped to the first score corresponding to each phenotypic category, and then the first risk level is determined based on preset rules. This allows the risk assessment to fit the pathophysiological differences of different subgroups, effectively solving the problem of insufficient targeting and accuracy in risk prediction in existing technologies, and improving the reliability and clinical adaptability of cardiovascular event risk prediction.
[0028] To address the problems of the prior art, embodiments of this application provide a method, apparatus, device, computer storage medium, and computer program product for predicting cardiovascular events. The method for predicting cardiovascular events provided in this application will be described first.
[0029] Figure 1 A flowchart illustrating a cardiovascular event prediction method according to an embodiment of this application is shown. Figure 1 As shown, the method includes the following steps.
[0030] S101, Obtain the multi-dimensional first physiological characteristics of the first target object, the first physiological characteristics including at least sleep breathing monitoring parameters and clinical cardiovascular metabolic parameters.
[0031] As an example, the primary target group can refer to individuals who need to undergo a risk assessment for cardiovascular events, and these individuals are suspected or diagnosed with sleep-disordered breathing, which can include individuals of different ages, genders, underlying health conditions, and degrees of sleep-disordered breathing.
[0032] As an example, multidimensional primary physiological characteristics can refer to a set of features that can reflect the sleep breathing state and cardiovascular metabolic basis of the primary target object from multiple dimensions such as physiology, pathology, and signs. This set consists of quantitative physiological indicators of different types and dimensions, and is the basic input data for predicting cardiovascular event risk.
[0033] As an example, sleep apnea monitoring parameters refer to quantitative indicators obtained through professional monitoring methods that can characterize an individual's breathing rhythm, ventilation status, blood oxygen level, and sleep structure during sleep. They are core parameters that reflect the characteristics of sleep apnea disorder.
[0034] As an example, clinical cardiovascular metabolic parameters refer to quantitative indicators and clinical information that can reflect the functional status of an individual's cardiovascular system and the level of metabolism, obtained through clinical examinations, medical history collection, etc. They are key parameters for assessing the basic risk of cardiovascular events.
[0035] Specifically, as an example, multi-dimensional primary physiological characteristics can be obtained through multi-source data acquisition methods. For sleep apnea monitoring parameters, polysomnography can be used to collect sleep data throughout the night, or portable sleep monitoring devices can be used to obtain home-based sleep apnea-related data. For clinical cardiovascular and metabolic parameters, structured detection indicators can be extracted from electronic medical record systems and clinical examination databases, while standardized medical history collection forms can be used to supplement medical history parameters. After collection, the data is formatted to form a structured multi-dimensional primary physiological characteristic dataset, ensuring data integrity and identifiability.
[0036] As another implementation of S101, S101 may also include the following steps.
[0037] Obtain the multi-dimensional second physiological characteristics of the first target object; normalize the continuous variables in the second physiological characteristics to obtain normalized continuous variables, encode the discrete variables to obtain coded data of the discrete variables; use the normalized continuous variables and the coded data of the discrete variables as the first physiological characteristics.
[0038] As an example, multidimensional second physiological characteristics can refer to a set of raw physiological indicators and clinical information collected directly from the primary target subject without any numerical processing. This set reflects the target subject's sleep-breathing state and cardiovascular metabolic level from multiple dimensions and serves as the raw input data for subsequent data preprocessing. Continuous variables can refer to quantified physiological indicators in multidimensional second physiological characteristics that have continuous values and clear dimensions, exhibiting continuous numerical changes. Normalization refers to numerical transformation operations that eliminate differences in dimensions and numerical ranges between continuous variables, making continuous variables of different dimensions comparable. Discrete variables can refer to non-continuous clinical information representing categories and states in multidimensional second physiological characteristics, lacking quantified dimensions and numerical change characteristics. Encoding refers to the transformation operation that converts discrete variables into quantified values that can participate in algorithmic model calculations. Encoded data refers to standardized quantified values obtained after encoding discrete variables, representing an algorithmically identifiable form of the discrete variables.
[0039] Specifically, as an example, multidimensional secondary physiological characteristics can be categorized into variables, filtering out continuous variables such as age, body mass index, and apnea-hypopnea index, and then performing Z-score standardization and normalization on them. The calculation formula is as follows: Where χ represents the original value of the variable, μ represents the mean of the variable in the training dataset, and δ represents the standard deviation. Discrete variables such as gender and history of hypertension are selected and processed using 0 / 1 binary encoding, with a positive clinical status assigned a value of 1 and a negative clinical status assigned a value of 0. Before processing, outliers of continuous variables are truncated to a clinically reasonable range, and missing values are imputed using the mean of the training dataset. Missing values of discrete variables are imputed using the mode of the training dataset to ensure the integrity of the processed data, ultimately yielding standardized coded data for continuous and discrete variables.
[0040] Specifically, as an example, the normalized continuous variables and discrete variables can be dimensionally integrated according to the original dimensional order of the multi-dimensional second physiological feature, maintaining a one-to-one correspondence between the feature dimensions and the original indicators, and avoiding dimensional confusion. All standardized numerical values after integration are format-validated to ensure that the numerical type is uniformly floating-point type recognizable by the algorithm, with no missing dimensions or numerical misalignment issues, forming a structured standardized feature vector. This feature vector is directly identified as the first physiological feature and can be stored in a data format matching the input interface of the cardiovascular event prediction model, achieving seamless conversion from preprocessed data to model input data and ensuring the continuity and effectiveness of subsequent model calculations.
[0041] The cardiovascular event prediction method of this application eliminates the differences in the dimensions and types of different physiological features by normalizing continuous variables and encoding discrete variables in multi-dimensional second physiological features. This allows multi-dimensional features to be uniformly input into the cardiovascular event prediction model, improving the stability and prediction accuracy of the model operation and ensuring the standardization of input data.
[0042] S102, through the linear transformation layer of the cardiovascular event prediction model, the multi-dimensional first physiological feature is mapped to multiple numerical values representing preset phenotypic categories, and the first score corresponding to each phenotypic category is obtained; wherein, the preset phenotypic category is obtained by cluster analysis of the multi-dimensional physiological features of multiple object samples; the multiple object samples are multiple objects that have experienced cardiovascular events.
[0043] As an example, a cardiovascular event prediction model can refer to an algorithmic model that takes multi-dimensional physiological characteristics related to sleep breathing and cardiovascular metabolism as input and a quantitative score or level related to the risk of cardiovascular events as output, and is used to predict the risk of cardiovascular events in patients with sleep-disordered breathing. This model includes functional modules such as feature mapping and category representation.
[0044] As an example, the linear transformation layer can refer to the functional module in a cardiovascular event prediction model that realizes linear mapping of multi-dimensional physiological features. This module transforms high-dimensional input physiological features into low-dimensional values that can characterize different phenotypic categories through linear operations, and is the core unit for the model to realize feature transformation.
[0045] As an example, a predefined phenotype category can refer to a phenotype classification formed by using patients with sleep-disordered breathing who have experienced cardiovascular events as research samples and grouping samples with similar physiological characteristic patterns into one category through cluster analysis. Each phenotype corresponds to a set of characteristic sleep breathing and cardiovascular metabolic physiological characteristics, which can characterize different pathophysiological states.
[0046] As an example, the target sample can refer to individuals who have been diagnosed with cardiovascular events and also suffer from sleep-disordered breathing. The multidimensional physiological characteristics data of such individuals provide basic sample support for the construction of the preset phenotypic categories. The sample must meet the clinical diagnostic criteria and data integrity requirements.
[0047] As an example, cluster analysis can refer to a data analysis method that groups and classifies object samples with similar characteristics. By calculating the feature similarity between samples, samples with similar features are divided into the same category, ultimately forming multiple phenotypic categories with clear feature differences.
[0048] As an example, the first score can refer to the quantitative value obtained after mapping the multi-dimensional first physiological features through a linear transformation layer, which can characterize the degree to which the first target object belongs to each preset phenotypic category. The value is correlated with the feature similarity between the target object and the corresponding phenotypic category.
[0049] Specifically, as an example, the parameters of the linear transformation layer can be dynamically adapted to match pre-trained dedicated weight matrices and bias vectors for the first target objects of different age groups and different underlying diseases, instead of using globally fixed parameters. This makes the linear mapping more in line with the physiological characteristics of different subgroup target objects and improves the accuracy of the first score in representing phenotypic categories.
[0050] Specifically, as an example, hierarchical clustering can be used to construct preset phenotypic categories during cluster analysis. First, primary clustering is performed according to the type of cardiovascular event. Then, for each type of cardiovascular event sample, secondary clustering is performed according to the characteristics of sleep breathing monitoring parameters. The final preset phenotypic category contains both the type of cardiovascular event and the characteristics of sleep breathing, enabling the first score to more accurately associate with the pathophysiological basis of cardiovascular events.
[0051] Specifically, as an example, the operation of the linear transformation layer can introduce feature interaction terms. On the basis of basic linear operation, statistically significant physiological feature interaction factors are added, and the mapping is completed by feature vector × weight matrix + bias vector + feature interaction terms, thereby improving the ability of numerical values to represent phenotypic category features.
[0052] S103, based on the first score corresponding to each phenotype category, determines the first risk level of cardiovascular events for the first target subject through a preset risk level determination rule.
[0053] As an example, the pre-defined risk level determination rule can refer to the standardized judgment criteria formulated based on the first score corresponding to each phenotype category, which is used to determine the risk level of cardiovascular events in the first target subject. This criterion establishes the correspondence between the first score and the risk level and is the basis for realizing the conversion of the score into the risk level.
[0054] As an example, the first risk level can refer to the classification of the likelihood of a cardiovascular event occurring in a first target subject according to a preset risk level determination rule. Different levels correspond to different probabilities of cardiovascular events, which can provide a clear level reference for clinical risk intervention.
[0055] Specifically, as an example, the preset risk level determination rule can adopt the phenotypic category scoring weight adaptation rule. Based on the correlation strength between each preset phenotypic category and the occurrence of cardiovascular events, different clinical weights are assigned to the first score of each phenotypic category. The comprehensive characterization value is calculated by multiplying the first score of each phenotypic category by the corresponding weight. Then, the first risk level is determined based on the distribution range of the comprehensive characterization value. The higher the correlation strength of the phenotypic category, the larger the weight coefficient is assigned, so that the risk assessment is more in line with the clinical pathological pattern.
[0056] Specifically, as an example, the preset risk level determination rule can adopt the scoring trend determination rule. If the first target object has multiple physiological characteristic data collections, the first score of each phenotypic category corresponding to each collection can be calculated separately, the dynamic change trend of the score can be analyzed, and the first risk level can be determined by combining the value of the single score with the rate and direction of the score change. For target objects with a continuous upward trend in scores, the risk level determination result can be improved to achieve dynamic risk assessment.
[0057] Specifically, as an example, the preset risk level determination rule can introduce clinical warning feature association rules, taking physiological features that are highly correlated with the occurrence of cardiovascular events as clinical warning features. If the first target object has such warning features, the risk level is adjusted upward based on the value of its first score, so that the risk level determination is more in line with the actual clinical risk warning needs.
[0058] As another implementation of S103, S103 may also include the following steps.
[0059] Based on the first score corresponding to each phenotype category, the first cardiovascular risk score of the target object is calculated according to the preset scoring calculation rules; the first cardiovascular risk score is divided according to the first preset threshold, and the first risk level of the first target object is output.
[0060] As an example, the pre-defined scoring calculation rule can refer to a calculation criterion pre-established based on the correlation between phenotypic categories and the risk of cardiovascular events, which converts the first score of each phenotypic category into a single quantitative risk value. This criterion clarifies the participation method and calculation form of the first scores of different phenotypic categories. The first pre-defined threshold can refer to a numerical boundary determined based on the distribution characteristics of cardiovascular risk scores in the training dataset, used to divide different risk levels. This threshold is a fixed value and applicable to the risk level determination of all target subjects. The first cardiovascular risk score can refer to a continuous numerical value obtained by calculating the first scores of each phenotypic category according to the pre-defined scoring calculation rule, used to quantitatively characterize the risk of cardiovascular events in the target subject. The magnitude of the value is correlated with the risk of cardiovascular events.
[0061] Specifically, as an example, we can first combine clinical research results with phenotypic clustering analysis conclusions to determine the association strength between each phenotypic category and cardiovascular events, screen out high-risk and low-risk phenotypic categories, and formulate corresponding scoring calculation rules. Substituting the first score corresponding to each phenotypic category of the target subject into the rules, we perform a linear summation operation on the first scores of high-risk phenotypic categories, while the first scores of low-risk phenotypic categories are not included in the calculation, thus obtaining the first cardiovascular risk score. During the calculation process, the original precision of the score is preserved, without rounding or other simplifications, ensuring that the value accurately reflects the risk differences of the target subject. This calculation rule can be embedded in the risk scoring module to achieve automated calculation of the target subject's score.
[0062] Specifically, as an example, the target population can first be divided into different subgroups based on core clinical cardiovascular metabolic characteristics (such as whether they have diabetes or severe hypoxia). Based on sample data from each subgroup, the distribution characteristics of the first cardiovascular risk score within each subgroup are calculated, and a first preset threshold specific to each subgroup is determined. After assigning the first target population to the corresponding characteristic subgroup, the first cardiovascular risk score is classified using the specific threshold of that subgroup, outputting a matching first risk level. For example, in the target population subgroup with diabetes, the baseline risk of cardiovascular events is higher. The high-risk threshold for this subgroup can be appropriately lowered, making the risk level determination for this population more closely aligned with their underlying pathological state. This avoids ignoring individual differences in risk due to a uniform threshold, improving the individualization and accuracy of risk level classification.
[0063] The cardiovascular event prediction method of this application converts the first score of each phenotype category into a first cardiovascular risk score through preset scoring calculation rules, and then divides the first risk level according to a first preset threshold. This can convert the abstract phenotype score into an intuitive risk level, making the risk assessment results standardized and interpretable, and facilitating the formulation and implementation of clinical intervention strategies.
[0064] As another implementation of this application, the method may further include the following steps before classifying the first cardiovascular risk score according to the first preset threshold.
[0065] Obtain the first cardiovascular risk score of multiple object samples; perform statistical analysis on the multiple first cardiovascular risk scores to determine one or more quantiles of their statistical distribution; and determine one or more quantiles as a first preset threshold.
[0066] As an example, statistical distribution refers to the distribution characteristics of the first cardiovascular risk scores of multiple sample subjects in the numerical space, reflecting the overall value pattern and numerical dispersion of the scores. Quantiles refer to the numerical division points that divide the sorted first cardiovascular risk scores into several equal parts according to a predetermined ratio. They are key indicators characterizing the statistical distribution characteristics of the scores and can be used as a numerical reference for risk level classification.
[0067] Specifically, as an example, multi-dimensional physiological features of all object samples can be extracted first, and after standardization, they can be input into the trained phenotypic prediction model to obtain the phenotypic scores corresponding to each sample. According to the preset scoring calculation rules, the scores of high-risk phenotypic categories are linearly summed, while the scores of low-risk phenotypic categories are not included in the calculation, to obtain the first cardiovascular risk score for each object sample. All scores are initially verified, and extreme score values caused by data anomalies are removed. The valid scores are organized by sample number to form a structured first cardiovascular risk score dataset, laying the foundation for subsequent statistical analysis.
[0068] Specifically, as an example, all first cardiovascular risk scores in the structured dataset can be sorted in ascending order to form a continuous score value sequence. Based on the actual needs of clinical cardiovascular risk stratification, the calculation ratio of quantiles is determined, and linear interpolation is used to perform statistical calculations on the sorted score sequence to obtain the corresponding proportion of quantiles. If the risk needs to be divided into low, medium, and high levels, the ternary loci of the score statistical distribution are calculated. If more refined stratification is required, quartiles or quintiles can be calculated. During the calculation process, the original precision of the quantiles is preserved, and the characteristics of the score statistical distribution are verified. If there is a slight skewness in the distribution, no additional correction is required, and the calculation results can be used directly.
[0069] Specifically, as an example, the requirements can be set according to the level of clinical risk stratification. The quantiles obtained from statistical analysis can be directly assigned as the first preset threshold. If it is a three-level risk classification, the two tertiles can be used as the classification thresholds for low-to-medium risk and medium-to-high risk, respectively. The determined first preset threshold is solidified and entered into the risk stratification module of the cardiovascular risk prediction system, and set as the system's default judgment standard to ensure that the risk level judgment of all target subjects is based on a unified threshold. At the same time, the sample size, quantile calculation ratio and score distribution characteristics corresponding to the threshold are recorded to form traceability data for threshold determination, providing a reference for subsequent validity verification and adjustment of the threshold.
[0070] The cardiovascular event prediction method of this application performs statistical analysis on the first cardiovascular risk scores of multiple object samples and determines the quantile of its statistical distribution as a first preset threshold. This method can classify risk levels based on the risk patterns of a large sample, improve the objectivity and consistency of the threshold, and avoid the subjective bias of human demarcation.
[0071] The cardiovascular event prediction method of this application acquires multi-dimensional first physiological features, including at least sleep apnea monitoring parameters and clinical cardiovascular metabolic parameters, to achieve a comprehensive characterization of the sleep apnea and cardiovascular metabolic status of the target subject, laying the foundation for accurate identification of patient subgroups. Its preset phenotypic categories are not artificially divided, but rather formed by cluster analysis based on the multi-dimensional physiological features of multiple subject samples that have experienced cardiovascular events. This clustering process relies on the inherent structure of the sample data to achieve natural aggregation, while comprehensively considering all dimensions of sleep apnea monitoring parameters and clinical cardiovascular metabolic parameters. This effectively identifies patient subgroups with the same AHI but different physiological states and assigns them to different phenotypic categories, precisely solving the problem of different prognoses for the same AHI. Furthermore, through the linear transformation layer of the cardiovascular event prediction model, the multi-dimensional first physiological features are mapped to the first score corresponding to each phenotypic category, and then the first risk level is determined based on preset rules. This allows the risk assessment to fit the pathophysiological differences of different subgroups, effectively solving the problem of insufficient targeting and accuracy in risk prediction in existing technologies, and improving the reliability and clinical adaptability of cardiovascular event risk prediction.
[0072] As another implementation of this application, in order to provide a clinically meaningful classification basis for risk prediction, such as Figure 2 As shown, before S102, the method may also include the following steps.
[0073] S201, Obtain multi-dimensional physiological feature samples from multiple object samples, including sleep breathing monitoring parameter samples and clinical cardiovascular metabolic parameter samples.
[0074] As an example, the subject sample can be individuals diagnosed with sleep-disordered breathing and who have experienced cardiovascular events. They must meet the clinical diagnostic criteria for the disease and have no key missing data on relevant physiological characteristics. This is a basic research sample for constructing the sleep-cardiometabolic phenotype.
[0075] As an example, multidimensional physiological characteristic samples can be a set of quantitative indicators collected from object samples that can reflect the physiological state of sleep respiration and the basic level of cardiovascular metabolism in multiple dimensions, serving as the raw data for subsequent cluster analysis.
[0076] As an example, sleep breathing monitoring parameter samples can be quantitative indicators that characterize the breathing, blood oxygenation, and sleep structure characteristics of a subject sample during sleep. They are core data reflecting the pathological characteristics of sleep-disordered breathing.
[0077] As an example, clinical cardiovascular metabolic parameter samples can be quantitative indicators and clinical information samples that characterize the cardiovascular function, metabolic level and related medical history of the subject sample, and are key data reflecting the risk factors of cardiovascular events.
[0078] Specifically, as an example, we can first establish inclusion and exclusion criteria for the target samples, removing samples with missing data rates exceeding a preset threshold or those with serious interfering underlying diseases; collect parameters covering demographics and physical characteristics, blood pressure and comorbidities, and sleep monitoring, obtaining data from polysomnography devices, electronic medical record systems, and cardiovascular disease follow-up databases, while manually entering non-electronic information through standardized forms; complete data format verification, standardize indicator units and recording formats, mark missing values and outliers, and form a structured multi-dimensional physiological characteristic sample dataset.
[0079] S202, based on the multi-dimensional physiological characteristics of each object sample, calculate the feature space distance between any two object samples.
[0080] As an example, the feature space distance can be a quantitative indicator that represents the degree of feature difference between different samples after mapping the multi-dimensional physiological features of the object sample to the feature space. Its value is negatively correlated with the similarity of physiological features between samples; the smaller the value, the higher the feature similarity between samples.
[0081] Specifically, as an example, multi-dimensional physiological feature samples can be preprocessed to eliminate differences in units and data types. The preprocessed features are then transformed into feature vectors in the feature space. The Euclidean distance is used to calculate the feature space distance between any two samples. If there is a significant correlation between feature dimensions, Mahalanobis distance can be used instead. If the features are sparsely distributed, Manhattan distance can be used to ensure that the distance calculation closely matches the actual distribution of the features.
[0082] S203. Based on the feature space distance, all object samples are assigned to multiple sets to obtain multiple target sets.
[0083] As an example, a set can refer to a group of samples formed by classifying object samples whose feature space distances meet a preset similarity condition.
[0084] As an example, a target set can refer to the final sample classification result that achieves "high similarity of sample features within the set and significant differences in sample features between sets" after sample allocation is completed. Each target set corresponds to a group of object samples with similar sleep-cardiovascular metabolic physiological feature patterns.
[0085] Specifically, as an example, density clustering can be used to allocate object samples without pre-setting the number of sets, as sets can be naturally divided based on the feature density of the samples. First, a neighborhood distance threshold and a core sample density threshold are set in the feature space. The feature vectors of all object samples are traversed, and samples whose neighborhood numbers reach the density threshold are marked as core samples. Starting with a core sample, all samples within its neighborhood are grouped into a temporary set, and samples within the neighborhood of all samples in this temporary set are continuously added until no new samples can be added, forming a target set. The core sample identification and sample inclusion operations are repeated for unclassified samples until all object samples are allocated, ultimately resulting in multiple target sets. This method can effectively identify sample groups with different densities in the feature space, avoid sample classification bias caused by pre-setting the number of sets, and better reflect the actual physiological characteristic distribution of the object samples.
[0086] Specifically, as an example, see Figure 3 Principal component analysis (PCA) can be used to reduce the dimensionality of multidimensional physiological features to a two-dimensional space, allowing for the visualization of clustering results of cardiovascular risk-related phenotypes in patients with sleep apnea. The figure clearly presents three phenotypic clusters (C1, C2, and C3), identified by circles, triangles, and squares, respectively. Samples within each cluster exhibit similar characteristics, while significant differences exist between clusters. The cluster centers for each phenotypic cluster are also labeled, visually demonstrating the distribution and separation of different phenotypes in the feature space.
[0087] As another implementation of S203, S203 may also include the following steps.
[0088] Randomly select the same number of object samples as the preset set from the object samples, and set their multi-dimensional physiological characteristics as the initial set center points; according to the feature space distance between each object sample and each initial set center point, assign each object sample to the set with the smallest feature space distance; obtain the object samples contained in each set and recalculate the target center point of each set; based on the target center point, update the set affiliation of each object sample; if the preset first convergence condition is not met, return to obtain the object samples contained in each set and recalculate the target center point of each set until the first convergence condition is met, and obtain multiple target sets.
[0089] As an example, the preset set size can refer to the total number of sample cluster sets pre-defined according to the classification requirements of the sleep-cardiovascular metabolic phenotype, and is a basic parameter for cluster analysis. The initial set centroid can refer to the multi-dimensional physiological feature vector of a sample randomly selected from the object samples that matches the preset set size, serving as the feature reference point for each set in the initial stage of cluster analysis.
[0090] Specifically, as an example, we can first consider the cardiovascular risk stratification needs of patients with sleep apnea, such as setting the preset set size to 3; select preprocessed object samples as the basis, the preprocessing includes Z-score standardization of continuous physiological features and 0 / 1 encoding of discrete features to eliminate dimensional differences; randomly select 3 object samples with no missing data and complete features from the preprocessed samples, extract their multi-dimensional physiological feature standardized vectors including demographics, blood pressure comorbidities, and sleep monitoring, directly set this vector as the center point of each initial set, record the feature dimension values of each center point, and provide a data basis for subsequent distance calculation.
[0091] Specifically, as an example, all preprocessed object samples can be transformed into standardized multi-dimensional physiological feature vectors. Euclidean distance is used as the distance metric to calculate the feature space distance between each sample and the three initial set centroids. Multiple sets of distance values for a single sample are compared one by one, and the sample is assigned to the set with the smallest feature space distance. A unique identifier for each set is added to each sample, and a correspondence table between samples and sets is established to clearly record the sample affiliation results of the initial clustering, providing a basis for the recalculation of set centroids in the future.
[0092] As an example, the target center point can refer to the feature aggregation point obtained by statistical calculation based on the multi-dimensional physiological characteristics of all object samples in a certain set.
[0093] Specifically, as an example, standardized multi-dimensional physiological feature vectors of all object samples in each set can be extracted based on the correspondence table between samples and sets. For each feature dimension, the mean value of the feature values of all samples in the set in that dimension is calculated. The mean values of each feature dimension are combined in order to form a new feature vector, which is then used as the target center point of the corresponding set. The mean data of each dimension are retained during the calculation process to ensure that the target center point can accurately represent the overall pattern of physiological characteristics of the samples in the set, which meets the feature clustering requirements of the sleep-cardiovascular metabolic phenotype.
[0094] As an example, set affiliation update can refer to the process of replacing the initial center point with a recalculated target center point, re-determining the set to which the object sample belongs, and updating the correspondence between the sample and the set.
[0095] Specifically, as an example, the target centroids of each set can be used as new clustering reference points. Euclidean distance is still used to recalculate the feature space distance between each object sample and all target centroids. Following the principle of minimum distance, the set to which each object sample belongs is re-determined, and an updated sample-set correspondence table is generated. The sample set affiliation identifiers before and after the update are compared, and the number and number of samples whose affiliation has changed are counted. The changes in sample affiliation during the clustering iteration process are clearly recorded, providing core data for determining the first convergence condition.
[0096] As an example, the first convergence condition can refer to the quantitative standard for determining that cluster analysis has reached a stable state, and the clustering iteration terminates when this standard is met. The target set can refer to the final set of sample clusters obtained after the clustering iteration converges, where the sample characteristics within each set are highly similar and the sample characteristics between sets are significantly different.
[0097] Specifically, as an example, a dual first convergence condition can be preset: first, the rate of change in the sample set's affiliation is lower than a preset threshold; second, the number of clustering iterations reaches a fixed value of no less than 1000. After each calculation of the target centroid and update of the set affiliation, the proportion of the number of samples with changed affiliation to the total number of samples is calculated. If either first convergence condition is not met, the target centroid of each set is recalculated and the sample affiliation is updated, and the iteration continues. When the first convergence condition is met, the iteration terminates, and the three cluster sets obtained at this time are the target sets, each set corresponding to a type of sleep-cardiovascular metabolic phenotype.
[0098] The cardiovascular event prediction method of this application, by randomly selecting the initial set center point, allocating samples according to the minimum feature spatial distance, iteratively calculating the target center point and updating the sample assignment until the first convergence condition is met, can ensure that the sample feature similarity within the target set is high and the feature difference between sets is significant, thereby improving the stability and accuracy of phenotypic classification and avoiding clustering bias caused by random initial values.
[0099] S204, determine the corresponding phenotypic category based on the multidimensional physiological characteristics of each target set.
[0100] As an example, phenotypic categories can refer to classifications defined for each target set based on the common patterns and core features of multidimensional physiological characteristics. These classifications have clear sleep-cardiovascular metabolic pathophysiological representations.
[0101] Specifically, as an example, a comprehensive statistical analysis can be conducted on the physiological characteristic samples of each target set, calculating key statistics such as the mean, median, and percentage of positive features for each feature dimension, and extracting the core features of each set in sleep apnea monitoring and clinical cardiovascular metabolism. Based on these core features, each target set can be assigned a unique phenotypic category identifier, clearly defining the pathophysiological characteristics of sleep apnea and the distribution of cardiovascular metabolic risk factors corresponding to each phenotype, and solidifying the core feature statistics to form a reference standard for phenotypic category features. The effectiveness of the phenotypic categories can be validated by analyzing the association between different phenotypic categories and the types and frequencies of cardiovascular events. Phenotypic categories with unclear feature representations or no significant association with cardiovascular risk can be readjusted to ensure the clinical relevance and discriminative power of the phenotypic categories.
[0102] The cardiovascular event prediction method of this application obtains multi-dimensional physiological feature samples of multiple object samples that have experienced cardiovascular events, calculates the feature spatial distance and assigns the samples to multiple target sets, and then determines the phenotypic category based on the set features. It can extract phenotypic classifications directly related to cardiovascular events from the pathophysiological level, providing a clinically meaningful classification basis for subsequent risk prediction and enhancing the pathological correlation of risk assessment.
[0103] As another implementation of this application, such as Figure 4 As shown, the method may also include the following steps.
[0104] S401, acquire multiple training samples; among which, the training samples include the multi-dimensional third physiological features and phenotypic category labels of the third target object.
[0105] As an example, training samples can refer to labeled data units used to train a cardiovascular event prediction model. These units consist of standardized physiological features and corresponding phenotypic category labels, serving as the foundational data for model parameter optimization. The third target group can refer to individuals diagnosed with sleep-disordered breathing who are included in model training. Their physiological feature data is complete, and phenotypic category determination has been completed, providing a data source for training samples. Multidimensional third physiological features can refer to a standardized set of multidimensional physiological features collected from the third target group and preprocessed, which can be directly input into the model, covering sleep breathing monitoring and clinical cardiovascular metabolic indicators. Phenotypic category labels can refer to the phenotypic category identifiers corresponding to the third physiological features, determined through cluster analysis. These serve as supervised learning labels for model training, used to determine the accuracy of the model's prediction results.
[0106] Specifically, as an example, patients with sleep apnea without critical missing data are selected as the third target group. Multiple physiological characteristics are collected and preprocessed: continuous variables are standardized using Z-scores, and discrete variables are coded with 0 / 1 to obtain multi-dimensional third physiological characteristics. Phenotypic categories determined by cluster analysis are used as labels and matched with corresponding third physiological characteristics to form training samples. All training samples are divided into training and validation sets according to a preset ratio, and samples with abnormal features or incorrect labels are removed to ensure the validity of the sample set and provide reliable data support for model training.
[0107] S402 uses the linear transformation layer of the cardiovascular event prediction model to map multi-dimensional third physiological features into multiple numerical values representing preset phenotypic categories, thereby obtaining the third score corresponding to each phenotypic category.
[0108] As an example, the linear transformation layer can refer to the functional module in a cardiovascular event prediction model that implements linear mapping of features, transforming high-dimensional physiological features into low-dimensional values representing phenotypic categories through linear operations. The third score can refer to the linear value representing the degree to which a third target object belongs to each preset phenotypic category, obtained after mapping multi-dimensional third physiological features through the linear transformation layer.
[0109] Specifically, as an example, standardized multi-dimensional third physiological features can be transformed into fixed-dimensional feature vectors and input into the linear transformation layer of a cardiovascular event prediction model. The linear transformation layer initially loads randomly initialized weight matrices and bias vectors. Through linear operations of feature vector × weight matrix + bias vector, the high-dimensional feature vectors are mapped to low-dimensional values consistent with the number of preset phenotypic categories. Each value corresponds to a phenotypic category, and this value is the third score corresponding to each phenotypic category. The operation process ensures that the feature dimension matches the matrix dimension, avoiding errors in the calculated dimension.
[0110] S403 uses the normalized index layer of the cardiovascular event prediction model to convert the third score into the predicted probability of the third target object belonging to each phenotype category.
[0111] As an example, the normalized index layer can refer to the functional module in a cardiovascular event prediction model that performs score probability transformation, converting linear scores into probability values in the 0-1 range through normalization operations. The predicted probability can refer to the value obtained after normalizing the third score, representing the likelihood of the third target object belonging to each phenotype category; the sum of the predicted probabilities of all phenotype categories is 1.
[0112] Specifically, as an example, all the third scores output by the linear transformation layer can be input into the normalized exponential layer. This layer uses the Softmax function as the core operation function to exponentially and normalize each third score, transforming the linear score into a predicted probability between 0 and 1, so that the sum of the predicted probabilities of all phenotypic categories is 1. Through this transformation, the abstract linear score is converted into a numerical value with practical probabilistic meaning, quantifying the probability that the third target object belongs to each phenotypic category, and adapting to the subsequent supervised learning model training logic.
[0113] S404: Extract the target predicted probability corresponding to the phenotypic category label from the predicted probabilities of each phenotypic category.
[0114] As an example, the target prediction probability can refer to the predicted probability corresponding to the phenotypic category label of the third target object in the training samples. It is a core value that reflects the accuracy of the model in predicting the true phenotypic category and provides the basis for constructing the likelihood function.
[0115] Specifically, as an example, a unique index identifier is assigned to each preset phenotypic category, establishing a one-to-one index relationship between phenotypic category labels and predicted probabilities. For a single training sample, based on its labeled phenotypic category label, the corresponding probability value is extracted from all predicted probabilities output by the normalized exponential layer through index matching, and this probability value is determined as the target predicted probability. During the extraction process, an index mapping table between labels and probabilities is established to avoid extraction errors and ensure that the target predicted probability of each training sample accurately corresponds to the true phenotypic category label.
[0116] S405, construct the likelihood function based on the target prediction probability.
[0117] As an example, the likelihood function can refer to a function constructed based on the target prediction probability of training samples, which characterizes the reasonableness of the model's prediction of the true phenotype category under the current parameters. The magnitude of the function value is positively correlated with the model's prediction accuracy and is an important basis for optimizing model parameters.
[0118] Specifically, as an example, based on the independent and identically distributed assumption, the prediction processes of each training sample are assumed to be independent of each other. The target prediction probabilities of all training samples are multiplied together to construct the overall likelihood function of the model. The larger the function value, the better the model's current parameters fit the real data. To reduce the computational complexity of the multiplication operation and avoid numerical underflow, the natural logarithm of the overall likelihood function is taken, transforming the multiplication operation into a summation operation to obtain the log-likelihood function. Subsequent model training uses this log-likelihood function as the optimization object, simplifying the parameter solution process.
[0119] S406. By maximizing the likelihood function, the parameters of the cardiovascular event prediction model are adjusted until the second convergence condition is met, thus obtaining the trained cardiovascular event prediction model.
[0120] As an example, the second convergence condition can refer to a quantitative standard for determining whether a model has reached a stable state. When this standard is met, the iteration of model parameters terminates, ensuring the model's fitting effect and generalization ability. A trained cardiovascular event prediction model can refer to a cardiovascular event prediction model that has reached the maximum value of the likelihood function through iterative optimization of parameters and satisfies the second convergence condition. Its parameters are fixed and can be directly used for phenotypic prediction of new samples.
[0121] Specifically, as an example, the log-likelihood function is maximized. By iteratively calculating the gradient of the function, the weight matrix and bias vector of the linear transformation layer are gradually adjusted to continuously increase the log-likelihood function value. A dual second convergence condition is preset: first, the iterative change in the log-likelihood function is less than a preset threshold; second, the number of model iterations reaches a preset value. During training, a validation set is used to monitor model performance and avoid overfitting. When the model satisfies either second convergence condition, parameter iteration is terminated, and the model parameters at this point are saved, resulting in a trained cardiovascular event prediction model that can be directly used for feature mapping and scoring output for new target objects.
[0122] The cardiovascular event prediction method of this application, based on training samples with phenotypic category labels, obtains the prediction probability using a linear transformation layer and a normalized exponential layer, constructs and maximizes the likelihood function to optimize the model parameters until the second convergence condition is met. This enables the model to accurately learn the correlation between physiological characteristics and phenotypic categories, improves the accuracy and generalization ability of predicting phenotypic patterns of new samples, and ensures the reliability of the model.
[0123] As another implementation of this application, the sleep breathing monitoring parameters include at least one of the following: apnea-hypopnea index, lowest blood oxygen saturation, percentage of time when blood oxygen saturation is below a second preset threshold, arousal index, total sleep time, and percentage of REM sleep; the clinical cardiovascular metabolic parameters include at least one of the following: age, gender, body mass index, waist circumference, systolic blood pressure, diastolic blood pressure, history of hypertension, history of diabetes, and history of dyslipidemia.
[0124] Specifically, as an example, a second preset threshold of 90% for blood oxygen saturation can be set as the critical standard for determining hypoxia. Through polysomnography or portable sleep monitoring devices, overnight sleep monitoring is conducted on the target subject, simultaneously collecting raw data for six parameters, including the apnea-hypopnea index and lowest blood oxygen saturation. The device automatically performs quantitative calculations of these indicators. In practical applications, at least one parameter can be selected for model input based on the monitoring device's functionality and clinical needs. If the device supports multi-parameter acquisition, multiple parameters can be combined. After acquisition, the data format is calibrated, and the units of measurement for indicators such as time and percentage are standardized to ensure data standardization.
[0125] Specifically, as an example, quantitative indicators such as age, systolic blood pressure, and diastolic blood pressure can be collected through hospital electronic medical record systems and clinical examination databases, with the system directly extracting the raw values obtained from clinical testing. Body mass index and waist circumference are obtained through on-site measurement during physical examinations, ensuring the accuracy of the values. History of hypertension, diabetes, and dyslipidemia is confirmed using standardized medical history collection forms combined with diagnostic records in electronic medical records, and converted into quantitative information using 0 / 1 coding. Gender is recorded using a binary classification identifier, while age, physical condition, and blood pressure indicators retain their original continuous values. In practical applications, at least one parameter can be selected and included based on the availability of clinical data, taking into account demographic, physical, blood pressure, and medical history indicators to ensure a comprehensive representation of cardiovascular metabolic status.
[0126] The cardiovascular event prediction method of this application defines the specific scope of sleep breathing monitoring parameters and clinical cardiovascular metabolic parameters, limits the selection range of multi-dimensional physiological features, ensures the comprehensiveness and pertinence of feature selection, and makes the cardiovascular event prediction method have clear operability and repeatability in clinical applications, thereby improving the feasibility of technology implementation.
[0127] It should be noted that the embodiments of this application can realize at least three types of specific applications. In offline model construction, sample variables meeting the inclusion criteria are collected and preprocessed. After training with K-means clustering and multinomial logistic regression models, model parameters are extracted and solidified. Simultaneously, the ternary locus stratification threshold is determined based on the risk score distribution of the training set. In online clinical applications, the model and thresholds are integrated into the hospital information system backend. After a new patient completes the examination, the system automatically retrieves their relevant variables, calculates the phenotypic score and composite risk score after standardization, completes risk stratification, and displays the results on the doctor's workstation interface. When extended to a follow-up management platform, variables from multiple patient follow-ups are input into the system one by one. The system calculates the risk score and level at different time points, realizing a visual presentation of the patient's cardiovascular risk trajectory. If the patient's risk score continues to rise and crosses a preset threshold, the system will automatically issue an early warning, facilitating timely intervention by medical staff. The system result output module will output the target object's phenotypic category, composite risk score value, and cardiovascular risk stratification results in a structured manner, while also outputting optional explanatory summaries, such as... Figure 5 As shown, descriptive prompts such as hypertension combined with severe hypoxia or obesity combined with severe hypoxia, along with corresponding individualized clinical recommendations, can serve as a reference for the development of clinical diagnosis and follow-up strategies. Furthermore, the results are compatible with various interfaces, including hospital information systems, standalone software, and mobile terminals.
[0128] The embodiments of this application can be deployed in various scenarios such as hospital clinical workstations, telemedicine and chronic disease management systems, wearable devices and sleep APP backends, medical device software, scientific research tools and big data platforms, and commercial health management. According to the actual needs of different scenarios, it can realize accurate assessment, large-scale screening and long-term follow-up management of cardiovascular risk for patients with sleep-disordered breathing.
[0129] Based on the cardiovascular event prediction method provided in the above embodiments, this application also provides specific implementation methods of the cardiovascular event prediction device. Please refer to the following embodiments.
[0130] First see Figure 6 The cardiovascular event prediction device 60 provided in this application embodiment includes the following modules: The feature acquisition module 601 is used to acquire the multi-dimensional first physiological features of the first target object, the first physiological features including at least sleep breathing monitoring parameters and clinical cardiovascular metabolic parameters. The feature mapping module 602 is used to map multi-dimensional first physiological features into multiple numerical values representing preset phenotypic categories through the linear transformation layer of the cardiovascular event prediction model, thereby obtaining the first score corresponding to each phenotypic category; wherein, the preset phenotypic categories are obtained by clustering analysis of multi-dimensional physiological features of multiple object samples; the multiple object samples are multiple objects that have experienced cardiovascular events. The risk determination module 603 is used to determine the first risk level of cardiovascular events in the first target subject based on the first score corresponding to each phenotype category and through preset risk level determination rules.
[0131] In some embodiments, the cardiovascular event prediction device 60 may further include the following modules: The sample acquisition module is used to acquire multi-dimensional physiological feature samples of multiple object samples, including sleep breathing monitoring parameter samples and clinical cardiovascular metabolic parameter samples. The distance calculation module is used to calculate the feature space distance between any two object samples based on the multi-dimensional physiological characteristics of each object sample. The sample allocation module is used to allocate all object samples to multiple sets based on the feature space distance, thus obtaining multiple target sets; The phenotypic determination module is used to determine the corresponding phenotypic category based on the multidimensional physiological characteristics of each target set.
[0132] In some embodiments, the sample allocation module includes: The initial setting module is used to randomly select the same number of object samples as the preset set from the object samples and set their multi-dimensional physiological characteristics as the initial set center point; The sample classification module is used to assign each object sample to the set with the smallest feature space distance based on the feature space distance between each object sample and the center point of each initial set. The central calculation module is used to obtain the object samples contained in each set and recalculate the target center point of each set; The attribution update module is used to update the set attribution of each object sample based on the target centroid. The iterative convergence module is used to retrieve the object samples contained in each set if the preset first convergence condition is not met, recalculate the target center point of each set, and continue until the first convergence condition is met to obtain multiple target sets.
[0133] In some embodiments, the risk determination module includes: The scoring calculation module is used to calculate the first cardiovascular risk score of the target subject based on the first score corresponding to each phenotype category and according to the preset scoring calculation rules. The risk classification module is used to classify the first cardiovascular risk score according to the first preset threshold and output the first risk level of the first target object.
[0134] In some embodiments, the cardiovascular event prediction device 60 may further include the following modules: The scoring acquisition module is used to acquire the first cardiovascular risk scores of multiple object samples before dividing the first cardiovascular risk score according to the first preset threshold. The statistical analysis module is used to perform statistical analysis on multiple first cardiovascular risk scores to determine one or more quantiles of their statistical distribution; The threshold determination module is used to determine one or more quantiles as a first preset threshold.
[0135] In some embodiments, the feature acquisition module includes: The original acquisition module is used to acquire the multi-dimensional second physiological characteristics of the first target object; The data processing module is used to normalize the continuous variables in the second physiological characteristic to obtain normalized continuous variables, and to encode the discrete variables to obtain coded data of the discrete variables. The feature determination module is used to take the coded data of normalized continuous and discrete variables as the first physiological feature.
[0136] In some embodiments, the cardiovascular event prediction device 60 may further include the following modules: The training acquisition module is used to acquire multiple training samples; among them, the training samples include the multi-dimensional third physiological features and phenotypic category labels of the third target object; The third mapping module is used to map multi-dimensional third physiological features into multiple numerical values representing preset phenotypic categories through the linear transformation layer of the cardiovascular event prediction model, so as to obtain the third score corresponding to each phenotypic category. The probability transformation module is used to convert the third score into the predicted probability of the third target object belonging to each phenotype category through the normalized index layer of the cardiovascular event prediction model. The probability extraction module is used to extract the target predicted probability corresponding to the phenotypic category label from the predicted probabilities of each phenotypic category. The function building module is used to construct the likelihood function based on the predicted probability of the target. The model training module is used to adjust the parameters of the cardiovascular event prediction model by maximizing the likelihood function until the second convergence condition is met, thereby obtaining the trained cardiovascular event prediction model.
[0137] In some embodiments, the sleep breathing monitoring parameters of the cardiovascular event prediction device 60 include at least one of the following: apnea-hypopnea index, lowest blood oxygen saturation, percentage of time blood oxygen saturation below a second preset threshold, arousal index, total sleep time, and percentage of REM sleep; and the clinical cardiovascular metabolic parameters include at least one of the following: age, sex, body mass index, waist circumference, systolic blood pressure, diastolic blood pressure, history of hypertension, history of diabetes, and history of dyslipidemia.
[0138] Figure 7 A schematic diagram of the hardware structure of the electronic device provided in an embodiment of this application is shown.
[0139] The electronic device may include a processor 701 and a memory 702 storing computer program instructions.
[0140] Specifically, the processor 701 may include a central processing unit (CPU), an application-specific integrated circuit (ASIC), or one or more integrated circuits that can be configured to implement the embodiments of this application.
[0141] Memory 702 may include mass storage for data or instructions. For example, and not limitingly, memory 702 may include a hard disk drive (HDD), floppy disk drive, flash memory, optical disk, magneto-optical disk, magnetic tape, or Universal Serial Bus (USB) drive, or a combination of two or more of these. Where appropriate, memory 702 may include removable or non-removable (or fixed) media. Where appropriate, memory 702 may be internal or external to the integrated gateway disaster recovery device. In a particular embodiment, memory 702 is non-volatile solid-state memory.
[0142] Memory may include read-only memory (ROM), random access memory (RAM), disk storage media devices, optical storage media devices, flash memory devices, and electrical, optical, or other physical / tangible memory storage devices. Therefore, typically, memory includes one or more tangible (non-transitory) computer-readable storage media (e.g., memory devices) encoded with software including computer-executable instructions, and when the software is executed (e.g., by one or more processors), it is operable to perform the operations described with reference to the methods according to one aspect of this disclosure.
[0143] The processor 701 reads and executes computer program instructions stored in the memory 702 to implement any of the cardiovascular event prediction methods in the above embodiments.
[0144] In one example, the electronic device may also include a communication interface 703 and a bus 710. For example, Figure 7 As shown, the processor 701, memory 702, and communication interface 703 are connected through bus 710 and complete communication with each other.
[0145] The communication interface 703 is mainly used to realize communication between various modules, devices, units and / or equipment in the embodiments of this application.
[0146] Bus 710 includes hardware, software, or both, that couples components of an electronic device together. For example, and not limitingly, the bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), HyperTransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an Infinite Bandwidth Interconnect, a Low Pin Count (LPC) bus, a memory bus, a Microchannel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a Video Electronics Standards Association Local (VLB) bus, or other suitable buses, or combinations of two or more of these. Where appropriate, bus 710 may include one or more buses. Although specific buses are described and illustrated in embodiments of this application, this application contemplates any suitable bus or interconnect.
[0147] Furthermore, in conjunction with the cardiovascular event prediction methods in the above embodiments, this application embodiment can provide a computer storage medium for implementation. The computer storage medium stores computer program instructions; when these computer program instructions are executed by a processor, they implement any of the cardiovascular event prediction methods in the above embodiments.
[0148] This application also provides a computer program product, including a computer program that, when executed by a processor, implements any of the cardiovascular event prediction methods described in the above embodiments.
[0149] It should be clarified that this application is not limited to the specific configurations and processes described above and shown in the figures. For the sake of brevity, detailed descriptions of known methods are omitted here. In the above embodiments, several specific steps are described and shown as examples. However, the method process of this application is not limited to the specific steps described and shown. Those skilled in the art can make various changes, modifications, and additions, or change the order of steps, after understanding the spirit of this application.
[0150] The functional blocks shown in the above block diagram can be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, they can be, for example, electronic circuits, application-specific integrated circuits (ASICs), appropriate firmware, plug-ins, function cards, etc. When implemented in software, the elements of this application are programs or code segments used to perform the required tasks. Programs or code segments can be stored on a machine-readable medium or transmitted over a transmission medium or communication link via data signals carried on a carrier wave. "Machine-readable medium" can include any medium capable of storing or transmitting information. Examples of machine-readable media include electronic circuits, semiconductor memory devices, ROM, flash memory, erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, radio frequency (RF) links, etc. Code segments can be downloaded via computer networks such as the Internet, intranets, etc.
[0151] It should also be noted that the exemplary embodiments mentioned in this application describe methods or systems based on a series of steps or apparatus. However, this application is not limited to the order of the above steps; that is, the steps can be performed in the order mentioned in the embodiments, or in a different order, or several steps can be performed simultaneously.
[0152] The aspects of this disclosure have been described above with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this disclosure. It should be understood that each block in the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus to produce a machine such that these instructions, executable via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions / actions specified in one or more blocks of the flowchart illustrations and / or block diagrams. Such a processor can be, but is not limited to, a general-purpose processor, a special-purpose processor, a special application processor, or a field-programmable logic circuit. It is also understood that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can also be implemented by special-purpose hardware performing the specified functions or actions, or can be implemented by a combination of special-purpose hardware and computer instructions.
[0153] The above are merely specific embodiments of this application. Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems, modules, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here. It should be understood that the protection scope of this application is not limited thereto. Any person skilled in the art can easily conceive of various equivalent modifications or substitutions within the technical scope disclosed in this application, and these modifications or substitutions should all be covered within the protection scope of this application.
Claims
1. A method for predicting cardiovascular events, characterized in that, include: Acquire multi-dimensional first physiological characteristics of the first target object, wherein the first physiological characteristics include at least sleep breathing monitoring parameters and clinical cardiovascular metabolic parameters; The multidimensional first physiological feature is mapped to multiple numerical values representing preset phenotypic categories through the linear transformation layer of the cardiovascular event prediction model, thereby obtaining the first score corresponding to each phenotypic category; wherein, the preset phenotypic category is obtained by cluster analysis of the multidimensional physiological features of multiple object samples; the multiple object samples are multiple objects that have experienced cardiovascular events. Based on the first score corresponding to each phenotype category, the first risk level of the first target subject for cardiovascular events is determined by a preset risk level determination rule.
2. The method according to claim 1, characterized in that, Before mapping the multi-dimensional first physiological feature to multiple numerical values representing preset phenotypic categories through the linear transformation layer of the cardiovascular event prediction model to obtain the first score corresponding to each phenotypic category, the method further includes: Obtain multi-dimensional physiological feature samples of the multiple object samples, including sleep breathing monitoring parameter samples and clinical cardiovascular metabolic parameter samples; Based on the multi-dimensional physiological characteristics of each object sample, the feature space distance between any two object samples is calculated. Based on the feature space distance, all object samples are assigned to multiple sets to obtain multiple target sets; The corresponding phenotypic category is determined based on the multidimensional physiological characteristics of each target set.
3. The method according to claim 2, characterized in that, The step of allocating all object samples to multiple sets based on the feature space distance to obtain multiple target sets includes: Randomly select the same number of object samples as the preset set from the object samples, and set their multi-dimensional physiological characteristics as the initial set center point; Based on the feature space distance between each object sample and the center point of each initial set, each object sample is assigned to the set with the smallest feature space distance; Obtain the object samples contained in each set and recalculate the target center point of each set; Based on the target center point, update the set affiliation of each object sample; If the preset first convergence condition is not met, return to obtain the object samples contained in each set, recalculate the target center point of each set, until the first convergence condition is met, and obtain the multiple target sets.
4. The method according to claim 1, characterized in that, The step of determining the first risk level of the first target object based on the first score corresponding to each phenotypic category and through a preset risk level determination rule includes: Based on the first score corresponding to each phenotype category, the first cardiovascular risk score of the target object is calculated according to the preset scoring calculation rules. The first cardiovascular risk score is divided according to the first preset threshold, and the first risk level of the first target object is output.
5. The method according to claim 4, characterized in that, Before classifying the first cardiovascular risk score according to a first preset threshold, the method further includes: Obtain the first cardiovascular risk score from multiple object samples; Perform statistical analysis on multiple first cardiovascular risk scores to determine one or more quantiles of their statistical distribution; The one or more quantiles are determined as the first preset threshold.
6. The method according to claim 1, characterized in that, The acquisition of the multi-dimensional first physiological features of the first target object includes: Obtain multi-dimensional secondary physiological characteristics of the first target object; The continuous variables in the second physiological characteristic are normalized to obtain normalized continuous variables, and the discrete variables are encoded to obtain encoded data of discrete variables. The encoded data of the normalized continuous variables and the discrete variables are used as the first physiological feature.
7. The method according to claim 1, characterized in that, The method further includes: Multiple training samples are obtained; wherein, the training samples include multi-dimensional third physiological features and phenotypic category labels of the third target object; The multidimensional third physiological feature is mapped to multiple numerical values representing preset phenotypic categories through the linear transformation layer of the cardiovascular event prediction model, thereby obtaining the third score corresponding to each phenotypic category. The third score is converted into the predicted probability of the third target object belonging to each phenotype category through the normalized index layer of the cardiovascular event prediction model. Extract the target predicted probability corresponding to the phenotypic category label from the predicted probabilities of each phenotypic category; Construct a likelihood function based on the predicted probability of the target; By maximizing the likelihood function, the parameters of the cardiovascular event prediction model are adjusted until the second convergence condition is met, thus obtaining the trained cardiovascular event prediction model.
8. The method according to any one of claims 1-7, characterized in that, The sleep breathing monitoring parameters include at least one of the following: apnea-hypopnea index, lowest blood oxygen saturation, percentage of time blood oxygen saturation below a second preset threshold, arousal index, total sleep time, and percentage of REM sleep; the clinical cardiovascular metabolic parameters include at least one of the following: age, gender, body mass index, waist circumference, systolic blood pressure, diastolic blood pressure, history of hypertension, history of diabetes, and history of dyslipidemia.
9. A cardiovascular event prediction device, characterized in that, The device includes: The feature acquisition module is used to acquire multi-dimensional first physiological features of the first target object, wherein the first physiological features include at least sleep breathing monitoring parameters and clinical cardiovascular metabolic parameters. The feature mapping module is used to map the multi-dimensional first physiological feature into multiple numerical values representing preset phenotypic categories through the linear transformation layer of the cardiovascular event prediction model, thereby obtaining a first score corresponding to each phenotypic category; wherein, the preset phenotypic category is obtained by clustering analysis of the multi-dimensional physiological features of multiple object samples; the multiple object samples are multiple objects that have experienced cardiovascular events. The risk determination module is used to determine the first risk level of the first target object for cardiovascular events based on the first score corresponding to each phenotype category and through preset risk level determination rules.
10. An electronic device, characterized in that, The device includes: a processor and a memory storing computer program instructions; When the processor executes the computer program instructions, it implements the cardiovascular event prediction method as described in any one of claims 1-8.
11. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer program instructions that, when executed by a processor, implement the cardiovascular event prediction method as described in any one of claims 1-8.
12. A computer program product, characterized in that, When the instructions in the computer program product are executed by the processor of the electronic device, the electronic device causes the electronic device to perform the cardiovascular event prediction method as described in any one of claims 1-8.