Methods and systems for preventing cardiometabolic diseases
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- VALIDAE HEALTH LP
- Filing Date
- 2025-11-14
- Publication Date
- 2026-06-25
AI Technical Summary
Conventional risk assessment approaches for cardiovascular disease rely on static measurements, failing to account for dynamic trajectories of modifiable causes and personalized risk factors, leading to ineffective preventive strategies.
Utilizing machine learning models to process comprehensive cardiometabolic health data, estimate dynamic biomarker trajectories, and evaluate therapeutic interventions' effects on predicted risk trajectories, enabling personalized treatment strategies tailored to individual risk profiles.
Enables the identification of optimal therapeutic interventions to reduce cardiovascular disease risk by targeting modifiable causes, improving preventive medicine's effectiveness and personalizing insurance pricing based on individual risk profiles.
Smart Images

Figure US2025055639_25062026_PF_FP_ABST
Abstract
Description
[0001] METHODS AND SYSTEMS FOR PREVENTING CARDIOMETABOLIC DISEASES
[0002] CROSS-REFERENCE TO RELATED APPLICATIONS
[0003] This application claims the benefit of priority under 35 U. S. C. 119 to U. S. Provisional Application No.: 63 / 721,246, titled “Methods and Systems for Preventing Cardiometabolic Diseases”, filed on November 15, 2024, which is incorporated by reference in its entirety herein.
[0004] FIELD
[0005] Aspects of the present disclosure relate to methods and systems for identifying one or more intervention for an individual in furtherance of preventing development of cardiometabolic disease, such as cardiovascular disease (e.g., atherosclerotic cardiovascular disease (ASCVD)) and diabetes (e.g., type 2 diabetes).
[0006] BACKGROUND
[0007] Personalized preventive medicine focuses on tailoring preventive strategies to individual patients based on their unique risk factors, genetics, lifestyle, and environmental influences. A personalized approach to preventative medicine aims to prevent diseases before illnesses develop, often through customized interventions.
[0008] SUMMARY
[0009] The present technology relates to computer-implemented methods and systems for personalized cardiovascular disease risk assessment and therapeutic intervention selection using machine learning models. The disclosed methods comprise obtaining comprehensive cardiometabolic health data for a subject, including clinical characteristics, physical measurements, and biochemical markers, and processing this data through trained machine learning models to generate predictive biomarker trajectories. Unlike conventional risk assessment approaches that rely on static measurements, the present technology estimates dynamic trajectories of modifiable causes of disease, such as low-density lipoprotein (LDL) cholesterol and systolic blood pressure (SBP), and other key cardiovascular risk factors across multiple time intervals throughout a subject’s lifetime.
[0010] The methods further comprise determining personalized cardiovascular disease risk measures by processing the predicted biomarker trajectories through trained survival models that utilize cumulative exposure metrics as intervals of follow-up. The potential benefit of various therapeutic interventions are then evaluated by modeling their effects on the subject’s predicted risk trajectory, enabling the identification of optimal treatment strategies tailored to individual risk profiles. The disclosed methods may be applied to various therapeutic modalities, including pharmacological interventions, DNA-based therapeutics, RNA-based therapeutics, and protein-based therapeutics. Additionally, the technology provides applications in insurance underwriting, where the personalized risk assessments may be utilized -1- #14604877vl to determine appropriate pricing for insurance instruments based on individualized cardiovascular disease risk profiles rather than population-based actuarial models.
[0011] Some aspects of the technology provide a method for identifying an intervention for a subject in furtherance of preventing development of cardiovascular disease in the subject, the method comprising: using at least one computer hardware processor to perform: obtaining cardiometabolic health data for the subject; determining, using at least some of the cardiometabolic health data, a first trained machine learning (ML) model and for each of multiple time intervals, one or more measures of risk that the subject develops cardiovascular disease to obtain multiple measures of risk corresponding to the multiple time intervals; determining, using the multiple measures of risk that the subject develops cardiovascular disease and at least one second trained ML model, benefit of administering to the subject one or more therapeutic interventions designed to reduce risk of cardiovascular disease by targeting one or more modifiable causes of the cardiovascular disease; and identifying, using the determined benefit of administering the one or more therapeutic interventions, at least one therapeutic intervention to recommend being administered to the subject.
[0012] Other aspects of the technology provide a method for identifying an intervention for a subject in furtherance of preventing development of cardiovascular disease in the subject, the method comprising: using at least one computer hardware processor to perform: obtaining cardiometabolic health data for the subject; determining, using at least some of the cardiometabolic health data, a survival model with cumulative LDL exposure as interval of follow-up, and for each of multiple time intervals, one or more measures of risk that the subject develops cardiovascular disease to obtain multiple measures of risk corresponding to the multiple time intervals; determining, using the multiple measures of risk that the subject develops cardiovascular disease, benefit of administering to the subject one or more therapeutic interventions designed to reduce risk of cardiovascular disease by targeting one or more modifiable causes of the cardiovascular disease; and identifying, using the determined benefit of administering the one or more therapeutic interventions, at least one therapeutic intervention to recommend to be administered to the subject.
[0013] Yet other aspects of the technology provide a method for identifying an intervention for a subject in furtherance of preventing development of cardiovascular disease in the subject, the method comprising: using at least one computer hardware processor to perform: obtaining cardiometabolic health data for the subject, the cardiometabolic health data comprising clinical, physical, and biochemical measurements of the subject, the obtaining further comprising estimating a cumulative LDL exposure trajectory for the subject using a trained ML model and the physical and biochemical measurements; determining, using at least some of the cardiometabolic health data including the cumulative LDL exposure trajectory and for each of multiple time intervals, one or more measures of risk that the subject develops cardiovascular disease to obtain multiple measures of risk corresponding to the multiple time intervals; determining, using the multiple measures of risk that the subject develops cardiovascular disease, benefit of administering to the subject one or more therapeutic interventions designed to reduce -2- #14604877vl risk of cardiovascular disease by targeting one or more modifiable causes of the cardiovascular disease; and identifying, using the determined benefit of administering the one or more therapeutic interventions, at least one therapeutic intervention to recommend to be administered to the subject.
[0014] Additional aspects of the technology provide a method of estimating a biomarker for a subject in furtherance of preventing development of cardiovascular disease in the subject, the method comprising: obtaining cardiometabolic health data for the subject, the cardiometabolic health data comprising subject characteristic and / or measurement data comprising: one or more values for one or more clinical characteristics of the subject, one or more values for one or more physical measurements of the subject, and / or one or more values for one or biochemical measurements of the subject; estimating, using a trained machine learning (ML) model and the subject characteristic and / or measurement data, an LDL level trajectory for the subject, wherein the LDL level trajectory for the subject comprises an estimated LDL level for the subject for each of multiple prior ages of the subject and multiple future ages of the subject; and determining a cumulative LDL exposure trajectory for the subject as the biomarker for use in identifying a therapeutic intervention for the subject in furtherance of preventing development of cardiovascular disease in the subject.
[0015] Further aspects of the technology provide a method of estimating a biomarker for a subject in furtherance of preventing development of cardiovascular disease in the subject, the method comprising: obtaining cardiometabolic health data for the subject, the cardiometabolic health data comprising subject characteristic and / or measurement data comprising: one or more values for one or more clinical characteristics of the subject, one or more values for one or more physical measurements of the subject, and / or one or more values for one or biochemical measurements of the subject; estimating, using a trained machine learning (ML) model and the subject characteristic and / or measurement data, an SBP level trajectory for the subject, wherein the SBP level trajectory for the subject comprises an estimated SBP level for the subject for each of multiple prior ages of the subject and multiple future ages of the subject; and determining a cumulative SBP exposure trajectory for the subject as the biomarker for use in identifying a therapeutic intervention for the subject in furtherance of preventing development of cardiovascular disease in the subject.
[0016] Some aspects of the technology provide a method of estimating a biomarker for a subject in furtherance of preventing development of cardiovascular disease in the subject, the method comprising: obtaining cardiometabolic health data for the subject, the cardiometabolic health data comprising subject characteristic and / or measurement data comprising: one or more values for one or more clinical characteristics of the subject, one or more values for one or more physical measurements of the subject, and / or one or more values for one or biochemical measurements of the subject; estimating, using a trained machine learning (ML) model and the subject characteristic and / or measurement data, an Lp(a) level trajectory for the subject, wherein the Lp(a) level trajectory for the subject comprises an estimated Lp(a) level for the subject for each of multiple prior ages of the subject and multiple future ages of the subject; and determining a cumulative Lp(a) exposure trajectory for the subject as the biomarker for use -3- #14604877vl in identifying a therapeutic intervention for the subject in furtherance of preventing development of cardiovascular disease in the subject.
[0017] Other aspects of the technology provide a method of determining one or more measures of risk that a subject develops cardiovascular disease, the method comprising, for each of multiple time intervals: using at least one computer hardware processor to perform: (a) estimating, using a trained cardiovascular risk prediction machine learning model and cardiometabolic health data for the subject including a cumulative LDL exposure trajectory for the subject, values indicative of log hazard ratios for risk of the subject having a cardiovascular event at respective levels of cumulative LDL exposure, wherein the cardiovascular risk prediction machine learning model has been trained using training data comprising, for each of a plurality of participants enrolled in one or more prospective studies, multiple LDL measurements along with a recorded age or date at which a first cardiovascular event occurred, optionally wherein the first cardiovascular event is a first episode of a fatal or non-fatal myocardial infarction (MI), fatal or non-fatal ischemic stroke, or coronary revascularization; and (b) estimating, using the values indicative of the log hazard ratios, (i) absolute instantaneous hazard rates of the subject having a cardiovascular event at the respective levels of cumulative LDL exposure, and (ii) cumulative lifetime risks of the subject having a cardiovascular event at the respective levels of cumulative LDL exposure.
[0018] Yet other aspects of the technology provide a method of determining an expected proportional reduction and / or absolute reduction in risk of cardiovascular events for a subject in response to a particular therapeutic intervention sequence, the particular therapeutic sequence indicating magnitude, duration, and timing of one or more interventions associated with a reduction of LDL level and / or SBP level over an interval of follow up for the subject, the method comprising: using at least one computer hardware processor to perform: determining multiple measures of risk of cardiovascular events for the subject comprising absolute instantaneous hazard rates, cumulative hazard rates, and cumulative event rates of the subject having a cardiovascular event at respective ones of multiple time intervals, wherein the cumulative hazard rates and the cumulative event rates are not adjusted for the particular therapeutic intervention sequence, and determining the expected proportional reduction and / or the absolute reduction in the risk of cardiovascular events for the subject in response to the particular therapeutic intervention sequence using a method that comprises: for each particular interval of follow-up for the subject from the subject’s current age to an upper threshold age, (i) determining, using the multiple measures of risk, a predicted instantaneous hazard rate of the subject having a cardiovascular event at the particular interval of follow up; (ii) determining, using a benefit prediction machine learning model, a time-averaged instantaneous log hazard ratio for a one unit lower LDL or SBP corresponding to duration of treatment at the particular interval of follow-up for the subject; (iii) determining an intervention-adjusted instantaneous hazard for the particular interval of follow-up by multiplying the instantaneous hazard rate of the subject determined at (a)(i) with the time-averaged instantaneous log hazard ratio determined at (a)(ii), thereby obtaining multiple intervention-adjusted instantaneous hazards for intervals of follow-up -4- #14604877vl evaluated at (a); determining, using the multiple intervention-adjusted instantaneous hazards, intervention-adjusted cumulative hazard rates and cumulative event rates of cardiovascular events for intervals of follow-up for the subject from the subject’s current age to the upper threshold age; and determining predicted proportional reductions in the risk of experiencing a cardiovascular event as ratios of the intervention-adjusted cumulated hazard rates and the cumulative hazard rates that are not adjusted for the particular therapeutic intervention sequence.
[0019] Additional aspects of the technology provide a method for pricing an insurance instrument for a subject based, the method comprising: using at least one computer hardware processor to perform: obtaining cardiometabolic health data for the subject; determining, using at least some of the cardiometabolic health data and for each of multiple time intervals, one or more measures of risk that the subject develops a disease to obtain multiple measures of risk corresponding to the multiple time intervals; and pricing the insurance instrument for the subject based on the multiple measures of risk corresponding to the multiple time intervals.
[0020] Further aspects of the technology provide a system, comprising: at least one computer hardware processor; and at least one non-transitory computer readable storage medium storing software comprising: a cardiometabolic health data module comprising processor-executable instructions that, when executed by at least one computer hardware processor, cause at least one computer hardware processor to perform obtaining cardiometabolic health data for a subject; a risk assessment module comprising processor-executable instructions that, when executed by at least one computer hardware processor, cause at least one computer hardware processor to perform determining, using at least some of the cardiometabolic health data and for each of multiple time intervals, one or more measures of risk that the subject develops disease to obtain multiple measures of risk corresponding to the multiple time intervals; a therapeutic intervention benefit assessment module comprising processor executable instructions that, when executed by at least one computer hardware processor, cause at least one computer hardware processor to perform determining, using the multiple measures of risk that the subject develops the disease, benefit of administering to the subject one or more therapeutic interventions designed to reduce risk of the disease by targeting one or more modifiable causes of the disease; and a therapeutic intervention selection module comprising processor executable instructions that, when executed by at least one computer hardware processor, cause at least one computer hardware processor to perform identifying, using the determined benefit of administering the one or more therapeutic interventions, at least one therapeutic intervention to recommend to be administered to the subject.
[0021] Some aspects of the technology provide a system, comprising: at least one computer hardware processor; and at least one non-transitory computer readable storage medium storing processorexecutable instructions that, when executed by at least one processor, cause at least one computer hardware processor to perform any one of the methods described herein.
[0022] -5- #14604877vl Other aspects of the technology provide at least one non-transitory computer readable storage medium storing processor-executable instructions that, when executed by at least one processor, cause at least one processor to perform any one of the methods described herein.
[0023] Yet other aspects of the technology provide a method for identifying an intervention for a subject in furtherance of preventing development of cardiovascular disease in the subject, the method comprising: using at least one computer hardware processor to perform: obtaining cardiometabolic health data for the subject; determining, using at least some of the cardiometabolic health data, a first trained machine learning (ML) model and for each of multiple time intervals, one or more measures of risk that the subject develops cardiovascular disease to obtain multiple measures of risk corresponding to the multiple time intervals; determining, using the multiple measures of risk that the subject develops cardiovascular disease and at least one second ML model, benefit of administering to the subject one or more therapeutic interventions designed to reduce risk of cardiovascular disease by targeting one or more modifiable causes of the cardiovascular disease; and identifying, using the determined benefit of administering the one or more therapeutic interventions, at least one therapeutic intervention to recommend being administered to the subject.
[0024] Additional aspects of the technology provide a system, comprising: at least one computer hardware processor; and at least one non-transitory computer-readable storage medium storing processor executable instructions that, when executed by at least one computer hardware processor, cause at least one computer hardware processor to perform a method for identifying an intervention for a subject in furtherance of preventing development of cardiovascular disease in the subject, the method comprising: obtaining cardiometabolic health data for the subject; determining, using at least some of the cardiometabolic health data, a first trained machine learning (ML) model and for each of multiple time intervals, one or more measures of risk that the subject develops cardiovascular disease to obtain multiple measures of risk corresponding to the multiple time intervals; determining, using the multiple measures of risk that the subject develops cardiovascular disease and at least one second ML model, benefit of administering to the subject one or more therapeutic interventions designed to reduce risk of cardiovascular disease by targeting one or more modifiable causes of the cardiovascular disease; and identifying, using the determined benefit of administering the one or more therapeutic interventions, at least one therapeutic intervention to recommend being administered to the subject.
[0025] Further aspects of the technology provide at least one non-transitory computer-readable storage medium storing processor executable instructions that, when executed by at least one computer hardware processor, cause at least one computer hardware processor to perform a method for identifying an intervention for a subject in furtherance of preventing development of cardiovascular disease in the subject, the method comprising: obtaining cardiometabolic health data for the subject; determining, using at least some of the cardiometabolic health data, a first trained machine learning (ML) model and for each of multiple time intervals, one or more measures of risk that the subject develops cardiovascular disease to obtain multiple measures of risk corresponding to the multiple time intervals; determining, -6- #14604877vl using the multiple measures of risk that the subject develops cardiovascular disease and at least one second ML model, benefit of administering to the subject one or more therapeutic interventions designed to reduce risk of cardiovascular disease by targeting one or more modifiable causes of the cardiovascular disease; and identifying, using the determined benefit of administering the one or more therapeutic interventions, at least one therapeutic intervention to recommend being administered to the subject.
[0026] Some aspects of the technology provide a method for estimating a cumulative LDL exposure trajectory for a subject, the method comprising: using at least one computer hardware processor to perform: obtaining cardiometabolic health data for a subject comprising: one or more values for one or more clinical characteristics of the subject, and / or one or more values for one or more physical measurements of the subject, and / or one or more values for one or more biochemical measurements of the subject; encoding the cardiometabolic health data for the subject into a feature vector; estimating an LDL level trajectory for the subject by processing the feature vector using an LDL trajectory prediction machine learning (ML) model that has been trained to estimate an LDL level for a subject at each of multiple prior ages and each of multiple future ages using training data comprising for each of a plurality of participants, repeated longitudinal measures of LDL levels and values for the one or more clinical characteristics and / or the one or more physical measurements and / or the one or more biochemical measurements over at least 10, at least 20, at least 30, or at least 50 years of follow-up, wherein the LDL level trajectory for the subject comprises estimated LDL levels for the subject including an estimated LDL level for each of multiple prior ages of the subject and multiple future ages of the subject; estimating, using the LDL level trajectory, a cumulative LDL exposure trajectory for the subject with respect to a set of ages, wherein the cumulative LDL exposure trajectory comprises an estimated cumulative LDL exposure level for the subject at each age in the set of ages; and outputting the estimate cumulative LDL exposure trajectory for the subject.
[0027] Other aspects of the technology provide a system, comprising: at least one computer hardware processor; and at least one non-transitory computer-readable storage medium storing processor executable instructions that, when executed by at least one computer hardware processor, cause at least one computer hardware processor to perform a method for estimating a cumulative LDL exposure trajectory for a subject, the method comprising: obtaining cardiometabolic health data for a subject comprising: one or more values for one or more clinical characteristics of the subject, and / or one or more values for one or more physical measurements of the subject, and / or one or more values for one or more biochemical measurements of the subject; encoding the cardiometabolic health data for the subject into a feature vector; estimating an LDL level trajectory for the subject by processing the feature vector using an LDL trajectory prediction machine learning (ML) model that has been trained to estimate am LDL level for a subject at each of multiple prior ages and each of multiple future ages using training data comprising for each of a plurality of participants, repeated longitudinal measures of LDL levels and values for the one or more clinical characteristics and / or the one or more physical measurements and / or the one or more biochemical measurements over at least 10, at least 20, at least 30, or at least 50 years of -7- #14604877vl follow-up, wherein the LDL level trajectory for the subject comprises estimated LDL levels for the subject including an estimated LDL level for each of multiple prior ages of the subject and multiple future ages of the subject; estimating, using the LDL level trajectory, a cumulative LDL exposure trajectory for the subject with respect to a set of ages, wherein the cumulative LDL exposure trajectory comprises an estimated cumulative LDL exposure level for the subject at each age in the set of ages; and outputting the estimate cumulative LDL exposure trajectory for the subject.
[0028] Yet other aspects of the technology provide at least one non-transitory computer-readable storage medium storing processor executable instructions that, when executed by at least one computer hardware processor, cause at least one computer hardware processor to perform a method for estimating a cumulative LDL exposure trajectory for a subject, the method comprising: obtaining cardiometabolic health data for a subject comprising: one or more values for one or more clinical characteristics of the subject, and / or one or more values for one or more physical measurements of the subject, and / or one or more values for one or more biochemical measurements of the subject; encoding the cardiometabolic health data for the subject into a feature vector; estimating an LDL level trajectory for the subject by processing the feature vector using an LDL trajectory prediction machine learning (ML) model that has been trained to estimate am LDL level for a subject at each of multiple prior ages and each of multiple future ages using training data comprising for each of a plurality of participants, repeated longitudinal measures of LDL levels and values for the one or more clinical characteristics and / or the one or more physical measurements and / or the one or more biochemical measurements over at least 10, at least 20, at least 30, or at least 50 years of follow-up, wherein the LDL level trajectory for the subject comprises estimated LDL levels for the subject including an estimated LDL level for each of multiple prior ages of the subject and multiple future ages of the subject; estimating, using the LDL level trajectory, a cumulative LDL exposure trajectory for the subject with respect to a set of ages, wherein the cumulative LDL exposure trajectory comprises an estimated cumulative LDL exposure level for the subject at each age in the set of ages; and outputting the estimate cumulative LDL exposure trajectory for the subject.
[0029] Additional aspects of the technology provide a computer-implemented method, comprising: obtaining cardiometabolic health data for a subject comprising: one or more values for one or more clinical characteristics of the subject, and / or one or more values for one or more physical measurements of the subject, and / or one or more values for one or more biochemical measurements of the subject; encoding the cardiometabolic health data for the subject into a first feature vector; estimating an LDL level trajectory for the subject by processing the first feature vector using an LDL trajectory prediction machine learning (ML) model that has been trained to estimate an LDL level for a subject at each of multiple prior ages and each of multiple future ages using training data comprising for each of a plurality of participants, repeated longitudinal measures of LDL levels and values for the one or more clinical characteristics and / or the one or more physical measurements and / or the one or more biochemical measurements over at least 10, at least 20, at least 30, or at least 50 years of follow-up, wherein the LDL
[0030] -8- #14604877vl level trajectory for the subject comprises an estimated LDL level for the subject for each of multiple prior ages of the subject and multiple future ages of the subject.
[0031] Further aspects of the technology provide a computer-implemented method of determining one or more measures of risk that a subject develops cardiovascular disease, the method comprising, for each of multiple time intervals: (a) estimating, using a trained cardiovascular risk prediction machine learning model and cardiometabolic health data for the subject including a cumulative LDL exposure trajectory for the subject, values indicative of log hazard ratios for risk of the subject having a cardiovascular event at respective levels of cumulative LDL exposure, wherein the cardiovascular risk prediction machine learning model has been trained using training data comprising, for each of a plurality of participants enrolled in one or more prospective studies, multiple LDL measurements along with a recorded age or date at which a first cardiovascular event occurred, optionally wherein the first cardiovascular event is a first episode of a fatal or non-fatal myocardial infarction (MI), fatal or non-fatal ischemic stroke, or coronary revascularization; and (b) estimating, using the values indicative of the log hazard ratios, (i) absolute instantaneous hazard rates of the subject having a cardiovascular event at the respective levels of cumulative LDL exposure, and (ii) cumulative lifetime risks of the subject having a cardiovascular event at the respective levels of cumulative LDL exposure.
[0032] Some aspects of the technology provide a computer-implemented method of determining an expected proportional reduction and / or absolute reduction in risk of cardiovascular events for a subject in response to a particular therapeutic intervention sequence, the particular therapeutic sequence indicating magnitude, duration, and timing of one or more interventions associated with a reduction of LDL level and / or SBP level over an interval of follow up for the subject, the method comprising: determining multiple measures of risk of cardiovascular events for the subject comprising absolute instantaneous hazard rates, cumulative hazard rates, and cumulative event rates of the subject having a cardiovascular event at respective ones of multiple time intervals, wherein the cumulative hazard rates and the cumulative event rates are not adjusted for the particular therapeutic intervention sequence, and determining the expected proportional reduction and / or the absolute reduction in the risk of cardiovascular events for the subject in response to the particular therapeutic intervention sequence using a method that comprises: (a) for each particular interval of follow-up for the subject from the subject’s current age to an upper threshold age, (i) determining, using the multiple measures of risk, a predicted instantaneous hazard rate of the subject having a cardiovascular event at the particular interval of follow up; (ii) determining, using a benefit prediction machine learning (ML) model, a time -averaged instantaneous log hazard ratio for a one unit lower LDL or SBP corresponding to duration of treatment at the particular interval of follow-up for the subject, wherein the benefit prediction machine learning model has been trained using training data from randomized trials of LDL lowering therapies and / or randomized trials of SBP therapies and Mendelian randomization studies evaluating genetic variants associated with lower LDL and / or lower SBP, said training data comprising for each of a plurality of participants in said trials, at least one LDL or SBP measurement along with a recorded age or date at which a first
[0033] -9- #14604877vl cardiovascular event occurred; (iii) determining an intervention-adjusted instantaneous hazard for the particular interval of follow-up by multiplying the instantaneous hazard rate of the subject determined at (a)(i) with the time-averaged instantaneous log hazard ratio determined at (a)(ii), thereby obtaining multiple intervention-adjusted instantaneous hazards for intervals of follow-up evaluated at (a); (b) determining, using the multiple intervention-adjusted instantaneous hazards, intervention-adjusted cumulative hazard rates and cumulative event rates of cardiovascular events for intervals of follow-up for the subject from the subject's current age to the upper threshold age; and (c) determining predicted proportional reductions in the risk of experiencing a cardiovascular event as ratios of the intervention-adjusted cumulated hazard rates and the cumulative hazard rates that are not adjusted for the particular therapeutic intervention sequence.
[0034] Other aspects of the technology provide a computer-implemented method for pricing an insurance instrument for a subject, the method comprising: obtaining cardiometabolic health data for the subject including a cumulative LDL exposure trajectory for the subject; determining multiple measures of risk that the subject develops cardiovascular disease using a method comprising: (a) estimating, using a trained cardiovascular risk prediction machine learning model and the cardiometabolic health data for the subject, values indicative of log hazard ratios for risk of the subject having a cardiovascular event at respective levels of cumulative LDL exposure, wherein the cardiovascular risk prediction machine learning model has been trained using training data comprising, for each of a plurality of participants enrolled in one or more prospective studies, at least one LDL along with a recorded age or date at which a first cardiovascular event occurred, optionally wherein the first cardiovascular event is a first episode of a fatal or non-fatal myocardial infarction (MI), fatal or non-fatal ischemic stroke, or coronary revascularization occurred; (b) estimating, using the values indicative of the log hazard ratios, (i) absolute instantaneous hazard rates of the subject having a cardiovascular event at the respective levels of cumulative LDL exposure, and (ii) cumulative lifetime risks of the subject having a cardiovascular event at the respective levels of cumulative LDL exposure; and (c) estimating, using the cumulative LDL exposure trajectory for the subject, multiple measures of risk to include: (i) absolute instantaneous hazard rates of the subject having a cardiovascular event at respective ones of multiple time intervals, and (ii) cumulative lifetime hazard and event rates of the subject having a cardiovascular event at the respective ones of the multiple time intervals; and pricing the insurance instrument for the subject based on the multiple measures of risk corresponding to the multiple time intervals.
[0035] Yet other aspects of the technology provide a computer-implemented method for pricing an insurance instrument for a subject, the method comprising: obtaining cardiometabolic health data for a subject comprising: one or more values for one or more clinical characteristics of the subject, and / or one or more values for one or more physical measurements of the subject, and / or one or more values for one or more biochemical measurements of the subject; generating a first feature vector representing the subject from the cardiometabolic health data for the subject, and / or a second feature vector representing the subject from the cardiometabolic health data for the subject; and performing one or both of:
[0036] -10- #14604877vl estimating an LDL level trajectory for the subject by processing the first feature vector using a LDL trajectory prediction machine learning model that has been trained to estimate an LDL level for a subject at each of multiple prior ages and each of multiple future ages using training data comprising for each of a plurality of participants, repeated longitudinal measures of LDL levels and values for the one or more clinical characteristics and / or the one or more physical measurements and / or the one or more biochemical measurements over at least 10, at least 20, at least 30, or at least 50 years of follow-up, wherein the LDL level trajectory for the subject comprises an estimated LDL level for the subject for each of multiple prior ages of the subject and multiple future ages of the subject; and estimating an SBP level trajectory for the subject by processing the second feature vector using an SBP trajectory prediction machine learning model that has been trained to estimate an SBP level for a subject at each of multiple prior ages and each of multiple future ages using training data comprising, for each of a plurality of participants, repeated longitudinal measures of SBP levels and values for the one or more clinical characteristics and / or the one or more physical measurements and / or the one or more biochemical measurements over at least 10, at least 20, at least 30, or at least 50 years of follow-up, wherein the SBP level trajectory for the subject comprises an estimated SBP level for the subject for each of multiple prior ages of the subject and multiple future ages of the subject; and pricing the insurance instrument for the subject based on the estimated LDL level trajectory and / or the estimated SBP level trajectory.
[0037] Additional aspects of the technology provide a computer-implemented method for pricing an insurance instrument for a subject, the method comprising: determining an expected risk of cardiovascular events for a subject in response to a therapeutic intervention associated with a reduction of LDL level and / or SBP level over an interval of follow up for the subject using a method comprising: (a) determining multiple measures of risk of cardiovascular events for the subject comprising absolute instantaneous hazard rates, cumulative hazard rates, and cumulative event rates of the subject having a cardiovascular event at respective ones of multiple time intervals, wherein the cumulative hazard rates and the cumulative event rates are not adjusted for the therapeutic intervention; (b) for each particular interval of follow-up for the subject from the subject’s current age to an upper threshold age, (i) determining, using the multiple measures of risk, a predicted instantaneous hazard rate of the subject having a cardiovascular event at the particular interval of follow up; (ii) determining, using a benefit prediction machine learning model, a time-averaged instantaneous log hazard ratio for a one unit lower LDL or SBP corresponding to duration of treatment at the particular interval of follow-up for the subject, wherein the benefit prediction machine learning model has been trained using training data from randomized trials of LDL lowering therapies and / or randomized trials of SBP therapies, and / or Mendelian randomization studies evaluating genetic variants associated with lower LDL and / or lower SBP, said training data comprising for each of a plurality of participants in said trials, at least one LDL or SBP measurement along with a recorded age or date at which a first cardiovascular event occurred; (iii) determining an intervention-adjusted instantaneous hazard for the particular interval of follow-up by multiplying the instantaneous hazard rate of the subject determined at (a)(i) with the time-averaged -11- #14604877vl instantaneous log hazard ratio determined at (b)(ii), thereby obtaining multiple intervention-adjusted instantaneous hazards for intervals of follow-up evaluated at (b); (c) determining, using the multiple intervention-adjusted instantaneous hazards, intervention-adjusted cumulative hazard rates and cumulative event rates of cardiovascular events for intervals of follow-up for the subject from the subject's current age to the upper threshold age; and (d) pricing the insurance instrument for the subject using the intervention-adjusted cumulative hazard rates and cumulative event rates of cardiovascular events obtained at (c).
[0038] Further aspects of the technology provide a computer-implemented method for pricing an insurance instrument for a subject, the method comprising: determining the value of one or more cardiometabolic health metrics associated with the subject using any one of the methods described herein; and pricing the insurance instrument for the subject based on the results of said determining.
[0039] Some aspects of the technology provide a system, comprising: at least one computer hardware processor; and at least one non-transitory computer readable storage medium storing processorexecutable instructions that, when executed by at least one processor, cause at least one processor to perform any one of the methods described herein.
[0040] Other aspects of the technology provide at least one non-transitory computer readable storage medium storing processor-executable instructions that, when executed by at least one processor, cause at least one processor to perform any one of the methods described herein.
[0041] Some aspects of the technology provide a computer-implemented method, comprising: obtaining cardiometabolic health data for a subject comprising: one or more values for one or more clinical characteristics of the subject, and / or one or more values for one or more physical measurements of the subject, and / or one or more values for one or more biochemical measurements of the subject; encoding the cardiometabolic health data for the subject into a first feature vector; estimating an SBP level trajectory for the subject by processing the first feature vector using an SBP trajectory prediction ML model that has been trained to estimate an SBP level for a subject at each of multiple prior ages and each of multiple future ages using training data comprising, for each of a plurality of participants, repeated longitudinal measures of SBP levels and values for the one or more clinical characteristics and / or the one or more physical measurements and / or the one or more biochemical measurements over at least 10, at least 20, at least 30, or at least 50 years of follow-up, wherein the SBP level trajectory for the subject comprises an estimated SBP level for the subject for each of multiple prior ages of the subject and multiple future ages of the subject.
[0042] Yet other aspects of the technology provide a computer-implemented method of determining one or more measures of risk that a subject develops cardiovascular disease, the method comprising, for each of multiple time intervals: (a) estimating, using a trained cardiovascular risk prediction machine learning model and cardiometabolic health data for the subject including a cumulative LDL exposure trajectory for the subject, values indicative of log hazard ratios for risk of the subject having a cardiovascular event at respective levels of cumulative LDL exposure, wherein the cardiovascular risk prediction machine -12- #14604877vl learning model has been trained using training data comprising, for each of a plurality of participants enrolled in one or more prospective studies, multiple LDL measurements along with a recorded age or date at which a first cardiovascular event occurred, optionally wherein the first cardiovascular event is a first episode of a fatal or non-fatal myocardial infarction (MI), fatal or non-fatal ischemic stroke, or coronary revascularization; and (b) estimating, using the values indicative of the log hazard ratios, (i) absolute instantaneous hazard rates of the subject having a cardiovascular event at the respective levels of cumulative LDL exposure, and (ii) cumulative lifetime risks of the subject having a cardiovascular event at the respective levels of cumulative LDL exposure; and (c) estimating, using the cumulative LDL exposure trajectory for the subject, the multiple measures of risk to include: (i) absolute instantaneous hazard rates of the subject having a cardiovascular event at respective ones of the multiple time intervals, and (ii) cumulative lifetime hazard and event rates of the subject having a cardiovascular event at the respective ones of the multiple time intervals.
[0043] Additional aspects of the technology provide a computer-implemented method of determining an expected proportional reduction and / or absolute reduction in risk of cardiovascular events for a subject in response to a particular therapeutic intervention sequence, the particular therapeutic sequence indicating magnitude, duration, and timing of one or more interventions associated with a reduction of LDL level and / or SBP level over an interval of follow up for the subject, the method comprising: determining multiple measures of risk of cardiovascular events for the subject comprising absolute instantaneous hazard rates, cumulative hazard rates, and cumulative event rates of the subject having a cardiovascular event at respective ones of multiple time intervals, wherein the cumulative hazard rates and the cumulative event rates are not adjusted for the particular therapeutic intervention sequence, and determining the expected proportional reduction and / or the absolute reduction in the risk of cardiovascular events for the subject in response to the particular therapeutic intervention sequence using a method that comprises: (a) for each particular interval of follow-up for the subject from the subject’s current age to an upper threshold age, (i) determining, using the multiple measures of risk, a predicted instantaneous hazard rate of the subject having a cardiovascular event at the particular interval of follow up; (ii) determining, using a benefit prediction machine learning model, a time -averaged instantaneous log hazard ratio for a one unit lower LDL or SBP corresponding to duration of treatment at the particular interval of follow-up for the subject, wherein the benefit prediction machine learning model has been trained using training data from randomized trials of LDL lowering therapies and / or randomized trials of SBP therapies and Mendelian randomization studies evaluating genetic variants associated with lower LDL and / or lower SBP, said training data comprising for each of a plurality of participants in said trials, at least one LDL or SBP measurement along with a recorded age or date at which a first cardiovascular event occurred; (iii) determining an intervention-adjusted instantaneous hazard for the particular interval of follow-up by multiplying the instantaneous hazard rate of the subject determined at (a)(i) with the time-averaged instantaneous log hazard ratio determined at (a)(ii), thereby obtaining multiple intervention-adjusted instantaneous hazards for intervals of follow-up evaluated at (a); (b) determining,
[0044] -13- #14604877vl using the multiple intervention-adjusted instantaneous hazards, intervention-adjusted cumulative hazard rates and cumulative event rates of cardiovascular events for intervals of follow-up for the subject from the subject’s current age to the upper threshold age; (c) determining predicted proportional reductions in the risk of experiencing a cardiovascular event as ratios of the intervention-adjusted cumulated hazard rates and the cumulative hazard rates that are not adjusted for the particular therapeutic intervention sequence; and (d) determining predicted absolute reductions in the risk of experience of experiencing a cardiovascular event as absolute differences between the intervention-adjusted cumulative event rates and the cumulative event rates that are not adjusted for the particular therapeutic intervention sequence.
[0045] Further aspects of the technology provide a system, comprising: at least one computer hardware processor; and at least one non-transitory computer readable storage medium storing processorexecutable instructions that, when executed by at least one processor, cause at least one processor to perform any one of the methods described herein.
[0046] Some aspects of the technology provide at least one non-transitory computer readable storage medium storing processor-executable instructions that, when executed by at least one processor, cause at least one processor to perform any one of the methods described herein.
[0047] The preceding Summary is non-limiting.
[0048] BRIEF DESCRIPTION OF DRAWINGS
[0049] Various aspects and embodiments will be described with reference to the following figures. It should be appreciated that the figures are not necessarily drawn to scale. Items appearing in multiple figures are indicated by the same or a similar reference number in all the figures in which they appear.
[0050] FIGs. 1 A-D are block diagrams illustrating aspects of an example architecture of an Al-model based reinforcement learning system, in accordance with some embodiments of the technology described herein.
[0051] FIG. IE is a block diagram illustrating an example system 150 configured for preventing development of disease, in accordance with some embodiments of the technology described herein.
[0052] FIG. 2A is an example of an illustrative process 200 for identifying an intervention for a subject in furtherance of preventing development of cardiovascular disease in the subject, in accordance with some embodiments of the technology described herein.
[0053] FIG. 2B is an example of an illustrative process for obtaining cardiometabolic health data for a subject, in accordance with some embodiments of the technology described herein.
[0054] FIG. 2C is an example of an illustrative process for determining, using the cardiometabolic health data, a first trained ML model, and multiple time intervals, measure(s) of risk that the subject develops cardiovascular disease, in accordance with some embodiments of the technology described herein.
[0055] FIG. 2D is an example of an illustrative process for determining, using the measure(s) of risk and at least one second trained ML model, benefit of administering to the subject therapeutic
[0056] -14- #14604877vl intervention(s) to reduce risk of cardiovascular disease, in accordance with some embodiments of the technology described herein.
[0057] FIG. 2E is an example of an illustrative process identifying, using determined benefit of administering the therapeutic intervention(s), at least one therapeutic intervention to recommend to be administered to the subject, in accordance with some embodiments of the technology described herein.
[0058] FIG. 3 A shows an illustrative example of the subject characteristic and / or measurement data, in accordance with some embodiments of the technology described herein.
[0059] FIG. 3B shows an illustrative example of standardizing values part of the subject characteristic and / or measurement data, in accordance with some embodiments of the technology described herein.
[0060] FIG. 3C shows an illustrative example of positionally encoding the standardized values of the subject characteristic and / or measurement data, in accordance with some embodiments of the technology described herein.
[0061] FIG. 3D shows an illustrative example of an LDL trajectory predicted for a biological male subject, in accordance with some embodiments of the technology described herein.
[0062] FIG. 3E shows illustrative examples of LDL trajectories predicted for biological male and female subjects, in accordance with some embodiments of the technology described herein.
[0063] FIG. 3F shows an illustrative example of a cumulative LDL exposure trajectory representing cumulative exposure to LDL as a function of age, in accordance with some embodiments of the technology described herein.
[0064] FIG. 3G shows an illustrative example of an SBP trajectory predicted for a biological male subject, in accordance with some embodiments of the technology described herein.
[0065] FIG. 3H shows illustrative examples of SBP trajectories predicted for biological male and female subjects, in accordance with some embodiments of the technology described herein.
[0066] FIG. 31 shows an illustrative example of a cumulative SBP exposure trajectory representing cumulative exposure to SBP as a function of age, in accordance with some embodiments of the technology described herein.
[0067] FIG. 3J shows an illustrative example of Lp(a) trajectories predicted for men and women in various percentiles of the population, each Lp(a) trajectory indicating Lp(a) levels as a function of age, in accordance with some embodiments of the technology described herein.
[0068] FIG. 3K shows illustrative examples of HbAlc, weight, and waist circumference trajectories predicted for a subject, in accordance with some embodiments of the technology described herein.
[0069] FIG. 4A shows cumulative lifetime risk of major adverse cardiovascular events (MACE) at each age among persons randomized by nature to higher, average, or lower LDL levels.
[0070] FIG. 4B shows cumulative lifetime risk of MACE at each level of cumulative exposure to LDL among persons randomized by nature to higher, average, or lower LDL levels.
[0071] -15- #14604877vl FIG. 4C-1 shows cumulative lifetime risk of MACE at each age among persons randomized by nature to higher, average, or lower LDL levels by inherited risk (family history of cardiovascular disease).
[0072] FIG. 4C-2 shows cumulative lifetime risk of MACE at each level of cumulative exposure to LDL among persons naturally randomized to higher, average, or lower LDL levels by inherited risk (family history of cardiovascular disease).
[0073] FIG. 4D-1 shows cumulative lifetime risk of MACE at each age among persons randomized by nature to higher, average, or lower LDL levels by exposure to endogenous arterial injury (Type 2 Diabetes).
[0074] FIG. 4D-2 shows cumulative lifetime risk of MACE at each level of cumulative exposure to LDL among persons randomized by nature to higher, average, or lower LDL levels by exposure to endogenous arterial injury (Type 2 Diabetes).
[0075] FIG. 4E-1 shows cumulative lifetime risk of MACE at each age among persons randomized by nature to higher, average, or lower LDL by exposure to exogenous arterial injury (tobacco smoking).
[0076] FIG. 4E-2 shows cumulative lifetime risk of MACE at each level of cumulative exposure to LDL among persons randomized by nature to higher, average, or lower LDL levels by exposure to exogenous arterial injury (tobacco smoking).
[0077] FIG. 4F illustrates a survival DNN trained at each level of cumulative exposure to LDL and combined piecemeal to train Al to learn to biology of how atherosclerosis develops and how the risk of acute atherosclerotic cardiovascular events at each level of cumulative LDL conditional on exposure to other causes of arterial wall injury that reduce the capacity of the artery to tolerate accumulated plaque burden, in accordance with some embodiments of the technology described herein.
[0078] FIG. 4G shows remaining cumulative lifetime risk of MACE by cumulative LDL exposure among persons with and without hypertension, in accordance with some embodiments of the technology described herein.
[0079] FIG. 4H illustrates calculation of personal plaque thresholds from LDL cumulative exposure thresholds at which MACE occur among persons with and without hypertension, in accordance with some embodiments of the technology described herein.
[0080] FIG. 41-1 shows an example of cumulative lifetime risk of MACE and corresponding LDL cumulative exposure thresholds for men in a reference population, in accordance with some embodiments of the technology described herein.
[0081] FIG. 41-2 shows an example of cumulative lifetime risk of MACE and corresponding LDL cumulative exposure thresholds for women in a reference population, in accordance with some embodiments of the technology described herein.
[0082] FIG. 4J is another diagram illustrating cumulative exposure to LDL as an estimate of accumulated plaque burden and corresponding lifetime risk of MACE, in accordance with some embodiments of the technology described herein.
[0083] -16- #14604877vl FIG. 5A illustrates randomized training data for training at least one second ML model - a causal DNN for ODE model in this example - used for estimating benefits of interventions, in accordance with some embodiments of the technology described herein.
[0084] FIG. 5B illustrates estimates of the benefit of lowering LDL or SBP by magnitude and duration generated using a causal deep neural network for ordinary differential equations, in accordance with some embodiments of the technology described herein.
[0085] FIG. 5C illustrates that a causal DNN for ODE model accurately predicts benefit of lower LDL during every year of life in time-to-event Mendelian randomization (MR) studies, in accordance with some embodiments of the technology described herein.
[0086] FIG. 5D illustrates that a causal DNN for ODE model accurately predicts benefit of lower SBP during every year of life in time-to event MR studies, in accordance with some embodiments of the technology described herein.
[0087] FIG. 5E illustrates that a causal DNN for ODE model accurately predicts benefit of lower LDL and SBP during every year of life in time-to-event MR studies, in accordance with some embodiments of the technology described herein.
[0088] FIG. 5F illustrates that a causal DNN for ODE model accurately predicts benefit of lower LDL during every month of follow-up in randomized trials of LDL lowering therapies, in accordance with some embodiments of the technology described herein.
[0089] FIG. 5G illustrates that a causal DNN for ODE model accurately predicts benefit of lower SBP during every month of follow-up in randomized trials of SBP lowering therapies, in accordance with some embodiments of the technology described herein.
[0090] FIG. 5H illustrates that a causal DNN for ODE model accurately predicts benefit of lower LDL and SBP during every month of follow-up in randomized trials of combination LDL and SBP lowering therapies, in accordance with some embodiments of the technology described herein.
[0091] FIG. 6 A illustrates an example benefit of lowering LDL by 35% beginning at the age of 40, in accordance with some embodiments of the technology described herein.
[0092] FIG. 6B illustrates an example benefit of lowering LDL by 35% beginning at the age of 40 as compared to lowering LDL by 35% beginning at the age of 65, in accordance with some embodiments of the technology described herein.
[0093] FIG. 6C illustrates example benefits of lowering LDL by 35% beginning at the ages of 40, 50, or 60 years, in accordance with some embodiments of the technology described herein.
[0094] FIG. 6D illustrates an example benefit of lowering LDL by 35% beginning at the age of 40 as compared to lowering LDL by 50% beginning at the age of 55, in accordance with some embodiments of the technology described herein.
[0095] FIGs. 7A-7C show screenshots of illustrative graphical user interfaces providing users with information about risk and / or recommended actions in furtherance of preventing development of cardiovascular disease, in accordance with some embodiments of the technology described herein.
[0096] -17- #14604877vl FIG. 8 illustrates impact of non-compliance on cumulative lifetime risk of experiencing an ASCVD event, in accordance with some embodiments of the technology described herein.
[0097] FIG. 9A illustrates an illustrative example of updated subject characteristic and / or measurement data, in accordance with some embodiments of the technology described herein.
[0098] FIG. 9B illustrates effect on remaining lifetime risk of ASCVD events from age 55 to age 80 from lowering LDL by 35% beginning at the age of 40, both with and without quantifying the benefit of legacy effect, in accordance with some embodiments of the technology described herein.
[0099] FIG. 10A shows an illustrative example of a deep neural network.
[0100] FIG. 10B shows an illustrative example of a deep neural network with ordinary differential equations (ODE).
[0101] FIG. 11 is a block diagram of an illustrative computing system that may be used in implementing some embodiments of the technology described herein.
[0102] DETAILED DESCRIPTION
[0103] Atherosclerotic cardiovascular disease and hypertension are by far the leading causes of morbidity, mortality, and healthcare costs around the world. Atherosclerosis develops over several decades before the accumulated plaque burden becomes large enough to increase the risk of having a heart attack or stroke; and systolic blood pressure (SBP) begins to rise linearly with age several decades before the development of hypertension. Thus, it is possible to predict who is developing these diseases and then intervene to reduce exposure to modifiable causes of disease, such as low density lipoprotein (LDL), lipoprotein(a) (Lp(a)), and SBP early in the disease process to slow the trajectory of atherosclerosis and rising SBP enough to largely reduce or even eliminate the I ifeti me risk of heart attack, stroke, and hypertension, for example. This should extend the average healthy lifespan, for example, by 25 years or more.
[0104] To achieve this goal, most people will likely require modest sustained reductions in LDL and SBP over several years, even decades, to slow the trajectory of atherosclerosis and rising SBP enough to largely reduce or eliminate their lifetime risk of developing heart attack, stroke, and hypertension.
[0105] However, long-term compliance with the therapies needed to produce large enough reductions in LDL and SBP to accomplish this goal is likely to be very poor, thus substantially undermining the potential clinical and economic benefits that can be achieved through early intervention to prevent cardiometabolic disease.
[0106] This problem can be solved, in some instances, by developing therapeutic interventions that can be administered yearly, for example, to ensure that long-term sustained reductions in LDL, Lp(a), and SBP are being achieved. Aspects of the technology described herein, including those referred to herein as Deep Causal Al (see, e.g., Figure IB), can be used to guide the use of these therapeutic interventions with quantitative precision, for example, by predicting each person’s remaining lifetime risk of developing heart attack, stroke and hypertension; predicting the benefit that each person will receive from lowering -18- #14604877vl LDL, Lp(a), and / or SBP beginning at any age and extending for any duration; and then using this information to prescribe and personalize therapeutic interventions and regimens to slow the trajectory of atherosclerosis and rising SBP enough to prevent heart attacks, stroke, and hypertension. Thus, aspects of the technology described herein provide a practical way to personalize the prevention of cardiometabolic (e.g., cardiovascular) disease.
[0107] Deep Causal Al, for example, combines Al, data analytics, and the current understanding of the biology of human diseases, with deep clinical insight to imbed randomized causal evidence into an ensemble of deep and machine learning algorithms that can predict risk and benefit. This advancement enables Deep Causal Al to both predict outcomes and prescribe actions to change those outcomes. As result, Deep Causal Al, as provided herein, can be used, for example, to enhance drug discovery and development, reimagine trial design and evidence generation, transform the delivery of therapeutic interventions, and / or define the future of Precision Cardiometabolic Health. Any one or more of the foregoing goals can be achieved using Deep Causal Al or other aspects of the technology described herein by: unlocking the unique value of therapies to extend the healthy lifespan by preventing common human diseases; generating evidence to create new markets to predict and prevent rather than diagnosis and treat disease; developing analytical tools and digital infrastructure for precision population health to deliver therapies to the right person, at the right time, and the right dose to prevent common human diseases; generating evidence to build an investment case and design innovative insurance and financial instruments to fund prevention and precision health at scale; predicting the trajectory of common human diseases over time by encoding biological cause and effect; quantifying the clinical benefit of reducing exposure to the modifiable causes of disease beginning at any age and extending for any duration; and / or using this information to prescribe specific actions that a person can take to effectively personalize the prevention of common diseases to extend a healthy lifespan.
[0108] The present technology, in some aspects, addresses the inability of existing computational systems to accurately predict personalized cardiovascular risk and identify optimal therapeutic interventions based on an individual’s unique biological trajectory over time. Conventional risk assessment tools rely on static, population-based models that fail to account for the dynamic, cumulative nature of cardiovascular disease development, leading to suboptimal treatment decisions and poor patient outcomes. Moreover, conventional methods do not model the underlying biology of cardiovascular disease development.
[0109] The technology described herein, by contrast, provides a significant advancement in the state of the art of predicting cardiovascular risk of individual patients and identification of optimal therapeutic interventions for those patients. The technology described herein provides this advancement through a combination of novel components, and each such component is an advancement in its own right both in terms of the function that each such component performs (because such functions were previously not possible) and the manner in which it performs it (because the technology enabling the function is also
[0110] -19- #14604877vl new). The present disclosure describes these individual components and how they operate together to predict personalized cardiovascular risk and identify personalized therapeutic interventions.
[0111] One component of the technology described herein includes technology for the determination of novel biomarkers for a subject, which novel biomarkers are subsequently used in quantifying the subject’s risk of experiencing major adverse cardiovascular events. The novel biomarkers include LDL, SBP, and / or Lp(a) trajectories for the subject that indicate estimated LDL, SBP, and / or Lp(a) levels for the subject at multiple ages (e.g., multiple prior and future ages, for example, from birth until 80 years old) and cumulative LDL, SBP, and / or Lp(a) exposure trajectories for the subject that indicate estimated cumulative LDL, SBP, and / or Lp(a) exposure levels for the subject at each of the multiple ages. These novel biomarkers are computationally derived using novel machine learning models and architectures developed by training on longitudinal data spanning years or decades. These biomarkers, together with conventional laboratory measurements and clinical data, create a comprehensive and precise cardiometabolic profile of an individual patient. The cardiometabolic profile of the padent may in turn be used to accurately predict the patient’s cardiovascular risk and identify therapeutic interventions personalized to the patient’s cardiovascular risk profile.
[0112] Another component of the technology described herein includes technology for using a patient’s cardiometabolic profile to estimate the patient’s cardiovascular risk. To this end, the disclosure describes a novel survival analysis enabled by new machine learning architectures. The novel survival analysis represents a fundamental departure from conventional approaches. Unlike conventional survival machine learning models, which use time (e.g., age in years) as intervals of follow-up, this disclosure introduces a family of new survival machine learning models that use cumulative exposure to LDL as intervals of follow-up. The family of new survival models includes various types of models, all using cumulative LDL exposure as interval of follow-up, including survival models with piecewise exponential modeling, deep neural networks, deep neural networks with piecewise exponential modeling, as well as other models, examples of which are provided herein. All these models are new and constitute improvements to conventional machine learning technology for survival analysis; these models did not exist prior to this disclosure and their use is not merely an application of existing models to new tasks - instead, the models described here constitute new machine learning architectures advancing the state of machine learning technology as part of survival analysis. As a result of the development of this new class of ML models, the technology described herein provides accurate estimates of an individual patient’s time-varying cardiovascular risk, while accounting for how it changes over time based on the time-varying biomarker trajectories described above.
[0113] Another component of the technology described herein includes technology for predicting the expected benefit that a patient may receive in response to therapeutic interventions designed to reduce exposure to modifiable causes of cardiovascular disease (e.g., therapeutic interventions designed to lower LDL, SBP, or both LDL and SBP). One enabling part of this technology is a novel machine learning model architecture that provides, for the first time, the ability to quantify and estimate the benefit of -20- #14604877vl lowering LDL and / or SBP for a particular patient (based on the patient’s cardiovascular risk profile determined as above). This architecture involves a novel family of ML models, each of which estimates the benefit of administering a particular type of intervention to the patient. The novel family of ML models includes a so-called “causal” deep neural network (DNN) for ordinary differential equations (c-DNN-ODE) that has a unique architecture and is trained using a dataset comprising data from a combination of (1) randomized trials of LDL and SBP lowering therapies, respectively; and (2) Mendelian randomization studies with individual participant follow-up data evaluating genetic variants associated with lower LDL (apoB) or SBP, respectively. The disclosure sets forth how such models can be used to determine the expected proportional and / or absolute reduction in risk of cardiovascular events for the subject if the LDL and / or SBP were lowered (e.g., responsive to a particular therapeutic intervention sequence of one or more therapeutic interventions designed to target LDL and / or SBP levels). These benefits may be estimated for various types of interventions or sequences of interventions, for varying magnitude, duration, and timing of such interventions. These models are also new and provide important advantages relative to other, conventional types of ML models, as described herein. Once again, these “causal” models constitute new machine learning architectures and are not merely an application of existing models to new tasks.
[0114] Yet another component of the technology described herein includes technology for discovering the optimal timing, type, intensity, combination, and / or sequence of interventions useful for personalizing the prevention of cardiometabolic disease to a patient. This technology involves various innovations including novel machine learning (e.g., reinforcement learning (RL)) methods for efficiently exploring the solution space of intervention sequences and scoring each intervention sequence based in part on the degree to which the intervention sequence lowers a patient’s risk of experience a major adverse cardiovascular event.
[0115] All these components, and others, are described herein and may be used individually or in any suitable combination. The components) may be used once or repeatedly, in the context of longitudinal patient monitoring and risk assessment. As the disclosure makes clear, the technology provided herein enables measurable improvements in medical practice, providing personalized treatment selection, objective insurance risk assessment, and clinical decision support with quantified benefits.
[0116] “Cardiometabolic disease” includes any metabolic, endocrine, or inflammatory disorder (herein used interchangeably with the term “disease”) or combination thereof, that increases the likelihood of adverse cardiovascular outcomes. Non-limiting examples of cardiometabolic diseases include cardiovascular diseases (e.g., atherosclerotic cardiovascular disease, stroke, arrhythmia, and heart failure), as well as hypertension, dyslipidemia, diabetes mellitus, metabolic syndrome, fatty liver disease, and obesity. Thus, cardiovascular disease represents a subset of cardiometabolic disease.
[0117] “Cardiovascular disease” includes disorders of the heart, blood vessels, or both, such as diseases of the coronary, peripheral, or cerebral vasculature, as well as structural, contractile, or electrical dysfunction of the myocardium. “Atherosclerotic cardiovascular disease” or “ASCVD” refers to -21- #14604877vl cardiovascular conditions caused by the buildup of plaque (atherosclerosis) in the arterial walls. Nonlimiting examples of ASCVD include coronary heart disease (also known as coronary artery disease, e.g., heart attack / myocardial infarction and angina), cerebrovascular disease (e.g., ischemic stroke and transient ischemic attack), peripheral artery disease, and aortic atherosclerotic disease (see, e.g., American Heart Association / heart.org).
[0118] “Cardiometabolic health” refers to the functional status of metabolic, endocrine, inflammatory, and / or cardiovascular systems (or any combination thereof), and exists on a continuous spectrum encompassing various states of health (e.g., optimal, intermediate, and / or poor health), as assessed by one or more cardiometabolic health data points.
[0119] “Cardiometabolic health data” refers to any qualitative or quantitative measurement indicative of cardiometabolic health, including, without limitation, one or more clinical characteristics of the subject (e.g., age, biological sex, family history of coronary heart disease, (CHD), family history of hypertension (HTN), family history of type 2 diabetes (T2D), polygenic scores, inherited predisposition or predispositions, history of tobacco use, etc.), one or more values for one or more physical measurements of the subject (systolic blood pressure (SBP), diastolic blood pressure (DBP), weight, waist circumference, height, body mass index (BMI), and waist-to-height ratio, etc.), and / or one or more values for one or biochemical measurements (e.g., lipid profile, low-density lipoprotein (LDL) level, high-density lipoprotein (HDL) level, total cholesterol level, triglyceride (TG) level, non-HDL cholesterol level, apolipoprotein (apoB) level, lipoprotein (a) (Lp(a)) level, glucose level, insulin level, glycated hemoglobin (HbAlc) level, liver function markers, inflammatory markers such as c-reactive protein (CRP) level)) of the subject.
[0120] “Major adverse cardiac event” or “MACE” refers to a composite clinical endpoint commonly used in the evaluation of cardiovascular risk, disease progression, and / or therapeutic efficacy. MACE generally encompasses serious cardiovascular outcomes that reflect clinically meaningful morbidity and mortality. As used herein, MACE refers to one of the following events: (i) non-fatal myocardial infarction (MI), (ii) fatal MI; (iii) non-fatal ischemic stroke; (iv) fatal ischemic stroke; and (v) coronary revascularization (percutaneous coronary revascularization with or without a stent, or CABG: coronary artery bypass grafting). The fatal MACE events (i.e., fatal MI and fatal ischemic stroke) can be referred to as cardiovascular death events.
[0121] Events constituting MACE can be adjudicated by predefined clinical criteria, optionally confirmed by electrocardiographic, biomarker, imaging, or procedural data. For example, myocardial infarction can be assessed by elevations in cardiac biomarkers (e.g., troponin), in some instances in combination with supporting clinical findings such as chest pain, electrocardiographic changes, or imaging evidence of new myocardial injury. As another example, stroke can include ischemic or hemorrhagic cerebrovascular events resulting in acute neurological deficit, optionally confirmed by imaging.
[0122] -22- #14604877vl FIG. 1 A and FIG. IB are illustrative examples of a Deep Causal Al Model-Based Reinforcement Learning System, framing the longitudinal monitoring of cardiometabolic health and the provision of updated guidance on how to prevent cardiometabolic disease as a model-based reinforcement learning problem. This system learns, monitors, and optimizes individual health trajectory by slowing biological progression of disease to personalize the prevention of cardiometabolic diseases. This Deep Causal Al Agent includes a series of deep and machine learning algorithms that encode biological cause and effect and have been trained to learn the biology of how common cardiometabolic diseases develop over time. Aspects of this technology use a State-Analysis-Action-Reward design for reinforcement learning problems to guide precision health. As summarized in FIG. IB, this predictive and prescriptive analysis quantitatively assesses the current state of health, predicts remaining lifetime risk of cardiometabolic disease over all time horizons, predicts the estimated benefit of specific interventions to reduce risk by magnitude, duration, and timing of intervention, and selects optimal sequences and timing of interventions needed to personalize prevention of cardiometabolic disease. This can be achieved, as follows:
[0123] Current State
[0124] Initially, the current state of a person’s cardiometabolic health is characterized. This state can be characterized by measuring the following biomarkers, for example: (1) clinical characteristics, including age, sex, family history, inherited predisposition (which can include polygenic risk), and tobacco history; (2) physical measurements, including systolic blood pressure (SBP), diastolic blood pressure (DBP), weight, waist circumference, height, and the derived measurements of body mass index (BMI), and waist-to-height ratio; and (3) biochemical measurements, including plasma low density lipoprotein (LDL) (apoB), lipoprotein(a) (Lp(a)), high density lipoprotein (HDL), triglycerides (TG), HbAlc, and others. The system is designed to be flexible and can include any bespoke set of features depending on local context and objectives.
[0125] A person’s state of cardiometabolic health, however, is a dynamic process that evolves over time. FIGs. 1C-1, 1C-2, and 1C-3 schematize the evolution of cardiometabolic health over time. Additional LDL and other atherogenic apoB-containing lipoproteins become trapped within the arterial walls leading to a progressively enlarging atherosclerotic plaque burden and corresponding progressively increasing risk of having a clinical atherosclerotic cardiovascular event, including myocardial infarction (MI) and ischemic stroke. The SBP level rises over time, progressively injuring the arterial wall thus reducing the capacity of the artery to tolerate the accumulated plaque burden and increasing the risk of having an atherosclerotic cardiovascular event; while also increasing the risk of developing hypertension and other pressure related comorbidities. Weight or waist circumference may progressively increase over time due to chronic excess energy balance leading to a variable rate of rise in HbAlc, which progressively injures the arterial wall reducing its capacity to tolerate the accumulated plaque burden and thus increasing the risk of having an atherosclerotic cardiovascular event; while also increasing the risk of developing type 2 diabetes (T2D) and its related comorbidities. This understanding of the biology of how cardiometabolic -23- #14604877vl diseases evolve over time, as provided herein, suggests a strategy to personalize prevention by slowing the progression of cardiometabolic disease.
[0126] First, LDL and other apoB -containing lipoproteins - including Lp(a) - are lowered to slow the progression of atherosclerosis enough to keep the size of the accumulated plaque burden below the threshold at which atherosclerotic cardiovascular events begin to occur. Next, an intervention to lower SBP is added when useful for preventing the accumulation of structural injury to the artery wall to maximize the capacity of the artery wall to tolerate the accumulated plaque burden and keep the risk of having an atherosclerotic cardiovascular event below the desired threshold; or when SBP exceeds 130 mmHg, for example, to prevent further rises in SBP and thus prevent the development of hypertension and its pressure related comorbidities. Similarly, a low-dose nutrient stimulated hormone (NuSH) or other intervention is added to prevent further weight gain caused by excess energy balance and thus prevent further rises in HbAlc, when useful for preventing the accumulation of structural injury to the artery wall caused by elevated glucose levels and thus maximizing the capacity of the artery wall to tolerate the accumulated plaque burden to keep the risk of having an atherosclerotic cardiovascular event below the desired threshold; or when HbAlc exceeds 5.7-6.0% (depending on age), for example, to prevent further rises in HbAlc and thus prevent the development of T2D and its related comorbidities. This integrated strategy is designed to personalize the prevention of myocardial infarction (MI), stroke, hypertension, and type 2 diabetes (T2D) by slowing the progression of how common cardiometabolic diseases develop. Provided herein is a system that learns biology to guide precision health so that, in some aspects, the reasoning and biological rationale for all outputs can be explained and objectively tested.
[0127] Analysis by Deep Causal Al Agent
[0128] These measurements are passed into a Deep Causal Al Agent to perform a series of predictive and prescriptive analyses. The Deep Causal Al Agent includes a sequential stack of deep and machine learning algorithms designed to perform the following functions: (a) assess the current state of cardiometabolic health, including computing: (i) an estimate of the size of the atherosclerotic plaque burden that has accumulated, and the rate at which the plaque burden is progressing, (ii) the current SBP, with estimates of the rate at which SBP is rising over time, and how much structural injury caused by elevated SBP has accumulated, (iii) the current weight, waist circumference, and HbAlc level, with estimates of trends for average changes in weight, waist circumference and HbAlc over time; (b) predict the risk of developing cardiometabolic disease, including (i) the risk of developing an atherosclerotic cardiovascular event - including MI and stroke - based on the size of the accumulated plaque burden, and the combination of other exposures that impact the capacity of the artery to tolerate the accumulated plaque burden at any point in time (including, for example, the predicted 1-year, 2-year, 5-year, 10-year, 20-year, or remaining lifetime cumulative risk of developing an atherosclerotic cardiovascular event), (ii) the risk of developing hypertension based on the current level of SBP and the predicted rate of rise in SBP over time (including, for example, the predicted age at which hypertension is likely to occur (c) -24- #14604877vl predict the expected benefit over any time period in response to actions (interventions) designed to reduce exposure to the modifiable causes of cardiometabolic disease, depending on the magnitude, duration, and timing of those interventions for all possible types, intensities, combinations, and sequences of possible interventions; and (d) discover the optimal timing, type, intensity, combination, and / or sequence of interventions useful personalizing the prevention of cardiometabolic disease by, for example, lowering LDL and other apoB-containing lipoproteins including Lp(a), SBP, weight (excess energy balance), and HbAlc by the amount needed by each person - when they need it - to prevent MI, stroke, and hypertension.
[0129] Recommended Action
[0130] The output of the Deep Causal Al Agent is twofold: (1) a narrative explaining the reasoning and biological rationale for why a person is at risk of developing cardiometabolic disease, how they can minimize their risk, how much they will benefit from specific actions to minimize risk, and the recommended optimal sequence of actions useful preventing cardiometabolic disease; and (2) recommended immediate action (which can include observation) to personalize the prevention of cardiometabolic disease, with a description of the expected subsequent actions used and when they are likely to be used based on the person’s predicted current cardiometabolic health trajectory.
[0131] Reward
[0132] The realized reward of the recommended action is determined by the actual achieved absolute reduction in exposure to the modifiable cause of disease targeted by the recommended intervention, which determines the corresponding expected reduction in the risk of cardiometabolic disease over each subsequent time interval, including the expected reduction in the risk of cardiometabolic disease over the next 1 year, 2 years, 5 years, 10 years, 20 years, remaining lifetime, or any other interval of interest.
[0133] Because the magnitude of the reward, or expected clinical benefit, is quantified by the absolute reduction in the modifiable cause of disease achieved in response to the recommended intervention(s) or sequence of interventions, the realized reward or clinical benefit depends on compliance with the recommended intervention(s) and how well the intervention(s) are implemented. This objective quantification of the reward function as the achieved absolute reduction in the targeted modifiable cause of disease, rather than the expected absolute reduction in the modifiable cause of disease and corresponding expected proportional reduction in the risk of cardiometabolic disease that was used to inform selection of the recommended action: (a) motivates the need to develop therapies that ensure compliance with the recommended action to maximize the realized reward or clinical benefit; (b) provides an objective metric to evaluate the success of an intervention, and provides a metric to evaluate iterative attempts to improve implementation strategies; and (c) provides an objective metric to establish longitudinal reimbursement schedules for interventions designed to prevent disease based on the
[0134] -25- #14604877vl achieved absolute reductions in exposure to the modifiable causes of disease and the corresponding expected clinical benefit over any time horizon.
[0135] Updated State
[0136] The updated state of cardiometabolic health can be determined by two dynamic processes: the achieved absolute reduction in the modifiable cause of disease targeted by the recommended intervention; and the absolute changes in other exposures not targeted by the recommended interventions(s) due to aging and the evolving biology of how common diseases develop during the same interval of longitudinal follow-up.
[0137] Updated Analysis
[0138] The updated values of a person’s clinical characteristics, physical biometrics, and biochemical measurements can then passed back into the Deep Causal Al Agent to provide: an updated assessment of their current state of cardiometabolic health; an updated prediction of the risk of cardiometabolic disease over all subsequent time intervals based on the updated state of cardiometabolic health and the reductions in the modifiable causes of disease achieved in response to prior interventions; an updated prediction of the expected benefit of interventions designed to reduce exposure to the modifiable causes of disease based on the updated state of cardiometabolic health and the reductions in the modifiable causes of disease achieved in response to prior interventions (including quantification of the legacy benefit from the achieved absolute reductions in the modifiable causes of disease in response to earlier interventions); and updated identification of the optimal timing, type, intensity, combination, and sequence of actions to prevent MI, stroke, and hypertension, for example.
[0139] Recommended Next Action
[0140] The next output of the Deep Causal Al Agent includes an updated narrative explaining the reasoning and biological rationale for why a person is at risk, how their cardiometabolic health is evolving, how they can reduce their risk of cardiometabolic disease, how much they have benefited from previous actions, how much they would benefit from subsequent actions to prevent disease, and updated recommendations for the optimal sequence of actions useful preventing cardiometabolic disease; and an updated recommendation for the optimal next immediate action could include continuing the current intervention, intensifying the current intervention, and / or adding another intervention. This process can repeat iteratively at regular follow-up intervals to monitor cardiometabolic health, and adjust guidance about the optimal recommended actions and sequence of actions useful for preventing cardiometabolic disease (e.g., MI, stroke, hypertension, and / or T2D) based on the person’s evolving cardiometabolic health and achieved reductions in the modifiable causes of disease in response to the recommended actions.
[0141] FIG. ID is an illustration of an example embodiment of a " Deep Causal Al Agent" composed of a stack of deep and machine learning algorithms, which are described herein including with reference to FIGs. 2A-2E.
[0142] -26- #14604877vl FIG. IE is a block diagram illustrating an example system 150 configured for preventing development of disease, in accordance with some embodiments of the technology described herein. The system may include one more processors and at least one non-transitory computer-readable storage medium storing software comprising software modules. Each of the software modules may include processor-executable instructions, that when executed by the processor(s) of the system, cause the processor(s) of the system to perform the function or functions of the software module.
[0143] For example, in the illustrative embodiment of FIG IE, system 150 includes interface module 151, cardiometabolic health data module 152, risk assessment module 153, therapeutic benefit assessment module 154, and therapeutic intervention selection module 155. The system 150 may be configured to perform one or more (e.g., all) acts of process 200 described with reference to FIG. 2A.
[0144] In some embodiments, the interface module 151 comprises processor-executable instructions, that when executed, provide functionality enabling interaction with the system 150. For example, interface module 151 may be configured one or more user interfaces (e.g., graphical user interfaces) that allow input of information about a subject (e.g., clinical characteristics, physical measurements, biochemical measurements) and their treatment goals (e.g., a specified target risk level in connection with cardiovascular disease, weight, etc.). The user interface(s) may additionally provide output to user(s) of the system including determined risks, benefits of treatment, recommended actions, and the like. Examples of such user interfaces are described herein including with reference to FIGs. 7A-7C.
[0145] In some embodiments, the cardiometabolic health data module 152 comprises processorexecutable instructions that, when executed, cause the system 150 to perform obtaining cardiometabolic health data for the subject (e.g., as described with reference to act 210 of process 200).
[0146] In some embodiments, the risk assessment module 153 comprises processor-executable instructions that, when executed, cause the system 150 to perform determining, using at least some of the cardiometabolic health data and for each of multiple time intervals (e.g., each time interval of follow-up), one or more measures of risk that the subject develops disease to obtain multiple measures of risk corresponding to the multiple time intervals (e.g., as described with reference to act 220 of process 200).
[0147] In some embodiments, the therapeutic intervention benefit assessment module 154 comprises processor-executable instructions that, when executed, cause the system 150 to perform determining, using the multiple measures of risk that the subject develops the disease, benefit of administering to the subject one or more therapeutic interventions designed to reduce risk of the disease by targeting one or more modifiable causes of the disease (e.g., as described with reference to act 230 of process 200).
[0148] In some embodiments, the therapeutic intervention selection module comprises processorexecutable instructions that, when executed, cause the system 150 to perform identifying, using the determined benefit of administering the one or more therapeutic interventions, at least one therapeutic intervention to recommend to be administered to the subject (e.g., as described with reference to act 240 of process 200).
[0149] -27- #14604877vl As shown in FIG. IE, system 150 also includes machine learning model datastore 160, which is configured to store one or more trained ML models to be used by one or more of the cardiometabolic health data module 151, the risk assessment module 152, the therapeutic intervention benefit assessment module 153, and / or the therapeutic intervention selection module 154. For example, the ML data store 160 may store parameters for a first ML model (e.g., a survival DNN-ODE model) to be used by the risk assessment module 152, parameters for at least one second ML model (e.g., one or multiple c-DNN-ODE models) to be used by the therapeutic intervention benefit assessment module 153, and / or parameters for one or more ML models (e.g., one or more bi-directional LSTM models) to be used by the cardiometabolic health data module 154 to estimate one or more trajectories of biomarker levels.
[0150] FIG. 2A is an example of an illustrative process 200 for identifying an intervention for a subject in furtherance of preventing development of cardiovascular disease in the subject, in accordance with some embodiments of the technology described herein.
[0151] As shown in FIG. 2A, process 200 comprises: (1) obtaining, at act 210, cardiometabolic health data for the subject, (2) determining, at act 220, using at least some (e.g., all) of the cardiometabolic health data, a first trained machine learning (ML) model (which may be referred to herein as a trained “cardiovascular risk prediction machine learning model” and may be, for example, a trained survival model), and for each of multiple time intervals (e.g., each time interval of follow-up, for example, in years), one or more measures of risk that the subject develops cardiovascular disease to obtain multiple measures of risk corresponding to the multiple time intervals; (3) determining, at act 230, using the multiple measures of risk that the subject develops cardiovascular disease and at least one second trained ML model (each of which may be referred to herein as a “benefit prediction machine learning model” and may be, for example, a trained deep neural network), benefit of administering to the subject one or more therapeutic interventions designed to reduce risk of cardiovascular disease by targeting one or more modifiable causes of the cardiovascular disease (e.g., LDL levels, SBP levels); and identifying, at act 240, using the determined benefit of administering the one or more therapeutic interventions, at least one therapeutic intervention (e.g., a particular sequence of interventions) to recommend to be administered to the subject. Examples of therapeutic interventions are described herein including in the Section titled “Therapeutic Interventions”. One or more of the identified therapeutic interventions may be recommended for administration to the subject. In some embodiments, the identified therapeutic intervention(s) may be administered to the subject.
[0152] In some embodiments, the subject may be monitored over time (e.g., longitudinally) and the analysis of acts 210-240 may be repeated. To this end, after act 240 is completed, process 200 may proceed to decision block 250, where it is determined that the analysis of acts 210-240 is to be repeated, for example, after some amount of time passes (e.g., a year, multiple months, a period of time between visits of the subject to their clinician, etc.). When it is determined that the analysis of acts 210-240 is not to be repeated, process 200 ends.
[0153] -28- #14604877vl On the other hand, when it is determined that the analysis of acts 210-240 is to be repeated, process 200 returns to act 210, where updated cardiometabolic health data is obtained for the subject. Next process 200 involves, at act 220, determining, using at least some of the updated cardiometabolic health data, the first trained machine learning (ML) model and for each of multiple second time intervals, one or more updated measures of risk that the subject develops cardiovascular disease to obtain multiple updated measures of risk corresponding to the multiple second time intervals. Next process 200 involves, at act 230, determining, using the multiple updated measures of risk that the subject develops cardiovascular disease and the at least one second trained ML model, benefit of administering to the subject one or more second therapeutic interventions designed to reduce risk of cardiovascular disease by targeting one or more modifiable causes of the cardiovascular disease. Then, process 200 involves, at act 240, identifying, using the determined benefit of administering the one or more therapeutic interventions, at least one second therapeutic intervention to recommend to be administered to the subject. The second therapeutic intervention may involve continuing administering the initial therapeutic intervention sequence identified the first time through the process 200 (e.g., during the first time a subject’s cardiometabolic data is analyzed) and / or adding a new therapeutic to the i nitial therapeutic intervention sequence. Other modifications to the initial therapeutic intervention sequence are also envisaged.
[0154] In this way, process 200 may be used to longitudinally monitor the health of the subject and to provide regularly updated guidance about how to personalize, to the subject, the prevention of development of cardiometabolic disease in the subject.
[0155] It should be appreciated that the process 200 is illustrative and that there are variations thereof. For example, in some embodiments, the process 200 may be performed for other diseases not just cardiovascular diseases, for example type 2 diabetes. Thus, for example, process 200 may be performed to personalize the prevention of cardiometabolic disease (e.g., cardiovascular disease, diabetes, etc.) in a subject.
[0156] Process 200 may be performed by any suitable computing system or systems, which may be a single computing system and / or multiple computing systems (e.g., cloud computing system, distributed computing system, one or more computing devices such as laptops, desktops, smartphones, etc.), as aspects of the technology described herein are not limited in this respect. For example, process 200 may be performed using the system 150 described with reference to FIG. IE and / or any computing device(s) described with reference to FIG. 11.
[0157] It should also be appreciated that, in some embodiments, one or more acts of process 200 may be performed by a computing system or systems while one or more other acts of process 200 may be performed, at least in part, manually. For example, in embodiments, where obtaining cardiometabolic data (at 210) involves actually making physical measurements of the subject (e.g., measuring blood pressure, weight, waist circumference, etc.) or biochemical measurements of the subject (e.g., doing a blood test with a lipid panel to determine, for example, the subject’s LDL level), such actions may be performed in part by a clinician and / or laboratory technician. As another example, in embodiments in -29- #14604877vl which process 200 includes further acts such as administering a therapeutic intervention or a sequence of therapeutic interventions, such acts may be performed in part by a clinician (administering a treatment to the subject) and / or the subject (e.g., a patient self-administering the treatment).
[0158] Obtaining Cardiometabolic Health Data for Subject
[0159] As shown in FIG. 2A, process 200 includes obtaining cardiometabolic health data for the subject at act 210. In some embodiments, this may be performed in accordance with the illustrative process shown in FIG. 2B. As shown in FIG. 2B, act 210 may involve: (1) obtaining, at act 211, subject characteristic and / or measurement data; (2) standardizing, at act 212, the received subject characteristic and / or measurement data; and (3) positionally encoding, at act 213, the data standardized at act 212.
[0160] Next, in some embodiments, the standardized data obtained at 212 together with the positionally encoded data obtained at act 213 may be processed, at act 214, to determine the subject’s LDL trajectory. The subject’s LDL trajectory may indicate estimated LDL levels for the subject including an estimated LDL level for each of multiple prior ages of the subject (e.g., to birth) and / or multiple future ages of the subject (e.g., up until some threshold age, for example, 80, 85, or 90). The processing to determine the subject’s LDL trajectory may be performed using a third trained ML model (e.g., a deep neural network such as a recurrent neural network, for example, a unidirectional or a bi-directional long short-term memory (LSTM) neural network, a neural network having a gated recurrent unit (GRU) architecture or a bi-directional GRU architecture).
[0161] Additionally, in some embodiments, the standardized data obtained at 212 together with the positionally encoded data obtained at act 213 may be processed, at act 215, to determine the subject’s SBP trajectory. The subject’s SBP trajectory may indicate an estimated SBP level for the subject for each of multiple prior ages of the subject (e.g., to age 20) and / or multiple future ages of the subject (e.g., up until some threshold age, for example, 80, 85, or 90). The processing to determine the SBP trajectory processing may be performed using a fourth trained ML model (e.g., a deep neural network such as a recurrent neural network, for example, a unidirectional or a bi-directional LSTM neural network, a neural network having a gated recurrent unit (GRU) architecture or a bi-directional GRU architecture) different from the third trained ML model.
[0162] Additionally, in some embodiments, the standardized data obtained at act 212 together with the positionally encoded data obtained at act 213 may be processed, at act 216, to determine the subject’s Lp(a) trajectory. The subject’s Lp(a) trajectory may indicate an estimated Lp(a) level for the subject for each of multiple prior ages of the subject (e.g., to birth) and / or multiple future ages of the subject (e.g., up until some threshold age, for example, 80, 85, or 90). The processing to determine the Lp(a) trajectory processing may be performed using a fifth trained ML model (e.g., a deep neural network such as a recurrent neural network, for example, a unidirectional or a bi-directional LSTM neural network, a neural network having a gated recurrent unit (GRU) architecture or a bi-directional GRU architecture) different from the third and fourth trained ML models.
[0163] -30- #14604877vl In turn, at act 217, the LDL, SBP, and LP(a) trajectories (determined at acts 214-216) may be used to determine the cumulative LDL exposure trajectory (indicating cumulative exposure to LDL at each of multiple ages (e.g., each age from birth to 80)), the cumulative SBP exposure trajectory (indicating cumulative exposure to SBP at each of multiple ages (e.g., each age from 20 to 80)), and / or the cumulative Lp(a) trajectory (indicating cumulative exposure to Lp(a) at each of multiple ages (e.g., each age from birth to 80)).
[0164] Next, at act 218, one or more additional feature trajectories may be determined for the subject. For example, a weight trajectory, a waist-circumference trajectory, and / or an HbAlc trajectory may be determined for the subject (e.g., from the “average” trajectories in the reference population for a person of the same sex and age, as described herein).
[0165] Next, at act 219, the various cardiometabolic data obtained and / or determined at acts 211-218 is used for further processing. For example, the data may be stored, in memory (volatile or non-volatile), using any suitable data structure or data structures, and may be subsequently accessed when performing further processing (e.g., as part of act 220 of process 200). As another example, the data may be passed onto other software modules for processing (e.g., to software modules configured to implement the function of act 220 of process 200, for example, risk assessment module 153).
[0166] Aspects of some of the acts 211-219 are now further described.
[0167] As described herein, act 211 may involve obtaining subject characteristic and / or measurement data for the subject.
[0168] In some embodiments, obtaining the cardiometabolic health data for the subject comprises obtaining subject characteristic and / or measurement data comprising: one or more values for one or more clinical characteristics of the subject, one or more values for one or more physical measurements of the subject, and / or one or more values for one or biochemical measurements of the subject. In some cases, multiple values of each type of data (i.e., clinical characteristics, and physical measurements, and biochemical measurements) of the subject are obtained.
[0169] In some embodiments, the clinical characteristics of the subject may include demographic characteristics (e.g. age, biological sex, ethnicity, marital status, geographic location, etc.), genetic characteristics (e.g., presence of gene variants associated with cardiovascular disease, polygenic scores for coronary heart disease, etc.), family history characteristics (e.g., history of coronary heart disease), comorbidities (e.g., hypertension), and / or risk factors (e.g., history of tobacco use). For example, the clinical characteristics of the subject may include one or more values for one or more (e.g., one, some, or all) clinical characteristics of the subject selected from the group consisting of age, biological sex, family history of atherosclerotic cardiovascular disease (ASCVD) including coronary heart disease (CHD), family history of hypertension (HTN), family history of type 2 diabetes (T2D), one or more polygenic scores (e.g., polygenic score for ASCVD, polygenic score for CHD, polygenic score for HTN, polygenic score for T2D, polygenic score for body mass index (BMI), etc.), inherited predisposition or predispositions, and history of tobacco use. In some embodiments, act 211 may involve accessing clinical -31- #14604877vl characteristic values (e.g., from a database or other data store, through a user interface such as a graphical user interface, from electronic health records (EHR), etc.) that were previously obtained. In other embodiments, act 211 may involve taking the clinical characteristic values in the first instance (e.g., by a clinician taking the subject’s history, the subject providing such information, etc.).
[0170] As discussed above, the clinical characteristics of the subject may, in some embodiments, include a polygenic score for ASCVD. Polygenic scores (ASCVD-PGS) for ASCVD (e.g., heart attack or stroke) are crude instruments that combine a large number (n) of: (a) genetic variants associated with ASCVD; or (b) a very large number of unselected genetic variants (typically 1-10 million variants). When using a very large number of unselected variants, the ASCVD-PGS may be constructed as follows: (1) the association between each genetic variant and the outcome of interest (heart attack & stroke) is measured in a reference population; (2) the effect size of each allele (log hazard ratio or log odds ratio) is recorded for each genotype is relative to the most common genotype for that allele; (3) an Effect Size Table is constructed for each of up to 10 million variants - where rows indicate each genetic variant and the columns are the possible genotypes for that variant, and each cell in the table contains the effect size (log hazard ratio or log odds ratio) for each genotype relative to the most common genotype for that variant (note that an assigned reference log HR or log OR of 0 is equivalent to a hazard ratio or odds ratio of 1); (4) for each subject, the genotype at each variant included in the ASCVD-PGS is determined; and (5) the corresponding log HR or log OR for that genotype from the Effect Size Table is then summed for each variant included in the score for each subject.
[0171] This process creates an ASCVD-PGS (a specific polygenic score for the composite outcome of major adverse cardiovascular events (MACE), such as fatal or non-fatal MI, fatal or non-fatal ischemic stroke, and coronary revascularization) with a normal distribution by design and construction. Note that a polygenic score must correspond to the outcome being predicted to have any meaning. Alternatively, instead of or in addition to an ASCVD-PGS, one can include a variety of polygenic scores estimating the inherited risk of MI, CHD, stroke, LDL, SBP, T2D, BMI, etc. The ASCVD-PGS is then standardized to have a mean of 0 and a standard deviation of 1, and the ASCVD-PGS for each person may be adjusted to this scale. (As described below, the ASCVD-PGS may be further standardized, e.g., using min-max scaling as part of act 212).
[0172] In some implementations, an ASCVD-PGS for the subject may be calculated as part of act 211 of process 200. For example, the steps (l)-(3) may be performed prior to the start of process 200, while steps (4)-(5) may be performed for the subject as part of process 200. In other implementations, an ASCVD for the subject may have been determined prior to performance of process 200 and may be accessed (rather than calculated) as part of act 211 of process 200. Similarly, one or more other polygenic scores (e.g., for coronary heart disease, hypertension, etc.), when included, may be calculated as part of process 200 or calculated prior to the start of process 200 and instead accessed during process 200.
[0173] In some embodiments, the physical measurements of the subject may include physical measurements of quantities that are risk factors for cardiometabolic (e.g., cardiovascular) disease, and / or -32- #14604877vl physiological measurements selected from blood pressure measurements(e.g., systolic and diastolic blood pressure) and measurements indicative of adiposity (e.g., weight, waist circumference, body mass index, etc.). For example, the subject characteristic and / or measurement data comprises one or more values for one or more (e.g., one, some, or all) physical measurements of the subject selected from the group consisting of systolic blood pressure (SBP), diastolic blood pressure (DBP), weight, waist circumference, height, body mass index (BMI), and waist-to-height ratio. In some embodiments, act 211 may involve accessing physical measurement values (e.g., from a database or other data store, through a user interface such as a graphical user interface, from electronic health records, etc.) that were previously obtained. In other embodiments, act 211 may involve taking the physical measurements (e.g., by a clinician).
[0174] In some embodiments, the biochemical measurements of the subject may include biochemical measurements of quantities that are risk factors for cardiometabolic (e.g., cardiovascular) disease and / or biochemical measurements selected from measurements of one or more biochemical markers, optionally a protein, lipid or lipoprotein, in a blood, serum or plasma sample from the subject. For example, the subject characteristic and / or measurement data comprises one or more values for one or more (e.g., one, some, or all) biochemical measurements of the subject selected from the group consisting of low-density lipoprotein (LDL) level, high-density lipoprotein (HDL) level, total cholesterol level, triglyceride (TG) level, non-HDL cholesterol level, apolipoprotein (apoB) level, lipoprotein (a) (Lp(a)) level, hemoglobin Ale (HbAlc) level, and c-reactive protein (CRP) level. In some embodiments, act 211 may involve accessing biochemical measurement values (e.g., from a database or other data store, through a user interface such as a graphical user interface, from electronic health records, etc.) that were previously obtained. In other embodiments, act 211 may involve obtaining the biochemical measurements (e.g., by obtaining a biological sample of the subject; and determining, by processing the biological sample, the one or more values for the one or more biochemical measurements of the subject; or determining, by processing a biological sample previously obtained from the subject the one or more values for the one or more biochemical measurements of the subject). By way of example and not limitation, HbAlc may be derived from whole blood samples (e.g., by determining as a percentage of total hemoglobin within red blood cells that are glycated), glucose may be measured from a plasma sample, lipids (e.g., LDL and apolipoproteins including apoB and Lp(a)) but also may be measured from a serum sample, but can also be measured in heparinized plasma depending on the laboratory reference being used.
[0175] An illustrative example of the subject characteristic and / or measurement data is shown in FIG.
[0176] 3A. It should be appreciated that this example is merely illustrative and that, in other embodiments, the subject characteristic and / or measurement data may include one or more values of one or more other characteristics and / or measurements than those shown in FIG. 3A. This may include any subject characteristics and / or measurements (whether clinical, physical, or biochemical) that are of clinical relevance (e.g., either as stated in objective clinical guidelines and / or in the judgment of the subject’s clinician and / or clinical team).
[0177] -33- #14604877vl It should be appreciated that it is not required that all of the aforementioned types of subject characteristic and / or measurement data be obtained for an individual in order for process 200 to be meaningfully applicable to the individual. Indeed, only a subset of the measurements may be available and be used to meaningfully perform the acts shown in FIG. 2B (e.g., to estimate LDL, SBP, and Lp(a) trajectories at acts 214-216) and process 200 more generally (e.g., to determine for each of multiple time intervals, one or more measures of risk that the subject develops cardiovascular disease to obtain multiple measures of risk corresponding to the multiple time intervals; determining, using the multiple measures of risk that the subject develops cardiovascular disease, benefit of administering to the subject one or more therapeutic interventions designed to reduce risk of cardiovascular disease by targeting one or more modifiable causes of the cardiovascular disease; and identifying, using the determined benefit of administering the one or more therapeutic interventions, at least one therapeutic intervention to recommend being administered to the subject).
[0178] For example, in some implementations, act 211 may involve obtaining only the subject’s age, biological sex, and LDL, SBP, and DBP measurements. Additionally, an Lp(a) measurement may be obtained if available (if not, then act 216 may be omitted). Additionally, any of the characteristics
[0179] It should be noted that even if not all possible types of measurements are initially available, they may be added at a future point in time during longitudinal follow-up monitoring if they become available then. At such future time points, an individual being monitored may have multiple measured biomarker levels at specific time points on their individual health trajectory for “untreated” targets. The updated measurements for “treated” targets (LDL, SBP, or both) would be used to quantify the legacy benefit from earlier interventions to slow plaque progression and arterial wall injury accumulation, as described herein. This is part of the rationale for longitudinal monitoring and continuously dynamically updating inputs based on each person’s evolving health trajectory of both treated and untreated exposures. With each update, the algorithms better learn each person’s individual trajectory based on the increasing number of longitudinal measurements for each person.
[0180] This flexibility is by design - and reflects the 'principle of parsimony’ that informs some of the design elements of the technology described herein. That is, we want to be able to use a minimal set of key features to predict evolving risk based on a person’s evolving health trajectory to make the system available to as many people as possible around the world as inexpensively as possible. Adding additional features simply refines the individual absolute estimate of risk at any given time. Of course, these additional features may be quite important (particularly for extreme values) - and therefore add additional information, when available.
[0181] Importantly, the estimates of benefit are not impacted. These are based on randomized evidence for an increment of LDL or SBP lowering - with the magnitude of proportional benefit determined by the magnitude and duration of exposure (increasing as disease progression is slowed over time). Because benefit estimated from randomized evidence represents biological causes and effects, we can assess if the benefit of lowering LDL, SBP, or both is conditional on other exposures or biological processes. All of -34- #14604877vl the randomized evidence (both randomized clinical trials and nature’s randomized clinical trials) suggest that the benefit of lowering LDL, SBP, or both (and the increased risk caused by an increment of increased LDL, SBP, or both) is independent and therefore very similar regardless of any other exposures.
[0182] Returning to FIG. 2B, as described herein, act 212 may involve standardizing the subject characteristic and / or measurement data obtained at act 211. This may be done because raw measurements for the clinical, physical, and biochemical measurements used to characterize the current state of cardiometabolic health of a subject may vary widely in magnitude and may be measured on different scales. Indeed, the data obtained at act 211 may be first standardized in order to: (i) improve computational efficiency (e.g., of algorithms used downstream in process 200); (ii) stabilize gradient flow, and prevent exploding or vanishing gradients (e.g., in using various machine learning models used as part of process 200 and / or training such models for use as part of process 200); and (3) mitigate bias toward features with larger numeric ranges. As such, the standardization performed at act 212 is designed to improve computational efficiency and to preserve use of the same dense input feature vector for multiple different sequential deep and machine learning algorithms, which may be used as part of process 200, as described herein. Moreover, standardization of the raw measurements in the processes described herein (e.g., as implemented by a Deep Causal Al Agent) preserves the possibility of recovering an estimate of the magnitude of the effect size for each input feature.
[0183] The standardization may be performed in any suitable way. Indeed, numerous standardization methods are available. In some embodiments, all measured inputs may be converted into respective numeric values between 0 and 1 to improve computational efficiency and eliminate the bias from measuring biological parameters on different scales that can vary by 1000-fold. To this end, dichotomous variables may be encoded as 0 or 1; ordinal variables may be one-hot encoded to create a series of dummy variables with values of 0 or 1; and continuous variables (e.g., polygenic scores, such as the ACSVD-PGS) may be Min-Max standardized to continuous values between 0 and 1, as detailed below. These input values create an n-dimensional input vector.
[0184] Family history may be encoded as a binary (dichotomous) variable in some embodiments, but as an ordinal variable in other embodiments. Using family history of CHD as an example, the input indicating whether the subject has a family history of coronary heart disease (CHD) may be coded as a binary variable, and may be standardized as having the values “1” and “0” indicating the presence or absence of family history of CHD, respectively.
[0185] In other embodiments, the input indicating whether the subject has a family history of CHD may be coded as an ordinal variable. For example, the variable may take on one value (e.g., 0) to indicate that no first degree relatives have ever had a MI, stroke, or coronary revascularization procedure (which demonstrates a substantial atherosclerotic burden producing impaired blood flow requiring mechanical intervention to reduce the obstruction and restore enough blood flow to meet oxygen demands with exertion), another value (e.g., 1) to indicate that either the biological mother or father has had an MI, -35- #14604877vl stroke or revascularization, another value (e.g., 2) to indicate that both the biological mother and father have had an MI, stroke, or revascularization, another value (e.g., 3) to indicate that both the biological mother and father, and one biological sibling have had an MI, stroke or revascularization, and yet another value (e.g., 4) to indicate that both the biological mother and father, and two or more biological siblings have had an MI, stroke or revascularization. This is important because family history has a dosedependent effect on the risk of having a major atherosclerotic cardiovascular event. Capping the ordinal representation at four reflects extreme inherited predisposition of atherosclerotic cardiovascular disease (and because very little data is available for persons with more extreme family histories). Interestingly, the dose-response effect of family history and the dose response effect of polygenic predisposition (as measured by a crude first generation polygenic score) are independent and additive (multiplicative on the risk scale, and additive on the log-risk scale).
[0186] With respect to continuous variables, min-max scaling (sometimes termed “min-max normalization”) may be used. This technique may be particularly useful when input features are naturally bounded (e.g., as is the case with much biological data). In some embodiments, min-max scaling may be performed using the formula:
[0187] ■ = >
[0188]
[0189] -'"’■sax -hiim
[0190] where x is the original feature value, and xmin and %max are the minimum and maximum values of the feature (e.g., in a reference population or on a given scale). An illustrative example of the standardized values (of some subject characteristic and / or measurement data from FIG. 3A) is shown in FIG. 3B.
[0191] As described herein, act 213 may involve positionally encoding the standardized data obtained at act 212 (though it should be appreciated that data can be positionally encoded even without being standardized, in some embodiments).
[0192] In some embodiments, act 213 involves positionally encoding feature values using age of the subject as the position function. Positional encoding by age injects a biological context into each included feature and is designed to permit the value of each feature to be interpreted within the context of the age of the subject at which the feature value was measured. This formulation of positional encoding provides unique biological information about the likely trajectory of prior values for physiologic and biochemical features that may dynamically change value over time. The feature values (e.g., from act 212) may be used together with their positionally encoded versions (e.g., as output at act 213) for subsequent processing as described herein (e.g., with respect to acts 214, 215, and 216 for calculating LDL, SBP, and Lp(a) trajectories for the subject).
[0193] In more detail, the interpretation of the absolute value of a particular biological parameter may vary substantially depending on biological context, including the subject’s age, at which the biological parameter was measured. For example, an SBP of 130 mmHg in a 25 year old man is 10 mmHg above the age-and-sex adjusted population median SBP level of 115 mmHg. By contrast, the same SBP level of -36- #14604877vl 130 mmHg in a 65 year old man is 15 mmHg below the age and sex adjusted population median SBP level of 145 mmHg. Thus, the very same SBP level can have very different biological and physiological implications depending on the age and sex of the subject being evaluated.
[0194] Accordingly, in some embodiments, positional encoding may be added to the input features to provide a sense of the timing within the subject’s evolving health, biological context, and / or age at which the feature was measured. To this end, the standardized values obtained at act 212 may be positionally encoded by age and the obtained positionally-encoded standardized values may be appended to the set of standardized values (which may be referred to herein as a feature vector or a dense input feature vector). The original standardized measurements are not thrown away; instead, they are retained to preserve any information that may be contained in a feature interpreted without regard to the timing during the person’s cardiometabolic health trajectory at which the feature is measured.
[0195] Positional encoding by age may be performed using any of a variety of positional encoding methods. For example, sinusoidal encoding may be used. This encoding involves calculating the encoding using the following formula:
[0196]
[0197] = sirs (
[0198]
[0199] lOC'OO’i
[0200]
[0201] whereby pos is the position in the sequence (age at measurement), d is the dimension of the positional encoding vector (e.g., matching the dimension of the input vector of standardized values returned from act 212), i is the index of the feature within the positional encoding vector. For even indices (2;), the above uses the sine function to encode the position. For odd indices (2;+l), the above uses the cosine function to encode the position. The term 10000(21 / t)defines a different frequency for each dimension; other constants may be used. Once the positional encoding is computed for each position in the input sequence, it is appended to the input feature vectors to obtain a positionally-encoded version of the standardized feature values to obtain the positionally encoded values, which in turn are appended to the standardized feature values (which are not discarded, as described above). That may be represented as XinpUt = Xemhedding + PE. Thus, an input vector of standardized values having n values coming out of act 212 now has 2n values, with the positionally encoded values appended to it.
[0202] An illustrative example of computing a positional encoding of the standardized values (from FIG. 3B) is shown in FIG. 3C, where the positional encoding vector (0.231, 0.478, 0.687, etc.) is added to the standardized values (0.47, 0.5, 0, etc.) to obtain the positional encoding.
[0203] As described herein, act 214 may involve determining an LDL trajectory for the subject using the standardized and positionally-encoded feature values (values of clinical characteristics, physical measurements, and / or biochemical measurements obtained at act 211) obtained at acts 212 and 213. The LDL level trajectory for the subject may include an estimated LDL level for the subject for each of
[0204] #14604877vl multiple prior ages of the subject (e.g., every age from birth to current age) and multiple future ages (up to the age 80 years, or other pre-defined age limit) of the subject.
[0205] In the context of biomarker (e.g., LDL, SBP, Lp(a), weight, waist circumference, Hbalc) trajectories, “prior ages” refers to ages of the subject prior to the most recent age at which the biomarker was measured for the subject, and “future ages” refers to ages of the subject subsequent to the most recent age at which the biomarker was measured for the subject. In other words, the words “prior” and “future” in this context refer to ages before and after a point in time at which a most recent measurement of the biomarker is available, which may or may not be the same as the current age of the subject (i.e. their age at the time of running a method as described herein). When the analysis of estimating a biomarker trajectory, part of process 200, is performed when the subject is aged the same as when the biomarker was last measured, then “the most recent age at which the biomarker was measured” is the subject’s current age and “prior ages” and “future ages” are relative to the current age of the subject whose health is being analyzed.
[0206] In embodiments, the reference to estimates at “multiple prior ages” refers to estimates obtained for each of multiple ages of the subject between a predetermined age (e.g. birth) and the most recent age at which the biomarker was measured, or the earliest age at which a biomarker measurement is available for the subject. For example, multiple measurements of the biomarker at respective different ages may be available for a subject (also referred to as “historical measurements”), and estimates at multiple prior ages may be obtained between the predetermined age and the earliest age of the respective different ages. In embodiments, the reference to estimates a “multiple future ages” refers to estimates obtained for each of multiple ages of the subject between the most recent age at which the biomarker was measured and a predetermined cutoff age (e.g. 75, 80, 85 or 90 years old).
[0207] For example, in the context of the LDL level trajectory, “prior ages” refers to ages of the subject prior to the subject’s age at which their LDL was most recently measured. As one specific example, determining an LDL trajectory for a subject that is 45 years old and whose LDL was measured at age 45, may involve estimating LDL levels from every age from birth until age 44 (these would be estimated levels for “prior ages”) and LDL levels from age 46 until age 80 (these would be estimated levels for “future ages”). As another specific example, determining an LDL trajectory for a subject that is 45 years old and whose LDL was measured at age 43, may involve estimating LDL levels from every age from birth until age 42 (these would be estimated levels for “prior ages”) and LDL levels from age 44 until age 80 (these would be estimated levels for “future ages”).
[0208] In embodiments where more than a single LDL measurement is available for a patient (e.g., longitudinal LDL inputs are available for multiple ages of the subject, whether or not consecutive), the LDL level trajectory may include estimated LDL levels for prior ages (relative to most recent age at which an LDL measurement is available) for which historical measurements are not available, and for ages following the most recent age for which an LDL measurement is available for the subject.
[0209] -38- #14604877vl In some embodiments, estimating the LDL level trajectory for the subject comprises: (a) generating a feature vector representing the subject from the subject characteristic and / or measurement data (e.g., by encoding the subject characteristic and / or measurement data as described herein); and (b) processing the feature vector generated from the subject characteristic / measurement data using a third trained ML model (which may be referred to as “an LDL trajectory prediction machine learning model”) to obtain the LDL level trajectory for the subject, the third trained ML model having been trained to estimate an LDL level for the subject at each of the multiple prior ages and each of the multiple future ages (relative to the age at which the most recent LDL level measurement was made). The feature vector may include both standardized values of feature values obtained at act 212 and positionally encoded versions thereof obtained at act 213.
[0210] The third trained ML model may be of any suitable type. For example, the third trained ML model may be a neural network model, such as a neural network having a transformer-based architecture or a recurrent neural network (RNN) model. For example, the third trained ML model may be a recurrent neural network such as a long short-term memory (LSTM) neural network model (e.g., a bi-directional neural network model), a gated recurrent unit (GRU) neural network model (e.g., a bi-directional GRU neural network model), or an ODE-RNN model. An ODE-RNN model is a recurrent neural network having continuous-time hidden dynamics defined by ordinary differential equations (ODEs), as described for example in: Y. Rubanova, R. T. Chen, and D. K. Duvenaud, “Latent ordinary differential equations for irregularly-sampled time series,” in Advances in Neural Information Processing Systems (NeurlPS 2019) and in E. De Brouwer, J. Simm, A. Arany, and Y. Moreau, “GRU-ODE-Bayes: Continuous modeling of sporadically-observed time series” in Advances in Neural Information Processing Systems (NeurlPS 2019).
[0211] For example, in some embodiments, the third trained ML model may be a bi-directional LSTM (bi-LSTM) neural network model. In this case, processing the feature vector using the third trained ML model includes: (a) estimating an LDL level for the subject at each of the multiple future ages (e.g., up to the age 80 years, or other pre-defined age limit) using a forward pass of the bi-LSTM neural network model; and (b) estimating an LDL level for the subject at each of the multiple prior ages (e.g., at all prior ages down to birth) using a backward pass of the bi-LSTM neural network model. In some embodiments, the bi-LSTM model may have hundreds, thousands, or tens of thousands of parameters (e.g., at least 100, at least 1000, at least 2500, at least 5000, at least 10,000, at least 25,000, at least 50,000, at least 100,000, between 100 and 1000, between 1000 and 5000, between 2500 and 25,000, between 10,000 and 100,000, between 50,000 and 250,000 or any other range within these ranges).
[0212] Accordingly, in some embodiments, a bi-directional LSTM model may predict the trajectory of a subject’s LDL (and, therefore, the subject’s cumulative exposure to LDL). In the forward pass (left-to-right), the bi-LSTM predicts the LDL level at all future ages (up to the age 80 years, or other pre-defined age limit) based on the measured LDL level at the age of the subject at which the most recent LDL measurement was obtained (which may be the subject’s current age if the prediction using the bi-LSTM -39- #14604877vl is being performed in the same year as the year during which the most recent LDL measurement was obtained) and the combined effect of the other standardized and positionally-encoded input features. In the backward pass (right- to-left), the bi-LSTM predicts the LDL level at all previous ages from the age of the subject at which the most recent LDL measurement was obtained (which may be the subject’s current age) going backward until (e.g., birth) using the same neural network.
[0213] The predicted LDL level at all ages (e.g., from birth or other pre-defined lower age limit to 80 or other pre-defined upper age limit) is then plotted to provide the predicted trajectory of LDL levels throughout life for the subject being evaluated. Examples of this are shown in FIGs. 3D and 3E.
[0214] In particular, FIG. 3D shows the predicted LDL cholesterol levels (in mmol / L) for all ages of the male subject whose data was shown in FIG. 3A. In this example, the person’s LDL levels are predicted from their current age (40) up to age 80 years, and from their current age (40) back to birth. FIG. 3D shows the predicted LDL levels to reveal this person’s predicted trajectory of changing LDL levels throughout life.
[0215] It is important to note that the trajectory of LDL throughout life differs substantial ly between persons with male and female biological sex - even among persons who have identical levels of all other features at the same ages. Therefore, in some embodiments, one ML (e.g., bi-LSTM, bi-GRU, ODE-RNN) model may be trained to predict LDL levels on data from persons with male biological sex and another ML (e.g., bi-LSTM, bi-GRU, ODE-RNN) model may be trained to predict LDL levels on data from persons with female biological sex. And, indeed, the analyses described herein may use biological sex-specific algorithms, in some embodiments, to achieve greater accuracy. FIG. 3E shows estimates of LDL levels for two persons with different biological sex.
[0216] In more detail with respect to embodiments where a bi-LSTM is used to estimate LDL trajectories, a long short-term memory (LSTM) network is a type of recurrent neural network (RNN) designed to effectively learn and capture long-term dependencies in sequential data (as exemplified here through tasks such as predicting the trajectory of how levels of LDL and SBP change throughout a subject’s life).
[0217] LSTM networks are described by J. Schmidhuber and S. Hochreiter. (" Long short-term memory." Neural Comput. 9.8 (1997): 1735-1780). Bidirectional LSTM networks are described by A. Graves and J. Schmidhuber. (" Framewise phoneme classification with bidirectional LSTM and other neural network architectures." Neural networks 18.5-6 (2005): 602-610). The neural network may be trained using any suitable neural network optimization software. The optimization software may be configured to perform neural network training by gradient descent, stochastic gradient descent, or in any other suitable way. In some embodiments, the Adam optimizer is used. The Adam optimizer is described by Kingma, D. and Ba, J. ((2015) Adam: A Method for Stochastic Optimization. Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015)).
[0218] A bidirectional LSTM (bi-LSTM) model processes input sequences in both forward and backward directions, providing a richer context for considering both past and future information. This -40- #14604877vl makes the model well-suited for tasks where understanding the full context of a sequence is important, including for the tasks described herein such as predicting the future and past levels of LDL (similarly, SBP and Lp(a)) from a single measured value (or multiple longitudinal values) of LDL (similarly, SBP and Lp(a)) to provide an estimate of total cumulative exposure to LDL (similarly, SBP and Lp(a)).
[0219] Cumulative exposure to LDL, SBP, and Lp(a) are engineered features that have biological meaning that renders them informative biomarkers for determining risk that an individual develops cardiovascular disease and the benefit of treating same preventatively or otherwise.
[0220] In some embodiments, the bi-LSTM model for predicting a subject’s LDL levels may be trained using participant data for at least 10,000 participants enrolled in one or more prospective studies, whereby for each of the at least 10,000 participants repeated longitudinal measures of LDL, HDL, SBP, weight, waist circumference, BMI, and / or HbAlc are available over at least 10, at least 20, at least 30, or at least 50 years of follow-up, with censoring applied at time lost to follow-up, death, a first cardiovascular event, or initiation of lipid lowering therapy.
[0221] In one illustrative embodiment, the bi-LSTM model for predicting a subject’s LDL levels was trained on individual participant data from 16,235 participants enrolled in one of three (3) long-term prospective cohort studies (the Framingham Heart Study (FHS), the Framingham Offspring Study (FOS), and the Multi-Ethnic Study of Atherosclerosis (MESA)) for whom repeated longitudinal measures of LDL, HDL, SBP, weight, waist circumference, BMI, and HbAlc (or fasting plasma glucose) are available over 25-59 years of follow-up (participants censored at time lost to follow-up, death, or first atherosclerotic cardiovascular events, or initiation of lipid lowering therapy). Such training data may be used to train alternative model types examples of which are provided herein including GRU-type or ODE-RNN models, for example.
[0222] In some embodiments, the bi-LSTM may be trained using the mean-squared error (MSE) loss function. This loss may be calculated as follows.
[0223] First, the input sequence data are processed in both directions (forward and backward). In the forward direction, LSTM processes the input sequence from the left to right. In the backward direction, the bi-LSTM processes the input sequence from the right to left. At each time step t, the forward and backward hidden states
[0224]
[0225] [ htand ht<— ] are concatenated (or summed, depending on implementation) to form the full hidden state ht. The combined hidden state is then passed to the next layer or used for prediction during inference.
[0226] After processing the input sequence through the bi-LSTM layers, the output at each time step may be passed through a dense (fully-connected) layer or a series of layers to produce the final predictions t? at each time step t. The bi-LSTM model outputs a sequence of predictions, one for each time step in the input sequence.
[0227] In turn, to calculate the loss during training, for each time step t in the sequence, the squared difference between the predicted and measured values is computed as:
[0228]
[0229] -41- #14604877vl where: ytis the true value at time step t and is the predicted value at time step t.
[0230] The MSE loss over the entire sequence is calculated by summing the squared errors across all time steps and taking the average:
[0231] MSE - 4 £0-7 -, Ef
[0232]
[0233] where T is the total number of time steps in the sequence.
[0234] Once the MSE loss is computed, backpropagation is used to calculate the gradients of the loss with respect to the model parameters (LSTM weights, biases, etc.). These gradients are then used to update the model parameters using an optimization algorithm such as stochastic gradient descent (SGD) or Adam.
[0235] In some embodiments, the LSTM models may be implemented using PyTorch's native libraries, which provide direct support with well-optimized functions for LSTM, including:
[0236] • torch.nn - The module includes nn. LSTM for implementing the LSTM layer, as well as nn. LSTMCell for finer control over LSTM cell computations. Additional modules like nn. Linear may be used to map the LSTM outputs to desired output dimensions.
[0237] • torch.optim - For defining optimization algorithms, such as Adam or SGD, to train the model.
[0238] • torch.utils.data - This is useful for managing time-series datasets and implementing custom data loaders or transformers.
[0239] Returning to FIG. 2B, act 215 involves determining the SBP trajectory for the subject using a 4thtrained ML model using the standardized and positionally encoded feature values (values of clinical characteristics, physical measurements, and / or biochemical measurements obtained at act 211) obtained at acts 212 and 213. The SBP level trajectory for the subject may include an estimated SBP level for the subject for each of multiple prior ages of the subject (e.g., every age to 20 or other predefined lower age limit) and multiple future ages (up to the age 80 years, or other pre-defined upper age limit) of the subject. As noted above for LDL estimation, separate models may be used to estimate SBP levels for males and females.
[0240] In some embodiments, estimating the SBP level trajectory for the subject comprises: (a) generating a feature vector representing the subject from the subject characteristic and / or measurement data; and (b) processing the feature vector using a fourth trained ML model (which may be referred to as “an SBP trajectory prediction machine learning model”) to obtain the SBP level trajectory for the subject, the fourth trained ML model having been trained to estimate an SBP level for the subject at each of the multiple prior ages and each of the multiple future ages. The feature vector may include both standardized values of feature values obtained at act 212 and positionally encoded versions thereof obtained at act 213. The feature vector may be the same as the feature vector used by the third ML model to estimate the LDL level trajectory for the subject at act 214.
[0241] -42- #14604877vl The fourth trained ML model may be of any suitable type including any of the types described above for the third trained ML model, including a recurrent neural network, such as a a bi-directional LSTM (bi-LSTM) neural network model, a bi-directional GRU neural network model, or an ODE-RNN model. In embodiments where the fourth trained ML model is a bi-LSTM, processing the feature vector using the fourth trained ML model comprises: estimating an SBP levels for the subject at each of the multiple future ages (relative to the age at which the most recent SBP measurement was taken) using a forward pass of the bi-LSTM model; and estimating an SBP level for the subject at each of the multiple prior ages using a backward pass of the bi-LSTM model. In some embodiments, the bi-LSTM model may have hundreds, thousands, or tens of thousands of parameters (e.g., at least 100, at least 1000, at least 2500, at least 5000, at least 10,000, at least 25,000, at least 50,000, at least 100,000, between 100 and 1000, between 1000 and 5000, between 2500 and 25,000, between 10,000 and 100,000, between 50,000 and 250,000 or any other range within these ranges).
[0242] The fourth ML model may be trained using similar training data as was used to train the third ML model for LDL level estimation, and may be trained using analogous training methods.
[0243] In more detail with respect to embodiments where a bi-LSTM is used to estimate SBP trajectories, the bi-LSTM may be designed to predict the SBP trajectory for an individual person which, in turn, may be used to estimate rate of rise and / or cumulative exposure to SBP for the individual. The forward pass (left-to-right) of this bi-LSTM predicts the SBP level at all future ages (up to the age 80 years, or other pre-defined upper age limit) based on the measured SBP level at the current age and the combined effect of the other input features. The backward pass (right-to-left) predicts the SBP level at all previous ages from the current age backwards to age 20 years (or other pre-defined lower age limit) using the same fully connected dense neural network. Of note, the SBP levels before age 20 are discounted because they evolve in response to the rapidly changing physiologic requirements during the rapid growth of childhood and adolescence, and because SBP levels are high enough to cause accumulating irreversible structural injury to the arterial wall during this time is exceedingly rare.
[0244] The trained bi-LSTM may be used to provide the predicted SBP level (e.g., from ages 20 to 80), which can then be plotted to provide the predicted trajectory of SBP levels throughout life for the subject being evaluated. Examples of this are shown in FIGs. 3G and 3H.
[0245] In particular,, FIG. 3G shows the predicted SBP levels (in millimeters of mercury (mmHg)) for all ages of the male subject whose cardiometabolic health data was shown in FIG. 3A. The prediction was made using a 2n-dimensional dense input vector obtained from acts 212 and 213 (with n values being standardized values and n values being the positionally encoded versions thereof). As with LDL trajectories, it is important to note that the trajectory of SBP throughout life differs substantially between persons with male and female biological sex - even among persons who have identical levels of all other features at the same ages. Therefore, in some embodiments, one ML model (e.g., one bi-LSTM) may be trained to predict SBP levels on data from persons with male biological sex and another ML model (e.g., another bi-LSTM) may be trained on data from persons with female biological sex. And, indeed, the -43- #14604877vl analyses described herein may use biological sex-specific algorithms, in some embodiments, to achieve greater accuracy. FIG. 3H shows estimates of SBP levels for two persons with different biological sex.
[0246] Returning to FIG. 2B, act 216 involves determining the Lp(a) trajectory for the subject. As will be described next, a machine learning approach is used to estimate this trajectory even though Lp(a) levels for each person are fairly stable throughout life, are largely inherited than acquire, and are not changed by diet, exercise, or other lifestyle choices. Despite such stability, it is not accurate to estimate cumulate Lp(a) levels by simply multiplying a person’s Lp(a) level measured at any point in time by their age to obtain cumulative exposure to Lp(a) up to that age. Nor is it accurate to simply multiply a person’s measured Lp(a) level by every age to obtain their Lp(a) at each age and sum (or integrate) them to calculate cumulative exposure to Lp(a).
[0247] The reason for why such simple approximations will not provide accurate estimates is that, for almost everyone, there is a slight inflection point between the ages of 50 to 60 at which Lp(a) levels begin to steadily rise over time (co-incident with the corresponding plateau and then fall of LDL levels over time). Importantly, this change in Lp(a) is detectable (or at least quantitatively and biologically meaningful) for people with extreme elevations of Lp(a). By contrast, for most persons, this inflection point and subsequent rise in Lp(a) is largely undetectable and biologically irrelevant because most people have very low Lp(a) levels. As a result, a gradual rise in Lp(a) of 20-30% beginning between ages 50 to 60 years for most people will translate into such small absolute changes in Lp(a) as to have an almost imperceptible clinical impact.
[0248] However, it is the people who inherit lifelong exposure to markedly elevated Lp(a) who experience the greatest biological impact on the risk of ASCVD events due to Lp(a) and therefore are the people who would benefit most from lowering Lp(a) when specific Lp(a)- lowering therapies become available in future; and these are the people who have been enrolled in all of the cardiovascular outcome randomized trials evaluating RNA-based (ASO and siRNA) therapies to reduce Lp(a).
[0249] Because Lp(a) almost certainly has a cumulative effect on the risk of ASCVD, like other atherogenic apoB -containing particles, it is important to estimate each person’s cumulative exposure to Lp(a) as accurately as possible to both predict the increased risk caused by cumulative exposure to Lp(a) at all time points and at each level of accumulated plaque burden estimated by cumulative exposure to LDL (and to triangulate the results of randomized trials when completed to accurately predict the expected individual benefit from lowering Lp(a) over time based on the reduction in cumulative exposure to Lp(a)).
[0250] Accordingly, while it is possible to use crude estimates such as those described above to estimate the Lp(a) trajectory at act 216, more accurate estimates can be obtained by using a trained machine learning model, as described herein. Accordingly, in some embodiments, act 216 involves estimating an Lp(a) trajectory for the subject using a 5thtrained ML model using the standardized and positionally encoded feature values (values of clinical characteristics, physical measurements, and / or biochemical measurements obtained at act 211) obtained at acts 212 and 213. The feature values may include the -44- #14604877vl subject’s measured Lp(a) level(s), which are standardized (using Min-Max standardization) and positionally encoded.
[0251] The Lp(a) level trajectory for the subject may include an estimated Lp(a) level for the subject for each of multiple prior ages of the subject (e.g., multiple ages prior to the age at which the most recent Lp(a) level was measured down to birth or other pre-defined lower age limit). With respect to prior years, the Lp(a) trajectory may include an estimated Lp(a) level for the subject for those years where Lp(a) measurements are not available since it is possible that multiple Lp(a) measurements are available for the subject; the measured Lp(a) values rather than estimates may be used for the years in which the measurements were obtained. For example, if a subject is 40 years old and Lp(a) measurements are available for the ages of 40, 35, and 33, then the Lp(a) trajectory may include Lp(a) values for all ages 6-40, with the values at years 33, 35, and 40 being measured values and all other values being estimated using the 5thtrained ML model. As such, the Lp(a) trajectory may include both estimated and measured values.
[0252] The Lp(a) level trajectory for the subject may also include an estimated Lp(a) level for each of multiple future ages of the subject (e.g., multiple ages subsequent to the age at which the most recent Lp(a) level was measured up to the age of 80 years, or other pre-defined upper age limit).
[0253] As noted above for LDL and SBP estimation, separate models may be used to estimate Lp(a) levels for males and females. FIG. 3J shows an illustrative example of Lp(a) trajectories predicted for men and women (in various percentiles of the population) with each Lp(a) trajectory indicating Lp(a) levels as a f unction of age, in accordance with some embodiments of the technology described herein.
[0254] In some embodiments, estimating the Lp(a) level trajectory for the subject comprises: (a) generating a feature vector representing the subject from the subject characteristic and / or measurement data; and (b) processing the feature vector using a fifth trained ML model (which may be referred to as “an Lp(a) trajectory prediction machine learning model”) to obtain the Lp(a) level trajectory for the subject, the fifth trained ML model having been trained to estimate an Lp(a) level for the subject at each of the multiple prior ages and each of the multiple future ages. The feature vector may include both standardized values of feature values obtained at act 212 and positionally encoded versions thereof obtained at act 213. The feature vector may be the same as the feature vector used by the third ML model to estimate the LDL level trajectory for the subject at act 214.
[0255] The fifth trained ML model may be of any suitable type including any of the types described herein for the third and fourth trained ML models, including a recurrent neural network, such as a bidirectional LSTM (bi-LSTM) neural network model or a bi-directional GRU neural network model. In embodiments where the fifth trained ML model is a bi-directional (e.g., bi-LSTM or bi-GRU) neural network (BNN) model, processing the feature vector using the fifth trained ML model comprises: estimating an Lp(a) level for the subject at each of the multiple future ages (relative to the age at which the most recent Lp(a) measurement was taken) using a forward pass of the BNN model; and estimating an Lp(a) level for the subject at each of the multiple prior ages (relative to the age at which the most -45- #14604877vl recent Lp(a) measurement was taken) using a backward pass of the BNN model. In some embodiments, the bi-LSTM model may have hundreds, thousands, or tens of thousands of parameters (e.g., at least 100, at least 1000, at least 2500, at least 5000, at least 10,000, at least 25,000, at least 50,000, at least 100,000, between 100 and 1000, between 1000 and 5000, between 2500 and 25,000, between 10,000 and 100,000, between 50,000 and 250,000 or any other range within these ranges).
[0256] The fifth trained ML model for estimating Lp(a) trajectories may be trained using training data generated from patient data for patients who have had their Lp(a) assessed once or multiple times (e.g., longitudinally). In some embodiments, the training data may be obtained from UK Biobank, including approximately 460,000 participants with at least one Lp(a) value measured at enrollment, and approximately 19,000 participants who had a second Lp(a) measured at a median of 4.6 years after enrollment. In addition, laboratory measurements research data are available on approximately 1,000,000 persons from a network of Lipid Specialty Clinical around the world organized by the European Atherosclerosis Society and the International Atherosclerosis Society. Among patients in these clinics, approximately 70% have had two or more Lp(a) values measured at different ages, excluding those measurements for which the repeated measures we obtained to assess the impact a therapeutic agent to lower Lp(a).
[0257] This network of specialty clinics is growing and organized to collaborate on lipid research questions. The measurements of Lp(a) in these lipid specialty clinic patients, in some instances but not all, were directed by a protocol to assess if and how Lp(a) varies over time. Although some persons are referred into these clinics because of elevated Lp(a) and some even receive treatments that can modestly lower Lp(a) or plasmapheresis to substantially lower Lp(a) and everything else, the vast majority of participants in these clinics just have Lp(a) measured as part of an advanced lipid and lipoprotein panel assessment. Lp(a) is unique because it is largely independently inherited and therefore not confounded by the other lipid disorders or their treatments. The fifth trained ML model for estimating Lp(a) trajectories may be trained using analogous training methods to those used to train the third and fourth ML models.
[0258] As may be appreciated from the foregoing discussion, different types of ML models may be used to estimate LDL, SBP, and Lp(a) trajectories as part of process 200. Within the class of recurrent neural networks, a variety of approaches are possible including LSTM, GRU, and ODE-RNN. Among these, any of the approaches may be used. However, in the context of predicting SBP trajectories (as well as Lp(a) trajectories), it may be of interest to detect an inflection point above SBP (and Lp(a)) begins to rise independently over time due to a feedback cycle of increasing vascular injury. For this reason, an LSTM architecture may be preferrable to the GRU architecture because the individual gates at different timepoints in an LSTM may be interrogated for possible clues about this inflection point. By contrast, the bi-directional GRU (or GRU more generally) hidden gate is much less accessible at specific time stamps. Indeed, the LSTM and GRU internal gating mechanisms and memory architectures are different and because GRUs compress memory and gating behavior into fewer mechanisms, some of the temporal
[0259] -46- #14604877vl interpretability that LSTMs provide via distinct cell states (each state having input, forget, and output gates) is lost.
[0260] As between unidirectional vs. bi-directional RNN architectures, bi-directional architectures may be preferrable because using unidirectional architecture to predict LDL, SBP, or Lp(a) for only future years tends to underestimate the true biological effect of an exposure that produces irreversible structural injury that accumulates over time. That said, in some embodiments, unidirectional architectures may be used. In such embodiments, a simplifying assumption may then be made that an individual’s LDL (SBP and / or Lp(a)) followed the average trajectory for persons of the same sex up to the subject’s current age -and that the proportional difference in LDL (or SBP or Lp(a)) for this individuals compared to the median for LDL (or SBP or Lp(a)) among persons of the same biological sex and age in the reference difference was constant at all ages leading up to the current age. In such embodiments, the unidirectional RNN (e.g., LSTM, GRU, latent ODE GRU) would likely predict an unique future trajectory that is not the same ‘shape’ as the average trajectory of persons in the reference population. This would lead to assuming everyone has two different trajectories demarcated by the age at which they happened to input their LDL or SBP.
[0261] Neural networks having a transformer architecture may also be used instead of RNNs. For example, neural networks having a temporal fusion transformer (TFT) architecture may be used. TFT neural networks are described in Bryan Lim, Sercan O. Arik, Nicolas Loeff, Tomas Pfister, “Temporal Fusion Transformers for interpretable multi-horizon time series forecasting,” International Journal of Forecasting, Volume 37, Issue 4, 2021, pp. 1748-1764. However, such neural networks generally require very large data sets for training and may not perform as well as the RNN architectures where few longitudinal data sets are available that have repeated longitudinal measurements of biomarkers for individual participants at regular intervals over an extended number of years.
[0262] In some embodiments, the third ML model (an LDL trajectory prediction ML model), the fourth ML model (an SBP trajectory prediction ML model), and the fifth ML model (an Lp(a) trajectory prediction ML model) may all operate on a same input vector generated at act 213. However, in other embodiments, the input vectors provided to these models need not be the same. For example, some feature values (e.g., Lp(a)) may be omitted from the inputs provided to the third ML model or the fourth ML model, but may be provided to the fifth ML model, as aspects of the technology described herein are not limited to having the trajectory prediction models on the same exact input vector at acts 214-216.
[0263] After acts 214-216, where LDL, SBP, and Lp(a) level trajectories are computed for the subject (e.g., using two bi-directional LSTMs, GRUs, or other algorithms), the LDL trajectory may be used to compute the cumulative LDL exposure trajectory, the SBP trajectory may be used to compute the cumulative SBP exposure trajectory, and the Lp(a) trajectory may be used to compute the cumulative Lp(a) exposure trajectory, at act 217. In addition, at act 217, the instantaneous rate of rise of the SBP levels may be determined.
[0264] -47- #14604877vl In some embodiments, the cumulative LDL exposure trajectory may be determined by summing the predicted LDL level at each age of the subject (or the area under the curve for the subject’s LDL trajectory may be integrated) to iteratively compute the subject’s cumulative exposure to LDL at every age (measured as Plaque Years of LDL in mmol / L, though other units may be used). An example cumulative LDL exposure trajectory, computed from the LDL levels shown in FIG. 3D, is shown in FIG.
[0265] 3F.
[0266] Cumulative exposure to LDL is an important and dynamic biomarker that estimates the number of atherogenic LDL (and other apoB-containing lipoproteins) that have become trapped within the artery wall up to that point in time, and therefore represents the first biochemical parameter to estimate the size of a person’s accumulated plaque burden and track their rate of plaque progression. This marker is a direct estimate of the number of LDL and other atherogenic apoB-containing lipoproteins that the arterial wall has been exposed to over time. And, therefore, it provides an indirect estimate of the number of LDL particles that have become trapped within the artery wall over time. As a result, cumulative exposure to LDL (measured in mmol / L of total Plaque Years of exposure to LDL) can be used to estimates the size of the accumulated atherosclerotic plaque burden at any point in time, the rate of plaque progression, and / or the expected size of the accumulated plaque burden at all future time points.
[0267] In some embodiments, the values for the predicted LDL levels and corresponding cumulative exposure to LDL at all ages may be used together with the other features already computed (e.g., at acts 212 and 213) for further processing (e.g., as part of act 220).
[0268] Turning now to the cumulative SBP exposure trajectory, in some embodiments, the cumulative SBP exposure trajectory may be determined by summing the predicted SBP level at each age of the subject (or the area under the curve for the subject’s SBP trajectory may be integrated) to iteratively compute the subject’s cumulative exposure to SBP at every age (measured mmHg-years, though other units can be used). An example cumulative SBP exposure trajectory, computed from the SBP levels shown in FIG. 3G, is shown in FIG. 31.
[0269] Cumulative exposure to SBP is another bio-engineered feature that represents a dynamic biomarker that provides unique biological information. Like cumulative exposure to LDL, elevated SBP causes irreversible structural injury to the artery wall that accumulates over time. Specifically, elevated SBP causes non-laminar blood flow at vulnerable branch points within the arterial tree. The non-laminar flow, in turn, causes an increased flux of LDL and other atherogenic apoB-containing lipoproteins within the artery wall. In addition, the elevated SBP causes hypertrophy of vascular smooth muscle cells; which, in turn, causes the secretion of a greater concentration of the proteoglycans that trap LDL and other apoB-containing lipoproteins within the artery wall. This combination of increased flux of atherogenic lipoproteins into the artery wall and a greater concentration of proteoglycans that can trap the increased concentration of LDL within the artery wall results in progressively increasing focal SBP induced plaque accumulation over time at vulnerable branch points within the arteries. As a result, SBP has a cumulative effect on the development of atherosclerotic plaque at specific vulnerable points within the vascular tree.
[0270] -48- #14604877vl In addition, elevated SBP causes accumulating diffuse structural injury to the artery wall resulting in vascular stiffening and inflammation (arteriosclerosis), which reduces the capacity of the artery to tolerate the accumulated plaque burden by making it more likely that a thrombus overlying a disrupted atherosclerotic plaque will occlude the vessel at any level of accumulated plaque burden. Finally, elevated SBP also increases the risk of plaque disruption, thus increasing the risk of acute cardiovascular events. For these reasons, LDL and SBP appear to have independent, additive, causal, and cumulative effects on the risk of acute cardiovascular events. The bio-engineered cumulative exposure to SBP biomarker computed by the method described herein provides unique information by quantifying these biological effects.
[0271] In addition to determining the cumulative SBP exposure trajectory, the SBP levels predicted at act 215 may be used to predict the rate of rise in SBP for the individual person being evaluated (e.g., by estimating slope of the subject’s SBP trajectory using any suitable method).
[0272] The predicted rate of rise in SBP is another unique bio-engineered feature. It can be combined with the measured SBP at the current age to predict whether a person is likely to develop hypertension, and the age at which person is likely to develop hypertension based on how rapidly SBP is predicted to rise with age. This information is useful for informing selection of the optimal sequence of current and anticipated future actions needed to achieve the combined goal of preventing MI, stroke, hypertension, and / or diabetes.
[0273] In some embodiments, the values for the predicted SBP levels, rates of rise of SBP, and corresponding cumulative exposure to SBP at all ages, and positionally encoded (by age) versions thereof, may be used together with the other features already computed (e.g., at acts 212 and 213, LDL levels, and cumulative LDL levels) for further processing (e.g., as part of act 220).
[0274] In addition, as part of act 217, a cumulative Lp(a) exposure trajectory may be determined by summing the Lp(a) levels in the Lp(a) trajectory determined at act 216 (or the area under the curve for the subject’s Lp(a) trajectory may be integrated) to iteratively compute the subject’s cumulative Lp(a) exposure trajectory, which indicates for each of one or more ages of the subject (e.g., some or all of ages from 5 to 80), the subject’s cumulative exposure to Lp(a) at that age.
[0275] In some embodiments, the values for the predicted Lp(a) levels and corresponding cumulative exposure to Lp(a), and positionally encoded (by age) versions thereof, may be used together with the other features already computed (e.g., at acts 212 and 213, LDL levels, cumulative LDL exposure levels, SBP levels, rates of rise of SBP, cumulative SBP exposure levels) for further processing (e.g., as part of act 220).
[0276] After act 217, process 200 proceeds to act 218 where one or more other trajectories may be determined for the subject.
[0277] For example, in some embodiments, at act 218, a weight trajectory for the subject may be determined, the weight trajectory for the subject comprising estimated weight of the subject for each of multiple prior ages of the subject and / or multiple future ages of the subject. This estimate may be -49- #14604877vl obtained in any suitable way, for example, based on the assumption that the subject’s current age-and-sex adjusted weight percentile remains constant throughout life.
[0278] As another example, in some embodiments, at act 218, a waist circumference trajectory for the subject may be determined, the waist circumference trajectory of the subject comprising the estimated waist circumference of the subject for each of multiple prior ages of the subject and / or multiple future ages of the subject. This estimate may be obtained in any suitable way, for example, based on the assumption that the subject’s current age-and-sex adjusted waist circumference remains constant throughout life.
[0279] As yet another example, in some embodiments, at act 218, an HbAlc level trajectory for the subject may be determined, the HbAlc level trajectory of the subject comprising estimated HbAlc levels of the subject for each of multiple prior ages of the subject and / or multiple future ages of the subject. This estimate may be obtained in any suitable way, for example, based on the assumption that the subject’s current age-and-sex adjusted HbAlc percentile remains constant throughout life.
[0280] An example of these trajectories for the subject (whose cardiometabolic data is shown in FIG.
[0281] 3A) is shown in FIG. 3K.
[0282] After completion of acts 214-218, a number of novel biomarkers providing an assessment of the subject’s current state of cardiometabolic health are provided. A trained machine learning model (e.g., a bi-LSTM, a bi-GRU neural network) was used to determine the size of the atherosclerotic plaque burden that has accumulated, and the rate at which the plaque burden is progressing. Another trained ML model (e.g., another bi-LSTM, bi-GRU neural network) was used to determine the rate at which SBP is rising over time, and how much structural injury caused by elevated SBP has accumulated. Yet another trained ML model (e.g., yet another bi-LSTM, bi-GRU neural network) was used to determine Lp(a) levels and cumulative Lp(a) levels. Additionally, estimates of the current weight, waist circumference, and HbAlc trajectories are obtained.
[0283] Next, at act 219, the various cardiometabolic data obtained and / or determined at acts 211-219 may be used for further processing. As described above, data may be stored, in memory (volatile or nonvolatile), using any suitable data structure or data structures, and may be subsequently accessed when performing further processing (e.g., as part of act 220 of process 200). As another example, the data may be passed onto other software modules for processing (e.g., to software modules configured to implement the function of act 220 of process 200).
[0284] It should be appreciated that the process shown in FIG. 2B is illustrative and that there are variations. For example, in the illustrated process, multiple biomarker trajectories are determined (e.g., LDL, SBP, Lp(a), cumulative LDL exposure, instantaneous rate of rise of SBP, cumulative SBP exposure, cumulative Lp(a) exposure, weight, waist circumference, and HbAlc). In other embodiments, only a subset (e.g., any subset) of these may be determined. For example, in some embodiments, the LDL and cumulative LDL exposure trajectories may be determined, but the SBP and cumulative SBP exposure trajectories may not be determined and therefore not used subsequently. As another example, in -50- #14604877vl some embodiments, the LDL, cumulative LDL exposure, SBP, and cumulative SBP exposure trajectories may be determined, but the Lp(a) and cumulative Lp(a) trajectories may not be determined (or if the Lp(a) trajectories are determined, a simple approximation may be used instead of a machine learning model to do so - examples of such simple approximations are provided herein).
[0285] Determining Measure(s) of Risk That Subject Develops Cardiovascular Disease
[0286] As shown in FIG. 2A, process 200 includes act 220 in which at least some (e.g., all) of the cardiometabolic health data obtained for the subject at act 210 may be used, together with a first trained ML model (e.g., a trained survival model), and for each of multiple time intervals (e.g., years), one or more measures of risk that the subject develops cardiovascular disease, thereby obtaining multiple measures of risk corresponding to the multiple time intervals.
[0287] In some embodiments, this may be performed in accordance with the illustrative process shown in FIG. 2C. As shown in FIG. 2C, act 220 may involve: (1) obtaining, at act 221, input feature data; (2) estimating, at act 222 and using the first trained ML model and the input feature data obtained at act 221, values indicative of log hazard ratios of risk of the subject having a cardiovascular event at respective levels of cumulative LDL exposure; (3) estimating, at act 223 and using the values indicative of the log hazard ratios, absolute instantaneous and cumulative hazard rates, and event rates, of the subject having a cardiovascular event at the respective levels of cumulative LDL exposure; and (4) estimating, at act 224, using the cumulative LDL exposure trajectory for the subject, absolute instantaneous and cumulative hazard rates, and event rates, of the subject having a cardiovascular event at the respective ones of the multiple time intervals (e.g., years). Aspects of some of the acts 221-224 are now further described in greater detail.
[0288] At act 221, the various cardiometabolic health data values for the subject (obtained at act 210, for example, as described with respect to FIG. 2B) may be accessed and pre-processed and / or organized, into the input feature data, for subsequent processing by the first ML model at act 222. Thus, in some embodiments, the process of FIG. 2C involves: (1) generating, at act 221, input feature data using at least some of the cardiometabolic health data for the subject; and (2) processing, at act 222, the input feature data using the first trained ML model to obtain, as output, the values indicative of the log hazard ratios for the risk of the subject having a cardiovascular event at the respective levels of cumulative LDL exposure. In some embodiments, values indicative of the log hazard ratios may be values of the log hazard ratios themselves or other values, from which the log hazard ratios may be (e.g., uniquely) determined.
[0289] First, aspects of generating the input feature data at act 221 are described. Following this is a description of how that data is processed using the first trained ML model as well as aspects of the first ML model.
[0290] With respect to the input feature data, in some embodiments, the input feature data may be generated using at least some (e.g., all) of the feature values part of the subject’s cardiometabolic health -51- #14604877vl data including the subject’s characteristic and / or measurement data (e.g., including age, biological sex, family history, polygenic scores, etc.), LDL and cumulative LDL exposure trajectories, SBP and cumulative SBP exposure trajectories, Lp(a) and Lp(a) exposure trajectories, weight trajectory, waist circumference trajectory, HbAlc trajectory, and / or positionally encoded versions of one or more thereof.
[0291] For example, in some embodiments, the input feature data may comprise a set of input feature vectors (e.g., organized as a matrix or in any other suitable way), with each particular one of the input feature vectors corresponding to a particular cumulative LDL exposure level from among the respective levels of cumulative LDL exposure. In turn, the particular input feature vector in the set of input feature vectors that corresponds to the particular cumulative LDL exposure level may include: (a) subject characteristic and / or measurement data comprising: one or more values for one or more clinical characteristics of the subject, one or more values for one or more physical measurements of the subject, and / or one or more values for one or biochemical measurements of the subject; (b) an LDL level trajectory for the subject; (c) a cumulative LDL exposure trajectory for the subject (e.g., determined from the LDL level trajectory); (d) an SBP level trajectory for the subject indicating an estimated SBP level for the subject at each of the respective levels of cumulative LDL exposure; (e) a cumulative SBP exposure trajectory for the subject indicating an estimated cumulative SBP exposure for the subject at each of the respective levels of cumulative LDL exposure; (f) an Lp(a) trajectory for the subject indicating an estimated Lp(a) level for the subject at each of the respective levels of cumulative LDL exposure; (g) a cumulative Lp(a) exposure trajectory for the subject indicating an estimated cumulative Lp(a) exposure for the subject at each of the respective levels of cumulative LDL exposure; (h) a weight trajectory for the subject indicating an estimate of the subject’s weight at each of the respective levels of cumulative LDL exposure; (i) a waist circumference trajectory for the subject indicating an estimate of the subject’s waist circumference at each of the respective levels of cumulative LDL exposure; (j) an HbAlc level trajectory for the subject indicating an estimate of the subject’s HbAlc level at each of the respective levels of cumulative LDL exposure; and / or (k) an age trajectory for the subject indicating an estimated age for the subject at each of the respective levels of cumulative LDL exposure. Additionally, the particular input feature vector may include: (1) a positionally encoded version of one or more of (a), (b), (d), (e), (f), (g), (h), (i), (j), and (k) obtained by positionally encoding the one or more of (a), (b), (d), (e), (f), (g), (h), (i), (j), and (k) by age of the subject at the particular cumulative LDL exposure level is predicted to occur. The subject’s cumulative LDL exposure trajectory is not positionally encoded, in some embodiments. Any combination of the foregoing features may be used to create the input feature data. Thus, a subset of the foregoing features may be used in some embodiments. For example, features (a), (b), (c), (d), and (e) may be used, along with their positional encoding, to constitute the input feature vectors. As another example, in some embodiments, all of the foregoing features, but without Lp(a) features (f) and (g) and their respective positional encodings may be used.
[0292] As a specific illustrative example, the first trained ML model may be a survival model that is trained to use cumulative exposure to LDL as the increment of follow-up (instead of age or time) at the -52- #14604877vl time of inference in order to estimate the natural log hazard ratio at each level of cumulative exposure to LDL for the subject being evaluated. (Further aspects of such models are described herein).
[0293] In this example, the first trained ML model may take as input a matrix of feature vectors, with each vector calculated for a respective level of cumulative LDL exposure. A particular feature vector for a particular level of cumulative LDL exposure may be composed of two parts: (1) a set of time-invariant or fixed inputs that do not vary over time (e.g., sex, family history, polygenic scores, etc.); and (2) a set of time- varying or changing inputs whose values do change over time - either naturally like LDL, SBP, and Lp(a) levels, or due to lifestyle choices like weight or waist circumference or tobacco smoking, or as a biological consequence of lifestyle choices such as changes in HbAlc in response to excess energy balance by consuming more calories than one expends.
[0294] As a result, the values of the time-invariant features included in particular feature vectors for particular levels of cumulative LDL exposure are the same across the all the vectors. However, the positional encoding of these inputs (e.g., polygenic scores such as the ASCVD-PGS) does change at each level of cumulative exposure to LDL because the age at which the corresponding level of cumulative exposure occurs changes as plaque accumulates from trapping more LDL particles over time. In addition, the value of each time-varying feature and its positional encoding also changes at each level of cumulative exposure to LDL.
[0295] Thus, in this example, the input feature data may be generated (at act 221) by determining multiple feature vectors calculated for multiple respective levels of cumulative LDL exposure. To this end, first, a “mapping” function between age and cumulative exposure LDL levels may be obtained - this may be done as described with respect to FIG. 2B (e.g., using the output of the trained ML model (e.g., bi-LSTM, bi-GRU, ODE-RNN) that calculates the LDL trajectory and based on this, determining a cumulative LDL exposure trajectory which provides a cumulative LDL exposure level for each age). Then, at each level of cumulative LDL exposure, the corresponding input feature vector may include:
[0296] (1) standardized clinical characteristics, physical measurements, and biochemical measurements for
[0297] all time-invariant features and their positional encodings by the age of the subject at which the corresponding level of cumulative exposure to LDL occurred (which varies for each person depending on how rapidly their cumulative exposure to LDL and corresponding plaque size are increasing and, therefore, is determined using the mapping between the subject’s age and cumulative LDL exposure trajectory);
[0298] (2) standardized LDL level at the corresponding increment of cumulative LDL exposure based the subject’s predicted trajectory of LDL levels and the positional encoding of the LDL level by the age of the subject at which the corresponding level of cumulative exposure to LDL occurred (in this example, the LDL levels are positional ly encoded by the corresponding age at each level of cumulative exposure - but “cumulative exposure to LDL” levels are not, because this is the x-axis or survival 'time' variable in the 'survival' analysis);
[0299] -53- #14604877vl (3) standardized SBP level at each increment of cumulative exposure to LDL based on the subject’s predicted trajectory of SBP levels (and the corresponding ages at which those SBP levels occur to match or map the age at which the corresponding level of cumulative exposure to LDL occurs) as well as the positional encoding of the standardized SBP levels by the age of the subject at which the corresponding level of cumulative exposure to LDL occurred;
[0300] (4) cumulative exposure to SBP at each increment of cumulative exposure to LDL (using the corresponding ages at which those cumulative exposure to SBP levels occur to match or map the age at which the corresponding level of cumulative exposure to LDL occurs), and the positional encoding of cumulative exposure to SBP by the age of the subject at which the corresponding level of cumulative exposure to LDL occurred to capture the rate of rising SBP on the stability of the total accumulated plaque as estimated at each level of cumulati ve exposure to LDL (which turns out to be an important determinant of propensity for plaque disruption at vulnerable branch points and, as such, a powerful predictor of the instantaneous risk of having an acute cardiovascular event);
[0301] (5) standardized Lp(a) level at each increment of cumulative exposure to LDL based on the subject predicted trajectory of Lp(a) levels (and the corresponding ages at which those Lp(a) levels occur to match or map the age at which the corresponding level of cumulative exposure to LDL occurs) as well as the positional encoding of the standardized Lp(a) levels by the age of the subject at which the corresponding level of cumulative exposure to LDL occurred;
[0302] (6) cumulative exposure to Lp(a) at each increment of cumulative exposure to LDL (using the corresponding ages at which those cumulative exposure to Lp(a) levels occur to match or map the age at which the corresponding level of cumulative exposure to LDL occurs), and the positional encoding of cumulative exposure to Lp(a) by the age of the subject at which the corresponding level of cumulative exposure to LDL occurred;
[0303] (7) standardized weight at each increment of cumulative exposure to LDL based on the subject’s predicted trajectory of weight (at the corresponding ages at which those weight levels are predicted to occur to match or map the age at which the corresponding level of cumulati ve exposure to LDL occurs) and the positional encoding of weight by the age of the subject at which the corresponding level of cumulative exposure to LDL occurred;
[0304] (8) standardized waist circumference at each increment of cumulative exposure to LDL based on the subject’s predicted trajectory of waist circumference (at the corresponding ages at which those waist circumference levels are predicted to occur to match or map the age at which the corresponding level of cumulative exposure to LDL occurs) and the positional encoding of waist circumference by the age of the subject at which the corresponding level of cumulative exposure to LDL occurred;
[0305] (9) standardized HbAlc level at each increment of cumulative exposure to LDL based on the subject’s predicted trajectory of HbAlc levels (at the corresponding ages at which those HbAlc levels are predicted to occur to match or map the age at which the corresponding level of cumulative -54- #14604877vl exposure to LDL occurs) and the positional encoding of HbAlc level by the age of the subject at which the corresponding level of cumulative exposure to LDL occurred; and / or
[0306] (10) standardized age at each increment of cumulative exposure to LDL based on the age of the subject at which each increment of cumulative exposure to LDL occurs and the positional encoding of that age by the age of the subject at which the corresponding level of cumulative exposure to LDL occurred (as such, age is also being positionally encoded by age, which may help incorporate information indicating that the importance of the ‘age’ variable may not be the same at all ages).
[0307] It should be appreciated that while, in this example, each input feature vector may include each of the sets of values (l)-(10), any subset of these values may be used in other examples.
[0308] The set of input feature vectors for a subject, with each input feature vector determined for a respective level of the subject’s cumulative LDL exposure, constitutes the input feature data for the subject that may then be passed on to and be processed by the first ML model to obtain, as output, the values indicative of the log hazard ratios for the risk of the subject having a cardiovascular event at the respective levels of cumulative LDL exposure. The set of input feature vectors may be organized into a matrix (with the vectors as rows or columns) or using any other suitable data structure(s), as aspects of the technology described herein are not limited in this respect.
[0309] As one example, a input feature vector may be determined for each level of cumulative exposure to LDL (the increment of survival), range from 50-350 Plaque Years of LDL in mmol / L. The 300 dense input vectors include the time invariant features (e.g., biological sex, family history), and the predicted levels of time varying exposures (e.g., SBP) at each level of cumulative exposure to LDL - and their positional encoding by age to preserve biological context. These features vectors may then be organized into a matrix (with the vectors as rows or columns) or using any other suitable data structure(s) and processed with a survival model (e.g., a causal survival deep neural network with piecemeal exponential models, as described herein) to determine log hazard ratios of the risk that the subject has a cardiovascular event at respective levels of cumulative LDL exposure.
[0310] Having discussed aspects of how input feature data is obtained at act 221, we now turn to a discussion of how that data may be processed by the first trained ML model to determine log hazard ratios of the risk that the subject has a cardiovascular event at respective levels of cumulative LDL exposure.
[0311] First, some aspects regarding the first ML model are described. In some embodiments, the first trained ML model may be a trained survival model with cumulative LDL exposure (instead of age) as the interval of follow-up. A survival model to predict risk relative to cumulative LDL levels, as described herein, is an important innovation in the cardiovascular space. The first trained ML model may be a survival deep neural network (DNN) model with cumulative LDL exposure as the interval of follow-up. Other types of survival models may be used as well, examples of which include a Cox proportional hazard model, a random forest of survival trees model, a gradient boosted machine (GBM) for survival trees model, a survival deep neural network (DNN) model; and / or an ensemble of any of these survival -55- #14604877vl models. The key is that it is cumulative LDL exposure - rather than age - that serves as the interval of follow up in any implementation, whether that implementation is a neural network survival model or another type of survival model.
[0312] Moreover, the trained survival model may be of a change-point type, allowing for independent estimates of the hazard function in different intervals of follow-up. As one example, the first ML model may be a piecewise model (PM), for example, a piecewise exponential model (PEM). For example, the first ML model may be a survival DNN model (or other type of survival model examples of which are provided above) with PEM using cumulative LDL exposure as the interval of follow-up. In some embodiments, the trained survival DNN may be a fully-connected neural network having at least 3, at least 4, at least 5, or at least six fully connected layers; the parameters (e.g., weights) of the DNN may be different for each interval of follow-up (i.e., each interval of cumulative LDL exposure). In that sense, the first ML model (e.g., the trained survival DNN with PEM model) may be considered to be a set of DNN models having a common architecture (e.g., fully connected, ReLU non-linearities, at least 3, 4, 5, layers, etc.), but each trained to make a prediction over a particular interval of cumulative LDL exposure. Thus, the DNN models in the set have identical architecture but have different weights for different intervals of follow-up (though we refer to this as a single ML model here for clarity of exposition).
[0313] In some embodiments, the DNN with PEM model may have at least 10K, at least 25 K, at least 50K, at least 100K, at least 500K, between 10K and 500K weight values for each interval of cumulative LDL exposure. In one illustrative example, the architecture of the DNN may comprise an input layer followed by 64, 128, 256, 128, 64, and 1-dimensional layers, all fully connected, with bias terms and ReLu non-linearities.
[0314] As such, in the PEM approach, the first ML model may be considered to include a separate survival model estimated at each increment of cumulative exposure to LDL (e.g., at each increasing plaque year measured in mmol / L). In turn, the output of the first ML model may be a log hazard ratio for the risk of the subject having a cardiovascular event at each level of cumulative exposure to LDL thus allowing for changes in exposures over time as plaque accumulates (i.e., time-varying inputs). Thus, in some examples, the first ML model may take in as input a set of input feature vectors (e.g., organized in a matrix) with one vector of input at each level of cumulative exposure to LDL (the piecemeal interval).
[0315] It should be noted that a survival model (e.g., DNN) with PEM,and with cumulative LDL exposure as the increment of follow up, is an important innovation and it is designed to predict how much an individual person’s unique combination of exposures impacts the risk of having an acute cardiovascular event at all levels of accumulated plaque burden.
[0316] Cumulative exposure to LDL is used as the piecemeal interval of survival follow-up in this context because it provides an estimate of the size of the accumulated atherosclerotic plaque burden at any point in time, and because the size of the accumulated plaque burden, in turn, is the strongest determinant of the risk of having an acute atherosclerotic cardiovascular event.
[0317] -56- #14604877vl However, the risk of having an acute cardiovascular event from a disrupted atherosclerotic plaque does not begin to increase in a measurable way until after the size of the accumulated plaque burden exceeds a specific threshold (measured in cumulative exposure to LDL). Furthermore, at all levels of plaque burden, the risk of having an acute cardiovascular depends not only on the size of the accumulated plaque burden but also on how much other exposures combine to impact: (i) the capacity of the artery to tolerate the accumulated plaque burden; (ii) the propensity for plaque disruption within the artery; and (iii) the inherited vulnerability to trapping atherosclerotic particles.
[0318] Therefore, the atherosclerotic plaque size threshold above which cardiovascular events begin to occur, and the risk of having an acute cardiovascular event at all levels of accumulated plaque burden varies substantially between individuals depending on their unique combination of other exposures.
[0319] This motivates using a survival model (e.g., a survival DNN) with PEM partitioned into increments of increasing cumulative exposure to LDL measured in Plaque Years of LDL (mmol / L) -which is used as an estimate of the size of the incrementally increasing plaque burden. As described herein, at each of these piecemeal increments of cumulative exposure to LDL, a fully-connected dense deep neural network (or another non-linear regression model) may be used to estimate the log hazard ratio of having a cardiovascular event at that size of accumulated plaque burden due to exposure to the combination of other features that impact the capacity of the artery to tolerate the accumulated plaque burden, conditional on surviving to that level of plaque burden without an event. The predicted log hazard ratios for the risks of having an atherosclerotic cardiovascular event at various levels of plaque burdens provides a direct estimate of the biological effect of how a subject’s combination of exposures impacts the risk that the subject has a cardiovascular event at every level of accumulated plaque burden size. In this context, the cardiovascular event may be a fatal myocardial infarction (MI), a non-fatal MI, a fatal ischemic stroke, a non-fatal ischemic stroke, or a coronary revascularization.
[0320] Additional aspects of a survival deep neural network with piecewise exponential modeling are described next. As described herein, a Survival Deep Neural Network with a Piecewise Exponential Model (Survival DNN with PEM) is a neural network designed for survival analysis, which involves predicting the time until an event occurs (in this case, an atherosclerotic cardiovascular event including MI or stroke). Unlike standard regression tasks, survival analysis accounts for censored data, where the event of interest has not occurred for some individuals by the end of the observation period.
[0321] In a PEM, the follow-up is divided into intervals, and within each interval, the hazard rate (the rate at which the event occurs during that interval) is assumed to be constant (the exponential assumption). In some embodiments, the follow-up is divided into intervals of cumulative exposure to LDL to estimate the size of the accumulated plaque burden. As such, the survival DNN model is designed to predict the log hazard ratio (log HR) during each follow-up interval, i.e., at each level of size of the accumulated plaque burden measured in Plaque Years (mml / L) of cumulative exposure to LDL. The piecewise exponential model discretizes the time-to-event problem into intervals, allowing for flexible modeling of the hazard ratios over time, assuming only that the hazard rate is constant during -57- #14604877vl each growing interval of plaque burden (interval of cumulative exposure to LDL), but may vary at different levels of plaque burden. This faithfully reflects the biology of atherosclerosis, where cardiovascular events begin to occur only after a specific accumulated plaque burden size accrues.
[0322] In some embodiments, the trained survival DNN with PEM model may be trained using participant data for at least 1 million (M) or 1.5M participants enrolled in one or more prospective studies (e.g., IPD data from the UK Biobank (UKBB), the FinnGen Project, and the UK Clinical Practice Research Database (CPRD)), whereby for each of the at least IM or 1.5M participants at least one LDL or SBP measurement (e.g., multiple LDL measurements and / or multiple SBP measurements) were available along with a recorded age or date at which a first episode of a fatal or non-fatal myocardial infarction (MI), fatal or non-fatal ischemic stroke, or coronary revascularization occurred, with censoring applied at time of last follow-up, death, or first cardiovascular event. The cardiovascular event may be a fatal myocardial infarction (MI), an episode of a non-fatal MI, a fatal ischemic stroke, an episode of a non-fatal ischemic stroke, or a coronary revascularization.
[0323] As one illustrative example, a survival DNN with PEM model was trained using individual participant data from 1,623,491 participants enrolled in one of three (3) long-term prospective biobank cohort studies for whom at least one LDL or SBP measurement was available, and for whom medical records were available recording age (date) at which the first episode of a fatal or non-fatal MI, fatal or non-fatal ischemic stroke, and coronary revascularization occurred. Participants were censored at time of last follow-up, death, or first atherosclerotic cardiovascular event. In addition, summary data from 2.6 million participants with LDL measurements and age (dates) of first atherosclerotic cardiovascular event were used for additional external validation experiments.
[0324] During training, the negative log-likelihood may be used as the loss function. In such an implementation, the Survival DNN with PEM predicts the log hazard ratio (log HR0 for individual i based on their co variates (the input feature data for that individual); during each of K intervals (of cumulative exposure to LDL or equivalently at each measured size of the accumulated plaque burden).
[0325] In this case, the hazard rate 4,i for the ith participant in the K-th follow-up interval is given by:
[0326]
[0327] = λ_k · exp(log(HR_i)) = λ_k · HR_i
[0328] where: Ik is the baseline hazard rate for the kth time interval, log(HRi) is the predicted log hazard ratio for individual I, and HRi is the hazard ratio obtained by exponentiating the predicted log hazard ratio.
[0329] Calculations in the uncensored data case (event occurs) may be performed as follows. For an individual i who experiences the event at time ti (in interval ki), the log-likelihood contribution is based on: (1) the predicted hazard ratio HRi multiplied by the baseline hazard Ik; and (2) the cumulative hazard up to that time. Accordingly, the log-likelihood for uncensored data is given by:
[0330] log L; ≈ log(λ_k · HR_i) - λ_k · HR_i · Δt_k
[0331]
[0332] A--J
[0333] -58- #14604877vl where: HRi = exp(log(HR0) is the predicted hazard ratio for individual i, Xk,i is the baseline hazard in interval Ki, Atk is the length of the kill interval, the first term in the summation represents the likelihood of the event occurring at time ti, and the second term in the summation represents the cumulative hazard up to ti, derived from the hazard in each interval.
[0334] Calculations in the censored data case (event does not occur) may be performed as follows. For censored individuals, the event has not occurred, so the likelihood of surviving beyond the censoring time is calculated. The log-likelihood contribution is based on the survival probability, which is related to the cumulative hazard according to:
[0335] log L_i = - Σ λ_k · HR_i · Δt_k
[0336]
[0337] This represents the survival probability up to the censoring time, calculated using the cumulative hazard.
[0338] Putting the uncensored and censored components together provides the total loglikelihood for the entire dataset, which may be obtained as the sum of the log-likelihood contributions from all individuals, combining both uncensored and censored cases:
[0339] Log-Likelihood = Σ log L_i
[0340]
[0341] where N is the number of participants. The loss function is the negative log-likelihood given by:
[0342] Loss = - Σ L_i
[0343]
[0344] Once the negative log-likelihood loss is computed, backpropagation may be used to calculate the gradients of the loss with respect to the (e.g., DNN) model parameters (weights, biases, etc.). These gradients are then used to update the model parameters using an optimization algorithm such as stochastic gradient descent (SGD) or Adam.
[0345] In some embodiments, a survival deep neural network with PEM may be implemented using several libraries in PyTorch that support time-to-event data modelling, including:
[0346] 1. torch.nn - Core PyTorch library for neural network layers like nn. Linear, nn. ReLU, and other activation layers for building the model architecture.
[0347] 2. pycox - A package built on PyTorch specifically for survival analysis. It provides implementations of common survival loss functions (e.g., partial log-likelihood) and methods like PEM, enabling efficient training on time-to-event data.
[0348] 3. lifelines - A survival analysis library (not based on PyTorch) that provides some additional methods for data handling, that may be useful for preprocessing survival data.
[0349] -59- #14604877vl As one example, therefore, to learn the biology of how atherosclerosis develops and how atherosclerotic clinical events occur as a compl ication of the disruption of the underlying plaque with the resulting formation of a thrombus to seal the disrupted plaque, a causal survival DNN with PEM uses cumulative exposure to LDL as the increment of ‘survival’ or follow-up (instead of age or time) to estimate the size of the accumulated plaque burden. At each level (piecemeal) of cumulative exposure to LDL (plaque burden), a separate fully-connected survival DNN is generated to predict the natural log Hazard ratio (InHR) of having an ASCVD event at that level of accumulated plaque burden caused by the combined effect of all other exposures at that point in ti me for the person under consideration (thus capturing the time-varying changes in these other exposures that impact the capacity of the artery to tolerate the accumulated plaque burden). The output of the causal survival DNN with PEM is a vector composed of the InHR for the risk of having an ASCVD event at each level of cumulative exposure to LDL (or accumulated plaque burden) measured in Plaque Years of LDL in mmol / L caused by the combined effect of the changing levels of all other exposures for the person being evaluated.
[0350] An illustration is provided in FIG. 4F which shows a survival DNN trained at each level of cumulative exposure to LDL and combined piecemeal to train Al to learn the biology of how atherosclerosis develops and how the risk of atherosclerotic cardiovascular events at each level of cumulative LDL conditional on exposure to other causes of arterial wall injury that reduce the capacity of the artery to tolerate accumulated plaque burden. Though it should be appreciated that, in other embodiments, a different model (instead of a DNN) may be used together with PEM to predict the log hazard ratio of having an ASCVD at each level of accumulated plaque burden.
[0351] FIGs. 4A, 4B, 4C-1, 4C-2, 4D-1, 4D-2, and 4E-1 and 4E-2 illustrate striking evidence justifying the use of increments of cumulative exposure to LDL as an estimate of increasing plaque burden size to train the algorithms to learn the biology of how atherosclerotic cardiovascular disease develops. In particular, evidence may be derived from naturally randomized causal evidence from Mendelian randomization. As shown in FIG. 4A, using genetic variants associated with LDL (and apoB) as instrumental variables, participants randomized by nature to higher LDL have higher plasma LDL, higher cumulative exposure to LDL and higher absolute cumulative risk of a major adverse cardiac event (MACE), such as an ASCVD event, at all ages as compared to persons randomized by nature to lower LDL.
[0352] However, when these three randomized groups are compared using cumulative exposure to LDL as the increment of survival follow-up - they have the same absolute cumulative lifetime risk of MACE (e.g., ASCVD events) at all levels of cumulative exposure to LDL regardless of the age at which the plaque size accrued, as shown in FIG. 4B. This demonstrates that the risk of atherosclerotic cardiovascular disease events is largely determined by the accumulated plaque size regardless of how fast or when that plaque size accrues - when all other exposures are the same.
[0353] However, the risk of having an ASCVD event at the same level of cumulative exposure to LDL varies based on other exposures that impact the capacity of the artery to tolerate the accumulated plaque -60- #14604877vl burden - including the impact of family history - for example, inherited predisposition to trapping LDL (apoB) particles or propensity for plaque disruption, as shown in FIGs. 4C-1 and 4C-2. FIG. 4C-1 shows the cumulative risk of MACE at each age among persons randomized by nature to higher, average, or lower LDL levels by inherited risk (family history of cardiovascular disease). FIG. 4C-2 shows the cumulative lifetime risk of MACE at each level of cumulative exposure to LDL among persons naturally randomized to higher, average, or lower LDL levels by inherited risk (family history of cardiovascular disease).
[0354] The risk also varies due to other causes of endogenous injury to the artery wall - including T2D from chronically elevated HbAlc (glucose) that can cause negative remodeling leading to small-caliber, diffusely narrowed arteries, as shown in FIGs. 4D-1 and 4D-2. FIG. 4D-1 shows the cumulative lifetime risk of MACE at each age among persons randomized by nature to higher, average, or lower LDL levels by exposure to endogenous arterial injury (Type 2 Diabetes). FIG. 4D-2 shows the cumulative lifetime risk of MACE at each level of cumulative exposure to LDL among persons randomized by nature to higher, average, or lower LDL levels by exposure to endogenous arterial injury (Type 2 Diabetes).
[0355] The risk also varies due to other causes of exogenous injury to the artery wall - including tobacco smoking which increases the risk of having an ASCVD at all levels of cumulative exposure to LDL and corresponding size of accumulated plaque burden by increasing the probability of plaque disruption, as shown in FIGs. 4E-1 and 4E-2. FIG. 4E-1 shows cumulative lifetime risk of MACE at each age among persons randomized by nature to higher, average, or lower LDL levels by exposure to exogenous arterial injury (tobacco smoking). FIG. 4E-2 shows cumulative lifetime risk of MACE at each level of cumulative exposure to LDL among persons randomized by nature to higher, average, or lower LDL levels by exposure to exogenous arterial injury (tobacco smoking).
[0356] This evidence establishes that the survival models described herein, including the survival DNN with PEM, are “causal” models because this evidence and justification
[0357] for using cumulative exposure to LDL as the piecemeal increment of survival to learn the biology of how atherosclerosis develops is based on the randomized - and therefore ‘causal’ - evidence outlined above.
[0358] After act 222 is performed, the process of FIG. 2C proceeds to act 223, which involves estimating, using the values indicative of the log hazard ratios (obtained at act 222), absolute instantaneous rates, cumulative hazard rates, and event rates, of the subject having a cardiovascular event at the respective levels of cumulative LDL exposure.
[0359] In some embodiments, estimating, using the values indicative of the log hazard ratios, the cumulative lifetime risks of the subject having a cardiovascular event at the respective levels of cumulative LDL exposure comprises multiple steps, including:
[0360] (a) generating a data structure encoding a lifetable, the data structure comprising values indicating absolute instantaneous hazard rates of experiencing a cardiovascular event at the respective levels of cumulative LDL exposure, at average levels of all other exposures, in a reference population;
[0361] -61- #14604877vl (b) determining the absolute instantaneous hazard rates of the subject having a cardiovascular event at the respective levels of cumulative LDL exposure by multiplying: (i) the values, from (a), indicating absolute instantaneous hazard rates of experiencing a cardiovascular event at the respective levels of cumulative LDL exposure, at average levels of all other exposures, in the reference population; and (ii) the values indicative of the log hazard ratios for risk of the subject having a cardiovascular event at the respective levels of cumulative LDL exposure;
[0362] (c) determining the cumulative hazard rates of the subject having a cardiovascular event at the respective levels of cumulative LDL using the absolute instantaneous hazard rates, determined at (b); and (d) determining the cumulative event rates of the subject having a cardiovascular event at the respective levels of cumulative LDL using the absolute instantaneous hazard rates, determined at (b).
[0363] Aspects of determining instantaneous hazard rates at the respective levels of cumulative LDL exposure, at average levels of all other exposures, in a reference population are described in more detail in Example 8.
[0364] These computations, in turn, allow for identifying a so-called “personal plaque burden” threshold for the subject. The personal plaque burden threshold may indicate a level of cumulative plaque burden at which cardiovascular events are predicted to begin to occur for the subject, and can be identified by using the cumulative event rates of the subject having a cardiovascular event at the respective levels of cumulative LDL, which are determined at act 223.
[0365] Relatedly, the results of the computations at act 223, may be used to identify plaque levels at which cumulative lifetime risk is below a given threshold (e.g., 10%). For example, in some embodiments, act 223 may further include identifying, for a specified level of cumulative lifetime risk, a level of cumulative plaque burden at which risk of occurrence cardiovascular events for the subject is less than the specified level of cumulative lifetime risk.
[0366] Aspects of the computations performed at act 223 are described further below. These computations involve using the log hazard ratios determined at act 222 to estimate the absolute risk of having an acute cardiovascular event at all levels of accumulated plaque burden depending on a person’s combination of other exposures. In some embodiments, this may be accomplished as follows.
[0367] First, a lifetable may be constructed of the absolute instantaneous hazard of experiencing an atherosclerotic cardiovascular event at all levels of cumulative exposure to LDL (plaque burden) - at the average levels of all other exposures - in a reference population. These data are derived from naturally randomized experiments using genetic instrumental variable LDL scores. These studies demonstrate that persons randomized by nature to higher lifelong exposure to LDL have a higher measured LDL level, a higher corresponding cumulative exposure to LDL, and a higher absolute cumulative risk of having an atherosclerotic cardiovascular event at all ages as compared to persons randomized by nature to lower LDL. However, when using cumulative exposure to LDL as the increment of follow-up from the time of randomization, each of these groups has the same absolute cumulative risk of atherosclerotic cardiovascular events at the same cumulative exposure to LDL (accumulated plaque size) regardless of -62- #14604877vl the age at which the cumulative exposure to LDL was achieved (as shown in FIGs. 4A-4B, for example). This naturally randomized biological evidence demonstrates that the absolute risk of atherosclerotic cardiovascular events is determined by the size of the accumulated plaque burden (measured in cumulative exposure to LDL), regardless of how rapidly the plaque burden is accumulating when all other exposures are equal. (Of note, the results of these naturally randomized trials also provides the motivation, evidence, and justification (as recognized by the inventor) for using cumulative exposure to LDL as the increment of follow-up in the causal survival DNN with PEM described above.).
[0368] Next, the absolute instantaneous hazard rate of having an atherosclerotic event for an individual person at each level of cumulative exposure to LDL (accumulated plaque burden) may be estimated by: multiplying the instantaneous hazard rate in the reference group, by the log hazard ratio derived from the first ML model (e.g., causal survival model with PEM, for example a DNN with PEM model) estimating how much a person’s unique combination of exposures impacts the risk of having a cardiovascular event at each level of cumulative exposure to LDL (accumulated plaque burden).
[0369] Next, the absolute instantaneous hazard rates may be summed to compute the cumulative hazard rates and absolute cumulative event rates at each level of cumulative LDL exposure (plaque burden) for the person under evaluation, conditional on surviving without an event up to that level of plaque burden.
[0370] Next, the personal plaque threshold at which atherosclerotic cardiovascular events are predicted to occur for the person under evaluation may be determined based on the predicted absolute cumulative event rates for respective levels of cumulative LDL exposure. The level of cumulative exposure to LDL at which the size of the accumulated atherosclerotic plaque burden exceeds the threshold size at which atherosclerotic cardiovascular events begin to occur for a specific person may be considered their ‘personal plaque threshold.’ This may be visualized by plotting the predicted absolute cumulative event rates by cumulative exposure to LDL - the plot identifies the cumulative exposure to LDL corresponding to the size of the accumulated plaque burden at which atherosclerotic cardiovascular events are predicted to begin to occur for the person under study. That identified level of cumulative LDL exposure may be considered their ‘personal plaque threshold.’ It follows that this plot can also be used to identify the personal plaque threshold for any level of cumulative lifetime risk of atherosclerotic cardiovascular events by identifying the cumulative exposure to LDL and the corresponding size of the accumulated plaque burden at the any specified level of cumulative lifetime risk. It should be noted that the values for the personal plaque threshold for any cumulative rate of cardiovascular events (risk) can also be obtained directly from the lifetables described herein.
[0371] Overall, the cumulative exposure to LDL corresponding to a specific cumulative lifetime risk of cardiovascular events (all these quantities are now available as a result of computations performed in acts 222 and 223) may be used as a therapeutic target to personalize the prevention of cardiovascular events by providing guidance about how much a person’s LDL level is to be reduced, in order to: slow their rate of plaque progression and / or to keep their cumulative exposure to LDL and the corresponding size of their accumulated plaque burden below the threshold required to achieve a desired level of cumulative -63- #14604877vl lifetime risk. This, therefore, provides the basis for personalizing prevention by titrating each person’s LDL, SBP, and exposure to other modifiable causes of disease to slow the rate of disease progression enough to keep each person below their selected personal plaque threshold (and thus keep their cumulative lifetime risk of having an atherosclerotic cardiovascular event below the selected personal plaque threshold target.)
[0372] An illustrative example of the results of calculations performed at act 223 is shown in FIG. 4G. As described herein, a subject’s remaining cumulative lifetime risk of cardiovascular (e.g., atherosclerotic cardiovascular) events at all levels of cumulative exposure to LDL (accumulated plaque size) caused by the combined effect of their levels of all other exposures at the time that level of cumulative exposure to LDL accrues may be estimated by: a) first constructing a lifetable of the instantaneous hazard of having a cardiovascular event at each increment of cumulati ve exposure to LDL derived from the naturally randomized evidence (for persons with average levels of all other exposures in the reference population); and then, b) multiplying the instantaneous hazard rate at each increment of cumulative exposure to LDL by the exponentiated form of the corresponding log hazard ratio (i.e., the hazard ratio) for a cardiovascular event caused by the subject’s combination of all other exposures at that level of cumulative exposure to LDL derived from the output of the first ML model (e.g., causal survival DNN with PEM). The cumulative hazard rates and cumulative event rates may then be calculated using standard lifetable methods and the cumulative lifetime risk can be plotted as shown in FIG. 4G (shown for a subject with and without hypertension). FIG. 4G shows the cumulative exposure to LDL and corresponding lifetime risk of MACE among persons with and without hypertension.
[0373] FIG. 4H illustrates calculating personal plaque thresholds from LDL cumulative exposure thresholds at which lifetime risk of MACE occur among persons with and without hypertension. As shown in FIG. 4H, a specific threshold for a person with and without hypertension is calculated by identifying the cumulative exposure to LDL above which lifetime risk of MACE exceeds the person’s lifetime risk goal (e.g., 10% as shown in FIG. 4H).
[0374] Returning back to FIG. 2C, after act 223 is performed, the process of FIG. 2C proceeds to act 224, which involves estimating, using the cumulative LDL exposure trajectory for the subject, absolute instantaneous rates (determined at 223), cumulative hazard rates (determined at 223), and event rates (determined at 223), of the subject having a cardiovascular event at the respective ones of the multiple time intervals (e.g., years).
[0375] In some embodiments, this is performed by determining ages at which the respective levels of cumulative LDL exposure occur for the subject by using the cumulative LDL exposure trajectory for the subject (obtained, e.g., by using the first ML model for predicting the person’s trajectory of LDL levels over time). This enables understanding the cumulative lifetime risk of cardiovascular events as a function of age for the subject. For example, as shown in FIGs. 41-1 and 41-2, the lifetable of cumulative lifetime risk of cardiovascular events may be plotted against the subject’s age. FIG. 41-1 shows the cumulative lifetime risk of MACE and corresponding LDL cumulative exposure thresholds for men in a reference -64- #14604877vl population. FIG. 41-2 shows the cumulative lifetime risk of MACE and corresponding LDL cumulative exposure thresholds for women in a reference population.
[0376] Aspects of mapping “risk by cumulative LDL exposure” to “risk by age” are now further described. In particular, mapping the age at which each corresponding level of cumulative exposure to LDL occurs for a subject may be performed by using the output from the LDL bi-LSTM (or any other type of model that predicts the subject’s trajectory of LDL levels over time).
[0377] The resulting lifetable of risk by age provides the predicted absolute instantaneous hazard rates, cumulative hazard rates, and cumulative event rates at each age for the subject based on the size of the accumulated plaque burden (cumulative exposure to LDL), their rate of plaque progression, and their combination of other exposures.
[0378] The creation of a lifetable of absolute hazards by age enables: communicating risk in a clinically useful way, computing updated predictions of remaining lifetime risk over time as a person survives previous risk intervals without experiencing an event, and computing the benefit of reducing exposure to the modifiable causes of disease by both magnitude, duration, and / or timing of intervention. In addition, the lifetable of absolute hazards by age allow for the cataloging and quantification of the legacy benefit derived from earlier interventions to reduce exposure to the modifiable causes of disease. This prevents forgetting of the benefit of earlier interventions and enables accurate prediction of remaining lifetime risk (of cardiovascular events) and benefit (of particular interventions or sequences of interventions), as well as guiding the appropriate selection of optimal sequence of interventions required to achieve the desired therapeutic goals, and accounting for the accruing benefit of earlier interventions to reduce exposure to the modifiable causes of disease.
[0379] In summary, at the completion of the process shown in FIG. 2C, the results of the survival analysis (e.g., using a survival DNN with PEM) using cumulative exposure to LDL as the interval of follow-up (as described with respect to act 222), combined with the lifetable analyses informed by the results of the naturally randomized trials of cumulative exposure to LDL (as described with respect to act 223) provide the predicted risk of developing an atherosclerotic cardiovascular event based on the size of a person’s accumulated plaque burden, and their combination of other exposures that impact the capacity of the artery to tolerate the accumulated plaque burden, at all future time points up to any age of the subject (with the mapping from cumulative LDL levels to age performed as described with respect to act 224).
[0380] FIG. 4J provides an illustration of a subject’s cumulative lifetime risk of having a cardiovascular event at any point in time as a function of the size of the accumulated plaque and the combined effect of other causes of arterial wall injury that determine the capacity of the artery to tolerate the accumulated plaque burden.
[0381] In addition, in some embodiments, the predictions the SBP level trajectory and the rate of rise of SBP (e.g., computed as described with reference to FIG. 2B) may be used to provide predictions for the risk of developing hypertension and the age at which hypertension is likely to occur.
[0382] -65- #14604877vl Determining Benefit of Administering Therapeutic Interventions) to Reduce Risk That Subject Develops Cardiovascular Disease
[0383] As shown in FIG. 2A, process 200 includes act 230 that involves determining, using the multiple measures of risk (obtained at act 220) that the subject develops cardiovascular disease and at least one second trained ML model (e.g., at least one trained deep neural network), the benefit of administering to the subject one or more therapeutic interventions designed to reduce risk of cardiovascular disease by targeting one or more modifiable causes of the cardiovascular disease (e.g., LDL levels, SBP levels). For example, the at least one second trained ML model may include one trained ML model (e.g., a first c-DNN-ODE) to estimate the benefit of administering to the subject one or more interventions for lowering LDL and another trained ML model (e.g., a second c-DNN-ODE) to estimate the benefit of administering to the subject one or more interventions for lowering SBP. In some embodiments, the at least one second trained ML model may include yet another trained ML model (e.g., a third c-DNN-ODE) to estimate the benefit of administering to the subject an intervention for lowering both LDL and SBP.
[0384] In some embodiments, this may be performed in accordance with the illustrative process shown in FIG. 2D. As shown in FIG. 2D, act 230 may involve: (1) generating, at act 231, one or more therapeutic intervention sequences; (2) filtering, at act 232, the generated sequences using a domain expert clinical translation heuristic; and (3) determining, at act 233, the benefit of the filtered intervention sequence(s) using the at least one second trained ML model.
[0385] Before further describing each of the acts 231-233, which involve evaluating benefit of multiple sequences of interventions, we begin by describing how the benefit of targeting one or more modifiable causes of disease may be quantified and estimated. We will then return to a discussion of the acts 231-233.
[0386] Determining Benefit of Lowering LDL or SBP for Subject
[0387] In some embodiments, the benefit of targeting one or more modifiable causes of disease for a subject may be quantified and estimated. In some embodiments, the benefit of lowering LDL or SBP for a subject may be quantified and estimated, for example, by determining the expected proportional and / or absolute reduction in risk of cardiovascular events for the subject if the LDL or SBP were lowered (e.g., responsive to a particular therapeutic intervention sequence of one or more therapeutic interventions designed to target LDL and / or SBP levels).
[0388] In some embodiments, determining the expected proportional and / or absolute reduction in risk involves two distinct steps. The first step involves obtaining estimates of the magnitude of the hazard ratio per unit lower LDL, or per unit lower SBP, during each year of life (or other interval of follow-up) among participants randomized by nature to lifelong exposure to lower LDL or lower SBP, respectively. This may be performed using an innovative type of ML model described herein, which in some embodiments may be a causal DNN for ordinary differential equations (c-DNN-ODE) model. The second -66- #14604877vl step involves combining the estimates provided by the c-DNN-ODE model(s) (which relate to benefits of lowering LDL and / or SBP in a population) with the predicted instantaneous hazard rates that the subject experiences a cardiovascular event in each interval of follow up from the subject’s current age to age 80 or other upper threshold age (these hazard rates relate to the subject specifically and have been determined as already described with respect to FIG. 2C).
[0389] With respect to the first step, in some embodiments, at least one machine learning model (e.g., the at least one second trained ML model referenced in FIGs. 2A and 2D) is trained on causal data from two sources: (1) randomized trials of LDL and SBP lowering therapies, respectively and (2) Mendelian randomization studies with individual participant follow-up data evaluating genetic variants associated with lower LDL (apoB) or SBP, respectively. Each such ML model may be a deep neural network model and, for example, may be a causal DNN ODE (c-DNN-ODE) model as described herein. Thus, in some embodiments, the at least one second ML model may include a c-DNN-ODE model (or other type of model) for estimating the benefit of lowering LDL and another c-DNN-ODE model (or other type of model) for estimating the benefit of lowering SBP. Yet another c-DNN-ODE model (or other type of model) may be used for estimating the benefit lowering LDL and SBP.
[0390] In some embodiments, a c-DNN-ODE model includes a deep neural network to estimate a function describing a smooth curve that can be used to compute the instantaneous hazard ratio for a standard increment of lowering a biomarker associated with a modifiable cause of disease that causes accumulating irreversible structural injury over time, for example the instantaneous hazard ratio of lowering LDL (1 mg / dL) or lowering SBP (1 mmHg), during each interval of treatment over time (without the need to specify the closed form of the function). The DNN is further combined an ordinary differential equation solver to estimate how much the benefit of lowering LDL or SBP changes during infinitesimally small increments of increasing duration of treatment.
[0391] In this way, the c-DNN-ODE model transforms the idea of passing data through a fixed sequence of layers in a neural network into a continuous process of transformations between layers where data evolves smoothly over time (analogous to a system governed by a differential equation), and therefore enables quantifying how much the magnitude of the instantaneous hazard ratio for an increment of lower LDL or SBP changes over time - and this dynamically changing hazard ratio provides a quantitative estimate of the expected proportional clinical benefit of sustained LDL or SBP lowering over time.
[0392] The intuition behind the c-DNN-ODE model (or other model that can be trained on similar data) is that the results of randomized trials can be reanalyzed by constructing a lifetable to recover the instantaneous hazard rates for the desired outcome event (or composite event) in either treatment arm at every interval of follow-up. The recovery of the instantaneous hazard rates permits the computation of the corresponding instantaneous hazard ratio within each increment of follow-up. In turn, combining the observed instantaneous hazard ratios during each interval of follow-up with the corresponding absolute difference in LDL or SBP between the randomized treatment arms observed during the same increment
[0393] -67- #14604877vl of follow-up permits computation of the instantaneous hazard ratio per unit lower LDL, or per unit lower SBP, during each increment of follow-up during the trial.
[0394] Combining the estimated hazard ratio per unit lower LDL, or per unit lower SBP, during each increment of follow-up from multiple randomized trials in an inverse variance-weighted metaanalysis provides a robust estimate of the magnitude of the instantaneous hazard ratio per unit lower LDL, or per unit lower SBP, during each treatment interval. Moreover, plotting the magnitude of the summary hazard ratio during each interval of follow-up provides a visual and quantitative assessment of how the hazard ratio changes over ti me - thus providing a direct estimate of how much the benefit of lowering LDL or SBP increases over time.
[0395] Extending this same intuition to the analysis of nature’s randomized trials using individual participant data from Mendelian randomization studies evaluating genetic variants associated with lower LDL (apoB) or SBP, respectively, permits the construction of lifetables containing the instantaneous hazards and instantaneous hazard ratios during each year of life (or other interval of follow-up) among participants randomized by nature to lifelong exposure to lower LDL or lower SBP, respectively, beginning at birth until age 80 years of age. In particular,
[0396] combining the observed instantaneous hazard ratios during each interval of follow-up with the corresponding absolute difference in LDL or SBP observed between the groups randomized by nature to higher or lower LDL or SBP, respectively, during the same increment of follow-up, permits computation of the instantaneous hazard ratio per unit lower LDL, or per unit lower SBP, during each year of life. Moreover, combining the results of numerous different Mendelian randomization studies evaluating hundreds of different genetic variants associated with lower LDL or lower SBP, respectively, in an inverse variance- weighted meta-analysis provides a more robust estimate of the magnitude of the hazard ratio per unit lower LDL, or per unit lower SBP, during each year of life (or other interval of follow-up) among participants randomized by nature to lifelong exposure to lower LDL or lower SBP, respectively.
[0397] In some embodiments, the c-DNN-ODE takes as input the estimated instantaneous hazard ratios ordered by duration of follow-up from the analyses of both the randomized trials and Mendelian randomization studies; and processes these data to differentiate a continuous series of infinitesimally small increments of increasing follow-up time to provide a continuous estimate of the magnitude of the instantaneous hazard ratio for a one unit increment of lower LDL, or lower SBP, for any duration of intervention.
[0398] Finally, the time-averaged hazard ratio for a one unit lower LDL, or SBP, for all durations of sustained intervention of follow-up may be computed by iteratively calculating the weighted average of the c-DNN-ODE estimated instantaneous hazards in sequence from the start of therapy.
[0399] Accordingly, in some embodiments, the c-DNN-ODE model may be used to determine a vector of the ‘time-averaged’ log hazard ratio for a standardized decrement of LDL (1 mmol / L or other suitable unit) or SBP (10 mmHg or other suitable unit) (or both) during the corresponding duration of intervention extending from 1 month to 80 years.
[0400] -68- #14604877vl Aspects of the foregoing are illustrated in FIGs. 5A-5H. To begin, as shown in FIG. 5A, to estimate the benefit of reducing exposure to a modifiable cause of disease that causes irreversible structural injury that accumulates over time (such as LDL or SBP), a causal deep neural network for ordinary differential equations (c-DNN-ODE) is trained on randomized training data comprising the instantaneous hazard ratio of a standardized increment of lower LDL (or SBP) during every month of follow-up in randomized clinical trials RCTs (extending from 1-6 years), and every year of follow-up from Mendelian randomization studies extending from age 25-80 years. FIG. 5A illustrates these data by plotting the absolute value of the log hazard ratios against treatment duration (in years in this example). Note that despite the numerous randomized clinical trials and Mendelian randomization studies used to generate the training data, there is a gap in the data between 7 and 25 years, which influences the type of model that may be trained using such data without systematically underestimating the magnitude of the causal estimate of effect. Notably, the c-DNN-ODE method described herein is designed to avoid such systematic underestimates, whereas conventional machine learning models that are trained using regularization (e.g., 11, 12, Lasso, etc.) undesirably produce these underestimates.
[0401] FIG. 5B then shows the instantaneous log hazard ratio in each time-interval of follow-up (up through age 80) as estimated by the c-DNN-ODE model and the time-averaged log hazard ratio of all follow-up intervals up to and including the current interval. These non-linear curves show that the c-DNN-ODE “learned” the benefit of lowering LDL by magnitude and duration. Again, different c-DNN-ODE models would be used to predict benefits of lowering LDL and SBP.
[0402] FIGs. 5C-5H show empirical evidence validating this approach on randomized data by empirically demonstrating that it accurately estimates the benefit of life long exposure to lower LDL at every age of life in Mendelian randomization studies designed as naturally randomized trials.
[0403] FIGs 5C-5E show empirical evidence validating the approach on data from Mendelian randomization studies.
[0404] FIG. 5C shows that the c-DNN-ODE model accurately estimates the benefit of life long exposure to lower LDL at every age of life in Mendelian randomization studies designed as naturally randomized trials. The average LDL level is shown by the solid black curve, the c-DNN-ODE estimate is shown by the dashed line and accurately tracks the lower cumulative lifetime risk line shown by the gray solid line.
[0405] FIG. 5D shows that the c-DNN-ODE model accurately estimates the benefit of life long exposure to lower SBP at every age of life in Mendelian randomization studies designed as naturally randomized trials. The average SBP level is shown by the solid black curve, the c-DNN-ODE estimate is shown by the dashed line and accurately tracks the lower cumulative lifetime risk line shown by the gray solid line.
[0406] FIG. 5E shows that the c-DNN-ODE model accurately estimates the benefit of life long exposure to combined lower LDL and SBP at every age of life in Mendelian randomization studies designed as naturally randomized trials. The average LDL and SBP level is shown by the solid black curve, the c-DNN-ODE estimate is shown by the dashed line and accurately tracks the lower cumulative lifetime risk line shown by the gray solid line.
[0407] -69- #14604877vl FIGs 5F-5H show empirical evidence validating the approach on data from randomized trials. FIG. 5F shows that the c-DNN-ODE model accurately estimates the benefit of pharmacologic short-term LDL lowering started later in life during every month of follow-up in randomized trials of LDL lowering therapies. The solid black line indicates a placebo, the dashed line is the c-DNN-ODE estimate and accurately tracks the LDL lowering therapy line shown by the gray solid line.
[0408] FIG. 5G shows that the c-DNN-ODE model accurately estimates the benefit of pharmacologic short-term SBP lowering started later in life during every month of follow-up in randomized trials of SBP lowering therapies. The solid black line indicates a placebo, the dashed line is the c-DNN-ODE estimate and accurately tracks the SBP lowering therapy line shown by the gray solid line.
[0409] FIG. 5H shows that the c-DNN-ODE model accurately estimates the benefit of pharmacologic short-term LDL and SBP lowering started later in life during every month of follow-up in randomized trials of combined LDL and SBP lowering therapies. The solid black line indicates a placebo, the dashed line is the c-DNN-ODE estimate and accurately tracks the LDL and SBP lowering therapy line shown by the gray solid line.
[0410] Before we proceed further to explain how the estimates provided by the c-DNN-ODE model (quantifying benefits of lowering LDL and / or SBP in a population) are combined with the predicted instantaneous hazard rates that the subject experiences a cardiovascular event in each interval of follow up, additional aspects of the c-DNN-ODE model are described.
[0411] Additional Aspects Relating to the c-DNN-ODE model
[0412] The c-DNN-ODE model is designed to find a function that describes the change in one variable (F) over time caused by another variable (X) by modelling continuous changes in the data as it passes between layers of the neural network to capture the non-linear dynamics of how the causal effect of X on Y changes over time. In the context of the technology described herein, the biological cause and effect may be modelled using the observed log hazard ratio during every month (or other interval) of follow-up in randomized trials evaluating LDL or SBP lowering therapies, respectively, standardized for a one-unit absolute observed difference in LDL or SBP during the corresponding time interval; and the observed log HR during every year of life among participants randomized by nature to higher or lower LDL or SBP, respectively, standardized for a one-unit absolute observed difference in LDL or SBP during the corresponding time interval. Each unit lower LDL or SBP represents proportionally less incremental change from the previous time unit, and thus the log hazard ratio eventually approaches an asymptote quantified benefit measured as an instantaneous log hazard ratio.
[0413] The c-DNN-ODE model may be designed to quantify how much the proportional reduction in cardiovascular events caused by lowering LDL or SBP changes over time. By using a DNN for ODE, we v model both the relationship between x and y; and the dynamics of the rate of change (the derivative, ). This differential information allows the network to infer a likely "true curve" representing the relationship governed by the biological impact of slowing the trajectory of atherosclerosis by reducing -70- #14604877vl the number of atherogenic lipoproteins that become trapped within the artery wall over time, incorporating the inherent variability of data points.
[0414] The resultant DNN-ODE model is causal because it is trained exclusively on randomized data from: (1) randomized trials of LDL and SBP lowering therapies, and (2) Mendelian randomization studies evaluating genetic variants associated with LDL and SBP designed as naturally randomized trials. The c-DNN-ODE model not only matches the predicted values with experimental results, but also learns the trajectory of y over x, constrained by a smooth, plausible curve that reflects a governing law (biological effect). By simultaneously learning the derivative (or instantaneous slope) of the change in y at each value of x along with predictions of the output, the network constrains itself to a more consistent and smooth shape that aligns with the expected non-linear biological effect of reducing LDL or SBP on the underlying biological processes of atherosclerotic plaque progression and the propensity of the accumulated plaque to physically disrupt.
[0415] Using the ODE approach, the network is regularized by focusing not only on fitting y values, but also on matching the rate of change of y over x. This focus on differential information helps smooth out experimental variability, leading the model to converge toward the most likely true, smooth curve that represents the biological relationship. This robustness to variability enhances the model’s capacity to generalize, providing a clearer picture of the actual underlying curve. In addition, when learning a function by fitting randomized causal experimental data, using an ODE solver eliminates the need for other regularization techniques (including L1 and L2 regularization methods). Though it should be appreciated that, in other embodiments, LI or L2 or other type of regularization methods may be used instead of or in addition to an ODE solver.
[0416] That said, one benefit of the ODE approach relative to some other types of regularization methods is that some regularization methods shrinks estimates to reduce sensitivity to outliers, thereby biasing the predicted true causal effects toward the null and, therefore, systematically underestimating both the proportional and absolute magnitude of the benefit of reducing exposure to a modifiable cause of disease, which in turn may lead to systematically underestimating how much relative or proportional benefit increases over time. Thus, combining an ODE with a DNN reduces or outright eliminates the potential for learning estimates of the true causal effects that are systematically biased toward the null.
[0417] This is an especially important point because, although the training data available is derived from hundreds of randomized clinical trials and thousands of naturally randomized trials, there is little data in the gap between years 7 and 25 (e.g., as can be seen in FIGs. 5A and 5B). Thus, it is important to avoid systematically underestimating the magnitude of the causal estimate of effect by using regularization techniques common to most machine learning and deep learning methods.
[0418] As described above, in some embodiments, the c-DNN-ODE may be a DNN-ODE model trained on randomized data from: (a) randomized trials of LDL and SBP lowering therapies, and (b) Mendelian randomization studies evaluating genetic variants associated with lower LDL and SBP designed as naturally randomized trials.
[0419] -71- #14604877vl In some embodiments, the c-DNN-ODE may be trained using randomized data obtained from: (a) participant data for at least 1 million (M) or 1.5M participants enrolled in one or more prospective studies, whereby for each of the at least IM or 1.5M participants at least one LDL or SBP measurement were available along with a recorded age or date at which a first episode of a fatal or non- fatal myocardial infarction (MI), fatal or non-fatal ischemic stroke, or coronary revascularization occurred, with censoring applied at time of last follow-up, death, or first cardiovascular event; and (b) participant data from at least 250K or 500K participants enrolled in at least 25, 50, or 75 randomized trials evaluating LDL or BP lowering therapies that provided time-to-event curves, and measurements of absolute difference in LDL or SBP between randomized groups in the randomized trials.
[0420] In one illustrative example, the c-DNN-ODE may be trained using individual participant data (IPD) from 1,623,491 participants enrolled in one of three (3) long-term prospective biobank cohort studies for whom at least one LDL or SBP measurements was available, and for whom medical records were available recording age (date) at which the first episode of a fatal or non-fatal MI, fatal or non-fatal ischemic stroke, or coronary revascularization occurred. Participants were censored at time of last follow-up, death, or first cardiovascular event. Data from 527,512 participants enrolled in 76 randomized trials evaluating LDL or BP lowering therapies that provided time-to-event curves, and measurements of the absolute difference in LDL or SBP between the randomized groups. In addition, summary data from 2.6 million participants with the age (dates) at which a first atherosclerotic cardiovascular event was recorded, was used for additional external validation experiments.
[0421] In more detail, the training data may include training data derived from randomized trials and Mendelian randomization studies. In particular, for randomized trials, the training data may include individual participant data (IPD) and / or summary level data from all published randomized trials of LDL or SBP lowering therapies that met the following criteria: randomized, double blinded cardiovascular outcomes trial comparing an LDL or SBP lowering therapy, respectively, to either placebo or usual care; at least 1000 participants randomized to each treatment arm; at least 1 year of follow-up; provided data including cumulative event curves on cumulative rates of major adverse cardiovascular events; provided data on absolute magnitude of the difference in LDL or SBP between the treatment arms at more than one point in the trial. For Mendelian randomization studies, the training data may include IPD from UK Biobank and FinnGen Project - using genetic variants associated with LDL (and both directionally and proportionally concordant changes in apoB) or SBP to construct instrumental variables.
[0422] Further details regarding generating training data for training of the c-DNN-ODE, in some example implementations, are as follows. For an given randomized trial, we can use the reported cumulative event curve to recover the instantaneous hazard of having an outcome event during any interval of follow-up since randomization conditional on surviving to that interval without having had an event in both the treatment and intervention groups. Based on the number of participants surviving to any given time point without an event, we can recreate the lifetable and calculate both the instantaneous hazards and corresponding standard errors for all follow-up intervals. Comparing the instantaneous -72- #14604877vl hazards during each time interval, we can obtain the instantaneous log hazard (InHR) estimating the benefit of intervention during that time interval (and corresponding standard errors) for the observed difference in absolute reduction of the biomarker targeted by the intervention during the same interval of time. By dividing the instantaneous InHR (and its SE) by the absolute magnitude of the achieved biomarker reduction achieved during each interval, we obtain the predicted benefit of a 1 unit (sustained) change in that biomarker during each interval.
[0423] This process can be repeated for hundreds or LDL and SBP lowering trials. In the combined dataset, we can obtain an overall average summary estimate of the instantaneous InHR (and corresponding standard error) for a one unit change in the targeted biomarker at each interval of followup by combining the InHR (SE) for a one unit change in biomarker in all of the studies (that contributed data up to and including this interval) by calculating the inverse variance weight average (equivalent to a meta-analysis of each time interval).
[0424] We can extend this intuition to obtain the same information for longer durations of sustained interventions from nature’s randomized trials. This is accomplished by conducting a Mendelian randomization study designed as a longitudinal time-to-event naturally randomized trial using participant level data (or event curve data if necessary). This study design provides a lifetable of instantaneous hazards of the risk of having an outcome event during any interval of follow-up from the time of randomization (birth, or more precisely - conception).
[0425] For each interval of follow-up we can obtain the causal effect of the intervention (genetic instrument) on the risk of having an outcome event during that interval from the ratio of instantaneous hazards to produce a InHR for that interval. From the number of participants at risk during each interval (from the lifetable), we can obtain the SE of the InHR during each interval. By obtaining the corresponding magnitude absolute reduction in a biomarker during each interval caused by the genetic variant (or genetic instrumental variable composed of more than one independently inherited variant), we can obtain the InHR for a one unit change in the biomarker due to the intervention during that interval by dividing the InHR (and its SE) by the absolute change in the biomarker observed during that interval.
[0426] This process can be repeated for thousands of genetic variants associated with LDL (directionally consistent and proportionally consistent with changes in apoB) or SBP as instruments of both randomization and effect; and for hundreds of instrumental variable genetic LDL or SBP scores limited to independently inherited variants meeting specific inclusion criteria. In the combined dataset, we can obtain an overall average summary estimate of the instantaneous InHR (and SE) for a one unit change in the targeted biomarker at each interval of follow-up by combining the InHR (SE) for a one unit change in biomarker in all of the studies (that contributed data up to and including this interval) by calculating the inverse variance weight average (equivalent to a meta-analysis of each time interval).
[0427] The resulting final combined dataset is composed exclusively of randomized evidence estimating the magnitude of the causal effect of sustained reductions in LDL or SBP or both at all time intervals from 1 month to 80 years of follow-up from the time of randomization. Because the data is derived -73- #14604877vl exclusively from randomized and therefore unbiased evidence, the included effect estimates represent unconfounded causal estimates of effect that permit the training of ‘casual’ DNN for ODE. In other words, training a DNN for ODE using these training data is what renders the resulting model ‘causal.’ In some embodiments, the DNN for ODE model trained may have the architecture as described in “Neural Ordinary Differential Equations”, Chen, R. T. Q., Rubanova, Y., Bettencourt, J. and Duvenaud, D. K, in Advances in Neural Information Processing Systems, volume 31, 2018.
[0428] Further details regarding aspects of calculations with a c-DNN-ODE model and training c-DNN-ODE models are as follows. FIG. 10A shows a simple fully-connected two-layer deep neural network for ordinary differential equations (DNN for ODE). In a DNN for ODE, both a predicted value of y, and the derivative or instantaneous slope of the change in y at a given value of x are calculated at each node as shown in FIG. 10B.
[0429] dy
[0430] The overall derivative of for the change in Y for a given X can be calculated from the partial derivatives at each node using the chain rule:
[0431] dy [Mathematical derivative chain rule expression]
[0432]
[0433] [Mathematical derivative chain rule expression - denominator terms] Note that the partial derivatives at each node are weighted by the same learned weights as those used to predict the value of Y at each node; and f () represents the activation function of the node output (e.g., ReLU, sigmoid, etc.), so that all derivatives are computed on the activated node output.
[0434] One possible loss function for training a DNN-ODE model may be the sum of the squared differences between the predicted and actual values. Because the model simultaneously calculates both the predicted output y, and the derivative or the rate of change in y with respect to x, the loss function seeks to minimize the mean squared error from the absolute difference between the predicted and observed data for both of these parameters.
[0435] For example, the loss function may be calculated as follows:
[0436] • the neural network takes X as input and outputs Y and the rate of change of Y to fit a function:
[0437]
[0438] • during each epoch, a numerical integration method (like the Euler method or Runge- Kutta) is used to reconstruct the value of Y at different points along X
[0439] y{t-. ) ODE-Solver(f),r(), i(„ )
[0440] • the predicted and observed values of Y, and the predicted and observed rate of change in Y at each value of X, are then compared to calculate the mean squared error as:
[0441] [Loss function mathematical expression]
[0442]
[0443] dx dx
[0444] where: ypred and ytrUe and are the predicted and actual y values and
[0445]
[0446] are the predicted and actual x slopes, and a is a weighting factor to balance the contribution of the two terms.
[0447] -74- #14604877vl • The network weights and biases are updated using backpropagation to simultaneously minimize the combined squared errors for both the predicted values of Y and the instantaneous rate of change in Y.
[0448] In some embodiments, the c-DNN-ODE may be implemented using a combination of standard PyTorch modules along with specialized libraries for solving differential equation, including: (1) torch.nn (e.g., for building the base layers of the neural network, e.g., nn. Linear, nn. Sequential, etc.); (2) torchdiffeq - An external library specifically for solving differential equations with PyTorch. This library provides solvers like torchdiffeq. odeint that can handle neural ODEs. The ODE function can be defined as a neural network and passed to odeint to integrate over time: and (3) Autograd (automatic differentiation) - Py Torch’s autograd functionality is essential for computing gradients in a neural ODE. It enables the computation of the derivatives of loss with respect to model parameters.
[0449] In some embodiments, the c-DNN-ODE model may be a fully connected deep neural network predicting Y and the instantaneous rate of change in Y. The deep neural network may have at least 3, 4, or 5 fully-connected layers. In one illustrative example, the architecture of the DNN may comprise an input layer (e.g., one dimension specifying the duration of time) followed by 8, 16, 8, and 1 -dimensional layers, all fully connected, with bias terms and ReLu non-linearities.
[0450] Notwithstanding the foregoing discussion of c-DNN-ODE models, it should be appreciated that, in
[0451] some embodiments, one or more other types of models may be trained on these types of training data (including data from randomized trials and Mendelian randomization studies) instead of a c-DNN-ODE model. For example, the at least one second ML model may comprise a non-linear regression model, an adaptive basis function regression model, a neural network regression model, a deep neural network regression model, a logistic regression model, a polynomial regression model, a decision tree regression model, a random forest regression model, and / or a gradient boosted decision tree regression model, with any one of these models being regularized in any suitable way. We provide a few comments regarding such alternatives next.
[0452] One example of an alternative method is a polynomial regression using the log hazard ratio InHR as the dependent variable and increment of follow-up time as the independent variable. However, although this approach may be used, there are drawbacks to using this approach and which may limit its usefulness. First, selection of the order of the polynomial has to be limited to those that produce a biologically plausible curve without too many inflection points. Second, the most biologically plausible curve generally is not the one with the greatest r-squared value (i.e., it is not the one that minimizes the mean-squared error between the observed and predicted points on the curve). Selecting the correct polynomial therefore is a matter of judgement informed by domain area expertise, and generally requires an iterative process of trial and error. Naively minimizing the loss function would produce biologically implausible curves by overfitting the data (particularly when using the raw data for the InHR at all ti me points in both randomized trials and Mendelian randomization studies designed as longitudinal time-to- -75- #14604877vl event naturally randomized trials; rather the inverse variance weighted meta-analysis or summary estimate of effect, or InHR, at each increment of follow-up).
[0453] Alternatively, one can combine polynomial regression with other methods. For example, a gradient boosted machine can be used to predict the InHR at each time point, but cannot make predictions for intervention follow-up times ranging from 6 or 7 years from RCTs) to 30 years (where the naturally randomized trials begin estimate of InHR at yearly or other interval time up to age 80). As a result, gradient boosted machines and other tree-based methods cannot make estimates or create a continuous curve predicting proportion benefit (InHR) by magnitude and duration of sustained intervention. One can attempt to solve this problem by running two gradient boosted machines: one on the trial data for intervention durations from 0-7 years, and one on the naturally randomized trials between years 30 to 80. We could then plot the two curves, and then try to connect the curves using polynomial regression or other similar method. While possible, it is difficult to have confidence in the shape of the curve estimating benefit during the missing years where data is not available (e.g., years 7-30); and the resulting curve is highly dependent on the order of the selected polynomial. And even if the order of the polynomial were selected by a statistical software package to minimize the loss or maximize fit, it may produce a biologically and clinically implausible curve. This hybrid method exacerbates the limitations of polynomial regression because the gradient boosted machines often overfit the data when modelling two separate situations (short term trials, and long-term naturally randomized trials) - compared to using a single gradient boosted machine on the entire data set. Finally, the gradient boosted machine also uses a LI or L2 regularization to avoid this overfitting problem. However, these regularization methods are mere shrinkage functions that minimize the impact of outliers on overfitting - and systematically bias the results toward the null, thus systematically underestimating the causal effects of the magnitude of the benefit of the intervention. This is exactly what we wish to avoid. The entire point of creating a ‘causal’ model trained exclusively on randomized and therefore ‘causal’ data is to estimate true magnitude of the ‘causal’ effects so that we can accurately design and predict the results of RCTs, and make recommendations about the interventions needed to slow disease to prevent clinical events and accurately predict the benefit of those intervention - is to have accurate, unbiased estimates of biological cause and effect. Other ML and DL methods, including elastic net regression, DNN, random forests, and the like, suffer from the same drawbacks described above for either polynomial regression, gradient boosted machines, or both. In summary, though such alternatives may be employed, they have some drawbacks.
[0454] By contrast, the ‘causal’ DNN for ODE described herein (‘causal’ because it is trained exclusively on randomized and therefore ‘causal’ data) can take as input either all of the raw data, or the inverse variance-weighted summary estimates of the InHR at each increment of follow-up as the dependent variable and time as the independent variable (with or without other inputs - all of which should be equally distributed because we are exclusively using randomized data and therefore they are unlikely to provide material information unless they are strong effect modifiers) and produce a smoothly differentiated curve that also spans the gap of absent data (e.g., as shown in FIGs. 5A and 5B); and does -76- #14604877vl so without underestimating the causal magnitude of the benefit of the intervention by avoiding a regularization shrinkage function, and instead using differentiation as the method to minimize errors (by simultaneously minimizing the error between the predicted and observed InHR, and the predicted and observed change in InHR from the previous increment or the differentials).
[0455] Given the foregoing description of how estimates quantifying benefits of lowering LDL and / or SBP may be obtained by a c-DNN-ODE model or other model, we next describe how such estimates may be combined with the predicted instantaneous hazard rates that the subject experiences a cardiovascular event in each interval of follow up.
[0456] In particular, in some embodiments, the expected benefit of lowering LDL or SBP over any time interval for the individual person under consideration may be computed as follows.
[0457] First, we access values of the predicted instantaneous hazard rates of experiencing a cardiovascular event during each interval of follow-up from the current age to age 80 years in the lifetable of risk by age constructed for that person as described herein including with reference to FIG.
[0458] 2C (e.g., by using outputs from the bi-LSTM(s) and survival DNN with PEM using cumulative LDL as the increment of analysis).
[0459] These values are then multiplied by the c-DNN-ODE estimated time-averaged instantaneous log hazard ratio for a one unit lower LDL or SBP corresponding to the duration of treatment at that interval of follow-up (or age), and also adjusted for the expected absolute magnitude of the reduction in LDL or SBP in response to the recommended intervention using the Wald ratio of effect estimates method (see e.g., Example 7). Different c-DNN-ODEs would be used for LDL and SBP benefit estimates. That is, one c-DNN-ODE would be used to estimate time-averaged instantaneous log hazard ratios for a one unit lower LDL and another c-DNN-ODE would be used to estimate time-averaged instantaneous log hazard ratios for a one unit lower SBP.
[0460] Next, the estimated cumulative hazard of experiencing a cardiovascular event, and the corresponding cumulative event rate, at each age of follow-up, conditional on surviving to an interval without experiencing an event, may then be computed using standard lifetable analysis of the treatment adjusted predictions.
[0461] Finally, the expected clinical benefit of lowering LDL or SBP for the individual person under consideration is computed as follows. The predicted proportional reduction in the risk of experiencing a cardiovascular event at any duration of follow-up may be computed as the ratio of the predicted cumulative hazard at that duration of follow-up (age) adjusted for the magnitude and duration of LDL or SBP lowering, to the expected cumulative hazard without intervention to lower LDL or SBP. And the predicted absolute reduction in the risk of events may be computed as the absolute difference between the predicted absolute cumulative event rate at a specific duration of follow-up (age) adjusted for the magnitude and duration of LDL or SBP lowering, to the expected absolute cumulative event rate at the same duration of follow-up without intervention to lower LDL or SBP.
[0462] -77- #14604877vl Figures 6A-6D illustrate these computations. For example, FIG. 6A shows the benefit of lowering LDL by 35% beginning at the subject’s current age of 40 years. To estimate the benefit of a specific LDL intervention beginning at the current age (e.g. once-yearly intervention to lower LDL expected to achieve a 35% ‘time-averaged’ reduction in LDL) and extending until age 80 (or any other duration), the transposed lifetable of instantaneous hazards for a cardiovascular event during each year of life caused the accumulated plaque burden and combined effect of other causes of arterial wall injury at every point in time for the person being evaluated, is multiplied by the exponentiated form of the log hazard ratio for a 1 (one) mmol / L, adjusted for the expected absolute magnitude of LDL reduction for a 35% proportional reduction in LDL based on the person’s baseline untreated LDL level, during each increment of treatment - where the time-averaged log hazard ratio derived from the c-DNN-ODE model increases as the duration of treatment increases. The intervention-adjusted ‘on-treatment’ instantaneous hazard rates are then combined using standard lifetable methods to calculate the predicted cumulative hazard and cumulative event rates at each age for the person being evaluated for both the treated and untreated states. The resulting cumulative event rates with and without treatment are shown in FIG. 6 A.
[0463] For comparison, the expected reduction in lifetime risk from lowering LDL using the same once-yearly treatment beginning when 10-year risk exceeds 10% (according to current guidelines) is shown in FIG. 6B.
[0464] As described herein including in the next section, such analysis may be repeated for multiple different intervention sequences - constrained by the imposed domain specific heuristic that once started a therapy is continued, or increased intensity, or another therapy is added to enhance clinical translation. For example, FIG. 6C compares benefit of therapeutic intervention sequences for lowering LDL that vary by the age of the subject at which the therapeutic intervention sequence commences. As another example, FIG. 6D compares the benefit of therapeutic intervention sequences that vary by age of the subject at which they begin as well as by the magnitude of intervention.
[0465] Estimating How Much Additional Cumulative Reduction in LDL, SBP, or Both is Needed to Overcome Increased Risk Caused by Amount of Inherited Lp(a)
[0466] As described herein, the causal effect of cumulative exposure to Lp(a) at all levels of cumulative exposure to LDL (accumulated plaque burden) may be incorporated into the risk assessment framework described in this disclosure. This enables obtaining a quantitatively precise estimate of how much the biological causal effect of the amount of Lp(a) inherited by each person increases the risk of cardiovascular at every point in time.
[0467] As described above with respect to FIG. 2C, this causal effect can be obtained by including the positionally encoded (by age) predicted cumulative Lp(a) level (determined using an Lp(a) trajectory estimated as described herein) at every age in the survival model executed at every level of cumulative exposure to LDL; and then adding them together in a piecemeal manner (as with the other predicted exposure levels at level of cumulative exposure to LDL as a representation of the accumulated plaque -78- #14604877vl burden at every point in time) using the a survival model (e.g., a DNN) with PEM or any other survival model described herein.
[0468] This enables determining, for a particular subject, how much their inherited Lp(a) levels increase their risk of ASCVD events at all ages, and providing the particular subject with guidance on how additional LDL or SBP reductions over time (reductions in cumulative exposure to LDL, SBP, or both) is needed to specifically overcome their increased inherited risk caused by the amount of circulating Lp(a) that they have inherited.
[0469] This may be implemented as follows. First, the predicted cumulative event curves may be constructed with and without Lp(a). That may be done using the methods described with respect to FIG.
[0470] 2C with and without Lp(a) as part of the input feature data at act 221. Next, the magnitude of additional cumulative reduction in LDL needed to overcome the increased risk of MACE caused by cumulative exposure to Lp(a) at each point in time is calculated (conditional on the age at which LDL lowering is started). In addition, the magnitude of additional cumulative reduction in SBP needed to overcome the increased risk of MACE caused by cumulative exposure to Lp(a) at each point in time is calculated (conditional on the age at which SBP lowering is started). Finally, the magnitude of additional cumulative reduction to the combination of both LDL and SBP needed to overcome the increased risk of MACE caused by cumulative exposure to Lp(a) at each point in time is calculated (conditional on the age at which LDL and SBP lowering is started).
[0471] These calculations may be performed using any of a variety of simple quantitative algorithms by summing the required reductions in cumulative exposure needed to overcome the effect of Lp(a) at each point in time, calculating the incremental reduction in cumulative exposure at each point time relative to the previous ti me point, and then summing the required incremental reductions in cumulative exposure to LDL.
[0472] This is equivalent to the process of allowing a person to set a lifetime risk goal (or a level cardiovascular health they wish to achieve - construed as the maximum cumulative lifetime risk of MACE a person is willing to tolerate), and then solving for the magnitude of cumulative reductions in LDL, SBP, or both needed to achieve this goal.
[0473] Here the goal is equivalent to the difference in a person’s predicted cumulative lifetime risk of MACE when including Lp(a) in the calculations to estimate remaining lifetime risk, and when excluding Lp(a) as an input when calculating the estimated remaining cumulative lifetime risk for that person as follows: the cumulative event curve plots (and lifetable estimates) are compared at each age for: a) cumulative lifetime risk of heart attack or stroke without considering Lp(a) levels; b) cumulative lifetime risk of heart attack or stroke now including cumulative Lp(a) levels; and c) cumulative lifetime risk of heart attack or stroke both considering cumulative Lp(a) levels at each increment of time and the required reductions in cumulative exposure to LDL, SBP, or both required to overcome the increased risk caused by Lp(a).
[0474] -79- #14604877vl As may be appreciated from the foregoing, in one example implementation, this functionality may be achieved as follows.
[0475] First, the measured value of Lp(a) is used as an input. The measured Lp(a) level is standardized in the same way as all other inputs (e.g., min-max); noting that Lp(a) levels are very asymmetrically distributed in the population and can vary by 1000-fold between individuals. Then, the standardized Lp(a) levels are then positionally encoded by age and added to the vector of inputs. That vector of inputs may be processed, in parallel, by three machine learning models (examples of which are provided herein) cumulative exposures to LDL, SBP, and Lp(a), respectively, as described herein (e.g., with reference to acts 214-216 of FIG. 2B).
[0476] Next, the vector, now including the standardized estimate of Lp(a) levels at all ages, the corresponding positionally encoded predicted Lp(a) level at all ages, and the standardized and positionally encoded predicted cumulative exposure to Lp(a) at all ages, is then passed into the remaining stack of algorithms - where they contribute their time-dependent biological cause and effect to the estimates cumulative lifetime risk of MACE at all ages; and the corresponding predicted benefit of reducing LDL, SBP, or both beginning at all ages by magnitude, duration and timing of lowering LDL, SBP, or both.
[0477] Finally, after all current outputs, the option is presented to the user of the platform to be presented with estimates of how much their inherited level of Lp(a) increases their remaining lifetime risk of MACE at all ages; and how much additional reductions in LDL, SBP, or both are required to specifically overcome the increased risk caused by their inherited burden of Lp(a). This information translates Lp(a) levels into clinically useful information that can be immediately used to guide the timing and intensity of lowering LDL and SBP to further personalize the prevention of ASCVD events.
[0478] Additional implementation detail
[0479] In some embodiments, the translation of Lp(a) to ASCVD may be performed as follows.
[0480] First for Lp(a), we may estimate remaining lifetime risk under two scenarios. First, when including a person’s inherited risk due to Lp(a) - ‘exposure to Lp(a)’ and ‘inherited Lp(a)’ is referred to herein interchangeably because - unlike other causes of ASCVD like lipoproteins such as LDL, or other exposures like SBP - Lp(a) levels not changed by diet, exercise, or other lifestyle choices. Instead, the magnitude of exposure to Lp(a) is largely determined by each person’s genetics explaining why Lp(a) levels remain relative constant for most of life, unlike LDL or SBP which have characteristic shapes of their trajectories over time. As a result, our exposure to Lp(a) is determined almost entirely by how much Lp(a) we inherit. This gives us two estimates of remaining lifetime risk of MACE: (1) including the causal effect of circulating Lp(a) over time (i.e. cumulative exposure to Lp(a)); and (2) not including the causal effect of circulating Lp(a).
[0481] The difference between these two estimates of cumulative remaining lifetime risk of MACE provides an estimate of the magnitude of the effect caused by how much Lp(a) a person inherits (i.e. their -80- #14604877vl cumulative exposure to Lp(a)) at all time points: (i)the instantaneous hazard of MACE at any age or time interval; and (ii) the cumulative hazard of MACE until age 80 (or other arbitrarily selected age limit).
[0482] This estimated magnitude of increased risk caused by Lp(a) can be expressed as a proportional risk (hazard ratio), or an absolute increase in risk (difference in absolute instantaneous or cumulative hazard during any time interval or up to any duration of follow-up). Note that the causal effect of Lp(a) is generally restricted to how much it increases the risk of MACE because, unlike LDL or SBP, it is not normally distributed where some people have higher or lower levels than ‘average’. Instead, Lp(a) has an extreme rightward-skewed distribution that can vary by 1000-fold among individuals. The population median (not mean because of the extremely skewed distribution) is very low, around 15-20 nmol / L. Even for persons in the lowest 10% who have Lp(a) of 3-5 nmol / L (about 1 / 3 to 14 of the population median), the absolute difference is small - only 12-15 nmol / L. Because very large absolute differences in Lp(a) of 100-300 nmol / L are required to produce a meaningful and therefore measurable causal effect on the risk of MACE, these very small absolute differences are too small to have any material impact on risk.
[0483] Therefore, only those persons with Lp(a) levels elevated above the median have an increased risk of MACE caused by Lp(a); and the magnitude of that causal effect is proportional to the magnitude of the absolute increase in circulating Lp(a), thus explaining why only a small proportion of the population of a substantially increased lifetime risk of MACE caused by their inherited cumulative burden of circulating Lp(a).
[0484] The absolute difference in remaining cumulative lifetime risk of MACE when considering Lp(a) and not considering Lp(a) thereby provides an estimate of the increased risk of MACE that must be overcome to specifically eliminate the increased risk specifically caused by a person’s exposure to Lp(a) over time.
[0485] With this formulation, the problem is very similar to the problem of predicting how much we need to lower LDL, SBP, or both to reduce a person’s cumulative remaining lifetime risk of MACE to achieve a specific target or goal (e.g. keeping a person’s cumulative lifetime risk of MACE less than 5% at all ages up to age 80). This formulation allows us to translate a person’s Lp(a) level into clinically actionable information by estimating how much we need to lower LDL, SBP, or both - conditional on when we start to lower LDL or SBP - to specifically overcome the increased risk caused by how much Lp(a) they inherited (their exposure to Lp(a) over time throughout life)
[0486] In some embodiments, this may be accomplished as follows. First we construct a lifetable estimating the instantaneous hazard for MACE at all ages for a person using all exposures including Lp(a). Next, we construct a second column in the lifetable that estimates the instantaneous hazards at all ages for the same person including all exposures except Lp(a).
[0487] Then, we sum the instantaneous hazards to obtain two estimates of cumulative remaining lifetime for this persons: (i) one including the causal effect of Lp(a); (ii) one that does not include the casual effect of Lp(a). The difference in cumulative remaining lifetime risk between these two estimates simply
[0488] -81- #14604877vl becomes a sub-goal which is defined as: the remaining cumulative lifetime risk goal needed to overcome the causal effect of the amount of Lp(a) that a person has inherited.
[0489] In turn, we now simply iterate through the solution space but here to specifically find the amount we must reduce LDL, SBP, or both to achieve this specific proportional and absolute reduction in risk. The magnitude of which reduction will depend on when we start lowering LDL, SBP, or both because the proportional risk reduction (and therefore corresponding absolute reduction in instantaneous and cumulative hazards of MACE) depend on the magnitude and duration of when we start lowering LDL, SBP, or both; and because the corresponding absolute residual risk will depend on when we started because the residual rate of MACE depends on how much atherosclerosis and arterial wall injury has already accumulated before we start to lower LDL and SBP (and any legacy benefits form earlier interventions to lower LDL and SBP). Effectively we want to calculate how much we need to lower LDL, SBP, or both so that a person’s two estimates of remaining cumulative lifetime risk of MACE are the same: (i) cumulative lifetime risk of MACE for this person when considering the causal effect of their inherited burden of Lp(a) conditional on the selected magnitude and timing of lowering LDL, SBP, or both and continuing that intervention until age 80 years (or other arbitrary age limit); and (ii) cumulative lifetime risk of MACE for this person when considering all the same other exposures EXCEPT Lp(a).
[0490] This information translates a person’s Lp(a) into immediately actionable clinical information even before we have effective therapies to specifically and potently lower Lp(a) by telling a person and their physician how much they need to lower LDL, SBP, or both beginning now (or any future age) to specifically overcome the increased inherited risk of MACE caused by how much Lp(a) they inherited.
[0491] Estimating How Much Additional Cumulative Reduction in LDL, SBP, or Both is Needed to Overcome Increased Risk Caused by Person’s Inherited Genetic Risk of ASCVD
[0492] As described herein, a person’s inherited risk of ASCVD may be represented, albeit crudely, by the ASCVD polygenic score (ASCVD-PGS). The ASCVD polygenic score is a fixed exposure that is immutable and so it is not meaningful to estimate the benefit of reducing a polygenic score. On the other hand, it is possible to translate a polygenic score into clinically useful information that can provide additional information to refine and further guide clinical care individualized to each person.
[0493] To this end, a PGS may be translated into clinically useful information by estimating how much additional cumulative reductions in LDL, SBP, or both are needed to overcome the excess risk caused by a person’s inherited genetic risk of ASCVD as represented by the ASCVD-PGS. The magnitude of cumulative reductions in LDL, SBP, or both needed to overcome polygenic predisposition will depend on the magnitude, timing, and duration of therapy lowering LDL, SBP, or both - conditional on when the therapy was started.
[0494] By incorporating the (non-causal) effect of inherited polygenic risk at all levels of cumulative exposure to LDL (accumulated plaque burden), we can obtain a quantitatively precise estimate of how -82- #14604877vl much polygenic predisposition to ASCVD inherited by each person increases the risk of cardiovascular at every point in time. This non-causal effect may be obtained by simply including the ASCVD-PGS at every age in the survival model executed at every level of cumulative exposure to LDL; and then adding them together in a piecemeal manner (as with the other predicted exposure levels at level of cumulative exposure to LDL as a representation of the accumulated plaque burden at every point in time).
[0495] Here, we positionally encode the ASCVD-PGS by age because we are agnostic to whether its biological impact varies with age. The revised predicted cumulative event curves with and without considering ASCVD-PGS are then constructed using the techniques described with reference to FIG. 2C. Next, the magnitude of additional cumulative reduction in LDL needed to overcome the increased risk of MACE associated with (but not necessarily caused by the ASCVD-PGS) at each point in time is calculated (conditional on the age at which LDL lowering, SBP lowering, or both is started).
[0496] This calculation may be performed using any variety of simple quantitative algorithms by summing the required reductions in cumulative exposure needed to overcome the effect of inherited predisposition as represented in one axis by the ASCVD-PGS at each point in time, calculating the incremental reduction in cumulative exposure at each point time relative to the previous time point, and then summing the required incremental reductions in cumulative exposure to LDL, SBP, or both.
[0497] The method for calculating the additional reduction in cumulative exposure to LDL, SBP, or both needed to overcome a person’s polygenic predisposition is calculated using the same methods as described above for overcoming the increased inherited risk due to Lp(a), as described in the foregoing section.
[0498] Finally, the cumulative event curve plots (and lifetable estimates) are compared at each age for: a) cumulative lifetime risk of heart attack or stroke without considering ASCVD-PGS;
[0499] b) cumulative lifetime risk of heart attack or stroke now considering ASCVD-PGS; and
[0500] c) cumulative lifetime risk of heart attack or stroke both considering ASCVD-PGS at each increment of time and the required reductions in cumulative exposure to LDL, SBP, or both required to overcome the increased risk due to a person’s inherited genetic predisposition as represented by their unique combination of genotypes at the variants included in the ASCVD-PGS.
[0501] As may be appreciated from the foregoing, in one example implementation, this functionality may be achieved as follows. First, the calculated or provided ASCVD-PGS, and / or other PGS(s), is used as an input. Next, the ASCVD-PGS is standardized in the same way as all other inputs (e.g., min-max); noting that ASCVD-PGS values are already standardized to have a mean of zero (0) and a SD of 1; but are now converted to the same scale as all other inputs. The ASCVD-PGS value is then positionally encoded by age and added to the vector of inputs.
[0502] After the cumulative exposures to LDL, SBP, and Lp(a) are estimated; and the trajectory of other exposures are estimated; a matrix of input vectors (one for each age), including the standardized ASCVD-PGS value and the ASCVD-PGS value positionally encoded at each age, are passed into the remaining stack of algorithms - where the ASCVD-PGS contributes to the estimates cumulative lifetime -83- #14604877vl risk of MACE at all ages (agnostic to causality); and the corresponding predicted benefit of reducing LDL, SBP, or both beginning at all ages by magnitude, duration and timing of lowering LDL, SBP, or both.
[0503] Finally, the option may be presented to the user of the platform to be presented with estimates of how much their inherited level of polygenic score (ASCVD-PGS) increases their remaining lifetime risk of MACE at all ages; and how much additional reductions in LDL, SBP, or both are required to specifically overcome the increased risk caused by their inherited predisposition to ASCVD as represented in one dimension by their polygenic score (ASCVD-PGS). This information translates a person’s inherited polygenic predisposition to ASCVD into clinically useful information that can be used to guide the timing and intensity of lowering LDL and SBP to further personalize the prevention of ASCVD events.
[0504] Additional Implementation Detail
[0505] The process for translating polygenic scores into clinically useful information is exactly the same as the process described above with respect to Lp(a) (in the section also called “Additional Implementation Detail”), but with a small caveat. A current generation polygenic score (PGS) is a very crude construction that measures the association (odds ratio; relative risk; or HR) between a very large number of genetic variants and an outcome. The magnitude of these associations are measured by the beta coefficient for each variant. The beta coefficients are simply summed together to obtain a PGS for each person. All or most genetic variants, which are very highly correlated with each other because they are so close on the various genes, are included in the PGS. Therefore, current-generation PGS are not valid instrumental variables, which requires including only independently inherited variants associated with a biomarker or outcome to serve as a proxy ‘instrument’ to assess the level of the outcome or biomarker.
[0506] By construction and design, simply summing the beta coefficients for each variant produces a metric with a normal distribution (which for convenience can be centered around 0 with a symmetrical SD). Because a PGS is by design normally distributed, an equal number of people will have higher or lower than average PGS.
[0507] By convention, and for mathematical validity, we assume the mean PGS is associated with no increased or decreased risk compared to the average risk in the population conditional on the same level of all other exposures. Therefore, unlike Lp(a), persons with lower than average PGS will have a lower remaining lifetime risk of MACE than ‘average’ and persons with higher than average PGS will have a higher than average risk. The practical effect of this observation is that persons with higher than average PGS will have a higher than average instantaneous risk during all time intervals resulting in a higher corresponding cumulative remaining lifetime hazard of MACE as compared to persons with average PGS. Therefore, we want to solve for much we have to lower LDL, SBP, or both to specifically overcome the increased risk caused (or more precisely apparently caused) by a person’s PGS, conditional -84- #14604877vl on when we start to lower LDL, SBP, or both. This is the same problem for how to translate Lp(a) into clinically useful information.
[0508] On the other hand, persons with lower than average PGS will have a lower than average instantaneous risk during all time intervals resulting in a lower corresponding cumulative remaining lifetime hazard of MACE as compared to persons with average PGS. Therefore, the problem specification changes. In this case, we no longer want to solve for how much more we need to lower LDL, SBP, or both to overcome the increased risk due to a person’s inherited polygenic burden (as measured crudely by their PGS), but instead, we can calculate how much less we need to lower LDL, SBP, or both to achieve a specific goal because they appear to be less genetically vulnerable based on inherited polygenic predisposition to experience a MACE.
[0509] Computationally, we perform the same calculations as above, but return a negative number which represents how much less aggressive we need to be to lower LDL, SBP, or both because this person’s instantaneous and cumulative remaining lifetime hazard of MACE is HIGHER when we ignore their PGS. That is, their instantaneous hazard of MACE during every time interval, and their corresponding cumulative remaining lifetime hazard of MACE is higher when we ignore their PGS. Including the effect of their PGS allows us to lower their predicted cumulative remaining lifetime risk of MACE. Thus, we may be less aggressive at lowering LDL, SBP, or both than we would calculate if we didn’t include their PGS in the analysis.
[0510] Thus providing an additional layer of precision, in some embodiments, when individualizing guidance on how to personalize the prevention of MACE by lowering LDL to slow the progression of atherosclerosis, and lowering SBP to slow the progression of atherosclerosis at vulnerable branch points, reduce the propensity of the cumulated plaque to disrupt at any point in time, and reduce the accumulation of arterial wall injury to maximize the capacity of the artery to tolerate the accumulated plaque burden.
[0511] Determining Benefit of Sequences of Therapeutic Interventions
[0512] Returning now to the illustrative process shown in FIG. 2D, acts 231-233 may be performed to estimate the benefit of one or more sequences of therapeutic interventions.
[0513] A sequence of therapeutic interventions may be of any suitable type. For example, the sequence of therapeutic interventions may be designed to lower LDL, to lower SBP, or to lower both LDL and SBP. Therapeutic intervention sequences may differ from one another based on type or types of therapeutics utilized, amounts of the therapeutic or the therapeutics administered, and / or timing of administering the therapeutic or therapeutics.
[0514] A particular therapeutic intervention sequence may be defined by information specifying, for each particular therapeutic intervention in the sequence, a type of therapeutic or therapeutics part of the particular therapeutic intervention, an amount of the therapeutic or the therapeutics to administer as part of the particular therapeutic intervention, and timing information indicating when to administer the -85- #14604877vl particular therapeutic intervention. For example, a particular therapeutic intervention may include a sequence one or more, optionally annual or bi-annual, administrations of an intervention selected from a DNA-based therapy (e.g., gene therapy, antisense therapy, etc.), RNA-based therapy (e.g., mRNA, siRNA, miRNA, etc.), protein-based therapy (e.g., peptide therapy, antibody therapy, hormone therapy, enzyme therapy, etc.), and pharmacological therapy (e.g., small molecule drug therapy). Example therapies are described herein including in the section titled: “Therapeutic Interventionist and Additional Uses.”
[0515] To estimate the benefit of multiple sequences of therapeutic interventions, the therapeutic sequences to be evaluated are first generated, which is done at act 231. This may be done in any suitable way, for example, by enumerating a set of therapeutic intervention sequences by varying an age of the subject for when to begin an intervention therapeutic sequence, the duration of the intervention therapeutic sequence, the types of therapeutic or therapeutics used as part of the therapeutic intervention sequence, and / or the timings of therapeutic interventions in the therapeutic intervention sequence.
[0516] After the multiple sequences of therapeutic interventions are generated at act 231, one or more such sequences may be filtered out from subsequent evaluation at act 232. This may help to limit the number of potential combinations and sequences of LDL and SBP lowering therapies to be evaluated. For example, in some embodiments, only those therapeutic intervention sequences that continue the current intervention, intensify the current intervention, and / or add another intervention are considered. As another example, discontinuous intervention sequences that start and stop therapeutic interventions in random order may be filtered out from subsequent consideration. Additionally or alternatively, any other suitable clinical considerations may be used to identify which therapeutic intervention sequences are clinically plausible (and therefore should be considered for the benefit they provide) and which ones are not (and therefore should be eliminated from subsequent consideration). The set of one or more such clinical considerations, encoded into one or more rules that can be used (e.g., in a computer implemented setting) to filter out one or more therapeutic intervention sequences, may be referred to as a “domain expert clinical translation heuristic.”
[0517] It should be appreciated that, instead of generating multiple sequences of therapeutic interventions and filtering one or more sequences using the rule(s), the rule(s) may be used to generate the multiple sequences of therapeutic interventions that comply with the rule(s) ab initio, so that the filtering step may be omitted.
[0518] Next, the process of FIG. 2D proceeds to act 233, where the benefit of administering each the filtered therapeutic intervention sequence(s) may be evaluated.
[0519] In some embodiments, determining the respective benefits (e.g., as reflected by reductions in risk, a value function, etc.) of administering each of the multiple therapeutic intervention sequences to the subject comprises determining, for each particular therapeutic intervention sequence under consideration, expected proportional reduction and / or absolute reduction in risk of cardiovascular events for the subject in response to the particular therapeutic intervention sequence. This produces multiple expected
[0520] -86- #14604877vl proportional and / or absolute reductions in risk for the considered therapeutic intervention sequences, which may be stored for subsequent use and / or used in selecting a specific therapeutic sequence to recommend administering (or actually administering) to the subject.
[0521] Therefore, in some embodiments, as part of act 233, determining the benefit of a particular therapeutic intervention sequence (under consideration from among one of the filtered therapeutic intervention sequences) may involve determining the expected proportional reduction and / or absolute reduction in risk of cardiovascular events for the subject in response to the particular therapeutic intervention sequence. This may be done by using at least one second ML model (e.g., at least one c-DNN-ODE model) as described herein.
[0522] For example, in some embodiments, determining the expected proportional reduction and / or the absolute reduction in risk of cardiovascular events for the subject in response to the particular therapeutic intervention sequence comprises:
[0523] (a) for each particular interval of follow-up for the subject from the age of the
[0524] subject at which the particular therapeutic intervention sequence is to commence to
[0525] an upper threshold age (e.g., 80),
[0526] (i) determining, using the multiple measures of risk (e.g., obtained at act 220), a predicted instantaneous hazard rate of the subject having a cardiovascular event at the particular interval of follow up;
[0527] (ii) determining, using the at least one second ML model (e.g., at least one c-DNN-ODE model), a time-averaged instantaneous log hazard ratio for a one unit lower LDL and / or SBP corresponding to duration of treatment (e.g., in accordance with the particular therapeutic intervention sequence) at the particular interval of follow-up for the subject;
[0528] (iii) determining an intervention-adjusted instantaneous hazard for the particular interval of follow-up by multiplying the instantaneous hazard rate of the subject determined at (a)(i) with the time-averaged instantaneous log hazard ratio determined at (a)(ii), thereby obtaining multiple intervention-adjusted instantaneous hazards for intervals of follow-up evaluated at (a);
[0529] (b) determining, using the multiple intervention-adjusted instantaneous hazards, intervention-adjusted cumulative hazard rates and cumulative event rates of cardiovascular events for intervals of follow-up for the subject from the age of the subject at which the particular therapeutic intervention sequence is to commence to the upper threshold age;
[0530] (c) determining predicted proportional reductions in the risk of experiencing a cardiovascular event as ratios of the intervention-adjusted cumulated hazard rates and the cumulative hazard rates that are not adjusted for the particular therapeutic intervention sequence; and
[0531] (d) determining predicted absolute reductions in the risk of experiencing a cardiovascular event as absolute differences between the intervention-adjusted cumulative event rates and the cumulative event rates that are not adjusted for the particular therapeutic intervention sequence.
[0532] -87- #14604877vl As described herein, the at least one second trained ML model may be at least one causal DNN for Ordinary Differential Equations (c-DNN-ODE) model (e.g., at least one DNN-ODE model that is causal because it is trained on randomized data), a non-linear regression model, an adaptive basis function regression model, a neural network regression model, a deep neural network regression model, a logistic regression model, a polynomial regression model, a decision tree regression model, a random forest regression model, and / or a gradient boosted decision tree regression model.
[0533] In some embodiments, where the c-DNN-ODE model is used, in response to input indicating a duration of sustained intervention, optional...
Claims
CLAIMS1. A computer-implemented method, comprising:obtaining cardiometabolic health data for a subject comprising: one or more values for one or more clinical characteristics of the subject, and / or one or more values for one or more physical measurements of the subject, and / or one or more values for one or more biochemical measurements of the subject;encoding the cardiometabolic health data for the subject into a first feature vector; estimating an SBP level trajectory for the subject by processing the first feature vector using an SBP trajectory prediction ML model that has been trained to estimate an SBP level for a subject at each of multiple prior ages and each of multiple future ages using training data comprising, for each of a plurality of participants, repeated longitudinal measures of SBP levels and values for the one or more clinical characteristics and / or the one or more physical measurements and / or the one or more biochemical measurements over at least 10, at least 20, at least 30, or at least 50 years of follow-up,wherein the SBP level trajectory for the subject comprises an estimated SBP level for the subject for each of multiple prior ages of the subject and multiple future ages of the subject.
2. The method of claim 1, further comprising:encoding the cardiometabolic health data for the subject into a second feature vector; estimating an LDL level trajectory for the subject by processing the second feature vector using an LDL trajectory prediction machine learning (ML) model that has been trained to estimate an LDL level for a subject at each of multiple prior ages and each of multiple future ages using training data comprising for each of a plurality of participants, repeated longitudinal measures of LDL levels and values for the one or more clinical characteristics and / or the one or more physical measurements and / or the one or more biochemical measurements over at least 10, at least 20, at least 30, or at least 50 years of follow-up,wherein the LDL level trajectory for the subject comprises an estimated LDL level for the subject for each of multiple prior ages of the subject and multiple future ages of the subject.
3. The method of claim 1 or 2, wherein:(i) the one or more values for one or more clinical characteristics comprise values for one or more demographic characteristics of the subject, one or more genetic characteristics of the subject, one or more family history characteristics of the subjects, one or more comorbidities, and / or risk factors; and / or (ii) the one or more values for one or more physical measurements of the subject comprise one or more values for physical measurements of quantities that are risk factors for cardiometabolic disease-223- #14604877vland / or physiologic al measurements selected from blood pressure measurements and measurements indicative of adiposity; and / or(iii) the one or more values for one or more biochemical measurements of the subject comprise one or more values for biochemical measurements of quantities that are risk factors for cardiovascular disease and / or biochemical measurements selected from measurements of one or more biochemical markers, optionally a protein, lipid or lipoprotein, in a blood, serum, and / or plasma sample from the subject.
4. The method of any one of claims 1-3, wherein:(i) the one or more clinical characteristics of the subject are selected from the group consisting of: age, biological sex, family history of coronary heart disease (CHD), family history of hypertension (HTN), family history of type 2 diabetes (T2D), polygenic score for ASCVD, polygenic score for CHD, polygenic score for HTN, polygenic score for T2D, polygenic score for body mass index (BMI), inherited predisposition or predispositions, and history of tobacco use; and / or(ii) the one or more physical measurements of the subject are selected from the group consisting of systolic blood pressure (SBP), diastolic blood pressure (DBP), weight, waist circumference, height, body mass index (BMI), and waist-to-height ratio; and / or(iii) the one or more biochemical measurements of the subject are selected from the group consisting of low-density lipoprotein (LDL) level, high-density lipoprotein (HDL) level, total cholesterol level, triglyceride (TG) level, non-HDL cholesterol level, apolipoprotein (apoB) level, lipoprotein (a) (Lp(a)) level, hemoglobin Ale (HbAlc) level, and c-reactive protein (CRP) level.
5. The method of claim 1, wherein encoding the cardiometabolic health data to obtain the first feature vector, comprises:obtaining an i nitial feature vector using at least some of the cardiometabolic health data for the subject by standardizing at least some of values in the subject characteristic and / or measurement data to obtain an initial feature vector; andpositionally encoding the initial feature vector by age of the subject associated with the values in the cardiometabolic health data for the subject to obtain the a feature vector representing the subject, wherein the positionally encoding comprises:generating a positional encoding of the initial feature vector using sinusoidal encoding of at least some elements of the initial feature vector, wherein the sinusoidal encoding uses age in years as position; andgenerating the feature vector by appending the positional encoding of the initial feature vector to the initial feature vector.-224- #14604877vl6. The method of claim 2, wherein the LDL trajectory prediction ML model has been trained using training data comprising for each of the plurality of participants, repeated longitudinal measures of LDL levels and feature vectors derived from values for the one or more clinical characteristics and / or the one or more physical measurements and / or the one or more biochemical measurements using at least positional encoding by age of the participant associated with the respective values, optionally, wherein the positional encoding is sinusoidal encoding.
7. The method of any one of claims 2-6, wherein the LDL trajectory prediction ML model is a bidirectional LSTM (bi-LSTM) model, a bi-directional GRU model, or an ODE-RNN model.
8. The method of claim 7, wherein:the LDL trajectory prediction ML model is the bi-LSTM model, andprocessing the first feature vector using the LDL trajectory prediction ML model comprises:estimating an LDL level for the subject at each of the multiple future ages using a forward pass of the bi-LSTM model; andestimating an LDL level for the subject at each of the multiple prior ages using a backward pass of the bi-LSTM model.
9. The method of any one of claims 2-8, further comprising:estimating, using the LDL level trajectory, a cumulative LDL exposure trajectory for the subject with respect to a set of ages, wherein the cumulative LDL exposure trajectory comprises an estimated cumulative LDL exposure level for the subject at each age in the set of ages.
10. The method of any one of claims 1-9, further comprising:estimating a weight trajectory for the subject using the cardiometabolic health data, wherein the weight trajectory for the subject comprises estimated weight of the subject for each of multiple ages of the subject, optionally wherein estimating the weight trajectory is performed assuming that the subject’s current age-and-sex adjusted weight percentile remains constant throughout life; and / or estimating a waist circumference trajectory for the subject the cardiometabolic health data, wherein the waist circumference trajectory for the subject comprises estimated waist circumference of the subject for each of multiple ages of the subject, optionally wherein estimating the waist circumference trajectory is performed assuming that the subject’s current age-and-sex adjusted waist circumference percentile remains constant throughout life; and / orestimating an HbAlc level trajectory for the subject using the cardiometabolic health data, wherein the HbAlc level trajectory for the subject comprises estimated HbAlc levels of the subject for each of multiple ages of the subject, optionally wherein estimating the HbAlc level trajectory is -225- #14604877vlperformed assuming that the subject’s current age-and-sex adjusted HbAlc percentile remains constant throughout life.
11. A computer-implemented method of determining one or more measures of risk that a subject develops cardiovascular disease, the method comprising, for each of multiple time intervals:(a) estimating, using a trained cardiovascular risk prediction machine learning model and cardiometabolic health data for the subject including a cumulative LDL exposure trajectory for the subject, values indicative of log hazard ratios for risk of the subject having a cardiovascular event at respective levels of cumulative LDL exposure, wherein the cardiovascular risk prediction machine learning model has been trained using training data comprising, for each of a plurality of participants enrolled in one or more prospective studies, multiple LDL measurements along with a recorded age or date at which a first cardiovascular event occurred, optionally wherein the first cardiovascular event is a first episode of a fatal or non-fatal myocardial infarction (MI), fatal or non-fatal ischemic stroke, or coronary revascularization; and(b) estimating, using the values indicative of the log hazard ratios,(i) absolute instantaneous hazard rates of the subject having a cardiovascular event at the respective levels of cumulative LDL exposure, and(ii) cumulative lifetime risks of the subject having a cardiovascular event at the respective levels of cumulative LDL exposure; and(c) estimating, using the cumulative LDL exposure trajectory for the subject, the multiple measures of risk to include:(i) absolute instantaneous hazard rates of the subject having a cardiovascular event at respective ones of the multiple time intervals, and(ii) cumulative lifetime hazard and event rates of the subject having a cardiovascular event at the respective ones of the multiple time intervals.
12. The method of claim 11,wherein the cumulative LDL exposure trajectory has been obtained using the method of any of claims 2-10, and / or wherein the method comprises obtaining the cumulative LDL exposure trajectory using the method of any of claims 2-10; and / orwherein step (a) uses cardiometabolic health data further comprising a cumulative SBP exposure trajectory that has been obtained using the method of any of claims 2-10, and / or wherein step (a) uses cardiometabolic health data further comprising a cumulative SBP exposure trajectory and the method comprises obtaining the cumulative SBP exposure trajectory using the method of any of claims 2-10; and / orwherein step (a) uses cardiometabolic health data further comprising a cumulative Lp(a) exposure trajectory that has been obtained using the method of any of claims 2-10, and / or wherein step -226- #14604877vl(a) uses cardiometabolic health data further comprising a cumulative Lp(a) exposure trajectory and the method comprises obtaining the cumulative SBP exposure trajectory using the method of any of claims 2-10.
13. The method of claim 11, wherein estimating, using the trained cardiovascular risk prediction machine learning model and the at least some of the cardiometabolic health data, the values indicative of the log hazard ratios for risk of the subject having a cardiovascular event at the respective levels of cumulative LDL exposure comprises:encoding the at least some of the cardiometabolic health data to obtain input feature data; and estimating, by processing the input feature data using the cardiovascular risk prediction ML model, the values indicative of the log hazard ratios for the risk of the subject having a cardiovascular event at the respective levels of cumulative LDL exposure,optionally, wherein processing the input feature data comprises using a trained survival deep neural network (DNN) with piecewise exponential modeling (PEM) model to calculate log hazard ratios for the risk of the subject having a cardiovascular event at the respective levels of cumulative LDL exposure.
14. A computer-implemented method of determining an expected proportional reduction and / or absolute reduction in risk of cardiovascular events for a subject in response to a particular therapeutic intervention sequence, the particular therapeutic sequence indicating magnitude, duration, and timing of one or more interventions associated with a reduction of LDL level and / or SBP level over an interval of follow up for the subject, the method comprising:determining multiple measures of risk of cardiovascular events for the subject comprising absolute instantaneous hazard rates, cumulative hazard rates, and cumulative event rates of the subject having a cardiovascular event at respective ones of multiple time intervals, wherein the cumulative hazard rates and the cumulative event rates are not adjusted for the particular therapeutic intervention sequence, anddetermining the expected proportional reduction and / or the absolute reduction in the risk of cardiovascular events for the subject in response to the particular therapeutic intervention sequence using a method that comprises:(a) for each particular interval of follow-up for the subject from the subject’s current age to an upper threshold age,(i) determining, using the multiple measures of risk, a predicted instantaneous hazard rate of the subject having a cardiovascular event at the particular interval of follow up;(ii) determining, using a benefit prediction machine learning model, a time- averaged instantaneous log hazard ratio for a one unit lower LDL or SBP corresponding -227- #14604877vlto duration of treatment at the particular interval of follow-up for the subject, wherein the benefit prediction machine learning model has been trained using training data from randomized trials of LDL lowering therapies and / or randomized trials of SBP therapies and Mendelian randomization studies evaluating genetic variants associated with lower LDL and / or lower SBP, said training data comprising for each of a plurality of participants in said trials, at least one LDL or SBP measurement along with a recorded age or date at which a first cardiovascular event occurred;(iii) determining an intervention-adjusted instantaneous hazard for the particular interval of follow-up by multiplying the instantaneous hazard rate of the subject determined at (a)(i) with the time-averaged instantaneous log hazard ratio determined at (a)(ii),thereby obtaining multiple intervention-adjusted instantaneous hazards for intervals of follow-up evaluated at (a);(b) determining, using the multiple intervention-adjusted instantaneous hazards, intervention-adjusted cumulative hazard rates and cumulative event rates of cardiovascular events for intervals of follow-up for the subject from the subject’s current age to the upper threshold age;(c) determining predicted proportional reductions in the risk of experiencing a cardiovascular event as ratios of the intervention-adjusted cumulated hazard rates and the cumulative hazard rates that are not adjusted for the particular therapeutic intervention sequence; and(d) determining predicted absolute reductions in the risk of experience of experiencing a cardiovascular event as absolute differences between the intervention-adjusted cumulative event rates and the cumulative event rates that are not adjusted for the particular therapeutic intervention sequence.
15. The method of claim 14, wherein the multiple measures of risk not adjusted for the particular therapeutic intervention have been obtained using the method of any of claims 11-13; and / or wherein the method comprises determining the multiple measures of risk of cardiovascular events for the subject not adjusted for the particular therapeutic intervention using the method of any of claims 11-13.
16. The method of any one of claims 14- 15, wherein the benefit prediction machine learning model comprises a causal DNN for Ordinary Differential Equations (c-DNN-ODE) model.
17. A system, comprising:at least one computer hardware processor; and-228- #14604877vlat least one non-transitory computer readable storage medium storing processor-executable instructions that, when executed by the at least one processor, cause the at least one processor to perform the method of any one of the foregoing method claims.
18. At least one non-transitory computer readable storage medium storing processor-executable instructions that, when executed by at least one processor, cause the at least one processor to perform the method of any one of the foregoing method claims.-229- #14604877vl