Explainable ai system for ICU diagnosis prediction
An explainable AI system using ensemble and deep learning models with ClinicalBERT enhances ICU diagnosis prediction by analyzing electronic health records, improving accuracy and transparency for timely clinical decisions.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- NORTHEASTERN UNIV (US)
- Filing Date
- 2025-12-23
- Publication Date
- 2026-07-02
AI Technical Summary
Existing clinical decision support systems fail to accurately predict missed severe diagnoses in intensive care units due to the lack of leveraging vast patient data and providing transparent, explainable outputs, limiting clinician trust and effective decision-making.
An explainable AI system utilizing ensemble models, deep learning models, and large language models to analyze electronic health records, generating interpretable explanations for predicting conditions like acute myocardial infarction, pulmonary embolism, and others, by combining logistic regression, random forest, XGBoost, and support vector machines, with ClinicalBERT for unstructured clinical notes, and providing real-time risk assessment.
Improves prediction accuracy and clinician trust by capturing diverse clinical patterns and providing transparent reasoning, enabling timely and informed decision-making in ICU environments.
Smart Images

Figure US2025061223_02072026_PF_FP_ABST
Abstract
Description
NEX-20525EXPLAINABLE Al SYSTEM FOR ICU DIAGNOSIS PREDICTIONCROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit to U.S. Provisional Patent Application No.63 / 738,940, filed December 26, 2024; the entire contents of which are hereby incorporated by reference.TECHNICAL FIELD
[0002] The present disclosure relates to artificial intelligence systems for clinical decision support, and more particularly to an explainable Al -based tool that utilizes machine learning, large language models, and generative Al for early prediction of missed severe diagnoses in intensive care unit patients using electronic health record data.BACKGROUND
[0003] Intensive care units represent complex clinical environments where healthcare providers must rapidly identify and treat patients with severe and life-threatening conditions. Despite advances in medical technology and the widespread adoption of electronic health record systems, missed or delayed diagnoses of severe conditions remain a significant problem in critical care settings. Conditions such as acute myocardial infarction, acute pulmonary embolism, bacterial pneumonia, aortic dissection, meningitis, spinal abscess, bacterial endocarditis, and arterial thromboembolism can present with atypical or overlapping symptoms, making timely and accurate diagnosis challenging even for experienced clinicians.
[0004] Existing clinical decision support systems often fail to adequately leverage the vast amounts of patient data captured in electronic health records, including demographics, vital signs, laboratory results, and diagnostic codes. Furthermore, conventional machine learning approaches applied to clinical prediction frequently operate as "black boxes," providingNEX-20525predictions without meaningful explanations that clinicians can use to validate or act upon the results. This lack of interpretability limits clinical adoption and trust in automated diagnostic support tools. There remains a need for systems capable of accurately predicting missed severe diagnoses in intensive care unit patients while providing transparent, explainable outputs that enable clinicians to understand the reasoning behind predictions and make informed clinical decisions.SUMMARY
[0005] This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
[0006] According to an aspect of the present disclosure, an explainable artificial intelligence system for early prediction of missed severe diagnoses in intensive care unit patients is provided. The system comprises a data processing module configured to receive and process electronic health record data including patient demographics, vital signs, laboratory results, and diagnostic codes. The system further comprises a machine learning module comprising a plurality of trained models including at least one ensemble model, at least one deep learning model, and at least one large language model. The machine learning module is configured to analyze the processed electronic health record data to predict a likelihood of at least one severe diagnosis from a group consisting of acute myocardial infarction, acute pulmonary embolism, bacterial pneumonia, aortic dissection, meningitis, spinal abscess, bacterial endocarditis, and arterial thromboembolism. The system additionally comprises an explainability module configured to generate feature importance rankings and provide interpretable explanations for the predictions made by the machine learning module.NEX-20525
[0007] The combination of multiple model types including ensemble models, deep learning models, and large language models enables the system to capture diverse patterns in clinical data, thereby improving prediction accuracy across different severe diagnoses. The explainability module provides clinicians with transparent reasoning behind predictions, facilitating clinical trust and enabling informed decision-making in time-sensitive intensive care unit environments.
[0008] According to other aspects of the present disclosure, the ensemble model may comprise a stacking ensemble model that combines predictions from logistic regression, random forest, XGBoost, and support vector machine base models.
[0009] The stacking ensemble architecture leverages the complementary strengths of different base model types, where logistic regression captures linear relationships, random forest and XGBoost capture non-linear interactions, and support vector machines provide robust classification boundaries, resulting in improved overall predictive performance compared to individual models.
[0010] According to other aspects of the present disclosure, the stacking ensemble model may use logistic regression as a meta-learner to combine the base model predictions.
[0011] Using logistic regression as the meta-learner provides an interpretable combination of base model outputs while maintaining computational efficiency, and the learned weights indicate the relative contribution of each base model to the final prediction.
[0012] According to other aspects of the present disclosure, the deep learning model may comprise at least one of a Multi-Layer Perceptron network and a Long Short-Term Memory network.
[0013] Multi-Layer Perceptron networks enable the system to learn complex non-linear feature interactions from patient data, while Long Short-Term Memory networks capture temporal dependencies in sequential patient records, allowing the system to model disease progression patterns over time.NEX-20525
[0014] According to other aspects of the present disclosure, the large language model may comprise ClinicalBERT fine-tuned on clinical text data.
[0015] ClinicalBERT, having been pre-trained on clinical text corpora, captures medical terminology and contextual relationships in clinical narratives, enabling the system to extract meaningful insights from unstructured clinical notes that complement structured diagnostic codes.
[0016] According to other aspects of the present disclosure, the data processing module may be configured to process ICD-9 diagnostic codes using one-hot encoding or count vectorization techniques.
[0017] One-hot encoding preserves the categorical nature of diagnostic codes while enabling machine learning model compatibility, and count vectorization captures the frequency of diagnoses in patient history, providing additional signal about disease burden and recurrence patterns.
[0018] According to other aspects of the present disclosure, the explainability module may be configured to generate feature importance rankings using at least one of coefficient analysis for logistic regression models and feature importance scores for tree-based models.
[0019] Coefficient analysis provides direct interpretability of feature contributions in linear models, while tree-based feature importance scores quantify the predictive value of each feature, enabling clinicians to understand which patient characteristics drive the risk assessment.
[0020] According to other aspects of the present disclosure, the system may further comprise an integration interface configured to integrate with electronic health record systems and provide real-time risk assessment alerts to clinicians.
[0021] Real-time integration with electronic health record systems enables automated risk assessment at the point of care, reducing the cognitive burden on clinicians and facilitating timely intervention for patients at risk of missed severe diagnoses.NEX-20525
[0022] According to another aspect of the present disclosure, a computer-implemented method for early prediction of missed severe diagnoses in intensive care unit patients is provided. The method comprises receiving electronic health record data for a patient, the data including at least patient demographics, vital signs, laboratory results, and diagnostic codes. The method further comprises preprocessing the electronic health record data by cleaning missing values, encoding categorical variables, and extracting relevant features. The method additionally comprises applying a trained ensemble machine learning model to the preprocessed data to generate a prediction score indicating a likelihood of the patient having at least one severe diagnosis selected from acute myocardial infarction, acute pulmonary embolism, bacterial pneumonia, aortic dissection, meningitis, spinal abscess, bacterial endocarditis, and arterial thromboembolism. The method also comprises generating an explainable output comprising feature importance rankings that identify which patient characteristics most contributed to the prediction. The method further comprises outputting the prediction score and explainable output to support clinical decision-making.
[0023] The method provides a systematic approach to processing heterogeneous clinical data and generating actionable predictions with explanations, enabling clinicians to validate model outputs against their clinical judgment and prioritize diagnostic workup for high-risk patients.
[0024] According to other aspects of the present disclosure, the ensemble machine learning model may comprise a stacking ensemble that combines predictions from logistic regression, random forest, XGBoost, and support vector machine base models.
[0025] The stacking ensemble approach aggregates diverse model perspectives on patient risk, reducing prediction variance and improving robustness to different patient presentations and data quality variations.
[0026] According to other aspects of the present disclosure, the stacking ensemble may use logistic regression as a meta-learner to combine the base model predictions.NEX-20525
[0027] The logistic regression meta-learner learns optimal weights for combining base model predictions, providing a principled approach to model aggregation that adapts to the relative strengths of each base model on the training data.
[0028] According to other aspects of the present disclosure, preprocessing the electronic health record data may comprise encoding ICD-9 diagnostic codes using one-hot encoding or count vectorization techniques.
[0029] These encoding techniques transform categorical diagnostic codes into numerical representations suitable for machine learning algorithms while preserving the clinical information content of the original codes.
[0030] According to other aspects of the present disclosure, the method may further comprise performing 5 -fold cross-validation during model training to ensure robust performance across different data subsets.
[0031] Cross-validation provides an unbiased estimate of model generalization performance and helps detect overfitting, ensuring that the trained model performs reliably on unseen patient data in clinical deployment.
[0032] According to another aspect of the present disclosure, a non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform operations is provided. The operations comprise accessing electronic health record data for intensive care unit patients, the data including patient demographics, vital signs, laboratory results, and diagnostic codes. The operations further comprise training a plurality of machine learning models including at least one ensemble model and at least one deep learning model using the electronic health record data, wherein the models are configured to predict missed severe diagnoses comprising acute myocardial infarction, acute pulmonary embolism, bacterial pneumonia, aortic dissection, meningitis, spinal abscess, bacterial endocarditis, and arterial thromboembolism. The operations additionally comprise implementing an explainabilityNEX-20525framework that generates interpretable feature importance scores for predictions made by the trained models. The operations also comprise deploying the trained models and explainability framework to provide real-time prediction and explanation capabilities for clinical use.
[0033] The non-transitory computer-readable storage medium enables deployment of the prediction system across different computing environments, facilitating integration with existing hospital information technology infrastructure and enabling scalable clinical decision support.
[0034] According to other aspects of the present disclosure, the ensemble model may comprise a stacking ensemble model that combines predictions from logistic regression, random forest, XGBoost, and support vector machine base models using a meta-learner.
[0035] The meta-learner architecture enables the system to learn how to optimally weight and combine diverse base model predictions, adapting to the specific characteristics of the clinical dataset and target diagnoses.
[0036] According to other aspects of the present disclosure, the meta-learner may comprise a logistic regression model trained to optimally combine the base model predictions.
[0037] Logistic regression as the meta-learner provides interpretable combination weights and efficient training, while maintaining strong predictive performance through principled probability calibration.
[0038] According to other aspects of the present disclosure, the deep learning model may comprise at least one of a Multi-Layer Perceptron network and a Long Short-Term Memory network configured to process sequential patient data.
[0039] These deep learning architectures enable the system to learn hierarchical feature representations and temporal patterns in patient data, capturing complex clinical relationships that may not be apparent from individual features.
[0040] According to other aspects of the present disclosure, the operations may further comprise performing 5 -fold cross-validation during model training.NEX-20525
[0041] Cross-validation during training ensures that model performance estimates are reliable and that the final deployed model generalizes well to new patient populations.
[0042] According to other aspects of the present disclosure, the trained models may generate performance metrics including area under the curve scores exceeding 0.90 for at least one of the severe diagnoses.
[0043] High area under the curve scores indicate strong discriminative ability between patients with and without the target diagnoses, providing clinicians with confidence in the system's ability to identify high-risk patients.
[0044] According to other aspects of the present disclosure, the explainability framework may identify traumatic amputation, bacteremia, heart failure, and atrial fibrillation as top predictive features for bacterial endocarditis.
[0045] Identification of clinically relevant predictive features validates the model's learning against established medical knowledge and provides actionable insights that clinicians can use to guide diagnostic evaluation and treatment planning.
[0046] The foregoing general description of the illustrative embodiments and the following detailed description thereof are merely exemplary aspects of the teachings of this disclosure and are not restrictive.BRIEF DESCRIPTION OF FIGURES
[0047] Non-limiting and non-exhaustive examples are described with reference to the following figures.
[0048] Fig. 1 illustrates a block diagram of a machine learning pipeline system for predicting missed severe diagnoses, according to aspects of the present disclosure.
[0049] Fig. 2 illustrates a flowchart of a method for predicting missed severe diagnoses in intensive care unit patients, according to aspects of the present disclosure.NEX-20525
[0050] Fig. 3 illustrates a block diagram of a multi-layer architecture for an explainable artificial intelligence system, according to aspects of the present disclosure.
[0051] Fig. 4 illustrates a block diagram of a computer system that may be used to implement the explainable artificial intelligence system, according to aspects of the present disclosure.
[0052] Fig. 5 illustrates receiver operating characteristic curves for machine learning models evaluated using cross-validation, according to aspects of the present disclosure.
[0053] Fig. 6 illustrates receiver operating characteristic curves for a machine learning model evaluated on two testing sets, according to aspects of the present disclosure.
[0054] Fig. 7A illustrates a SHAP value summary plot showing feature importance for predicting missed severe diagnoses, according to aspects of the present disclosure.
[0055] Fig. 7B illustrates a table listing ICD codes and their corresponding descriptions, according to aspects of the present disclosure.
[0056] Fig. 8 illustrates a SHAP value summary plot showing feature importance for predicting outcomes in cardiac surgery patients, according to aspects of the present disclosure.DETAILED DESCRIPTION
[0057] The following description sets forth exemplary aspects of the present disclosure. It should be recognized, however, that such description is not intended as a limitation on the scope of the present disclosure. Rather, the description also encompasses combinations and modifications to those exemplary aspects described herein.
[0058] A detailed description of systems, devices, and methods consistent with embodiments of the present disclosure is provided below. While several embodiments are described, it should be understood that disclosure is not limited to any one embodiment, but instead encompasses numerous alternatives, modifications, and equivalents. In addition, whileNEX-20525numerous specific details are set forth in the following description in order to provide a thorough understanding of the embodiments disclosed herein, some embodiments can be practiced without some or all of these details. Moreover, for the purpose of clarity, certain technical material that is known in the related art has not been described in detail in order to avoid unnecessarily obscuring the disclosure.
[0059] Referring to FIG. 1, and in brief overview, a machine learning pipeline system for predicting missed severe diagnoses in intensive care unit patients is illustrated. The system may receive raw EHR data 102, which may be subsequently preprocessed 104 to generate processed data 106. Processed data 106 may be separated into diseased samples 108A, representing patients with a target diagnosis, and the non-diseased samples 108B representing patients without the target diagnosis. From the diseased samples 108A and the non-diseased samples 108B, the system may create the first testing set 110A containing balanced positive and negative cases and the second testing set 110B containing samples with specific risk features in the nondiseased samples. The training set 112 may be generated from the processed data 106 and may be used to train the model 114, which may process input features to generate predictions. The model 114 may connect to the meta model 116, which may combine predictions from multiple base models to produce final classification results through stacking operations. The first testing set 110A and the second testing set HOB may be used for evaluation of the trained model 114 and the meta model 116. The meta model 116 may generate the outputs 120, which may include performance metrics such as area under the curve, sensitivity, specificity, positive predictive value, negative predictive value, and Fl -score, along with receiver operating characteristic curves and other evaluation results that support clinical decision-making.
[0060] Still referring to FIG. 1, and in greater detail, the system may obtain raw EHR data 102 from one or more data sources. In some cases, the raw EHR data 102 includes patient demographics such as age, gender, and ethnicity. The raw EHR data 102 may include vital signs captured during intensive care unit admissions. In some cases, the raw EHR data 102 includesNEX-20525laboratory results from tests performed on patients. The raw EHR data 102 may include diagnostic codes such as ICD-9 codes that identify patient conditions and procedures. In some cases, the raw EHR data 102 includes ICU admission records, lab events, and other data available at admission time.
[0061] The system may use a MIMIC-III database as a data source for the raw EHR data 102. The MIMIC-III database may contain de-identified health data from patients admitted to critical care units of Beth Israel Deaconess Medical Center from 2001-2012. In some cases, the MIMIC-III database provides diagnoses, procedures, admissions, and demographics data for each patient.
[0062] With continued reference to FIG. 1, data preprocessing stage 104 may receive the raw EHR data 102 and may perform operations to prepare the data for model training and evaluation. The data preprocessing stage 104 may perform cleaning operations to handle missing values and incorrectly assigned codes. In some cases, the data preprocessing stage 104 performs feature engineering to extract relevant features from the raw EHR data 102. The data preprocessing stage 104 may perform undersampling to address class imbalance between positive and negative cases. In some cases, the data preprocessing stage 104 performs shuffling to randomize the order of samples. The data preprocessing stage 104 may perform filtering to remove data points without diagnostic codes or with incorrectly assigned codes.
[0063] The data processing module may use regular expressions to identify patients with specific diagnoses based on ICD-9 code patterns. In some cases, the data processing module constructs ICD-9 code history for each patient up to the point of the patient's first diagnosis of a target condition. The ICD-9 code history may be stored as a list of codes for each individual patient.
[0064] The data preprocessing stage 104 may remove ethnicity as a feature when the distribution of ethnicity is similar across positive cases, negative cases, and control patients to reduce noise in a dataset. In some cases, the data preprocessing stage 104 combines specifiedNEX-20525ethnicity categories including UNABLE TO OBTAIN, OTHER, PATIENT DECLINED TO ANSWER, and UNKNOWN / NOT SPECIFIED into a single OTHER category.
[0065] The data preprocessing stage 104 may encode categorical variables using various encoding techniques. In some cases, the system encodes gender as binary values with M=1 and F=0. The system may encode admission type with EMERGENCY=1, ELECTIVE=2, and URGENT=3. In some cases, the data preprocessing stage 104 extracts patient demographics such as admission type, ethnicity, and age from the raw EHR data 102.
[0066] The data preprocessing stage 104 may generate processed data 106 that is prepared for subsequent model training and evaluation operations. In some cases, the processed data 106 includes encoded features, cleaned records, and extracted patient characteristics suitable for input to machine learning models.
[0067] The processed data 106 may be separated into diseased samples 108A and nondiseased samples 108B based on the presence or absence of target diagnoses. In some cases, diseased samples 108 A represent patients with a target diagnosis such as meningitis, pulmonary embolism, or bacterial endocarditis. Non-diseased samples 108B may represent patients without the target diagnosis. The system may exclude patients diagnosed with target medical conditions from control cohorts to create cleaner negative samples within the non-diseased samples 108B.
[0068] The system may generate a first testing set 110A and a second testing set HOB from the processed data 106 for model evaluation. In some cases, the first testing set 110A contains balanced positive and negative cases for initial model assessment. The second testing set HOB may contain samples with specific risk features in the non-diseased samples to evaluate model performance under different conditions. The system may shuffle combined case and control cohort data to ensure structural differences between cohorts do not dominate or skew the dataset.
[0069] The data processing module may be configured to process ICD-9 diagnostic codes using one-shot encoding or count vectorization techniques. In some cases, the system usesNEX-20525MultiLabelBinarizer to one-shot encode ICD-9 codes for patients with multiple diagnoses, creating a binary feature for each unique ICD-9 code in the dataset. The data processing module may convert ICD-9 codes to feature vectors using a custom implementation of Count Vectorizer that may indicate the frequency of times a patient has an ICD-9 code present in the patient's medical history. Count vectorization may differ from one-shot encoding as count vectorization may be indicative of the number of times a patient has an ICD-9 code present in the patient's medical history, adding weighting to each of the ICD-9 codes observed so that the frequency of various diagnoses and procedures is considered.
[0070] The data processing module may aggregate procedure codes by hospital admission ID and may create binary columns for each unique procedure code. In some cases, preprocessing the electronic health record data comprises encoding ICD-9 diagnostic codes using one-shot encoding or count vectorization techniques as part of a method for early prediction of missed severe diagnoses.
[0071] The preprocessing module may remove features that appear fewer than three times in the dataset to avoid overfitting and maintain dataset quality. In some cases, the preprocessing module removes features with correlation to the target variable below a threshold of 0.1 to eliminate irrelevant features. For example, the system may remove ICD-9 codes 03811 and 03812 associated with Methicillin from the dataset due to known causal relationship with spinal abscess that could introduce confounding variables.
[0072] The system may calculate age at admission by subtracting date of birth from admission date. In some cases, the system filters out unrealistic ages greater than 120 years or less than 0 years. The preprocessing module may truncate diagnosis histories to the most recent 25 diagnoses based on distribution analysis when patients have more than the cutoff number.
[0073] The preprocessing module may impute missing values in laboratory result columns with the mean value of their respective columns. In some cases, the system drops columns with missing value percentages exceeding a 90% threshold due to limited contribution to analysis.NEX-20525The system may expand a positive cohort by including cases diagnosed within the same admission period to address small sample sizes.
[0074] A training set 112 may be generated from the processed data 106 for training machine learning models. In some cases, the training set 112 contains balanced positive and negative cases derived from the diseased samples 108A and the non-diseased samples 108B. The system may limit the number of admissions considered for control cohorts, for example to the 5 most recent admissions, while keeping all admissions for positive cases due to class imbalance. In some cases, the system uses RandomUnderSampler to balance the dataset by setting negative sample size to 2 times the positive count.
[0075] The system includes a machine learning module comprising a plurality of trained models including at least one ensemble model, at least one deep learning model, and at least one large language model. The machine learning module is configured to analyze the processed electronic health record data to predict a likelihood of at least one severe diagnosis from a group consisting of acute myocardial infarction, acute pulmonary embolism, bacterial pneumonia, aortic dissection, meningitis, spinal abscess, bacterial endocarditis, and arterial thromboembolism.
[0076] A model 114 may receive the training set 112 and may process input features to generate predictions. In some cases, the model 114 includes a Random Forest Classifier that uses bagging to train incomplete decision trees and creates an ensemble model by taking a majority vote across all decision trees for final prediction output. The Random Forest Classifier may handle non-linearity, high dimensionality, and overfitting on classes with lower numbers of samples by utilizing bagging strategies to train incomplete decision trees and create an ensemble model of a large number of incomplete decision trees. When making a prediction, a vote may be taken across all decision trees and the prediction value with the highest number of votes may be predicted as the final output value.NEX-20525
[0077] The model 114 may use balanced class weights to penalize incorrect predictions for the minority class during training to account for class imbalance. In some cases, balanced class weights account for the relatively small number of positive cases in comparison with control cohorts.
[0078] The ensemble model may comprise a stacking ensemble model that combines predictions from logistic regression, random forest, XGBoost, and support vector machine base models. In some cases, the ensemble model includes Naive Bayes, AdaBoost, and Extra Trees classifiers as additional base models in the stacking ensemble. The stacking ensemble architecture may combine predictions from multiple base models to produce final classification results.
[0079] The XGBoost model may handle imbalanced datasets by assigning different weights to positive and negative classes and may build models sequentially where each new model improves upon the preceding one. In some cases, the XGBoost model uses gradient boosting techniques to optimize prediction accuracy across iterations.
[0080] A meta model 116 may receive predictions from the model 114 and may combine predictions from multiple base models to produce final classification results. In some cases, the stacking ensemble model uses logistic regression as a meta-learner to combine the base model predictions. The meta model 116 may comprise a logistic regression model trained to combine the base model predictions from logistic regression, random forest, XGBoost, and support vector machine base models. The meta model 116 may perform stacking operations to optimize prediction accuracy by learning weights for combining base model outputs.
[0081] In a method for early prediction of missed severe diagnoses in intensive care unit patients, the ensemble machine learning model may comprise a stacking ensemble that combines predictions from logistic regression, random forest, XGBoost, and support vector machine base models. In some cases, the stacking ensemble uses logistic regression as a meta-learner to combine the base model predictions.NEX-20525
[0082] In a non-transitory computer-readable storage medium storing instructions for predicting missed severe diagnoses, the ensemble model may comprise a stacking ensemble model that combines predictions from logistic regression, random forest, XGBoost, and support vector machine base models using a meta-learner. In some cases, the meta-learner comprises a logistic regression model trained to combine the base model predictions.
[0083] The system may generate outputs 120 from the meta model 116 that may include performance metrics for evaluating model effectiveness. In some cases, the outputs 120 include area under the curve scores, sensitivity, specificity, positive predictive value, negative predictive value, and Fl -score metrics. The outputs 120 may include receiver operating characteristic curves that visualize model performance across different classification thresholds. In some cases, the outputs 120 provide interpretable results that support clinical decision-making for early prediction of missed severe diagnoses in intensive care unit settings, including risk scores indicating probability of specific conditions such as bacterial endocarditis or pulmonary embolism, ranked lists of contributing factors such as prior diagnoses of atrial fibrillation or heart failure that elevated the patient's risk assessment, recommended diagnostic tests or imaging studies based on the predicted conditions, and prioritized alerts that enable clinicians to focus attention on patients with highest likelihood of missed severe diagnoses.
[0084] The first testing set 110A and the second testing set HOB may be used for evaluation of the trained model 114 and the meta model 116. In some cases, the system performs evaluation using the first testing set 110A to assess initial model performance and the second testing set 110B to evaluate model performance under different conditions with specific risk features in non-diseased samples.
[0085] Referring to FIG. 2, a method 200 for early prediction of missed severe diagnoses in intensive care unit patients is illustrated. The method 200 may begin at step 202 with receiving electronic health record data for a patient. Step 204 may comprise preprocessing the electronic health record data by cleaning missing values, encoding categorical variables, and extractingNEX-20525relevant features. Step 206 may comprise applying a trained ensemble machine learning model to the preprocessed data to generate a prediction score indicating a likelihood of the patient having at least one severe diagnosis. Step 208 may comprise generating an explainable output comprising feature importance rankings and outputting the prediction score and explainable output to support clinical decision-making.
[0086] Still referring to FIG. 2, and in greater detail, the method may begin by receiving electronic health record data for a patient (step 202). In some cases, the data includes at least patient demographics, vital signs, laboratory results, and diagnostic codes. The step 202 may receive data from the raw EHR data 102 including ICU admission records, lab events, and other data available at admission time. In some cases, the step 202 receives ICD-9 diagnostic codes that identify patient conditions and procedures performed during hospital admissions.
[0087] With continued reference to FIG. 2, a step 204 may comprise preprocessing the electronic health record data by cleaning missing values, encoding categorical variables, and extracting relevant features. In some cases, the step 204 performs operations corresponding to the data preprocessing 104 described with reference to FIG. 1. The step 204 may handle missing values by imputing mean values for laboratory result columns or removing records with incomplete data. In some cases, the step 204 encodes categorical variables including gender, admission type, and ethnicity using binary encoding or one-hot encoding techniques. The step 204 may extract relevant features from the electronic health record data including ICD-9 code histories, patient age at admission, and procedure codes.
[0088] The step 204 may generate the processed data 106 that is prepared for input to trained machine learning models. In some cases, the step 204 separates the processed data 106 into the diseased samples 108A and the non-diseased samples 108B based on presence or absence of target diagnoses.
[0089] A step 206 may comprise applying a trained ensemble machine learning model to the preprocessed data to generate a prediction score indicating a likelihood of the patient havingNEX-20525at least one severe diagnosis. In some cases, the severe diagnosis is selected from acute myocardial infarction, acute pulmonary embolism, bacterial pneumonia, aortic dissection, meningitis, spinal abscess, bacterial endocarditis, and arterial thromboembolism. The step 206 may apply the model 114 and the meta model 116 to generate prediction scores based on the preprocessed electronic health record data.
[0090] The method 200 may further comprise performing 5-fold cross-validation during model training to ensure robust performance across different data subsets. In some cases, the training process divides the training set 112 into five folds, using four folds for training and the remaining fold for testing in each iteration. The 5 -fold cross-validation process may be repeated five times with each fold being tested once to evaluate model consistency across different data partitions.
[0091] A step 208 may comprise generating an explainable output comprising feature importance rankings that identify which patient characteristics most contributed to the prediction. In some cases, the step 208 generates the outputs 120 including performance metrics and interpretable explanations. The step 208 may output the prediction score and explainable output to support clinical decision-making.
[0092] The method 200 may generate confusion matrices showing predicted versus true labels for model evaluation. In some cases, the confusion matrices display the number of true positive, true negative, false positive, and false negative predictions for each classification threshold. The system may calculate negative predictive value and positive predictive value in addition to sensitivity, specificity, and area under the curve for comprehensive model evaluation.
[0093] The method 200 may calculate 95% confidence intervals for each evaluation metric to estimate variability of model performance. In some cases, the confidence intervals provide statistical bounds on the expected range of performance metrics across different patient populations or data samples. The outputs 120 may include the confidence intervals along withNEX-20525point estimates for sensitivity, specificity, positive predictive value, negative predictive value, and area under the curve scores.
[0094] Referring to FIG. 3, and in brief overview, a multi-layer architecture for an explainable artificial intelligence system for early prediction of missed severe diagnoses in intensive care unit patients is illustrated. The architecture may receive input data through a data collection layer 302, which may be subsequently processed through a feature extraction layer 304 to generate unified feature representations. The unified feature representations may be prepared by a preprocessing layer 306, which may perform missing data filtering, normalization, feature selection, and feature weighting operations. A prediction layer 308 may receive the preprocessed features and may generate disease predictions through a machine learning model. The prediction layer 308 may output classifications indicating disease absence or disease presence, and may provide interpretable results to an ontology component that interfaces with clinicians and generates recommended items or activities and emergency alerts based on the prediction results.
[0095] Still referring to FIG. 3, and in greater detail, data collection layer 302 may obtain input data regarding a patient. As shown in FIG. 3, input data may be received from two primary sources. In some cases, the data collection layer 302 receives sensor data including signals from multiple sensors that capture measurements such as electroencephalogram, electrocardiogram, electromyogram, activity, heart rate, blood sugar, blood pressure, oxygen saturation, respiration rate, and body temperature. The data collection layer 302 may receive EMR data including questions and observations, medical history, and laboratory tests. In some cases, the medical history encompasses smoking history, diabetes history, and medication history. The laboratory tests may include CT scans, echo reports, X-ray reports, MRI scans, and blood tests. The data collection layer 302 may obtain raw EHR data 102 described with reference to FIG. 1.
[0096] With continued reference to FIG. 3, a feature extraction layer 304 may perform feature fusion and extraction operations on the data obtained by the data collection layer 302. InNEX-20525some cases, the feature extraction layer 304 performs a feature fusion operation that combines features with corresponding signals from the sensor data. As shown in FIG. 3, the feature extraction layer 304 may perform a Framingham risk factors extraction operation that processes the EMR data to generate extracted facts with corresponding features. In some cases, the feature extraction layer 304 merges the fused sensor features with the extracted EMR facts to create a unified feature representation for subsequent processing.
[0097] The system may include a neural network model with an embedding layer that may transform categorical diagnosis sequences into dense vectors. In some cases, the dense vectors are flattened and concatenated with clinical and demographic inputs to form a combined feature representation. The embedding layer may convert ICD-9 diagnostic code sequences into continuous vector representations that capture semantic relationships between different diagnosis codes.
[0098] A preprocessing layer 306 may perform data preparation operations on the data. In some cases, the preprocessing layer 306 performs missing data filtering to handle incomplete records in the electronic health record data. The preprocessing layer 306 may perform normalization to standardize feature values across different measurement scales. In some cases, the preprocessing layer 306 performs feature selection to identify relevant predictors from the unified feature representation. The preprocessing layer 306 may perform feature weighting to assign importance scores to different features based on their predictive value. The preprocessing layer 306 may correspond to the data preprocessing 104 described with reference to FIG. 1.
[0099] A prediction layer 308 may receive preprocessed features from the preprocessing layer 306. In some cases, the prediction layer 308 includes a machine learning model that processes the preprocessed features to generate predictions. The prediction layer 308 may output two possible classifications indicating disease absence and disease presence for the target severe diagnoses.NEX-20525
[0100] The deep learning model may comprise at least one of a Multi-Layer Perceptron network and a Long Short-Term Memory network. In some cases, the Multi-Layer Perceptron network processes the merged features through two fully connected layers with ReLU activations followed by a sigmoid-activated output layer for binary classification. The Long Short-Term Memory network may be configured to process sequential patient data including time-series vital signs and sequential diagnostic code histories.
[0101] The neural network model may be trained using Adam optimizer and binary crossentropy loss function. In some cases, the Adam optimizer adjusts learning rates adaptively during training to improve convergence. The binary cross-entropy loss function may measure the difference between predicted probabilities and actual binary labels for disease presence or absence.
[0102] In a non-transitory computer-readable storage medium storing instructions for predicting missed severe diagnoses, the deep learning model may comprise at least one of a Multi-Layer Perceptron network and a Long Short-Term Memory network configured to process sequential patient data. In some cases, the Long Short-Term Memory network captures temporal dependencies in patient medical histories to improve prediction accuracy for conditions that develop over time.
[0103] The prediction layer 308 may provide prediction results to an ontology component that interfaces with a clinician representation. In some cases, the ontology component generates outputs including recommended items or activities and emergency alerts based on the prediction results. The prediction layer 308 may provide interpretable outputs to support clinical decisionmaking for early identification of missed severe diagnoses in intensive care unit settings. The outputs from the prediction layer 308 may correspond to the outputs 120 described with reference to FIG. 1.
[0104] Referring to FIG. 4, a block diagram illustrates a computer system 10 that may be used to implement the explainable artificial intelligence system for early prediction of missedNEX-20525severe diagnoses in intensive care unit patients. The computer system 10 may include a computer system / server 12 that may contain several interconnected components for executing machine learning models and processing electronic health record data.
[0105] The computer system / server 12 may include a processing unit 16 that may execute instructions and manage computational operations for the machine learning module and explainability framework. In some cases, the processing unit 16 comprises one or more processors configured to execute program instructions for training and deploying machine learning models. The processing unit 16 may execute operations for accessing electronic health record data for intensive care unit patients, the data including patient demographics, vital signs, laboratory results, and diagnostic codes.
[0106] A system bus 18 may provide communication pathways between the various components of the computer system / server 12. In some cases, the system bus 18 facilitates data transfer between the processing unit 16 and other system components including memory and storage devices. The system bus 18 may enable high-speed communication for real-time processing of patient data during prediction operations.
[0107] With continued reference to FIG. 4, a memory 28 may connect to the system bus 18 and may provide data storage capabilities for program execution. The memory 28 may include RAM 30 for temporary data storage during program execution and model inference operations. In some cases, the RAM 30 stores intermediate results during feature extraction and prediction score calculation. The memory 28 may include a cache 32 for storing frequently accessed data to improve processing speed during iterative model training operations.
[0108] A storage system 34 may be connected to the system bus 18 and may provide persistent data storage capabilities for the computer system / server 12. The storage system 34 may include a storage medium 40 for storing data and executable code. In some cases, the storage medium 40 comprises a non-transitory computer-readable storage medium storingNEX-20525instructions that, when executed by the processing unit 16, cause the processing unit 16 to perform operations for predicting missed severe diagnoses.
[0109] The storage system 34 may store program modules 42 that may contain executable instructions implementing the machine learning models, data preprocessing operations, and explainability framework. In some cases, the program modules 42 include instructions for training a plurality of machine learning models including at least one ensemble model and at least one deep learning model using the electronic health record data. The program modules 42 may include instructions wherein the models are configured to predict missed severe diagnoses comprising acute myocardial infarction, acute pulmonary embolism, bacterial pneumonia, aortic dissection, meningitis, spinal abscess, bacterial endocarditis, and arterial thromboembolism. In some cases, the program modules 42 include instructions for deploying the trained models and explainability framework to provide real-time prediction and explanation capabilities for clinical use.
[0110] The computer system / server 12 may include I / O interfaces 22 connected to the system bus 18. In some cases, the I / O interfaces 22 enable communication with external devices 14 and a display 24. The external devices 14 may include input devices, additional storage, or other peripheral equipment for data entry and system configuration. The display 24 may provide visual output for presenting prediction results, feature importance rankings, and other information to clinicians.
[0111] A network adapter 20 may connect to the system bus 18 and may enable the computer system / server 12 to communicate with external networks and systems. In some cases, the network adapter 20 facilitates integration with electronic health record systems and enables real-time data exchange for patient risk assessment. The system may further comprise an integration interface configured to integrate with electronic health record systems and provide real-time risk assessment alerts to clinicians through the network adapter 20.NEX-20525
[0112] The system may use a missingno library to visualize missing data in the electronic health record data. In some cases, the missingno library generates heatmap representations showing correlations between missing values across different features. The missingno library may generate matrix representations displaying patterns of missing data across patient records. In some cases, the missingno library generates bar chart representations showing the percentage of missing values for each feature column to support data quality assessment during the data preprocessing 104 operations.
[0113] Referring to FIG. 5, receiver operating characteristic curves illustrate performance of three machine learning models evaluated using 5-fold cross-validation for myocardial infarction prediction. The figure may comprise three panels arranged horizontally, with each panel showing performance of a different classifier across five validation folds. The receiver operating characteristic curves may demonstrate discriminative ability of each model by plotting True Positive Rate on the vertical axis against False Positive Rate on the horizontal axis.
[0114] The left panel of FIG. 5 may display receiver operating characteristic curves for a Random Forest classifier across five folds. In some cases, the Random Forest classifier achieves area under the curve values ranging from 0.7735 for Fold 4 to 0.9163 for Fold 2. Fold 1 may achieve an area under the curve of 0.8299, Fold 3 may achieve an area under the curve of 0.8989, and Fold 5 may achieve an area under the curve of 0.9070. The curves may demonstrate model ability to discriminate between positive and negative cases, with all folds performing above a diagonal reference line representing random classification with an area under the curve of 0.5.
[0115] The system may use grid search methodology to optimize hyperparameters for Random Forest models. In some cases, the grid search explores values for parameters including n estimators, max depth, min samples split, min samples leaf, max features, bootstrap, oob score, and criterion. The grid search methodology may select a combination of parameters that optimizes model performance across evaluation metrics. In some cases, the model training tracks out-of-bag error across increasing numbers of estimators from 10 to 200 in increments ofNEX-2052510 to determine model complexity. The out-of-bag error tracking may identify a number of estimators that balances model accuracy with computational efficiency.
[0116] With continued reference to FIG. 5, the center panel may present receiver operating characteristic curves for a LightGBM classifier across the same five folds. In some cases, the LightGBM classifier achieves area under the curve values of 0.8295 for Fold 1, 0.8480 for Fold 2, 0.8773 for Fold 3, 0.7419 for Fold 4, and 0.8441 for Fold 5. The curves may indicate consistent performance across folds with some variability, particularly in Fold 4 which may show lower discriminative ability compared to other folds.
[0117] The right panel of FIG. 5 may illustrate receiver operating characteristic curves for a Deep Neural Network classifier evaluated across five folds. In some cases, the Deep Neural Network classifier achieves area under the curve values ranging from 0.7809 for Fold 1 to 0.8831 for Fold 5. Fold 2 may achieve an area under the curve of 0.8750, Fold 3 may achieve an area under the curve of 0.8329, and Fold 4 may achieve an area under the curve of 0.7963. Each panel may include a dashed diagonal line representing performance of a random classifier with an area under the curve of 0.5.
[0118] The system may generate receiver operating characteristic curves displaying model performance across 5 -fold cross-validation with area under the curve scores for each fold. In some cases, the trained models generate performance metrics including area under the curve scores exceeding 0.90 for at least one of the severe diagnoses. The 5-fold cross-validation process may divide training data into five folds, using four folds for training and the remaining fold for testing in each iteration. In some cases, the 5 -fold cross-validation process is repeated five times with each fold being tested once to evaluate model consistency across different data partitions.
[0119] In a non-transitory computer-readable storage medium storing instructions for predicting missed severe diagnoses, the operations may further comprise performing 5-fold cross-validation during model training. In some cases, the trained models stored on the non-NEX-20525transitory computer-readable storage medium generate performance metrics including area under the curve scores exceeding 0.90 for at least one of the severe diagnoses.
[0120] The large language model may comprise ClinicalBERT fine-tuned on clinical text data. In some cases, the ClinicalBERT model is fine-tuned with a learning rate of 2e-5 and trained for 7 epochs. The ClinicalBERT model may use a batch size of 16 to balance computational efficiency and effective learning. In some cases, the maximum sequence length is set to 128 tokens, allowing the model to process relevant medical information efficiently. The ClinicalBERT model may use gradient accumulation with 2 steps, effectively increasing the batch size without overloading memory. In some cases, mixed-precision fpl6 training is applied to reduce memory consumption and speed up training. The ClinicalBERT model may implement gradient clipping with a max gradient norm of 1.0 to avoid gradient explosion during backpropagation.
[0121] The ClinicalBERT model may use AutoTokenizer to tokenize raw ICD-9 codes in patient history for input processing. In some cases, the AutoTokenizer converts ICD-9 diagnostic code sequences into token representations suitable for input to the ClinicalBERT model. The tokenized ICD-9 codes may be processed through the fine-tuned ClinicalBERT model to generate contextualized embeddings for each token in the input sequence.
[0122] The system may use L2 regularization with a penalty parameter for logistic regression models. In some cases, a C parameter is set to 0.23357214690901212 to control regularization strength and balance underfitting and overfitting. The logistic regression model may use saga solver suited for large datasets with max iter set to 1000 to allow model convergence. In some cases, the saga solver adjusts model weights iteratively until convergence criteria are satisfied or the maximum number of iterations is reached.
[0123] The system may use a Decision Tree Classifier as an alternative model for binary classification with feature importance extraction. In some cases, the Decision Tree Classifier provides interpretable feature importance rankings that identify which patient characteristicsNEX-20525contribute to predictions. The Decision Tree Classifier may be trained using the same training data and evaluated using the same cross-validation methodology as other models in the machine learning module.
[0124] Referring to FIG. 6, receiver operating characteristic curves illustrate performance of the machine learning model evaluated on two separate testing sets for prediction of missed severe diagnoses in intensive care unit patients. The figure may comprise two panels, with each panel showing the relationship between True Positive Rate on the vertical axis and False Positive Rate on the horizontal axis for a respective testing set.
[0125] The left panel of FIG. 6 may display the receiver operating characteristic curve for Testing Set 1. In some cases, the curve rises steeply from the origin and reaches a True Positive Rate of approximately 0.95 at a False Positive Rate near 0.05. The curve may continue along the upper portion of the plot, demonstrating strong discriminative ability. Testing Set 1 may achieve an area under the curve of 0.9654, indicating the model's ability to distinguish between positive and negative cases with high accuracy. A dashed diagonal line may extend from the origin to the upper right corner, representing performance of a random classifier with an area under the curve of 0.5.
[0126] With continued reference to FIG. 6, the right panel may present the receiver operating characteristic curve for Testing Set 2. In some cases, the curve demonstrates a stepped pattern as the curve rises from the origin, reaching a True Positive Rate of approximately 0.85 at a False Positive Rate near 0.1. The curve may continue to approach the upper left region of the plot before reaching the maximum True Positive Rate. Testing Set 2 may achieve an area under the curve of 0.9490, demonstrating strong discriminative ability with the curve positioned substantially above the diagonal reference line. A dashed diagonal reference line may be present in the right panel, representing random classification performance with an area under the curve of 0.5.NEX-20525
[0127] The stepped appearance of both curves in FIG. 6 may reflect the discrete nature of classification thresholds evaluated during model assessment. In some cases, the performance metrics indicate that the model maintains high predictive accuracy across both testing sets, with Testing Set 1 showing slightly higher discriminative performance compared to Testing Set 2.
[0128] The system may perform a three-phase feature engineering approach to optimize prediction performance. In some cases, Phase 1 of the feature engineering approach excludes an emergency flag feature from the feature set during initial model training and evaluation. Phase 2 of the feature engineering approach may remove ICD9 code 444 from the feature set to assess impact on model performance. In some cases, Phase 3 of the feature engineering approach reintroduces both the emergency flag feature and ICD9 code 444 to the feature set for comparison with earlier phases. The three-phase feature engineering approach may identify combinations of features that provide improved prediction performance compared to using all features without systematic evaluation.
[0129] The system may include an ADMISSION TYPE BINARY feature indicating whether an admission was an emergency type. In some cases, theADMISSION TYPE BINARY feature is derived from the admission type categorical variable by encoding emergency admissions as a binary indicator. The ADMISSION TYPE BINARY feature may contribute moderately to prediction when included in the feature set during model training. In some cases, the contribution of the ADMISSION TYPE BINARY feature is evaluated across the three phases of feature engineering to determine the feature's impact on overall model discriminative ability as reflected in the area under the curve scores for Testing Set 1 and Testing Set 2.
[0130] Referring to FIG. 7A, a SHAP value summary plot illustrates feature importance rankings for predicting missed severe diagnoses in intensive care unit patients. The SHAP value summary plot may display features ranked by importance along a vertical axis, with SHAP values indicating impact on model output along a horizontal axis. A color gradient may indicateNEX-20525feature values from low to high, with scattered data points for each feature extending horizontally to show the magnitude and direction of SHAP values on model predictions.
[0131] The system includes an explainability module configured to generate feature importance rankings and provide interpretable explanations for the predictions made by the machine learning module. In some cases, the explainability module generates SHAP value summary plots that visualize how different feature values influence model predictions in positive or negative directions. The explainability module may implement an explainability framework that generates interpretable feature importance scores for predictions made by the trained models.
[0132] An ICD-9 diagnostic code feature 3893 may appear at the top of the ranking in FIG.7A, indicating the feature has the highest predictive importance for the target diagnosis. In some cases, the ICD-9 diagnostic code feature 3893 corresponds to insertion of cochlear implant as a procedure. An ICD-9 diagnostic code feature 9904 may be positioned second in the ranking and may correspond to transfusion of packed cells as a procedure. An ICD-9 diagnostic code feature 3961 may appear third in the ranking and may correspond to extracorporeal circulation auxiliary to open heart surgery as a procedure.
[0133] With continued reference to FIG. 7A, an ICD-9 diagnostic code feature 9671 and an acute respiratory failure code 51881 may be positioned in the upper portion of the plot. In some cases, the ICD-9 diagnostic code feature 9671 corresponds to continuous invasive mechanical ventilation for less than 96 consecutive hours as a procedure. The acute respiratory failure code 51881 may be identified as a highly prevalent code in pneumonia patient cohorts along with acute kidney failure code 5849 and septicemia code 0389.
[0134] An ICD-9 diagnostic code feature 8856 and an ICD-9 diagnostic code feature 3995 may appear in the middle-upper region of the ranking. In some cases, the ICD-9 diagnostic code feature 3995 corresponds to hemodialysis as a procedure. An ICD-9 diagnostic code feature 9604 and an ICD-9 diagnostic code feature 966 may follow below in the ranking. The ICD-9NEX-20525diagnostic code feature 9604 may correspond to insertion of endotracheal tube as a procedure, and the ICD-9 diagnostic code feature 966 may correspond to enteral infusion of concentrated nutritional substances as a procedure.
[0135] An ICD-9 diagnostic code feature 4513 and an ICD-9 diagnostic code feature 3615 may be displayed in the central portion of the plot. In some cases, the ICD-9 diagnostic code feature 3615 corresponds to single internal mammary-coronary artery bypass as a procedure. An ICD-9 diagnostic code feature 3950 and an ICD-9 diagnostic code feature 3929 may appear below these features in the ranking.
[0136] An ICD-9 diagnostic code feature 2720 and an ICD-9 diagnostic code feature v053 may be positioned in the lower-middle region of the ranking. In some cases, the ICD-9 diagnostic code feature v053 corresponds to need for prophylactic vaccination and inoculation against viral hepatitis as a diagnosis. The feature importance analysis may identify vaccination for viral hepatitis as a top predictive feature for spinal abscess along with hypertension, adjustment disorder, diabetes mellitus, and osteomyelitis.
[0137] An ICD-9 diagnostic code feature 2859 and an atrial fibrillation code 42731 may appear in the lower portion of the plot in FIG. 7A. A diabetes mellitus code 25000 may be positioned near the bottom of the ranking, followed by an ICD-9 diagnostic code feature 2724 and an ICD-9 diagnostic code feature 2762 at the lowest positions. In some cases, the feature importance analysis identifies diabetes mellitus as a top predictive feature for spinal abscess.
[0138] Referring to FIG. 7B, a table may list ICD codes and corresponding descriptions for medical procedures and diagnoses. The table may provide interpretable explanations for the ICD-9 diagnostic code features displayed in FIG. 7A. In some cases, the table enables clinicians to understand the clinical meaning of features that contribute to model predictions.
[0139] The explainability module may be configured to generate feature importance rankings using at least one of coefficient analysis for logistic regression models and featureNEX-20525importance scores for tree-based models. In some cases, the coefficient analysis extracts weights assigned to each feature by logistic regression models to determine relative contribution to predictions. The feature importance scores for tree-based models may be derived from Random Forest classifiers that may calculate importance based on how frequently features are used in decision tree splits and the resulting improvement in prediction accuracy.
[0140] The feature importance analysis may identify fitting and adjustment of cardiac device, personal history of surgery, age, narcolepsy with cataplexy, and closed fracture of base of skull as top predictive features for pulmonary embolism. In some cases, the feature importance analysis identifies hypertension, vaccination for viral hepatitis, adjustment disorder, diabetes mellitus, and osteomyelitis as top predictive features for spinal abscess.
[0141] The system may identify hemodialysis, end stage renal disease, and hypertensive chronic kidney disease as top predictive features for bacterial endocarditis using a random forest model. In some cases, the explainability framework identifies traumatic amputation, bacteremia, heart failure, and atrial fibrillation as top predictive features for bacterial endocarditis. The system may remove a bacteremia feature and evaluate model performance change to assess feature contribution to predictions.
[0142] In a non-transitory computer-readable storage medium storing instructions for predicting missed severe diagnoses, the explainability framework may identify traumatic amputation, bacteremia, heart failure, and atrial fibrillation as top predictive features for bacterial endocarditis. In some cases, the program modules 42 stored on the storage medium 40 include instructions for implementing the explainability framework that generates interpretable feature importance scores for predictions made by the trained models.
[0143] Referring to FIG. 8, a SHAP value summary plot illustrates feature importance and impact on model output for predicting outcomes in cardiac surgery patients. The SHAP value summary plot may display clinical features ranked by importance along a vertical axis, with SHAP values indicating impact on model output along a horizontal axis ranging fromNEX-20525approximately -2 to 5. A color gradient bar on the right side may indicate feature value from low to high, with magenta indicating high feature values and blue indicating low feature values.
[0144] FIG. 8 illustrates various clinical features used for predicting outcomes in cardiac surgery patients. The features shown include a WBC count feature indicating white blood cell count, which may have predictive importance for cardiac surgery outcomes. In some cases, the WBC count feature displays scattered data points extending horizontally to indicate the magnitude and direction of SHAP values on model predictions. High values of the WBC count feature may push predictions in a positive direction, while low values may push predictions in a negative direction.
[0145] The features shown in FIG. 8 include a creatinine level feature and a platelet count feature. In some cases, the creatinine level feature indicates kidney function status that may influence cardiac surgery outcomes. The platelet count feature may reflect coagulation status and bleeding risk in cardiac surgery patients.
[0146] With continued reference to FIG. 8, the features include an IABP insertion timing feature and a hematocrit feature. In some cases, the IABP insertion timing feature indicates timing of intra-aortic balloon pump insertion relative to cardiac surgery. The hematocrit feature may reflect oxygen-carrying capacity of blood and anemia status in patients undergoing cardiac procedures.
[0147] FIG. 8 also displays an aortic valve disease etiology feature and a BMI feature. In some cases, the aortic valve disease etiology feature indicates the underlying cause of aortic valve pathology. The BMI feature may reflect body mass index calculated from patient height and weight measurements.
[0148] Additional features shown in FIG. 8 include a patient age feature and a heart failure feature. In some cases, the patient age feature indicates age at admission and may influenceNEX-20525surgical risk and recovery outcomes. The heart failure feature may indicate presence or absence of heart failure diagnosis in the patient's medical history.
[0149] The features in FIG. 8 include an aortic valve regurgitation feature and an ejection fraction feature. In some cases, the aortic valve regurgitation feature indicates severity of blood flow reversal through the aortic valve. The ejection fraction feature may indicate the percentage of blood pumped out of the left ventricle with each heartbeat, reflecting cardiac function status.
[0150] FIG. 8 further shows a weight feature and a chronic lung disease feature. In some cases, the weight feature indicates patient body weight in kilograms. The chronic lung disease feature may indicate presence of chronic obstructive pulmonary disease or other respiratory conditions that may affect surgical outcomes.
[0151] The features displayed in FIG. 8 also include a height feature and an emergent status feature. In some cases, the height feature indicates patient height in centimeters. The emergent status feature may indicate whether the cardiac surgery was performed on an emergency basis versus elective scheduling.
[0152] FIG. 8 includes a cardiogenic shock feature and a NYHA classification feature. In some cases, the cardiogenic shock feature indicates presence of shock due to inadequate cardiac output. The NYHA classification feature may indicate New York Heart Association functional classification of heart failure severity ranging from Class I to Class IV.
[0153] The features shown in FIG. 8 include an alcohol use feature and a cardiac arrhythmia feature. In some cases, the alcohol use feature indicates patient history of alcohol consumption. The cardiac arrhythmia feature may indicate presence of irregular heart rhythms that may complicate cardiac surgery outcomes.
[0154] FIG. 8 also displays a heart failure timing feature. In some cases, the heart failure timing feature indicates temporal relationship between heart failure diagnosis and cardiac surgery admission. Each feature in FIG. 8 may display scattered data points colored according toNEX-20525feature value, demonstrating how different feature values influence model predictions in positive or negative directions.
[0155] The feature importance analysis may identify age at admission, pneumonia diagnosis, acute coronary syndrome, upper GI bleed, and Medicaid insurance as predictive features for arterial thromboembolism. In some cases, the patient age feature 70 contributes to predictions for arterial thromboembolism in addition to cardiac surgery outcomes. The feature importance rankings generated by the explainability module may vary across different target diagnoses, with the SHAP value summary plot in FIG. 8 illustrating feature contributions for cardiac surgery outcome prediction.
[0156] Machine readable storage including machine-readable instructions, when executed, to implement a method or realize an apparatus in any of the examples of the present application.
[0157] Various techniques, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, a non-transitory computer readable storage medium, or any other machine-readable storage medium wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the various techniques. In the case of program code execution on programmable computers, the computing device may include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and / or storage elements), at least one input device, and at least one output device. The volatile and non-volatile memory and / or storage elements may be a RAM, an EPROM, a flash drive, an optical drive, a magnetic hard drive, or another medium for storing electronic data. The eNB (or other base station) and UE (or other mobile station) may also include a transceiver component, a counter component, a processing component, and / or a clock component or timer component. One or more programs that may implement or utilize the various techniques described herein may use an application programming interface (API), reusable controls, and the like. Such programs may be implemented in a high-level procedural or anNEX-20525object-oriented programming language to communicate with a computer system. However, the program(s) may be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or an interpreted language, and combined with hardware implementations.
[0158] It should be understood that many of the functional units described in this specification may be implemented as one or more components, which is a term used to more particularly emphasize their implementation independence. For example, a component may be implemented as a hardware circuit comprising custom very large scale integration (VLSI) circuits or gate arrays, off-the-shelf semiconductors such as logic chips, transistors, or other discrete components. A component may also be implemented in programmable hardware devices such as field programmable gate arrays, programmable array logic, programmable logic devices, or the like.
[0159] Components may also be implemented in software for execution by various types of processors. An identified component of executable code may, for instance, comprise one or more physical or logical blocks of computer instructions, which may, for instance, be organized as an object, a procedure, or a function. Nevertheless, the executables of an identified component need not be physically located together, but may comprise disparate instructions stored in different locations that, when joined logically together, comprise the component and achieve the stated purpose for the component.
[0160] Indeed, a component of executable code may be a single instruction, or many instructions, and may even be distributed over several different code segments, among different programs, and across several memory devices. Similarly, operational data may be identified and illustrated herein within components, and may be embodied in any suitable form and organized within any suitable type of data structure. The operational data may be collected as a single data set, or may be distributed over different locations including over different storage devices, andNEX-20525may exist, at least partially, merely as electronic signals on a system or network. The components may be passive or active, including agents operable to perform desired functions.
[0161] Reference throughout this specification to "an example" means that a particular feature, structure, or characteristic described in connection with the example is included in at least one embodiment of the present invention. Thus, appearances of the phrase "in an example" in various places throughout this specification are not necessarily all referring to the same embodiment.
[0162] As used herein, a plurality of items, structural elements, compositional elements, and / or materials may be presented in a common list for convenience. However, these lists should be construed as though each member of the list is individually identified as a separate and unique member. Thus, no individual member of such list should be construed as a de facto equivalent of any other member of the same list solely based on its presentation in a common group without indications to the contrary. In addition, various embodiments and examples of the present invention may be referred to herein along with alternatives for the various components thereof. It is understood that such embodiments, examples, and alternatives are not to be construed as de facto equivalents of one another, but are to be considered as separate and autonomous representations of the present invention.
[0163] Although the foregoing has been described in some detail for purposes of clarity, it will be apparent that certain changes and modifications may be made without departing from the principles thereof. It should be noted that there are many alternative ways of implementing both the processes and apparatuses described herein. Accordingly, the present embodiments are to be considered illustrative and not restrictive, and the invention is not to be limited to the details given herein, but may be modified within the scope and equivalents of the appended claims.
[0164] Those having skill in the art will appreciate that many changes may be made to the details of the above-described embodiments without departing from the underlying principles ofNEX-20525the invention. The scope of the present invention should, therefore, be determined only by the following claims.
Claims
NEX-20525CLAIMS1. An explainable artificial intelligence system for early prediction of missed severe diagnoses in intensive care unit patients, comprising:a data processing module configured to receive and process electronic health record data;a machine learning module comprising a plurality of trained models including at least one ensemble model, at least one deep learning model, and at least one large language model, wherein the machine learning module is configured to analyze the processed electronic health record data to predict a likelihood of at least one severe diagnosis; and an explainability module configured to generate feature importance rankings and provide interpretable explanations for the predictions made by the machine learning module.
2. The system of claim 1, wherein the ensemble model comprises a stacking ensemble model that combines predictions from logistic regression, random forest, XGBoost, and support vector machine base models.
3. The system of claim 2, wherein the stacking ensemble model uses logistic regression as a meta-learner to combine the base model predictions.
4. The system of claim 1, wherein the deep learning model comprises at least one of a MultiLayer Perceptron network and a Long Short-Term Memory network.
5. The system of claim 1, wherein the large language model comprises ClinicalBERT finetuned on clinical text data.
6. The system of claim 1, wherein the data processing module is configured to process ICD-9 diagnostic codes using one-hot encoding or count vectorization techniques.NEX-205257. The system of claim 1, wherein the explainability module is configured to generate feature importance rankings using at least one of coefficient analysis for logistic regression models and feature importance scores for tree-based models.
8. The system of claim 1, further comprising an integration interface configured to integrate with electronic health record systems and provide real-time risk assessment alerts to clinicians.
9. A computer-implemented method for early prediction of missed severe diagnoses in intensive care unit patients, comprising:receiving electronic health record data for a patient, the data including at least patient demographics, vital signs, laboratory results, and diagnostic codes;preprocessing the electronic health record data by cleaning missing values, encoding categorical variables, and extracting relevant features;applying a trained ensemble machine learning model to the preprocessed data to generate a prediction score indicating a likelihood of the patient having at least one severe diagnosis selected from acute myocardial infarction, acute pulmonary embolism, bacterial pneumonia, aortic dissection, meningitis, spinal abscess, bacterial endocarditis, and arterial thromboembolism;generating an explainable output comprising feature importance rankings that identify which patient characteristics most contributed to the prediction; andoutputting the prediction score and explainable output to support clinical decisionmaking.
10. The method of claim 9, wherein the ensemble machine learning model comprises a stacking ensemble that combines predictions from logistic regression, random forest, XGBoost, and support vector machine base models.NEX-2052511. The method of claim 10, wherein the stacking ensemble uses logistic regression as a metalearner to combine the base model predictions.
12. The method of claim 9, wherein preprocessing the electronic health record data comprises encoding ICD-9 diagnostic codes using one-hot encoding or count vectorization techniques.
13. The method of claim 9, further comprising performing 5-fold cross-validation during model training to ensure robust performance across different data subsets.
14. A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform operations comprising:accessing electronic health record data for intensive care unit patients, the data including patient demographics, vital signs, laboratory results, and diagnostic codes;training a plurality of machine learning models including at least one ensemble model and at least one deep learning model using the electronic health record data, wherein the models are configured to predict missed severe diagnoses comprising acute myocardial infarction, acute pulmonary embolism, bacterial pneumonia, aortic dissection, meningitis, spinal abscess, bacterial endocarditis, and arterial thromboembolism;implementing an explainability framework that generates interpretable feature importance scores for predictions made by the trained models; anddeploying the trained models and explainability framework to provide real-time prediction and explanation capabilities for clinical use.
15. The non-transitory computer-readable storage medium of claim 14, wherein the ensemble model comprises a stacking ensemble model that combines predictions from logistic regression, random forest, XGBoost, and support vector machine base models using a metalearner.NEX-2052516. The non-transitory computer-readable storage medium of claim 15, wherein the meta-learner comprises a logistic regression model trained to optimally combine the base model predictions.
17. The non-transitory computer-readable storage medium of claim 14, wherein the deep learning model comprises at least one of a Multi-Layer Perceptron network and a Long Short-Term Memory network configured to process sequential patient data.
18. The non-transitory computer-readable storage medium of claim 14, wherein the operations further comprise performing 5 -fold cross-validation during model training.
19. The non-transitory computer-readable storage medium of claim 14, wherein the trained models generate performance metrics including area under the curve scores exceeding 0.90 for at least one of the severe diagnoses.
20. The non-transitory computer-readable storage medium of claim 14, wherein the explainability framework identifies traumatic amputation, bacteremia, heart failure, and atrial fibrillation as top predictive features for bacterial endocarditis.