Multimodal-based postoperative complication prediction method for surgical anesthesia patients
By integrating tabular data, clinical text, and intraoperative physiological time-series data using deep learning multimodal and multi-task prediction techniques, and utilizing Transformer encoders and dynamic task priority/weighted loss, the problem of insufficient utilization of multimodal data and insufficient cross-scenario generalization ability in existing technologies is solved, achieving high-precision prediction and early warning of postoperative complications.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- WEST CHINA HOSPITAL SICHUAN UNIV
- Filing Date
- 2026-03-05
- Publication Date
- 2026-06-12
AI Technical Summary
Existing postoperative complication risk prediction technologies are insufficient in multimodal perioperative data fusion, dynamic time point prediction, rare outcome modeling, and cross-scenario generalization capabilities. They are unable to fully utilize the rich perioperative data, resulting in the neglect of key prediction windows and limited improvement in predictive performance for low-incidence but serious complications.
Employing deep learning multimodal and multitasking prediction technology, this study integrates tabular data, clinical text, and intraoperative physiological time-series data through a Transformer fusion encoder. By introducing a dynamic task interaction module and adaptive task priority/weighted loss, it achieves high-precision and strong generalization risk prediction for various postoperative complications.
It achieves high-precision prediction of various postoperative complications in critical preoperative and postoperative windows, enhances early warning and individualized intervention decision support capabilities, demonstrates strong generalization ability and attention to rare and serious outcomes, and improves predictive performance.
Smart Images

Figure CN122201806A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of medical information technology, and more specifically, to a method for predicting postoperative complications in surgical anesthesia patients based on multimodality. Background Technology
[0002] Surgical procedures are among the most common treatments in modern healthcare systems, with a massive number of surgeries performed globally each year. Clinical practice shows that postoperative complications remain a significant challenge in perioperative management, as they significantly increase the risk of short-term death and reduce long-term survival. In addition to clinical harm, postoperative complications also impose a significant economic burden, such as increased hospitalization costs, prolonged hospital stays, and increased intensive care resource requirements. Therefore, accurate and timely risk prediction of postoperative complications is a crucial prerequisite for implementing individualized perioperative management, early preventative interventions, and optimized resource allocation.
[0003] To achieve postoperative risk stratification, various rule-based or scoring-based risk assessment tools have long been used in clinical practice, such as the POSSUM scoring system, the ACSNSQIP surgical risk calculator, and its derivative models. These methods typically rely on a small number of pre-defined variables to provide estimates of mortality or complication risks through linear or empirical rules, and are characterized by their simplicity and ease of generalization. In recent years, with the accumulation of information data such as electronic medical records (EMR), research on perioperative risk prediction based on artificial intelligence has gradually increased, especially with machine learning and deep learning methods being used to build predictive models on large-scale data. Some studies have achieved high discriminative power in specific populations and for specific outcomes, and have further expanded the predictive targets to include more types of postoperative adverse events. Related methods typically utilize structured data, and some studies have also attempted to introduce data from more sources to improve model performance.
[0004] Although existing scoring systems and intelligent algorithms have received widespread attention in postoperative risk assessment, the following technical issues still exist in terms of overall performance:
[0005] Limited and static variable sets make it difficult to reflect dynamic changes during the perioperative period. Many existing tools treat surgical risk primarily as being determined by a small number of preoperative baseline characteristics, typically using only about 10 to 20 preoperative variables for modeling. This makes it difficult to fully utilize the data features continuously generated during the perioperative period, resulting in insufficient characterization of the dynamic changes in the patient's condition.
[0006] The generalization ability across populations, hospitals, and scenarios is insufficient. Rule-based scoring or partially data-driven models often exhibit significant performance fluctuations across different surgical populations, medical institutions, and clinical pathways. Insufficient external validation can lead to unstable discriminative ability, affecting clinical usability and reliability.
[0007] Predictive performance for low-incidence but clinically important outcomes remains unsatisfactory. For complications with low incidence (e.g., ≤5%) but serious consequences (such as severe respiratory complications, unplanned ICU admissions, etc.), existing models are often affected by class imbalance and sample sparsity, resulting in limited improvement in predictive performance and making it difficult to meet the needs of early identification and warning.
[0008] Insufficient utilization of perioperative multimodal data leads to the neglect of key predictive windows. Perioperative data is multi-source and heterogeneous, including not only structured tabular data but also clinical text records (surgical records, intraoperative findings, etc.) and intraoperative physiological monitoring time-series data. Many existing methods fail to adequately integrate this multimodal data, particularly neglecting the risk assessment of the crucial "immediately postoperative" time point, resulting in missed predictive windows closer to intervention.
[0009] In summary, existing postoperative complication risk prediction technologies still have shortcomings in areas such as multimodal perioperative data fusion, dynamic time point prediction, rare outcome modeling, and cross-scenario generalization. There is an urgent need for a technical solution that can make fuller use of rich perioperative data and improve prediction and application capabilities. Summary of the Invention
[0010] The purpose of this invention is to provide a multimodal method for predicting postoperative complications in surgical anesthesia patients. This method employs a perioperative deep learning multimodal, multi-task prediction technique: it separately encodes tabular data, clinical text, and intraoperative physiological time-series data, and performs cross-modal fusion using a Transformer-based fusion encoder. A dynamic task interaction module is introduced to model the dependencies between multiple complications, and adaptive task priority / weighted loss is used to alleviate class imbalance. This achieves high-precision, strong generalization, and a greater focus on rare and severe outcomes in predicting the risk of various postoperative complications during critical preoperative and postoperative windows, thereby enhancing early warning and personalized intervention decision support capabilities.
[0011] The above-mentioned technical objective of the present invention is achieved through the following technical solution: a method for predicting postoperative complications in surgical anesthesia patients based on multimodality, comprising the following steps:
[0012] S1: Deep multimodal fusion; a modality-specific encoder transforms the data type of each data type into a shared high-dimensional latent layer representation, allowing for subsequent joint processing of heterogeneous inputs; subsequently, a Transformer-based fusion encoder integrates cross-modal information and generates a unified patient-level feature representation, extracting the final output vector corresponding to the global label. ;
[0013] S2: Dynamic task interaction; obtaining the final output vector of global labels. Then, this single feature set is used to predict multiple different outcomes; through dynamic task interactions, the interactions between tasks are adaptively modeled and predicted on a patient-by-patient basis, promoting information sharing among relevant outcomes while preserving task-specific information;
[0014] S3: Adaptive task priority; Asymmetric loss (ASL) is used for each binary classification task; The model’s focus is dynamically adjusted during training through an adaptive weighting strategy, prioritizing results that are more difficult to predict and improving the prediction performance of rare complications, ultimately yielding the final predicted probability of the corresponding complications.
[0015] The invention is further configured such that: in step S1, the original multimodal input is converted into a unified high-dimensional vector; the embedding layer maps discrete variables into dense vectors. , For categorical features, standardized continuous numerical features are used directly; all categorical embeddings and numerical features are concatenated and then projected through a linear layer to generate the final tabular representation. For text data, a pre-trained weight-sharing BERT model is used to extract semantic features; for each text input... ,extract Marked output This marker captures global semantic information of the text; the vector maps its dimensions to a single linear projection layer. get Finally, all text feature vectors are aggregated to form a unified text representation. ,
[0016] .
[0017] The present invention is further configured such that: in step S1, a Transformer-based deep fusion encoder utilizes a self-attention mechanism to explore complex interactions between features; the encoder's input is represented by a table. With text representation Constructed; learnable modal embeddings are added to each representation. To distinguish its source; a learnable global tag Inserted into the sequence to aggregate comprehensive information from downstream tasks; the encoder's initial input sequence. The structure is as follows:
[0018] ;
[0019] Here, represents concatenation along the sequence dimension; learnable positional embeddings are added to provide the model with relative positional information of the labels; sequence Subsequently The encoder is processed by stacked encoder modules; each module contains a multi-head self-attention layer and a feedforward network, with residual connections and layer normalization to ensure training stability; the feedforward network contains two linear layers and a GELU activation function.
[0020] ;
[0021] ;
[0022] After information exchange through the stacked encoders, the final output vector corresponding to the global marker is extracted. The vector As a highly condensed global feature representation, it integrates all modal information and reflects the patient's overall condition.
[0023] The present invention is further configured such that, in step S2, for Each prediction task is first generated through a task-specific linear layer. Task-specific representation Allows each task to form an initial task-specific feature representation; uses an Alpha generator network to dynamically generate a sample-specific task interaction matrix. :
[0024] ;
[0025] The generator is based on the original As input, the global features are first flattened and reshaped into... The matrix is then subjected to row-wise Softmax normalization. Each element in Representative task Update features from task The weights for obtaining information from the original features, and satisfying with Use the sample-specific shared matrix Update feature representation for each task Calculated as a weighted average:
[0026] .
[0027] The invention is further configured such that, to ensure stability and promote convergence early in training, the Alpha generator is initialized; the weights of its last layer are initialized to zero, and the bias term is initialized to a flattened identity matrix; ensuring that at the start of training... Approximately an identity matrix, for , and Each task primarily relies on its own characteristics; as training progresses, meaningful cross-task sharing patterns are gradually learned.
[0028] The present invention is further configured such that, in step S3, the ASL decouples the adjustment factors of positive and negative samples, allowing for flexible adjustment of their contributions, specifically defined as follows:
[0029]
[0030] in It's a real label. It is a prediction probability. , These are the focusing parameters for positive and negative samples, respectively; in ASL, negative samples are only considered when their predicted probability exceeds a threshold. Only when a task contributes to the loss does it become apparent; this allows ASL to dynamically balance the contributions of positive and negative samples in each task, mitigating bias caused by data imbalance; in the proposed multi-task learning framework, the total loss is defined as... The weighted sum of the losses from each task:
[0031]
[0032] in Indicates task Asymmetric loss, static class weights Dynamic task weights .
[0033] The present invention is further configured as follows: static class weights The weighting is set inversely proportional to the incidence of each complication, assigning higher weights to rare events to balance importance at a macro level; dynamic task weighting. i is adaptively adjusted to synchronize the learning progress of different tasks and mitigate heterogeneity; the dynamic weights are updated using a weighting mechanism based on relative loss, which prioritizes tasks with higher relative losses, indicating that these tasks converge more slowly.
[0034] The present invention is further configured to: periodically record each task average loss And calculate the standardized loss based on the distribution of losses across all tasks. Then it is mapped through the Sigmoid function. Determine the basic weighting factors The values are constrained within a reasonable range; to prevent instability caused by short-term loss fluctuations, the dynamic weights are updated using an exponential moving average (EMA). :
[0035] ;
[0036] in It is the momentum coefficient. This indicates the update cycle; after each update, the task weights are normalized to a mean of 1 and then clipped to a preset range. This mechanism prevents gradient explosion or vanishing and ensures training stability; it simultaneously addresses class imbalance within tasks and learning rate balancing between tasks, jointly optimizing the predictive performance of all complications; finally, it updates the feature vector for each task. The data is fed into its corresponding task-specific prediction head; each prediction head consists of a small MLP that outputs a single Logit value, which is then converted into the final predicted probability of the corresponding complication by a Sigmoid activation function.
[0037] In summary, this invention offers the following advantages: It employs a deep learning-based multimodal, multi-task prediction technique for the perioperative period. This involves encoding tabular data, clinical text, and intraoperative physiological time-series data separately, and performing cross-modal fusion using a Transformer-based fusion encoder. A dynamic task interaction module is introduced to model the dependencies between multiple complications, while adaptive task priority / weighted loss is used to mitigate class imbalance. This achieves high-precision, strong generalization, and a greater focus on rare and severe outcomes in risk prediction of various postoperative complications during critical preoperative and postoperative windows, thereby enhancing early warning and personalized intervention decision support capabilities. Attached Figure Description
[0038] Figure 1-2 This is a framework diagram of the multimodal, multi-task prediction method in an embodiment of the present invention;
[0039] Figure 3-10 This is the ROC curve of the internal verification of the PeriSight-Post model and comparison method in the embodiments of the present invention;
[0040] Figure 11-18 This is the ROC curve of the external verification of the PeriSight-Post model and comparison method in the embodiments of the present invention;
[0041] Figure 19-26 This is the ROC curve of the internal verification of the PeriSight-Pre model and comparison method in the embodiments of the present invention;
[0042] Figure 27-34 This is the ROC curve of the external validation of the PeriSight-Pre model and comparison method in the embodiments of the present invention;
[0043] Figures 35-38This is a graph showing the ROC curves of the PeriSight-Post and PeriSight-Pre models in the embodiments of this invention during internal and external validation. Detailed Implementation
[0044] The following is in conjunction with the appendix Figure 1-38 The present invention will be described in further detail below.
[0045] Example: A multimodal method for predicting postoperative complications in patients undergoing surgical anesthesia, such as... Figure 1 , Figure 2 As shown, it includes the following steps:
[0046] S1: Deep multimodal fusion; a modality-specific encoder transforms the data type of each data type into a shared high-dimensional latent layer representation, allowing for subsequent joint processing of heterogeneous inputs; subsequently, a Transformer-based fusion encoder integrates cross-modal information and generates a unified patient-level feature representation, extracting the final output vector corresponding to the global label. ;
[0047] S2: Dynamic task interaction; obtaining the final output vector of global labels. Then, this single feature set is used to predict multiple different outcomes; through dynamic task interactions, the interactions between tasks are adaptively modeled and predicted on a patient-by-patient basis, promoting information sharing among relevant outcomes while preserving task-specific information;
[0048] S3: Adaptive task priority; Asymmetric loss (ASL) is used for each binary classification task; The model’s focus is dynamically adjusted during training through an adaptive weighting strategy, prioritizing results that are more difficult to predict and improving the prediction performance of rare complications, ultimately yielding the final predicted probability of the corresponding complications.
[0049] In step S1, the original multimodal input is converted into a unified high-dimensional vector; the embedding layer maps discrete variables into dense vectors. , For categorical features, standardized continuous numerical features are used directly; all categorical embeddings and numerical features are concatenated and then projected through a linear layer to generate the final tabular representation. For text data, a pre-trained weight-sharing BERT model is used to extract semantic features; for each text input... ,extract Marked output , It is a specialized classification label vector that captures global semantic information of the text; this vector is mapped to its dimensions through a separate linear projection layer. get Finally, all text feature vectors are aggregated to form a unified text representation. ,
[0050] .
[0051] A Transformer-based deep fusion encoder utilizes a self-attention mechanism to explore complex interactions between features; the encoder's input is represented by a table. With text representation Constructed; learnable modal embeddings are added to each representation. To distinguish its source; a learnable global tag Inserted into the sequence to aggregate comprehensive information from downstream tasks; the encoder's initial input sequence. The structure is as follows:
[0052] ;
[0053] Here, represents concatenation along the sequence dimension; learnable positional embeddings are added to provide the model with relative positional information of the labels; sequence Subsequently The encoder is processed by stacked encoder modules; each module contains a multi-head self-attention layer and a feedforward network, with residual connections and layer normalization to ensure training stability; the feedforward network contains two linear layers and a GELU activation function.
[0054] ;
[0055] ;
[0056] After information exchange through the stacked encoders, the final output vector corresponding to the global marker is extracted. The vector As a highly condensed global feature representation, it integrates all modal information and reflects the patient's overall condition.
[0057] In step S2, for Each prediction task is first generated through a task-specific linear layer. Task-specific representation Allows each task to form an initial task-specific feature representation; uses an Alpha Generator Network (a lightweight MLP) to dynamically generate a sample-specific task interaction matrix. :
[0058] ;
[0059] The generator is based on the original As input, the global features are first flattened and reshaped into... The matrix is then subjected to row-wise Softmax normalization. Each element in Representative task Update features from task The weights for obtaining information from the original features, and satisfying with Use the sample-specific shared matrix Update feature representation for each task Calculated as a weighted average:
[0060] .
[0061] To ensure stability and promote convergence early in training, the Alpha generator is initialized; the weights of its last layer are initialized to zero, and the bias term is initialized to a flattened identity matrix; ensuring that at the start of training... Approximately an identity matrix, for , and Each task primarily relies on its own characteristics; as training progresses, meaningful cross-task sharing patterns are gradually learned.
[0062] In step S3, ASL decouples the adjustment factors for positive and negative samples, allowing for flexible adjustment of their contributions, as specifically defined below:
[0063]
[0064] in It's a real label. It is a prediction probability. , These are the focusing parameters for positive and negative samples, respectively; in ASL, negative samples are only considered when their predicted probability exceeds a threshold. Only when a task contributes to the loss does it become apparent; this allows ASL to dynamically balance the contributions of positive and negative samples in each task, mitigating bias caused by data imbalance; in the proposed multi-task learning framework, the total loss is defined as... The weighted sum of the losses from each task:
[0065]
[0066] in Indicates task Asymmetric loss, static class weights Dynamic task weights .
[0067] Static class weights The weighting is set inversely proportional to the incidence of each complication, assigning higher weights to rare events to balance importance at a macro level; dynamic task weighting. i is adaptively adjusted to synchronize the learning progress of different tasks and mitigate heterogeneity; the dynamic weights are updated using a weighting mechanism based on relative loss, which prioritizes tasks with higher relative losses, indicating that these tasks converge more slowly.
[0068] Record each task regularly average loss And calculate the standardized loss based on the distribution of losses across all tasks. Then it is mapped through the Sigmoid function. Determine the basic weighting factors The values are constrained within a reasonable range; to prevent instability caused by short-term loss fluctuations, the dynamic weights are updated using an exponential moving average (EMA). :
[0069] ;
[0070] in It is the momentum coefficient. This indicates the update cycle; after each update, the task weights are normalized to a mean of 1 and then clipped to a preset range. This mechanism prevents gradient explosion or vanishing and ensures training stability; it simultaneously addresses class imbalance within tasks and learning rate balancing between tasks, jointly optimizing the predictive performance of all complications; finally, it updates the feature vector for each task. The input is fed into its corresponding task-specific prediction head; each prediction head consists of a small MLP that outputs a single Logit value, which is then converted into the final predicted probability of the corresponding complication by a Sigmoid activation function. That is, each task head is a two-layer network containing GELU activation and Dropout regularization.
[0071] To support realistic clinical decision-making, two predictive models were developed based on the above approach: a preoperative model that assesses complication risk using only baseline information before surgery; and a postoperative model that updates the risk assessment using all available perioperative data up to the end of surgery. This design aligns with critical clinical decision windows, maximizing the usability of the predictive tool.
[0072] like Figure 3-38 As shown, the proposed PeriSight framework exhibits robust discriminative and calibrated performance in both internal and external validation. Figure 3-18As shown, the PeriSight-Post model achieved a mean AUC of 0.882 (95% CI: 0.847–0.912) in the internal validation set and 0.866 (0.815–0.911) in the validation cohort, demonstrating strong generalization ability despite significant differences in patient characteristics between cohorts. In contrast, the seven comparative models generally had lower mean AUCs in the validation cohort, ranging from 0.741 (0.602–0.859) for logistic regression (LR) to 0.843 (0.782–0.894) for TabPFN. Although predictive performance varied across postoperative complications, most outcomes maintained high performance. For example, with the PeriSight-Post model, the AUC for specific outcomes in the validation cohort ranged from 0.753 (0.614–0.867) for SSI (surgical site infection) to 0.971 (0.964–0.977) for unplanned ICU transfer. The model maintained strong discrimination across composite endpoints, with an AUC of 0.821 (0.792–0.851) for any EPCO outcome and 0.868 (0.841–0.895) for PPCs (postoperative pulmonary complications). Among single-organ complications, pneumonia had an AUC of 0.883 (0.844–0.920) and ARDS (acute respiratory distress syndrome) had an AUC of 0.934 (0.882–0.975), but AKI (acute kidney injury) and SSI had lower AUCs of 0.782 (0.729–0.832) and 0.753 (0.614, 0.867), respectively. The model also showed excellent discrimination in ICU resource utilization, with an AUC of 0.971 (0.964–0.977) for ICU stays of ≥48 hours and an AUC of 0.920 (0.851–0.969) for unplanned transfers to the ICU.
[0073] like Figure 19-34 As shown, the PeriSight-Pre model, using only preoperative data, exhibits lower discrimination than the PeriSight-Post model; however, it still demonstrates strong predictive ability, with a mean AUC of 0.836 (0.769–0.901) and outcome-specific AUCs ranging from 0.685 (0.505–0.848) to 0.916 (0.865–0.958). Across multiple prediction tasks, the mean sensitivity and specificity were 0.880 (0.675–0.939) and 0.772 (0.758–0.785), respectively, indicating good predictive ability. The mean Brier score was as low as 0.051 (0.050–0.053), confirming a good agreement between predicted probabilities and observed event occurrence rates.
[0074] like Figures 35-38As shown, the ROC curves of the PeriSight-Post and PeriSight-Pre models in internal and external validation are presented, illustrating the ROC curves and corresponding AUC (area under the curve) values of the PeriSight-Post (postoperative model) and PeriSight-Pre (preoperative model) in internal and external validation sets. The graphs comprehensively evaluate the model's predictive ability for the overall outcome of any postoperative EPCO outcome and seven specific clinical outcomes. The data demonstrate that the model provided by this invention not only possesses extremely high predictive accuracy but also exhibits excellent generalization ability in cross-center / cross-dataset external validation. In internal validation, the average AUC of PeriSight-Post reached 0.882, higher than PeriSight-Pre's 0.844; in external validation, the average AUC of PeriSight-Post was 0.866, also leading PeriSight-Pre's 0.836.
[0075] This specific embodiment is merely an explanation of the present invention and is not intended to limit the invention. After reading this specification, those skilled in the art can make modifications to this embodiment without contributing any inventive step, but such modifications are protected by patent law as long as they are within the scope of the claims of the present invention.
Claims
1. A multimodal method for predicting postoperative complications in patients undergoing surgical anesthesia, characterized by: Includes the following steps: S1: Deep multimodal fusion; Modality-specific encoders convert the data type of each type of data into a shared high-dimensional latent layer representation, allowing for subsequent joint processing of heterogeneous inputs; Subsequently, a Transformer-based fusion encoder is used to integrate cross-modal information and generate a unified patient-level feature representation, and the final output vector corresponding to the global label is extracted. ; S2: Dynamic task interaction; Obtain the final output vector of the global label. Then, this single feature set is used to predict multiple different outcomes; Through dynamic task interaction, we adaptively model and predict the interaction between tasks on a patient-by-patient basis, which promotes information sharing among relevant outcomes while preserving task-specific information. S3: Adaptive task priority; Asymmetric loss (ASL) is used for each binary classification task; The model’s focus is dynamically adjusted during training through an adaptive weighting strategy, prioritizing results that are more difficult to predict and improving the prediction performance of rare complications, ultimately yielding the final predicted probability of the corresponding complications.
2. The method for predicting postoperative complications in surgical anesthesia patients based on multimodal methods according to claim 1, characterized in that, In step S1, the original multimodal input is converted into a unified high-dimensional vector; the embedding layer maps discrete variables into dense vectors. , For classification features; The standardized continuous numerical features are used directly; all categorical embeddings and numerical features are concatenated and then projected through a linear layer to generate the final tabular representation. ; For text data, a pre-trained weight-sharing BERT model is used to extract semantic features; for each text input... ,extract Marked output This marker captures global semantic information of the text; the vector maps its dimensions to a single linear projection layer. get Finally, all text feature vectors are aggregated to form a unified text representation. , 。 3. The method for predicting postoperative complications in surgical anesthesia patients based on multimodal methods according to claim 1, characterized in that, In step S1, the Transformer-based deep fusion encoder utilizes a self-attention mechanism to explore complex interactions between features; the encoder's input is represented by a table. With text representation Constructed; learnable modal embeddings are added to each representation. To distinguish their origin; A learnable global tag Inserted into the sequence to aggregate comprehensive information from downstream tasks; the encoder's initial input sequence. The structure is as follows: ; Here, represents concatenation along the sequence dimension; learnable positional embeddings are added to provide the model with relative positional information of the labels; sequence Subsequently The encoder is processed by stacked encoder modules; each module contains a multi-head self-attention layer and a feedforward network, with residual connections and layer normalization to ensure training stability; the feedforward network contains two linear layers and a GELU activation function. ; ; After information exchange through the stacked encoders, the final output vector corresponding to the global marker is extracted. The vector As a highly condensed global feature representation, it integrates all modal information and reflects the patient's overall condition.
4. The method for predicting postoperative complications in surgical anesthesia patients based on multimodal methods according to claim 1, characterized in that, In step S2, for Each prediction task is first generated through a task-specific linear layer. Task-specific representation Allows each task to form an initial task-specific feature representation; uses an Alpha generator network to dynamically generate a sample-specific task interaction matrix. : ; The generator is based on the original As input, the global features are first flattened and reshaped into... The matrix is then subjected to row-wise Softmax normalization. Each element in Representative task Update features from task The weights for obtaining information from the original features, and satisfying with Use the sample-specific shared matrix Update feature representation for each task Calculated as a weighted average: 。 5. The method for predicting postoperative complications in surgical anesthesia patients based on multimodal methods according to claim 4, characterized in that, To ensure stability and promote convergence early in training, the Alpha generator is initialized; the weights of its last layer are initialized to zero, and the bias term is initialized to a flattened identity matrix. Make sure at the start of training Approximately an identity matrix, for , and This makes each task primarily dependent on its own characteristics. As training progresses, meaningful cross-task sharing patterns are gradually learned.
6. The method for predicting postoperative complications in surgical anesthesia patients based on multimodal methods according to claim 1, characterized in that, In step S3, ASL decouples the adjustment factors for positive and negative samples, allowing for flexible adjustment of their contributions, as specifically defined below: in It's a real label. It is a prediction probability. , These are the focusing parameters for positive and negative samples, respectively; in ASL, negative samples are only considered when their predicted probability exceeds a threshold. Only when a task contributes to the loss does it become apparent; this allows ASL to dynamically balance the contributions of positive and negative samples in each task, mitigating bias caused by data imbalance; in the proposed multi-task learning framework, the total loss is defined as... The weighted sum of the losses from each task: in Indicates task Asymmetric loss, static class weights Dynamic task weights .
7. The method for predicting postoperative complications in surgical anesthesia patients based on multimodal methods according to claim 6, characterized in that, Static class weights The weighting is set inversely proportional to the incidence of each complication, assigning higher weights to rare events to balance importance at a macro level; dynamic task weighting. , Adaptive adjustments are made to synchronize the learning progress of different tasks and mitigate heterogeneity; dynamic weights are updated using a weighting mechanism based on relative loss, which prioritizes tasks with higher relative losses, indicating that these tasks converge more slowly.
8. The method for predicting postoperative complications in surgical anesthesia patients based on multimodal methods according to claim 7, characterized in that, Record each task regularly average loss And calculate the standardized loss based on the distribution of losses across all tasks. Then it is mapped through the Sigmoid function. Determine the basic weighting factors The values are constrained within a reasonable range; to prevent instability caused by short-term loss fluctuations, the dynamic weights are updated using an exponential moving average (EMA). : ; in It is the momentum coefficient. This indicates the update cycle; after each update, the task weights are normalized to a mean of 1 and then clipped to a preset range. This is to prevent gradient explosion or vanishing and to ensure training stability; This mechanism simultaneously addresses class imbalance within tasks and learning rate balancing between tasks, jointly optimizing the predictive performance of all complications; finally, the updated feature vector for each task... The data is fed into its corresponding task-specific prediction head; each prediction head consists of a small MLP that outputs a single Logit value, which is then converted into the final predicted probability of the corresponding complication by a Sigmoid activation function.