Multi-agent multi-modal collaboration method and apparatus for intensive care unit patient state diagnosis
By employing a multi-agent, multi-modal collaborative approach and utilizing modality detection and dynamic knowledge graph construction techniques, the challenge of cross-modal data processing in the intensive care unit (ICU) was solved. This enabled highly accurate and interpretable diagnosis of ICU patient status, improving the transparency and reliability of the diagnosis.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- ZHEJIANG UNIV
- Filing Date
- 2026-03-25
- Publication Date
- 2026-06-26
AI Technical Summary
Existing AI-assisted medical systems are unable to effectively process cross-modal data in intensive care units, lack transparent and traceable reasoning paths, and rely on static external knowledge bases that cannot adapt to the dynamic and individualized conditions of patients, resulting in insufficient diagnostic accuracy and interpretability.
A multi-agent, multi-modal collaborative approach is adopted. Modality detection agents identify data types, domain expert agents extract features, a dynamic medical knowledge graph is constructed, and joint reasoning is performed through a large visual-language model to generate transparent and traceable diagnostic results.
It enables adaptive processing of multimodal data in the intensive care unit, dynamically constructs patient-specific knowledge graphs, improves the accuracy and interpretability of diagnoses, meets the audit requirements of high-risk decisions, and enhances the accuracy and reliability of diagnosing the condition of patients in the intensive care unit.
Smart Images

Figure FT_1 
Figure FT_2 
Figure FT_3
Abstract
Description
Technical Field
[0001] This invention relates to the field of medical artificial intelligence and intelligent analysis technology in intensive care, and more specifically, to a method and apparatus for processing complex multimodal data in intensive care units and using multi-agent collaboration and graph augmentation reasoning mechanisms to diagnose the condition and assess the prognosis of critically ill patients. Background Technology
[0002] In the extremely high-pressure clinical environment of the Intensive Care Unit (ICU), patients' life timelines and pathological states change rapidly. Critical care physicians must quickly integrate multiple highly heterogeneous data modalities under intense pressure to make accurate judgments. This data includes not only high-frequency life timelines (such as heart rate, blood pressure, and blood oxygen saturation) and daily bedside medical imaging (such as chest X-rays), but also detailed clinical rounds notes and electronic medical records. This multi-layered decision-making process demands extremely high logical reasoning abilities and must fully consider the potential cascade causal relationships between anatomical findings, physiological biomarkers, and symptoms.
[0003] Existing AI-assisted medical systems exhibit three critical limitations when dealing with intensive care unit (ICU) environments: First, most existing methods cannot handle deep physiological collaboration across modalities. They often operate in isolation, separating life-time series monitoring, imaging analysis, and text processing, or merely performing simple feature stitching at the end. In the ICU, for example, blood oxygen saturation (… Slight fluctuations in vital signs, low-grade fever recorded in medical records, and early pulmonary infiltration on imaging, while having very low diagnostic value when extracted individually, are crucial for identifying septic shock or acute respiratory distress syndrome when interpreted in combination. Secondly, existing multimodal large-scale model frameworks mostly operate as "black boxes." In life-or-death decision-making scenarios like the intensive care unit, AI systems lacking transparency and traceable reasoning paths simply cannot gain the clinical trust of doctors, making them difficult to implement. Finally, existing knowledge augmentation systems rely excessively on static external ontology knowledge bases (such as UMLS), which are ineffective in the intensive care unit. The clinical status of ICU patients is highly dynamic and individualized; complex entity relationships, such as complication chains and drug resistance reactions, must be dynamically and instantly synthesized from the patient's current multimodal evidence.
[0004] Therefore, existing technologies urgently need an intelligent collaborative diagnostic framework that can adaptively process heterogeneous multimodal data from intensive care units, dynamically construct patient-specific temporal knowledge graphs, and provide a transparent and traceable chain of evidence. Summary of the Invention
[0005] The present invention aims to overcome the above-mentioned shortcomings of the prior art and provide a multi-agent multimodal collaborative method and device for diagnosing the condition of patients in the intensive care unit.
[0006] A first aspect of the present invention provides a multi-agent, multimodal collaborative method for diagnosing the condition of patients in the intensive care unit, comprising: S1. Receive a multimodal clinical medical dataset obtained from an intensive care information system, the dataset including bedside medical images, intensive care clinical record text, and continuously monitored life time series data; S2. Extracting multimodal clinical features: The modality detection agent identifies the file type of the dataset and performs multi-directional routing, and the corresponding domain expert agents extract the clinical features from the multimodal medical dataset respectively; S3. Construct a dynamic medical knowledge graph: Input the clinical features into the knowledge graph agent, and classify them according to a predefined type classification method. Control Relationship Pattern Library Under the constraints, a dynamic medical knowledge graph specific to the patient is constructed through a two-stage generation process; S4. Extract global graph representation: Serialize the dynamic medical knowledge graph into structured text and fuse it with the original multimodal medical dataset to generate a global graph representation; S5. Perform joint diagnostic reasoning: Input the global graph representation into the collaborative agent, and use the visual language large model to perform graph-enhanced joint reasoning to obtain medical diagnostic results.
[0007] Step S2, which involves extracting multimodal clinical features, includes: S21. Each data file in the intensive care unit patient data warehouse is classified into one of the preset modal types through feature signature analysis, wherein the modal type includes at least images, text and time series; S22. If it is determined that image data exists, activate the image agent to extract visual discovery features; S23. If text data is determined to exist, activate the text agent to parse the symptoms, medical history, and diagnostic features in the clinical record; S24. If time series data is determined to exist, activate the time series agent to analyze the abnormal pattern features in the time series.
[0008] Step S3, which involves constructing a dynamic medical knowledge graph, includes: S31. Perform entity extraction: in type classification Under the constraints, clinically relevant entities are identified as shown in formula (1):
[0009] in, Represents a set of entities. Indicates the first The entity mentioned, This indicates a classification based on predefined types. The semantic type selected in This indicates a specific type of attribute being captured; S32. Perform relational reasoning: In the control relation schema library The clinical relationship between entity pairs is inferred under the constraints, as shown in Equation (2):
[0010] in, Represents a set of relations. This indicates the use of a predefined control relationship schema library. The relationship type selected in the middle, This indicates metadata containing relational strength and clinical significance.
[0011] Step S4, which involves extracting the global graph representation, includes: S41. For the entity set in the dynamic medical knowledge graph Perform attribute extraction to generate entity description fragments that include entity mentions, semantic type, and severity attributes; S42. Based on the set of relations The logical relationships between entities are transformed into structured text of “subject-clinical relationship-object”, and metadata containing clinical significance is attached. S43. Perform topological sorting of the entity description fragments and structured text according to the pathological evolution order to generate a global graph representation. .
[0012] Step S5, which involves performing joint diagnostic reasoning, includes: The collaborative agent invokes the visual language large model to execute the inference program, as shown in formula (3):
[0013] in, The final medical diagnosis result is output. The reasoning process representing a large model of visual language. This represents the task instruction. Representing the medical image, This represents the structured electronic health record data. Representing the aforementioned clinical record, This represents the structured text of the serialized dynamic medical knowledge graph. This represents the additional context.
[0014] A second aspect of this application provides a multi-agent, multimodal collaborative device for diagnosing the condition of patients in the intensive care unit, comprising: The receiving unit is used to receive multimodal critical care medical datasets acquired by the intensive care unit monitoring system; An adaptive perception unit is used to identify data types through a modality detection agent and route them to the corresponding domain expert agent to extract multimodal clinical features; The dynamic knowledge graph construction unit is used to construct a patient-specific dynamic medical knowledge graph based on the clinical characteristics and a predefined pattern. The graph representation extraction unit is used to map the dynamic medical knowledge graph into a structured prompt sequence that can be understood by a large language model; The collaborative reasoning unit is used to serialize the dynamic medical knowledge graph and combine it with the original data input into the visual language big model to output the final diagnostic result.
[0015] The working principle of this invention is: The method used in this invention simulates a multidisciplinary consultation process in an intensive care unit (ICU) involving attending physicians, radiologists, and a critical care nursing team. The system utilizes a modality detection agent as a global data gateway to perform feature signature analysis on the input complex file stream, accurately identifying image, text, or vital sign data, and precisely routing it to the corresponding domain expert agents (such as image agents, text agents, and time-series agents). This mechanism can adaptively handle the common data gap problem in ICU scenarios, keeping expert agents without data input dormant. After the domain expert agents extract high-fidelity single-modal clinical features in parallel, the knowledge graph agent abandons traditional static external knowledge base retrieval and adopts a strict two-stage, pattern-controlled generation mechanism. The first stage is entity extraction constrained by type classification, ensuring that only clinically significant entities are extracted; the second stage is relational reasoning constrained by relational patterns, inferring medical connections between entities based on contextual evidence. Through this mechanism, the system dynamically integrates fragmented cross-modal features into a patient-centric knowledge graph representing the current clinical state of the patient. Building upon this foundation, the collaborative intelligent agent, acting as the central decision-making hub of the system, serializes the constructed dynamic knowledge graph into structured text and inputs it along with the original multimodal data into the visual-language large model. By traversing the structured relationship paths within the graph, the system can not only integrate multi-party evidence to generate the final state diagnosis prediction, but also output a clear and traceable chain of reasoning evidence that is cross-modal verified.
[0016] The innovation of this invention is: First, this invention proposes a robust multi-agent collaborative architecture for the intensive care unit (ICU) scenario. Through an adaptive modal routing mechanism, the system is naturally immune to data modal loss caused by instrument malfunctions or unexecuted examinations in the ICU, ensuring the continuity of diagnostic reasoning.
[0017] Secondly, this invention realizes a dynamic atlas-enhanced clinical reasoning mechanism for critical illnesses. Instead of rigidly matching standard medical knowledge, it instantiates in real time the unique network of interactions between complications and life time series for each patient in a period of rapid deterioration or fluctuation.
[0018] Third, this invention greatly enhances the medical interpretability of critical care decision-making. Through cross-modal relational path modeling, the system transforms complex large-scale model reasoning into a clear chain of evidence similar to a doctor's clinical thinking (e.g., hypertension leading to heart failure with preserved ejection fraction, which in turn manifests as severe cardiac hypertrophy and pleural effusion), meeting the auditing requirements for high-risk decisions in the intensive care unit.
[0019] Fourth, the present invention maintains a good balance among various evaluation indicators (accuracy, AUC, sensitivity, specificity, precision, F1 score), avoiding the serious imbalance problem among indicators common in existing methods, and verifying the robustness of the proposed method.
[0020] Compared with the prior art, the above-described solution of this application has at least the following beneficial effects: The method proposed in this application extracts the pathological and physiological features of multi-source clinical data separately and integrates cross-modal associations using a dynamic knowledge graph construction strategy to generate a structured patient-centric global graph representation. This method not only captures the local features and temporal dynamic changes within a single data modality (such as medical images, clinical text, and vital signs), but also effectively breaks down the data barriers caused by the isolation of monitoring equipment in the intensive care unit through entity extraction and relational reasoning. This allows for a more comprehensive and accurate representation of the complex and changing clinical status of critically ill patients, improving the accuracy of intensive care unit patient status diagnosis and prognostic assessment. Furthermore, the dynamic graph representation with causal and temporal semantics generated by this invention can significantly enhance the logical reasoning ability of large language models in downstream complex decision-making, providing a transparent and traceable path for discovering medical evidence chains for high-risk clinical diagnostic tasks in intensive care. Attached Figure Description
[0021] Figure 1 A flowchart illustrating a multi-agent, multimodal collaborative method for diagnosing the condition of patients in the intensive care unit, according to an embodiment of this application, is shown. Figure 2 A unit block diagram of a multi-agent, multimodal collaborative device for diagnosing the condition of patients in the intensive care unit, according to an embodiment of this application, is shown. Figure 3 This is a block diagram of an electronic device for implementing multimodal intensive care unit monitoring and diagnosis, according to an exemplary embodiment. Detailed Implementation
[0022] To make the objectives, technical solutions, and advantages of this application clearer, the application will be further described in detail below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0023] The terminology used in the embodiments of this application is for the purpose of describing particular embodiments only and is not intended to limit the application. The singular forms “a,” “said,” and “the” used in the embodiments of this application and the appended claims are also intended to include the plural forms, and “multiple” generally includes at least two unless the context clearly indicates otherwise.
[0024] It should be understood that the term "and / or" used in this article is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, and B existing alone. Additionally, the character " / " in this article generally indicates that the preceding and following related objects have an "or" relationship.
[0025] It should be understood that although the terms first, second, third, etc., may be used in the embodiments of this application, these descriptions should not be limited to these terms. These terms are only used to distinguish the descriptions. For example, first may also be referred to as second without departing from the scope of the embodiments of this application, and similarly, second may also be referred to as first.
[0026] Depending on the context, the words “if” or “suppose” as used here can be interpreted as “when” or “in response to determination” or “in response to detection.” Similarly, depending on the context, the phrases “if determination” or “if detection (of the stated condition or event)” can be interpreted as “when determination” or “in response to determination” or “when detection (of the stated condition or event)” or “in response to detection (of the stated condition or event).”
[0027] It should also be noted that the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that an article or device that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such an article or device. Without further limitation, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the article or device that includes said element.
[0028] It should be noted that any symbols and / or numbers present in the specification that are not marked in the accompanying drawings are not reference numerals.
[0029] Patient status diagnosis in the intensive care unit (ICU) is of paramount clinical significance in actual critical care and life support management, especially when facing a massive amount of multimodal monitoring equipment for real-time monitoring of vital signs and pathological status. Multimodal signals in the ICU (such as bedside medical images, clinical records, and continuous vital signs) reflect the dynamic evolution of different organ systems and physiological dimensions in critically ill patients. Their pathological characteristics not only change rapidly with the progression of the disease but are also profoundly influenced by cascading complications, treatment interventions, and the patient's individual medical history.
[0030] Current clinical diagnostic aids typically employ large visual-language models (VLMs) or independent deep learning models, directly extracting features from isolated medical data sources. However, these methods often focus only on local features within a single modality or simply perform feature concatenation at the model's end, neglecting the deep synchronization and complementary information between different clinical modalities during pathological progression. In the complex disease evolution within the intensive care unit, there are clear physiological transmissions and causal relationships between various clinical indicators (e.g., pulmonary infiltration in imaging, fever in clinical records, and a high degree of synergy between these clinical signs and decreased blood oxygen saturation). If these cross-modal correlations are ignored, diagnostic results are highly susceptible to interference from missing data from single devices, signal noise, or sporadic fluctuations in vital signs.
[0031] Furthermore, due to the rapidly changing conditions and extreme individual heterogeneity of patients in the intensive care unit, traditional knowledge augmentation systems often rely on static external medical ontology libraries (such as UMLS). These static graphs struggle to model patient-specific complex relationships that must be dynamically inferred from the patient's current multimodal evidence. When faced with highly dynamic physiological differences and the evolution of complications, traditional "black box" models lack transparent reasoning paths, leading to a decline in predictive performance under complex conditions and difficulty in gaining the trust of clinicians in life-or-death critical care decision-making scenarios, resulting in severely insufficient generalization capabilities.
[0032] Based on the above, this application constructs a multi-agent multimodal collaborative framework to deeply model the multidimensional structure and logical reasoning information of critical care data in the intensive care unit: (1) Within each single data modality, high-fidelity pathological and physiological features are extracted using the corresponding domain expert agent; (2) Between cross-modal data, a dynamic clinical knowledge graph exclusive to the patient is constructed through entity extraction and relational reasoning to model the spatial and temporal causal correlation between different clinical findings; (3) At the same time, considering the interpretability requirements of clinical decision-making, the original multimodal tensors and structured graph reasoning paths are unified and integrated through a collaborative large model.
[0033] By incorporating the internal features of a single modality, the deep pathological associations across modalities, and the transparent and traceable chain of evidence into a graph-enhanced collaborative reasoning model, a more stable, robust, and clinically interpretable comprehensive medical representation can be obtained. This significantly improves the accuracy, error tolerance, and clinical generalization ability of complex diagnoses of intensive care unit patients (such as in-hospital mortality prediction and long-stay prediction) in extremely high-pressure critical care environments.
[0034] The optional embodiments of this application are described in detail below with reference to the accompanying drawings.
[0035] Example 1
[0036] This embodiment relates to a multi-agent, multi-modal collaborative method for diagnosing the condition of patients in the intensive care unit (ICU), aiming to solve the problem of fusion and diagnostic reasoning of high-dimensional heterogeneous data in the ICU. As the core location for the centralized treatment of critically ill patients within a hospital, the ICU generates massive amounts of multi-source, heterogeneous medical data daily. This data covers multiple dimensions, including imaging examinations, clinical records, and physiological parameter monitoring. Traditional single-modal analysis methods struggle to fully explore the inherent correlations and synergistic value between these data. To address these challenges, this method systematically divides the clinical diagnostic process into four core stages: perceptual routing, feature extraction, graph construction, and joint reasoning. It achieves deep integration of cross-modal information and intelligent decision-making through a multi-agent collaborative mechanism. The specific execution steps are as follows: Step S1: Receive the multimodal clinical medical dataset obtained from the intensive care information system.
[0037] This step is used to acquire the system's raw input data, which forms the data foundation and starting point for the entire diagnostic process. Specifically, the system connects to the intensive care unit (ICU) information system through a standardized data interface, establishing a stable and reliable data transmission channel to receive multimodal clinical medical data from the ICU ward in real time or in batches. The received datasets exhibit significant heterogeneity, covering three core data modalities: The first is bedside medical image data, mainly including visual diagnostic materials such as chest X-rays, CT scans, and ultrasound images. These images can intuitively reflect the morphological changes and pathological characteristics of the patient's organs. The second is ICU clinical record text data, covering unstructured text information such as doctors' ward round notes, nursing records, consultation opinions, discharge summaries, and progress notes. These texts contain the clinicians' professional judgments and treatment strategies regarding the patient's condition. The third is continuously monitored vital signs time-series data, including dynamic change curves of physiological parameters such as heart rate, blood pressure, blood oxygen saturation, respiratory rate, and body temperature. This time-series data can accurately depict the evolution trend and abnormal fluctuations of the patient's vital signs. The three types of data mentioned above complement and corroborate each other, together forming the information foundation for a comprehensive assessment of the health status of patients in the intensive care unit.
[0038] Step S2: Extract multimodal clinical features.
[0039] This step aims to address the heterogeneity issue of multi-source data by transforming the original heterogeneous data into a unified semantic feature representation through intelligent modality recognition and distributed feature extraction mechanisms. The entire feature extraction process adopts a three-layer architecture of "perception-routing-processing". First, a modality detection agent performs a comprehensive scan and type identification of the input dataset. This agent uses feature signature analysis technology to accurately classify each data file in the ICU patient data warehouse into one of the preset modality types by parsing metadata attributes such as file header information, data structure features, and encoding formats. The modality types include at least three major categories: image, text, and time series, thereby achieving automated data classification and intelligent routing.
[0040] After completing modality recognition, the system dynamically activates the corresponding domain expert agent based on the data type to perform targeted feature extraction tasks. When image data is detected, the system activates the image agent, which uses advanced computer vision models such as deep convolutional neural networks or visual Transformers to perform multi-level semantic analysis of medical images, specifically extracting visual features such as lesion location, tissue morphology characteristics, and imaging signs. These features objectively reflect the patient's organic lesions. When text data is detected, the system activates the text agent, which uses natural language processing technology and a medical knowledge base to deeply analyze key information contained in clinical records, such as symptom descriptions, past medical history, family history, medication records, and diagnostic impressions, extracting clinically significant diagnostic features and semantic entities. When time-series data is detected, the system activates the time-series agent, which uses time-series analysis algorithms and anomaly detection models to deeply mine the dynamic changes in physiological parameters, identifying and extracting abnormal patterns such as arrhythmias, sudden changes in blood pressure, and decreased blood oxygenation, while simultaneously capturing the periodic patterns and trends of vital signs. Through the aforementioned distributed parallel processing mechanism, expert agents from various fields can fully leverage their respective professional advantages to achieve efficient feature extraction from multimodal data.
[0041] Step S3: Construct a dynamic medical knowledge graph.
[0042] This step is crucial for achieving cross-modal evidence fusion, aiming to integrate the discrete clinical features extracted in the preceding steps into a structured, reasonable knowledge representation. The system aggregates the multimodal clinical features output by expert agents from various domains and inputs them into a knowledge graph agent. Under the dual constraints of a predefined type classification method and a control relation pattern library, this agent constructs a patient-specific dynamic medical knowledge graph through a rigorous two-stage generation process. This graph organizes clinical evidence in a graph structure, clearly demonstrating the semantic relationships and causal logic between various medical concepts.
[0043] In the first stage, the entity extraction task is performed. Under the constraint of the type classification method T, the knowledge graph agent performs fine-grained semantic analysis on the input features, identifies and extracts clinically significant medical entities. The type classification method predefines standardized entity categories such as symptoms, diseases, drugs, examination items, anatomical locations, and physiological indicators. This constraint mechanism can effectively filter out medical noise information that is irrelevant to diagnosis, ensuring the clinical relevance and standardization of the extracted entities. The entity extraction logic is shown in formula (4):
[0044] in, Represents a set of entities. Indicates the first The entity mentioned, This indicates a classification based on predefined types. The semantic type selected in This indicates a specific type of attribute being captured; In the second stage, the knowledge graph agent performs relational reasoning. Under the constraints of the control relational pattern library S, it comprehensively analyzes the semantic relationships between entities and, combined with clinical evidence and medical common sense, infers the clinical relationships between entity pairs. The control relational pattern library predefines standardized relational types such as "cause," "manifest as," "treatment," "contraindication," "concurrence," and "located in" to ensure the medical rationality and logical consistency of the inferred relationships. The relational reasoning process is shown in formula (5):
[0045] in, Represents a set of relations. This indicates the use of a predefined control relationship schema library. The relationship type selected in the middle, This indicates metadata containing relational strength and clinical significance.
[0046] Step S4: Extract global graph representation.
[0047] To enable downstream large language models to effectively parse and utilize graph-structured data, this step transforms the dynamic medical knowledge graph into a representation suitable for neural network processing. Since large language models primarily input sequential text data, while knowledge graphs are essentially a graph-structured data organization, there are differences in their representational forms. To bridge this gap, this step employs graph serialization technology to systematically transform the dynamic medical knowledge graph into a structured text description, and then deeply integrates it with the original multimodal medical dataset, ultimately generating a semantically rich and logically clear global graph representation. .
[0048] Specifically, the graph serialization process includes three progressive processing steps. First, the system processes the entity set in the dynamic medical knowledge graph. Comprehensive attribute extraction and text transformation are performed. For each entity node, core attributes such as entity mention, semantic type, and severity are extracted, and standardized entity description fragments are generated according to predefined templates. These fragments fully preserve the semantic information and clinical attributes of the entity in natural language. Subsequently, the system performs text transformation based on the relation set. The edge structure in the graph is textualized, transforming the logical relationships between entities into a structured textual representation of "subject-clinical relationship-object" triplets. Metadata such as relationship strength, confidence level, and clinical significance are embedded into the text as additional annotations to ensure the complete transmission of relational semantics. Finally, considering the pathological evolution and causal transmission logic of the disease, the system performs topological sorting on the entity description fragments and the structured relational text, reorganizing the text fragments according to chronological order and causal dependencies to generate a logically consistent and hierarchically distinct global graph representation. This representation retains the structured semantic information of the knowledge graph while conforming to the input format requirements of large language models, laying a solid foundation for subsequent joint reasoning.
[0049] Step S5: Perform joint diagnostic reasoning.
[0050] This step, as the final stage of the entire diagnostic process, is responsible for integrating the processing results from the preceding stages. Leveraging the powerful reasoning capabilities of the visual language big data model, it outputs a comprehensive medical diagnostic result. The system inputs the global graph representation generated in the previous step, along with the original multimodal data, into the collaborative agent. This agent, acting as the central coordinator of the multi-agent system, is responsible for coordinating and scheduling various information resources and invoking the visual language big data model to perform graph-enhanced joint reasoning tasks. With its cross-modal understanding capabilities and medical knowledge reserves, the visual language big data model can simultaneously process visual image information and textual semantic information, achieving deep fusion and collaborative analysis of multi-source evidence.
[0051] When the collaborative agent executes the reasoning process, it constructs a multi-dimensional input context, sequentially piecing together task instructions, medical images, structured electronic health records, clinical text records, serialized knowledge graphs, and additional contextual information to form a complete reasoning input. After receiving this input, the visual-language big data model aggregates cross-modal information and comprehensively weighs various clinical evidence through its internal attention mechanism and reasoning module, ultimately generating a diagnostic conclusion. The reasoning process is shown in formula (6):
[0052] in, The final medical diagnosis result is output. The reasoning process representing a large model of visual language. This represents the task instruction. Representing the medical image, This represents the structured electronic health record data. Representing the aforementioned clinical record, This represents the structured text of the serialized dynamic medical knowledge graph. This represents the additional context.
[0053] Through the aforementioned joint reasoning mechanism, the system can fully leverage the complementary advantages of multimodal data to output accurate, comprehensive, and interpretable medical diagnostic results, providing strong support for clinical decision-making in the intensive care unit.
[0054] Example 2
[0055] This embodiment provides a multi-agent, multimodal collaborative device for diagnosing the condition of patients in the intensive care unit. The device can be implemented through a combination of hardware and software. It includes multiple functional modules that interact with each other via data to complete the auxiliary diagnostic process. This device can be deployed on a server, a hospital information system terminal, or other electronic devices with computing capabilities.
[0056] This application also provides apparatus embodiments corresponding to the above embodiments, used to implement the above method steps. The functions of each unit or module in the apparatus are the same as those in the corresponding method embodiments, and the same technical effects can be achieved, which will not be repeated here.
[0057] like Figure 2 As shown, this application provides a multi-agent, multimodal collaborative device 200 for diagnosing the condition of patients in the intensive care unit, comprising: The receiving unit 201 is used to receive multimodal critical care medical datasets acquired by the intensive care unit monitoring system; The adaptive perception unit 202 is used to identify data types through a modality detection agent and route them to the corresponding domain expert agent to extract multimodal clinical features; The dynamic knowledge graph construction unit 203 is used to construct a patient-specific dynamic medical knowledge graph based on the clinical characteristics and a predefined pattern. The graph representation extraction unit 204 is used to map the dynamic medical knowledge graph into a structured prompt sequence that can be understood by a large language model; The collaborative reasoning unit 205 is used to serialize the dynamic medical knowledge graph and combine it with the original data input to the visual language big model to output the final diagnostic result.
[0058] Figure 3 This is a block diagram of an electronic device 300 for diagnosing the condition of patients in an intensive care unit, illustrating a multi-agent, multimodal collaboration according to an exemplary embodiment.
[0059] like Figure 3As shown, one embodiment of this application provides an electronic device 300. The electronic device 300 includes a memory 301, a processor 302, and an input / output (I / O) interface 303. The memory 301 is used to store instructions. The processor 302 is used to execute the methods of the embodiments of this application by calling the instructions stored in the memory 301. The processor 302 is connected to both the memory 301 and the I / O interface 303, for example, via a bus system and / or other forms of connection mechanisms (not shown). The memory 301 can be used to store programs and data, including the program for the multi-agent multimodal cooperative diagnostic method involved in the embodiments of this application. The processor 302 executes various functional applications and data processing of the electronic device 300 by running the program stored in the memory 301.
[0060] In this embodiment, the processor 302 needs to support high-performance inference computation of the Visual Language Large Model (VLLM). It can be implemented using at least one hardware form selected from Digital Signal Processing (DSP), Field Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 302 can be a Central Processing Unit (CPU), Graphics Processing Unit (GPU), Tensor Processing Unit (TPU), or other processing units with data processing and / or instruction execution capabilities, or a combination of several of these. Considering the real-time processing requirements of multimodal data and the computational complexity of the Visual Language Large Model inference, the processor 302 preferably uses a high-performance GPU or a dedicated AI acceleration chip to meet the stringent requirements for diagnostic response timeliness in intensive care unit scenarios.
[0061] The memory 301 in this embodiment may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and / or non-volatile memory. The volatile memory may include, for example, random access memory (RAM) and / or cache memory. The non-volatile memory may include, for example, read-only memory (ROM), flash memory, hard disk drive (HDD), or solid-state drive (SSD). Considering the parameter scale of the large visual language model and the storage requirements of multimodal medical data, the memory 301 should have sufficient storage capacity to store model weight parameters, patient historical medical data, dynamic medical knowledge graphs, and intermediate calculation results.
[0062] In this embodiment, the I / O interface 303 is used to interface with the central monitoring station of the intensive care unit and related medical information systems. It can receive input instructions (such as numerical or character information, and key signal inputs related to user settings and function control of the electronic device 300), and can also output various information (such as diagnostic results, risk warning information, and visualized images). Specifically, the I / O interface 303 interfaces with the intensive care information system through a standardized data transmission protocol, receiving multimodal clinical medical data in real time, including bedside medical images, clinical record text, and vital sign time series. Simultaneously, it outputs decision support information generated by the system, such as diagnostic results, condition assessment reports, and treatment recommendations, to the medical workstation. In this embodiment, the I / O interface 303 may include one or more of a physical keyboard, function keys, a mouse, a touch panel, a display screen, and a network communication interface to support the interactive operation and information retrieval needs of clinical medical staff.
[0063] Example 3
[0064] This embodiment provides a computer-readable storage medium storing computer-executable instructions, which, when executed by a processor, perform the method described in Embodiment 1.
[0065] In some embodiments, this application provides a computer program product comprising a computer program that, when executed by a processor, performs the method described in Embodiment 1.
[0066] Although the operations are described in a specific order in the accompanying drawings, this should not be construed as requiring these operations to be performed in the specific order or serial order shown, or requiring all of the operations shown to obtain the desired result. In certain environments, multitasking and parallel processing may be advantageous.
[0067] The methods and apparatus of this application can be implemented using standard programming techniques, utilizing rule-based logic or other logic to implement various method steps. It should also be noted that the terms "apparatus" and "module" as used herein and in the claims are intended to include implementations using one or more lines of software code and / or hardware implementations and / or devices for receiving input.
[0068] Any step, operation, or procedure described herein may be performed or implemented using one or more hardware or software modules, either alone or in combination with other devices. In one embodiment, the software module is implemented using a computer program product comprising a computer-readable medium containing computer program code, which is executable by a computer processor to perform any or all of the described steps, operations, or procedures.
[0069] The foregoing description of implementations of this application has been provided for illustrative and descriptive purposes. The foregoing description is not exhaustive and is not intended to limit this application to the exact forms disclosed. Various modifications and variations may exist in accordance with the foregoing teachings, or may arise from practice of this application. These embodiments were chosen and described to illustrate the principles of this application and its practical application, enabling those skilled in the art to utilize this application in various implementations and modifications to suit the specific purpose of the concept.
[0070] Other embodiments of this application will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of this application that follow the general principles of this application and include common knowledge or customary techniques in the field of this application that are not disclosed herein. The specification and examples are to be considered exemplary only, and the true scope and spirit of this application are indicated by the following claims.
[0071] It should be understood that this application is not limited to the precise structure described above and shown in the accompanying drawings, and various modifications and changes can be made without departing from its scope. The scope of this application is limited only by the appended claims.
[0072] The above embodiments are only used to illustrate the technical solutions of this application, and are not intended to limit them. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this application.
Claims
1. A multi-agent, multimodal collaborative method for diagnosing the condition of patients in the intensive care unit, characterized in that, include: S1. Receive a multimodal clinical medical dataset obtained from an intensive care information system, the dataset including bedside medical images, intensive care clinical record text, and continuously monitored life time series data; S2. Extracting multimodal clinical features: The modality detection agent identifies the file type of the dataset and performs multi-directional routing, and the corresponding domain expert agents extract the clinical features from the multimodal medical dataset respectively; S3. Construct a dynamic medical knowledge graph: Input the clinical features into the knowledge graph agent, and classify them according to a predefined type classification method. Control Relationship Pattern Library Under the constraints, a dynamic medical knowledge graph specific to the patient is constructed through a two-stage generation process; S4. Extract global graph representation: Serialize the dynamic medical knowledge graph into structured text and fuse it with the original multimodal medical dataset to generate a global graph representation; S5. Perform joint diagnostic reasoning: Input the global graph representation into the collaborative agent, and use the visual language large model to perform graph-enhanced joint reasoning to obtain medical diagnostic results.
2. The method according to claim 1, characterized in that, Step S2, extracting multimodal clinical features, includes: S21. Each data file in the intensive care unit patient data warehouse is classified into one of the preset modal types through feature signature analysis, wherein the modal type includes at least images, text and time series; S22. If it is determined that image data exists, activate the image agent to extract visual discovery features; S23. If text data is determined to exist, activate the text agent to parse the symptoms, medical history, and diagnostic features in the clinical record; S24. If time series data is determined to exist, activate the time series agent to analyze the abnormal pattern features in the time series.
3. The method according to claim 1, characterized in that, Step S3, constructing a dynamic medical knowledge graph, includes: S31. Perform entity extraction: in type classification Under the constraints, clinically relevant entities are identified as shown in Equation (1): in, Represents a set of entities. Indicates the first The entity mentioned, This indicates a classification based on predefined types. The semantic type selected in This indicates a specific type of attribute being captured; S32. Perform relational reasoning: In the control relation schema library The clinical relationship between entity pairs is inferred under the constraints, as shown in Equation (2): in, Represents a set of relations. This indicates the use of a predefined control relationship schema library. The relationship type selected in the middle, This indicates metadata containing relational strength and clinical significance.
4. The method according to claim 1, characterized in that, Step S4, extracting the global graph representation, includes: S41. For the entity set in the dynamic medical knowledge graph Perform attribute extraction to generate entity description fragments that include entity mentions, semantic type, and severity attributes; S42. Based on the set of relations The logical relationships between entities are transformed into structured text of "subject-clinical relationship-object" and supplemented with metadata containing clinical significance. S43. Perform topological sorting of the entity description fragments and structured text according to the pathological evolution order to generate a global graph representation. .
5. The method according to claim 1, characterized in that, Step S5, which involves performing joint diagnostic reasoning, includes: The collaborative agent invokes the visual language large model to execute the inference program, as shown in formula (3): in, The final medical diagnosis result is output. The reasoning process representing a large model of visual language. This represents the task instruction. Representing the medical image, This represents the structured electronic health record data. Representing the aforementioned clinical record, This represents the structured text of the serialized dynamic medical knowledge graph. This represents the additional context.
6. A multi-agent, multimodal collaborative device for diagnosing the condition of patients in the intensive care unit, characterized in that, include: The receiving unit is used to receive multimodal critical care medical datasets acquired by the intensive care unit monitoring system; An adaptive perception unit is used to identify data types through a modality detection agent and route them to the corresponding domain expert agent to extract multimodal clinical features; The dynamic knowledge graph construction unit is used to construct a patient-specific dynamic medical knowledge graph based on the clinical characteristics and a predefined pattern. The graph representation extraction unit is used to map the dynamic medical knowledge graph into a structured prompt sequence that can be understood by a large language model; The collaborative reasoning unit is used to serialize the dynamic medical knowledge graph and combine it with the original data input into the visual language big model to output the final diagnostic result.
7. A multi-agent, multimodal collaborative electronic device for diagnosing the condition of patients in intensive care units, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the method according to any one of claims 1 to 5.
8. A computer-readable storage medium having a computer program stored thereon, the computer program being executed by a processor to implement the method of any one of claims 1 to 5.