Knowledge-guided fine-tuning and optimization method for large medical model, and related apparatus
By constructing a multimodal-multi-level connection graph and explicit reasoning rules, and combining expert knowledge, the problems of insufficient diagnosis and poor interpretability of large medical models under single-modal data are solved. This achieves more efficient multimodal data integration and explicit logical reasoning, thereby improving the diagnostic accuracy and reliability of the model.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- SHENZHEN INST OF ADVANCED TECH CHINESE ACAD OF SCI
- Filing Date
- 2024-12-28
- Publication Date
- 2026-07-02
AI Technical Summary
Existing large medical models rely on single-modal data when diagnosing complex diseases, failing to fully integrate multimodal data. This results in incomplete and inaccurate diagnostic results, and the lack of clear logical reasoning paths leads to poor interpretability, making it difficult for doctors and patients to trust the models.
A knowledge-guided approach is adopted, which constructs a multimodal-multi-level connection graph through a multimodal-multi-level connection graph fusion module, an expert consensus-driven module, an analogy deduction module, and an explicit inference rule calculation module. Combining expert knowledge and explicit inference rules, logical chain hints are generated to enhance the interpretability and diagnostic accuracy of the model.
It achieves information complementarity from multimodal data, improves the diagnostic accuracy and interpretability of medical large models in the diagnosis of neuropsychiatric diseases, ensures the logic and credibility of model output, solves the "black box" problem, and improves the accuracy of early diagnosis.
Smart Images

Figure CN2024143482_02072026_PF_FP_ABST
Abstract
Description
A knowledge-guided fine-tuning and optimization method for large medical models and related devices Technical Field
[0001] This application pertains to a method for fine-tuning and optimizing a large medical model, specifically involving a knowledge-guided method and related apparatus for fine-tuning and optimizing a large medical model. Background Technology
[0002] With the accelerating aging of the global population, the number of patients with neuropsychiatric disorders has increased significantly, placing enormous pressure on society and the healthcare system. Diseases such as Alzheimer's, depression, and anxiety disorders represent a growing demand from the elderly and patients with neuropsychiatric disorders for efficient and precise medical services.
[0003] Against this backdrop, artificial intelligence technology, especially medical big data models, has emerged, providing powerful tools and solutions for the medical field. Medical big data models, by integrating multimodal data (such as MRI, PET, genomic data, electronic medical records, etc.), can support the early diagnosis and precision treatment of complex diseases. However, most existing medical big data models have three main problems: (1) They only use single-modal data and fail to fully integrate the advantages of multimodal data, resulting in incomplete diagnostic results or insufficient diagnostic accuracy. (2) Because medical big data models themselves rely on complex deep learning architectures, they cannot provide clear logical reasoning paths, making it difficult for doctors and patients to trust the conclusions obtained. (3) Due to the large differences between individual patients, the accuracy of existing medical big data models in early diagnosis is still not ideal. Summary of the Invention
[0004] This application addresses the technical problems of incomplete diagnostic results, low accuracy of diagnostic results, and unclear logical reasoning paths in current large-scale medical models by providing a knowledge-guided fine-tuning and optimization method and related apparatus for large-scale medical models.
[0005] To achieve the above objectives, this application adopts the following technical solution:
[0006] Firstly, this application proposes a knowledge-guided fine-tuning and optimization method for large medical models, including:
[0007] Acquire the multimodal data to be diagnosed;
[0008] The multimodal data to be diagnosed is input into the knowledge guidance model to obtain the thinking guidance information part in the logical chain prompts, which is then input into the medical big model to obtain the output that has been standardized by the chain prompts;
[0009] The knowledge guidance model includes a multimodal-multi-level connection graph fusion module, an expert consensus-driven module, an analogy and deduction module, an explicit deduction rule calculation module, and a progressive thinking guidance module.
[0010] The multimodal-multi-level interconnected graph fusion module is used to obtain the overall embedding features as states based on the modal data graphs of each modality of the disease to be diagnosed.
[0011] The expert consensus-driven module is used to extract expert consensus and clinical experience from the expert knowledge base of the disease to be diagnosed, combine the state semantic labels corresponding to the overall embedding features to determine the state relationship labels between the overall embedding features, extract the embedding feature representation of the relationship from the state relationship labels, and then combine the overall embedding features, state semantic labels, state relationship labels and the embedding feature representation of the relationship to construct a state hierarchy graph. Finally, combined with the modal data graph of the modal data, a global multimodal-multi-level connection graph is obtained.
[0012] The analogy deduction module is used to construct an individual multimodal-multilevel connection graph based on multiple sets of triples through analogy deduction, according to the global multimodal-multilevel connection graph; the triples include two global embedding features and corresponding embedding feature representations;
[0013] The explicit inference rule calculation module is used to obtain the complete inference rules for all nodes based on each node of the individual multimodal-multi-level connection graph.
[0014] The progressive thinking guidance module is used to combine the modality types in the global multimodal-multi-level connection graph and the complete reasoning rules of all nodes to obtain the thinking guidance information part in the logic chain prompts.
[0015] Furthermore, based on the modal data map of each modality of the disease to be diagnosed, the overall embedding features as states are obtained, including:
[0016] Based on the feature aggregation layer of the graph neural network, the neighborhood information of each node in the modal data map of each modality of the disease to be diagnosed is aggregated by the aggregation function Aggregate, and the global representation of the node is formed by aggregation. The overall embedding features of the modal data map of each modality of the disease to be diagnosed are then extracted.
[0017] Furthermore, the construction of an individual multimodal-multilevel connection graph based on multiple sets of triples through analogy includes:
[0018] Based on the global multimodal-multilevel connection graph, obtain the example triples;
[0019] The triplet to be predicted and the example triplet are passed through the analog feature embedding layer for embedding representation learning. Through the cross-modal adaptive interaction layer, the feature mapping between each modality is learned to obtain the embedding representation corresponding to each modality.
[0020] The multimodal information fusion layer integrates the embedded representations corresponding to each modality, and then the predicted triples are obtained through the inference consistency normalization layer and the prediction output layer, as well as the feature representation of the predicted triples.
[0021] Based on the predicted triples and their feature representations, an individual multimodal-multilevel connection graph is constructed.
[0022] Furthermore, the step of obtaining the complete inference rules for all nodes based on each node of the individual multimodal-multi-level connection graph includes:
[0023] Each node of the individual multimodal-multi-level connection graph is sequentially input into the strategy function for inference rule calculation;
[0024] The policy function includes a state encoding layer, multiple feature extraction layers, and a policy selection layer connected in sequence. An activation layer is set before each of the state encoding layer, multiple feature extraction layers, and policy selection layer. The output of the policy selection layer also includes normalization processing.
[0025] Furthermore, the modality types obtained by combining the global multimodal-multi-level connection graph and the complete reasoning rules of all nodes to obtain the thought guidance information part of the logic chain hints include:
[0026] Representative data from different modalities are extracted from the state hierarchy graph of the global multimodal-multi-level connection graph to construct a question-and-answer example library with progressive reasoning for different combinations of modal data. Then, based on the modality types in the complete reasoning rules of all nodes, corresponding question-and-answer examples are extracted from the question-and-answer example library as the thinking guidance information part in the logic chain hints.
[0027] Furthermore, the loss function used during the training of the analogy deduction module is:
[0028]
[0029] in, Let the relaxation loss function be... To train the total number of triples in the set, The sine similarity function is used. Here is an example of the hidden feature representation of a triple. To predict the hidden feature representation of triples, To maximize the calculation formula, and These are the hidden feature representations of the head and tail entities in the prediction pair, respectively.
[0030] Furthermore, the reward function used during training of the explicit derivation rule calculation module is:
[0031]
[0032]
[0033]
[0034]
[0035] in, For the total reward function, As a global objective reward, As a reward for path efficiency, Rewards for path diversity This is the last node in the individual multimodal-multilevel connection graph. For path length, The set of paths already explored. The cosine similarity function is used. for coefficient, for coefficient, for The coefficient;
[0036] The Monte Carlo policy gradient method is used to update the policy function parameters during the training of the explicit derivation rule calculation module.
[0037] Furthermore, the loss function used during the training of the knowledge-guided model is:
[0038]
[0039] in, The overall loss function during knowledge-guided model training. The loss function involved in updating the policy function parameters using the Monte Carlo policy gradient method is denoted as the path optimization loss. Calculate the semantic loss function for the reasoning process and labels in the logical chain hints' thought-guiding information section:
[0040]
[0041] in, Label the reasoning process. for Each embedding representation produced after the semantic encoding model for Each embedding representation produced after the semantic encoding model This refers to the reasoning process of the thought-guiding information part in the logical chain hints.
[0042] Secondly, this application proposes a knowledge-guided medical large-scale model fine-tuning and optimization system, including:
[0043] The data acquisition module is used to acquire the multimodal data to be diagnosed.
[0044] The prompt output module is used to input the multimodal data to be diagnosed into the knowledge guidance model to obtain the thinking guidance information part in the logical chain prompt, which is then input into the medical big model to obtain the output after chain prompt normalization;
[0045] The knowledge guidance model includes a multimodal-multi-level connection graph fusion module, an expert consensus-driven module, an analogy and deduction module, an explicit deduction rule calculation module, and a progressive thinking guidance module.
[0046] The multimodal-multi-level interconnected graph fusion module is used to obtain the overall embedding features as states based on the modal data graphs of each modality of the disease to be diagnosed.
[0047] The expert consensus-driven module is used to extract expert consensus and clinical experience from the expert knowledge base of the disease to be diagnosed, combine the state semantic labels corresponding to the overall embedding features to determine the state relationship labels between the overall embedding features, extract the embedding feature representation of the relationship from the state relationship labels, and then combine the overall embedding features, state semantic labels, state relationship labels and the embedding feature representation of the relationship to construct a state hierarchy graph. Finally, combined with the modal data graph of the modal data, a global multimodal-multi-level connection graph is obtained.
[0048] The analogy deduction module is used to construct an individual multimodal-multilevel connection graph based on multiple sets of triples through analogy deduction, according to the global multimodal-multilevel connection graph; the triples include two global embedding features and corresponding embedding feature representations;
[0049] The explicit inference rule calculation module is used to obtain the complete inference rules for all nodes based on each node of the individual multimodal-multi-level connection graph.
[0050] The progressive thinking guidance module is used to combine the modality types in the global multimodal-multi-level connection graph and the complete reasoning rules of all nodes to obtain the thinking guidance information part in the logic chain prompts.
[0051] Thirdly, this application proposes a computer program product, including a computer program, which, when executed by a processor, implements the aforementioned knowledge-guided fine-tuning and optimization of a large medical model.
[0052] Compared with the prior art, this application has the following beneficial effects:
[0053] This application proposes a knowledge-guided fine-tuning and optimization method for a large-scale medical model. It utilizes a knowledge-guided model to obtain the thought-guiding information portion from logical chain prompts, which is then input into the large-scale medical model. The resulting output, standardized by chain prompts, is then used to fine-tune and optimize the large-scale medical model. The knowledge-guided module includes a multimodal-multi-level connection graph fusion module, an expert consensus-driven module, an analogy deduction module, an explicit derivation rule calculation module, and a progressive thought guidance module. The multimodal-multi-level interconnected graph fusion module transforms medical data from different modalities into graph-structured data, extracts node features from different modalities, and achieves multimodal information complementarity. The expert consensus-driven module extracts medical consensus from the expert knowledge base to generate semantic labels for different modal states and the labels of relationships between them. The analogy deduction module realizes feature mapping and fusion between different modalities, supports analogy deduction of multimodal data, and improves reasoning ability and personalized diagnostic accuracy. The explicit derivation rule calculation module extracts explicit derivation rules for diseases and provides logical chain hints for logical reasoning rules, providing explicit logical support for the reasoning of large medical models and enhancing the interpretability of large medical models. The progressive thinking guidance module can use chain hint technology to realize sparse sample learning of large medical models, guide large medical models to make step-by-step reasoning, and ultimately improve the diagnostic accuracy of large medical models. Attached Figure Description
[0054] To more clearly illustrate the technical solutions of the embodiments of this application, the accompanying drawings used in the embodiments will be briefly introduced below. It should be understood that the following drawings only show some embodiments of this application and should not be regarded as a limitation of the scope. For those skilled in the art, other related drawings can be obtained based on these drawings without creative effort.
[0055] Figure 1 is a flowchart illustrating a knowledge-guided medical large-scale model fine-tuning optimization method according to this application;
[0056] Figure 2 is a schematic diagram of the second process of the knowledge-guided medical large-scale model fine-tuning and optimization method of this application.
[0057] Figure 3 is a schematic diagram of the principle corresponding to the second flowchart of the knowledge-guided medical large model fine-tuning and optimization method of this application.
[0058] Figure 4 is a schematic diagram of the principle of the analogy deduction module based on the multimodal-multi-level connection graph in the embodiment of this application;
[0059] Figure 5 is a schematic diagram of the explicit derivation rule calculation module based on multimodal-multi-level connection graph in an embodiment of this application;
[0060] Figure 6 is a schematic diagram of the knowledge-guided medical large model fine-tuning and optimization system of this application. Embodiments of the present invention
[0061] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. The components of the embodiments of this application described and shown in the accompanying drawings can generally be arranged and designed in various different configurations.
[0062] Therefore, the following detailed description of the embodiments of this application provided in the accompanying drawings is not intended to limit the scope of the claimed application, but merely to illustrate selected embodiments of the application. All other embodiments obtained by those skilled in the art based on the embodiments of this application without inventive effort are within the scope of protection of this application.
[0063] It should be noted that similar labels and letters in the following figures indicate similar items. Therefore, once an item is defined in one figure, it does not need to be further defined and explained in subsequent figures.
[0064] With the accelerating aging of the global population, the number of patients with neuropsychiatric disorders has increased significantly, placing enormous pressure on society and the healthcare system. Diseases such as Alzheimer's, depression, and anxiety disorders represent a growing demand from the elderly and patients with neuropsychiatric disorders for efficient and precise medical services. This increase in the number of patients not only affects public health but also exacerbates the strain on social medical resources.
[0065] Against this backdrop, the rise of artificial intelligence technology, particularly large-scale medical models, has provided powerful tools and solutions for the medical field. By integrating multimodal data (such as MRI, PET, genomic data, and electronic medical records), large-scale medical models can support the early diagnosis and precision treatment of complex diseases. Compared to traditional diagnostic methods based on a single data modality, the fusion of multimodal data can more comprehensively reflect the complex mechanisms of diseases and improve diagnostic accuracy.
[0066] However, large-scale medical models face a significant challenge in application: the "black box" problem. Their decision-making processes typically rely on complex deep learning architectures and a large number of parameters, based on data-driven implicit learning patterns rather than explicit rules. While these models excel at handling complex data and predictive tasks, their decision-making processes lack credibility and cannot provide clear explanations for doctors and patients. This makes the model's reasoning process difficult to understand, limiting its clinical application.
[0067] In the medical field, interpretability is crucial for model application. Large medical models must be interpretable so that doctors can rely on the model's diagnostic results and make clinical decisions based on the information it provides. Interpretability not only helps ensure the diagnostic safety and reliability of the model but also helps medical institutions improve patient treatment outcomes through the model's output. Especially in the diagnosis and treatment of neuropsychiatric diseases, due to the complexity of the diseases themselves and the variability among patients, models lacking interpretability may lead to inaccurate diagnoses and even medical risks. While existing interpretability methods (such as LIME and SHAP) can provide some explanations, these methods are generally suitable for relatively simple models. When faced with the complex nonlinear relationships and high-dimensional data of large medical models, they can only provide local or fragmented explanations, struggling to handle the complex logic and large-scale data processing involved in large medical models. Therefore, existing interpretability techniques cannot meet the needs of large medical models.
[0068] Therefore, current large-scale medical models mainly suffer from the following problems:
[0069] Current large-scale medical models rely on single-modality data when diagnosing complex diseases. However, single-modality data often fails to fully reflect the multidimensional characteristics of diseases, limiting diagnostic accuracy. Multimodal data (such as brain imaging, EEG, genomics, and electronic medical records) can provide complementary information sources; however, most current large-scale medical models struggle to effectively integrate these diverse data sources, resulting in incomplete diagnoses. Furthermore, current large-scale medical models are highly complex in structure, with a large number of parameters, relying on nonlinear transformations and multi-layered networks to process and transmit high-dimensional data. This implicit learning mechanism makes the model's reasoning and decision-making processes difficult to understand, creating a "black box" problem that severely impacts the interpretability of large-scale medical models and limits their application in clinical medicine. While existing interpretability techniques (such as LIME and SHAP) can provide partial explanations for simpler models, they can only offer fragmented explanations when dealing with highly complex nonlinear relationships and multidimensional data, failing to cover the model's overall decision-making logic and thus unable to provide sufficient interpretive support when dealing with complex large-scale medical models. Furthermore, while existing large medical models possess a certain diagnostic capability when dealing with complex diseases, their diagnostic accuracy remains unsatisfactory due to a lack of deep understanding of the disease domain and effective reasoning mechanisms.
[0070] The relevant specific studies mainly include:
[0071] Chinese invention patent CN118802369A proposes a collaborative enhancement method for APT knowledge graphs and large language models. Specifically, the method includes the following steps: S1, constructing an APT knowledge graph focused on the cybersecurity field; S2, designing chained prompts based on user input and sequentially passing them to the large language model for step-by-step answering, with the large language model utilizing the knowledge provided by the APT knowledge graph to query and locate the question; S3, combining the located APT knowledge nodes with the user-input question content, enriching the context-aware prompts with strongly relevant nodes from subgraph retrieval; S4, the large language model generating a comprehensive answer for the user based on the context-aware prompts. This patent application enhances the semantic understanding, reasoning, and prediction capabilities of large language models in APT scenarios. The large language model can better analyze and infer attacker behavior and attack paths, improving accuracy in complex network environments and enhancing the ability to defend against and respond to APT attacks. Furthermore, Chinese patent application CN118674056A proposes an intelligent solution method, system, device, and medium for multi-step reasoning problems. Specifically, it designs a multi-step reasoning problem-solving framework that generates explanation steps in a step-by-step manner, serving as a multi-step thought chain to inspire model reasoning. This enhances multi-step reasoning capabilities and has significant practical value in achieving strong artificial intelligence. Moreover, this application establishes a knowledge fusion mechanism and a relevance evaluation mechanism within the multi-step reasoning problem-solving framework, which helps improve the logical rationality and factual accuracy of the generated reasoning process, thereby enhancing the accuracy of problem-solving. Simultaneously, the external knowledge introduced by the knowledge fusion mechanism can provide additional information for many question-answering related machine learning tasks, and the relevance evaluation mechanism's assessment of the problem-solving steps can assist educational platforms in making reasonable evaluations and feedback on the solution process, enabling the provision of more personalized online tutoring services.
[0072] However, current research has the following problems:
[0073] Most existing medical models utilize only single-modal data (such as brain imaging, EEG data, genetic data, or electronic medical records), failing to fully integrate the advantages of multimodal data. Single-modal data struggles to comprehensively capture the multidimensional characteristics of complex diseases, especially diseases like Alzheimer's, which often require multimodal data to fully reveal the disease's biomarkers and developmental mechanisms. The use of single-modal data leads to incomplete diagnostic results and insufficient diagnostic accuracy. While existing large-scale medical models perform well with complex data, their reliance on complex deep learning architectures prevents them from providing explicit logical reasoning paths, making their conclusions difficult for doctors and patients to trust. Particularly in disease diagnosis, doctors cannot clearly understand the model's reasoning process through its decision-making process, affecting the credibility and application of diagnostic results. Large-scale medical models fail to provide sufficient explicit reasoning when facing nonlinear, multidimensional, and complex data; the "black box" nature of the models remains a significant problem. Due to the complex pathogenesis of neuropsychiatric diseases (such as Alzheimer's and depression) and significant individual patient variability, the accuracy of existing models in early diagnosis remains unsatisfactory. Current methods often rely on limited pathological data and markers, failing to provide highly accurate diagnoses in the early stages of disease. Furthermore, existing models fail to fully leverage expert knowledge and medical consensus to enhance diagnostic accuracy, lacking a systematic explicit reasoning mechanism that results in diagnostic outcomes without logical explanation and support.
[0074] To address the shortcomings of the existing technologies, this application proposes a knowledge-guided fine-tuning and optimization method and related apparatus for large-scale medical models. It constructs a method for enhancing the interpretability of large-scale medical models based on "multimodal data + expert knowledge," significantly improving the diagnostic accuracy and interpretability of the model in disease domains. The following detailed description, in conjunction with embodiments and accompanying drawings, further illustrates this application.
[0075] Figure 1 shows a flowchart of a knowledge-guided medical large-scale model fine-tuning optimization method according to this application, which may include:
[0076] S101, acquire the multimodal data to be diagnosed.
[0077] S102, input the multimodal data to be diagnosed into the knowledge guidance model to obtain the thinking guidance information part in the logical chain prompt, which is used to input into the medical big model to obtain the output after chain prompt normalization.
[0078] The knowledge guidance model includes a multimodal-multi-level connection graph fusion module, an expert consensus-driven module, an analogy and deduction module, an explicit derivation rule calculation module, and a progressive thinking guidance module.
[0079] The multimodal-multi-level connection map fusion module is used to obtain the overall embedding features as states based on the modal data maps of various modalities of the disease to be diagnosed. This module is the foundation of the entire optimization method, responsible for integrating multimodal data from different medical examination methods (such as CT and MRI images, pathology reports, and physiological monitoring data). Each of these data is represented in the form of a map, containing rich disease information. The fused map not only retains the unique information of each modality but also captures the correlations and complementarities between them, providing a comprehensive description of the disease state for subsequent analysis. By integrating multimodal data sources, the multimodal-multi-level connection map fusion module provides comprehensive disease features, overcoming the information loss and insufficient diagnostic accuracy problems caused by single-modal data. Furthermore, through the construction of a multi-level connection map, it effectively reveals the complementary information between different modalities, comprehensively capturing the multi-dimensional features of the disease and solving the limitations of single-modal data.
[0080] The expert consensus-driven module extracts expert consensus and clinical experience from an expert knowledge base of the disease to be diagnosed. It then combines this with state semantic labels corresponding to the overall embedded features to determine state relationship labels between the overall embedded features. The module extracts the embedded feature representations of these relationships from the state relationship labels. Finally, it combines the overall embedded features, state semantic labels, state relationship labels, and the embedded feature representations of the relationships to construct a state hierarchy graph. This graph is then combined with the modal data graph of the modal data to obtain a global multimodal-multi-level connection graph. The expert consensus-driven module can rely on a large expert knowledge base containing consensus, clinical experience, and research findings from medical experts on various diseases. First, it extracts the overall embedded features from the multimodal data of the disease to be diagnosed. Then, it searches the knowledge base for relevant expert consensus and clinical experience. Through semantic matching and similarity calculation, it assigns accurate state semantic labels and state relationship labels to these features. These labels not only describe the state of the disease but also reveal the logical relationships between states. Finally, the module uses these labels and relationships to construct a state hierarchy graph, which reflects the possible paths and stages of disease development. The expert consensus-driven module introduces expert knowledge to ensure that the reasoning rules extracted during the model reasoning process are explicit.
[0081] The analogy deduction module is used to construct an individual multimodal-multilevel connection graph based on multiple sets of triples through analogy deduction, according to the global multimodal-multilevel connection graph. Each triple includes two global embedding features and a corresponding embedding feature representation. Analogy deduction is a logical reasoning method that infers new triples based on known triples (i.e., two global embedding features and the relationship between them). Through multiple iterations and reasoning, a graph reflecting the disease state of an individual patient is gradually constructed. This graph not only contains specific information about the patient's condition but also reflects the personalized characteristics of disease development.
[0082] The explicit inference rule calculation module is used to obtain complete inference rules for all nodes based on the nodes of the individual multimodal-multi-level connection graph. The module generates complete inference rules based on the individualized multimodal-multi-level connection graph; these rules are expressed in the form of logical expressions, describing the causal relationships and evolutionary patterns between disease states. The module can extract all logical relationships by traversing the nodes and edges in the graph and transform them into executable inference rules. These rules provide clear guidance for subsequent decision-making. This design of an explicit inference rule calculation module based on a multimodal-multi-level connection graph extracts explicit inference rules from the connection graph and provides them to a large medical model, thereby enhancing the model's interpretability and solving the problem of low diagnostic reliability caused by the "black box" problem of large medical models. The analogy and deduction module and explicit inference rule calculation module based on the multimodal-multilevel connection graph mine the paths between different data modalities and diagnostic conclusions from the multimodal-multilevel connection graph, and extract the key explicit inference rules in the decision-making process, thus making up for the shortcomings of existing technologies in interpreting complex models.
[0083] The progressive thinking guidance module combines the modality types in the global multimodal-multi-level connection graph and the complete inference rules of all nodes to obtain the thinking guidance information in the logic chain prompts. This module is the final output of the optimization method, generating the thinking guidance information in the logic chain prompts by combining the global multimodal-multi-level connection graph and the complete inference rules of all nodes. This information is presented to doctors or users in a structured manner, guiding them to think and make decisions according to the logic chain. Designing this progressive thinking guidance module guides the medical model to gradually generate diagnostic results, enhancing the model's diagnostic accuracy in neuropsychiatric disease scenarios. It also ensures the model's output is logical and interpretable, significantly improving the accuracy of early diagnosis and helping to better utilize disease-related knowledge, thereby enhancing the accuracy and reliability of diagnosis.
[0084] This application proposes a logical chain hint construction method based on "multimodal data + expert knowledge". By constructing a connection graph and calculating the path of explicit reasoning rules, it further constructs hints for a large medical model, improving the diagnostic accuracy and interpretability of the diagnostic results. The multimodal-multi-level connection graph fusion module integrates graph-structured data from different modalities of medical data (such as brain imaging, EEG, genomics, electronic medical records, etc.) and aggregates and extracts node features from different modalities of neuropsychiatric disease data through a graph neural network, achieving multimodal information complementarity. The expert consensus-driven module extracts medical consensus from an expert knowledge base, generates semantic labels for different modal states and the relationships between them, and uses a natural language processing model to extract semantic embedding features of the relationships, providing semantic support for the connection graph of multimodal data. The mechanism combining expert knowledge with multimodal connection graphs and the mechanism combining modality-level graphs with state-level graphs addresses the challenge of how semantic embedding models can effectively process expert knowledge and organically integrate it with feature embeddings from multimodal data. This ensures that multidimensional features from different modalities can be integrated into a unified multimodal-multi-level connection graph. An analogy deduction module based on the multimodal-multi-level connection graph, through a cross-modal adaptive interaction layer design, achieves feature mapping and fusion between different modalities, supporting analogy deduction of multimodal data. This module can predict relationships between states and further construct individualized multimodal-multi-level connection graphs, improving reasoning ability and individualized diagnostic accuracy. An explicit derivation rule calculation module based on the multimodal-multi-level connection graph, using deep reinforcement learning technology, extracts explicit derivation rules for neuropsychiatric diseases from the state-level graph. It provides logical chain hints for logical reasoning rules, offering explicit logical support for reasoning in large medical models and enhancing their interpretability.
[0085] Existing large-scale medical models often rely on single-modality data, leading to incomplete understanding of diseases and insufficient diagnostic accuracy. This application proposes a multimodal-multi-level connection graph fusion module, which integrates multiple modalities and extracts and aggregates features through graph neural networks. Compared with existing single-modality or simple multimodal fusion techniques, this application can more comprehensively reflect the complex mechanisms of neuropsychiatric diseases, improving the accuracy and comprehensiveness of diagnosis. Existing large-scale medical models suffer from the "black box" problem, failing to clearly explain their decision-making processes, making it difficult for doctors to trust them in clinical practice. Existing interpretability methods have some application in simple models, but they can only provide fragmented explanations when dealing with complex nonlinear relationships and high-dimensional data. This application introduces an expert consensus-driven module, an analogy deduction module based on a multimodal-multi-level connection graph, and an explicit inference rule calculation module. By combining medical consensus from an expert knowledge base, it generates explicit reasoning rules between states and constructs explicit reasoning paths, significantly improving the interpretability of the model, enabling doctors to understand and trust the model's diagnostic results. Most existing medical models employ standard training methods and lack the ability to reason progressively for complex tasks, especially in the field of neuropsychiatric diseases. Due to the scarcity of data, models struggle to learn effectively through traditional methods. This application proposes a progressive thinking guidance module that combines the professional knowledge of clinicians to construct a question-and-answer example library with progressive reasoning. It then guides large medical models to learn from sparse samples through chained prompts. This technology enables the model to reason progressively with a limited amount of training data, improving its learning efficiency and diagnostic capabilities in the diagnosis of neuropsychiatric diseases.
[0086] Figure 2 shows a schematic diagram of the second flowchart of the knowledge-guided medical large-scale model fine-tuning and optimization method of this application, and Figure 3 shows a schematic diagram of the principle corresponding to the second flowchart of the knowledge-guided medical large-scale model fine-tuning and optimization method of this application. Specifically, it may include:
[0087] S201, Model Building
[0088] The model constructed in this application mainly comprises five modules: a multimodal-multi-level connection graph fusion module, an expert consensus-driven module, an analogical deduction module based on the multimodal-multi-level connection graph, an explicit derivation rule calculation module based on the multimodal-multi-level connection graph, and a progressive thinking guidance module. The specific construction methods for each model are as follows:
[0089] (1) Construct a multimodal-multi-level interconnected graph fusion module
[0090] Step a1: Obtain the modal data spectrum of a certain mode. As input, it is fed into the input layer of the multimodal-multi-level interconnected graph fusion module.
[0091] It should be noted that in this application, "modality" specifically refers to different types of medical data, such as brain imaging, electroencephalogram (EEG) data, and electronic medical records, all of which are types of modal data. Modal data can be represented as a graph structure, forming a modal data atlas through the graph structure composed of nodes and edges.
[0092] Step a2: The feature aggregation layer based on the graph neural network uses an aggregation function. The neighborhood information of aggregated nodes is used to form a global representation of the nodes, and then the overall embedding features of the modality data map are extracted. (Hereafter referred to as "state")
[0093] Graph Neural Networks (GNNs) are a deep learning method specifically designed for processing graph data, enabling end-to-end learning and inference on graph-structured data. The core components of a GNN include graph convolutional layers, message passing mechanisms, and multilayer perceptrons. Graph convolutional layers update the features of nodes in the graph by aggregating features from neighboring nodes to the central node and applying linear transformations and nonlinear activation functions to update the central node's features. Message passing mechanisms allow information to be passed and aggregated between nodes, thereby capturing global graph structure information. Multilayer perceptrons further process and transform node features, improving the model's expressive power.
[0094] Step a3: Modal data spectrum from step a1 This is the modal hierarchy map, which is the overall embedding feature obtained through step a2. These are nodes in the state hierarchy graph. and As the output of this mode, it together with the output of the expert consensus-driven module constitutes a multimodal-multi-level connection map.
[0095] (2) Constructing an expert consensus-driven module
[0096] Step b1: Taking Alzheimer's disease as an example, extract expert consensus and clinical experience from the expert knowledge base in the field of Alzheimer's disease, and then use the state semantic labels corresponding to the overall embedding features obtained in step a2. (For example, in structural image modalities, global embedding features) The overall feature represented could be "hippocampal atrophy," and the corresponding state semantic label would be... "Hippocampal atrophy" – finding labels relating states The relationship tags can be medical logic such as "caused by pathological changes" or "symptoms are related to pathology." Expert knowledge is extracted in the form of textual descriptions.
[0097] Step b2: Use a semantic extraction model trained on a large-scale medical text dataset. , using relationship tags The semantic features are used as embedding features of the relation, and are represented as follows:
[0098]
[0099] in, The embedded features representing the relationship are state relationships.
[0100] Step b3: Combine the overall embedding features from step a3 The state semantic label in step b1 Status Relationship Labels and the state relationship obtained in step b2 Constructing a state hierarchy graph The final result is a global multimodal-multilevel connection graph. .
[0101] (3) Construct an analogy deduction module based on multimodal-multilevel connection graph.
[0102] Figure 4 shows a schematic diagram of the principle of the analogy deduction module based on the multimodal-multi-level connection graph.
[0103] Step c1: When training the analogy deduction module based on multimodal-multi-level connection graph, the input is state pairs. , and the relationship between them The training triplet and state pairs , and the relationship between them Example triplet In practical applications, the input for prediction is the example triplet. and state pairs , and the relationship to be predicted between them Predictive triples consisting of (masked as [MASK]) .
[0104] Step c2: Taking the prediction process as an example, this step is explained. The input training triples and example triples are passed through an analog feature embedding layer for embedding representation learning. Through a cross-modal adaptive interaction layer, feature mappings between different modalities are learned, resulting in the embedding representation. , , , .
[0105] Step c3: After generating the node embedding representation, multimodal data is integrated through a multimodal information fusion layer. Then, after passing through an inference consistency normalization layer and a prediction output layer, the predicted triples (the predicted triples) and their feature representations are obtained. , .
[0106] Step c4: Based on the obtained predicted triples Combined with feature representation , Construct an individual multimodal-multilevel connection map.
[0107] (4) Construct an explicit derivation rule calculation module based on multimodal-multi-level connection graph.
[0108] Figure 5 shows a schematic diagram of the explicit derivation rule calculation module based on multimodal-multi-level connection graph.
[0109] Step d1: Connect the nodes of the individual multimodal-multilevel connection graph Input it into the model.
[0110] Step d2: The policy function for calculating the inference rules is... This function represents the current state. Next, the next step is relation selection, within the function. These are learnable parameters. The policy function consists of a state encoding layer, several feature extraction layers, and a policy selection layer. An activation layer is added after each layer, and the output of the policy selection layer is normalized. Step d3: After policy selection, the model obtains the next node in the inference rule path. .from Start by repeating steps d1 and d2 until the output node of step d2 is... You can get arrive The complete inference rules.
[0111] (5) Constructing a progressive thinking guidance module
[0112] Step e1: Extract representative data from different modalities from the state hierarchy graph of the global multimodal-multi-level connection graph. Under the guidance of clinicians, construct a question-answering example library with progressive reasoning for different combinations of modal data. Each example should include a simulated input part and a simulated output part. The simulated input part includes: simulated instruction requirements, simulated patient information descriptions, simulated strong logical evidence chains, and simulated weak logical evidence chains. The simulated output part includes: simulated detailed diagnostic reports and personalized intervention strategies, simulated reasoning processes, and simulated thought logic trees.
[0113] Step e2: Based on the modality types obtained in the reasoning rules in step d3, extract the corresponding question-and-answer examples from the question-and-answer example library with progressive reasoning constructed in step e1, and use them as the thinking guidance information part in the logic chain hints.
[0114] S202, Model Training
[0115] (1) Construct a global multimodal-multilevel connection graph
[0116] Step f1: Collect multimodal data from a number of Alzheimer's patients and divide them into training, validation, and test sets according to proportions. Then, set basic training parameters, including the number of training iterations.
[0117] Step f2: Preprocess the multimodal training data from step f1 to obtain the graph structure data corresponding to each modality of each individual in the training set. You can follow the steps in step S201, specifically the multimodal-multi-level interconnected graph fusion module, to obtain... Corresponding state .
[0118] Step f3: Execute the steps in the expert consensus-driven module S201 to obtain the global multimodal-multilevel connection graph composed of the training set data. .
[0119] (2) Constructing an individual multimodal-multilevel connection map
[0120] Step g1: Obtain the graph structure data of each modality of the individuals from the training set obtained in step f2. Corresponding state Forming several triples As predicted triples. The global multimodal-multilevel connectivity map obtained from step f3. Extracting triples As a training triplet.
[0121] Step g2: and The input is fed into the analogy deduction module based on the multimodal-multi-level connection graph, and the specific process in step S201 of the analogy deduction module based on the multimodal-multi-level connection graph is executed to obtain the predicted triples. The states corresponding to each modal data of the individual in step f2. All the triples obtained after reasoning are combined to obtain an individual multimodal-multilevel connection map. .
[0122] Step g3: Construct the relaxation loss, which is achieved by minimizing the example Compared with the prediction Hidden feature representation Simultaneously, it maximizes the prediction of the hidden feature representations of the head and tail entities. accomplish:
[0123]
[0124] (3) Uncover explicit reasoning rules
[0125] Step h1: Individual multimodal-multilevel connection map obtained in step g2 The disease detection standard modality features, such as "scale shows cognitive impairment," are used as termination nodes. Other nodes serve as the starting node. .
[0126] Step h2: The input is an explicit derivation rule calculation module based on a multimodal-multi-level connection graph. During the training of the policy function, the three reward functions are defined as follows:
[0127]
[0128]
[0129]
[0130] Among them, path efficiency reward Inference efficiency can be improved by limiting the path length of reinforcement learning interactions with the environment. Indicates path length. Path diversity reward. Negative cosine similarity can be used to improve the diversity of paths found by the model.
[0131] The total reward function is The policy function parameters are updated using a Monte Carlo policy gradient, as shown below:
[0132]
[0133] The loss function involved in this step is... It can be represented as:
[0134]
[0135] Step h3: Repeat step d3 to obtain arrive The complete inference rules.
[0136] Step h4: Repeat step h3 to obtain all starting nodes. To the terminal node The reasoning path, as the strong logical evidence chain and weak logical evidence chain of the explicit reasoning rule.
[0137] (4) Constructing logical chain hints
[0138] Step i1: Execute the steps of the progressive thinking guidance module in step S201 to obtain progressive thinking guidance information.
[0139] Step i2: Based on the patient's (individual's) medical records from step g2, construct a patient information description. The instructions should include detailed requirements for diagnostic reporting and personalized intervention strategies in clinical applications.
[0140] Step i3: Combine the instruction requirements from step i2, the patient information description, the explicit reasoning rules obtained from step h4, and the progressive thinking guidance information obtained from step i1 to form a logical chain prompt.
[0141] (5) Model optimization
[0142] Step j1: Select the large medical model and execute steps (1) to (4) in S202 to obtain the logical chain hints. In the above steps, the dimensions of all latent space variables are aligned with the semantic space of the large model.
[0143] Step j2: Input the logical chain hints into the medical big model to obtain the output after chain hint normalization.
[0144] Step j3: The state of the inference rule part of the logical chain hints and relationships Replace each with its corresponding text label and Label it as part of the overall reasoning process. The reasoning process of the output part in step j2. The semantic loss score calculated with respect to this label is:
[0145]
[0146] in, They represent Each embedded representation is produced after passing through a semantic encoding model.
[0147] Step j4: The overall loss function of the model is:
[0148]
[0149] The parameters of all modules and latent space are updated based on the overall loss function.
[0150] Step j5: Based on the number of training iterations in step f1, train the model using the training set, and validate it using the validation set during each iteration. After the iteration is complete, select the best model based on the validation results.
[0151] S203, Model Testing
[0152] Step k1: Based on the individual test set data in step f1, execute the steps in the multimodal-multi-level connectivity graph fusion module in step S201 to obtain the state corresponding to each modality of the test individual. .
[0153] Step k2: Execute steps (2)-(4) in step S202 to obtain the logical chain hints.
[0154] Step k3: Input the logic chain prompt into the medical model selected in step j1 to obtain an interpretable diagnostic report and personalized intervention strategy for the patient (test individual).
[0155] It should be noted that the foregoing only uses Alzheimer's disease as an example to illustrate this application. This application has broad applicability and can be extended to other complex disease fields, such as oncology and cardiovascular diseases. Specifically, in the application of the model, only the multimodal-multi-level connection map fusion module needs to be constructed by replacing the input with multimodal data of the corresponding disease (such as tumor imaging, cardiac ultrasound, blood biomarkers, etc.) to apply it to diagnostic and treatment decisions in other medical fields. By inputting multimodal data of different diseases into this method, this application can construct personalized multimodal connection maps for specific pathological mechanisms of various complex diseases and generate corresponding explicit inference rules and diagnostic prompts, thereby providing effective support for the precision diagnosis and treatment of different diseases.
[0156] Figure 6 shows a schematic diagram of the knowledge-guided medical large-scale model fine-tuning and optimization system of this application, which may include:
[0157] The data acquisition module is used to acquire the multimodal data to be diagnosed.
[0158] The prompt output module is used to input the multimodal data to be diagnosed into the knowledge guidance model to obtain the thinking guidance information part in the logical chain prompt, which is then input into the medical big model to obtain the output after chain prompt normalization;
[0159] The knowledge guidance model includes a multimodal-multi-level connection graph fusion module, an expert consensus-driven module, an analogy and deduction module, an explicit deduction rule calculation module, and a progressive thinking guidance module.
[0160] The multimodal-multi-level interconnected graph fusion module is used to obtain the overall embedding features as states based on the modal data graphs of each modality of the disease to be diagnosed.
[0161] The expert consensus-driven module is used to extract expert consensus and clinical experience from the expert knowledge base of the disease to be diagnosed, combine the state semantic labels corresponding to the overall embedding features to determine the state relationship labels between the overall embedding features, extract the embedding feature representation of the relationship from the state relationship labels, and then combine the overall embedding features, state semantic labels, state relationship labels and the embedding feature representation of the relationship to construct a state hierarchy graph. Finally, combined with the modal data graph of the modal data, a global multimodal-multi-level connection graph is obtained.
[0162] The analogy deduction module is used to construct an individual multimodal-multilevel connection graph based on multiple sets of triples through analogy deduction, according to the global multimodal-multilevel connection graph; the triples include two global embedding features and corresponding embedding feature representations;
[0163] The explicit inference rule calculation module is used to obtain the complete inference rules for all nodes based on each node of the individual multimodal-multi-level connection graph.
[0164] The progressive thinking guidance module is used to combine the modality types in the global multimodal-multi-level connection graph and the complete reasoning rules of all nodes to obtain the thinking guidance information part in the logic chain prompts.
[0165] It should be noted that, in the several embodiments provided in this application, it should be understood that the disclosed apparatus and methods can be implemented in other ways. For example, the system embodiments described above are merely illustrative. For instance, the division of each module is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple modules may be combined or integrated into another device, or some features may be ignored or not executed. The modules described as separate components may or may not be physically separated. The components shown as modules may be one or more physical units, that is, they may be located in one place or distributed in multiple different places. Some or all of the modules can be selected to achieve the purpose of the solution in this embodiment according to actual needs.
[0166] Furthermore, in the various embodiments of the present invention, the modules can be integrated into one processing unit, or each module can exist physically separately, or two or more modules can be integrated into one unit. The integrated unit described above can be implemented in hardware or as a software functional unit.
[0167] This application also proposes a computer program product comprising instructions that, when executed by a processor, implement the steps described in the above-described embodiment of the knowledge-guided medical large-scale model fine-tuning optimization method.
[0168] The computer program can be divided into one or more modules / units, which are stored in a memory and executed by the processor to complete the present invention.
[0169] The carrier for implementing the above-mentioned computer program product can be a computer device. Alternatively, the computer program product can be stored in a computer storage medium.
[0170] The computer device may be a desktop computer, laptop, handheld computer, or cloud server, etc. The computer device may include, but is not limited to, a processor and memory.
[0171] The processor may be a central processing unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
[0172] The memory can be used to store the computer program and / or module, and the processor implements various functions of the computer device by running or executing the computer program and / or module stored in the memory, and by calling the data stored in the memory.
[0173] If the modules / units integrated into the computer device are implemented as software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, all or part of the processes in the methods of the above embodiments of the present invention can also be implemented by a computer program instructing related hardware. The computer program can be stored in a computer-readable storage medium, and when executed by a processor, it can implement the steps of the various method embodiments described above. The computer program includes computer program code, which can be in the form of source code, object code, executable files, or certain intermediate forms. The computer-readable medium can include: any entity or device capable of carrying the computer program code, recording media, USB flash drives, portable hard drives, magnetic disks, optical disks, computer memory, read-only memory (ROM), random access memory (RAM), electrical carrier signals, telecommunication signals, and software distribution media, etc.
[0174] The above are merely preferred embodiments of this application and are not intended to limit this application. Various modifications and variations can be made to this application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the protection scope of this application.
Claims
A knowledge guide-based medical large model fine-tuning optimization method, characterized in that, Comprise: Obtain the multi-modal data to be diagnosed; Input the multi-modal data to be diagnosed into the knowledge guide model to obtain the thinking guide information part in the logic chain prompt, which is used to input into the medical large model to obtain the output standardized by the chain prompt; Wherein, the knowledge guide model comprises a multi-modal-multi-level connection graph fusion module, an expert consensus driven module, an analogical deduction module, an explicit derivation rule calculation module and a progressive thinking guidance module; The multi-modal-multi-level connection graph fusion module is used to obtain the overall embedding feature as a state according to the modality data graph of each modality data of the disease to be diagnosed; The expert consensus driven module is used to extract expert consensus and clinical experience from the expert knowledge base of the disease to be diagnosed, combine the state semantic label corresponding to the overall embedding feature, determine the state relationship label between the overall embedding features, and extract the embedding feature representation of the relationship from the state relationship label, and then combine the overall embedding feature, the state semantic label, the state relationship label and the embedding feature representation of the relationship to construct a state level graph, and then combine the modality data graph of the modality data to obtain a global multi-modal-multi-level connection graph; The analogical deduction module is used to construct an individual multi-modal-multi-level connection graph based on a plurality of triplets through analogical deduction according to the global multi-modal-multi-level connection graph; the triplet comprises two overall embedding features and a corresponding embedding feature representation; The explicit derivation rule calculation module is used to obtain complete reasoning rules of all nodes according to each node of the individual multi-modal-multi-level connection graph; The progressive thinking guidance module is used to combine the modality categories in the global multi-modal-multi-level connection graph and the complete reasoning rules of all nodes to obtain the thinking guide information part in the logic chain prompt. The method for optimizing fine-tuning of a medical large model based on knowledge guidance according to claim 1 is characterized in that, According to the modality data graph of each modality data of the disease to be diagnosed, the overall embedding feature as a state is obtained, comprising: A feature aggregation layer based on a graph neural network aggregates the neighborhood information of each node in the modality data graph of each modality data of the disease to be diagnosed through an aggregation function Aggregate to form a global representation of the node, and extracts the overall embedding feature of the modality data graph of each modality data of the disease to be diagnosed. The method for optimizing fine-tuning of a medical large model based on knowledge guidance according to claim 1 is characterized in that, The analogical deduction based on a plurality of triplets constructs an individual multi-modal-multi-level connection graph, comprising: According to the global multi-modal-multi-level connection graph, an example triplet is obtained; The to-be-predicted triplet and the example triplet are embedded and represented through an analogy feature embedding layer, the feature mapping between modalities is learned through a cross-modal adaptive interaction layer, and the embedding representation corresponding to each modality is obtained; The embedding representation corresponding to each modality is integrated through a multi-modal information fusion layer, and then the predicted triplet after prediction and the feature representation of the predicted triplet after prediction are obtained through a reasoning consistency normalization layer and a prediction output layer; According to the predicted triplet after prediction and the feature representation of the predicted triplet after prediction, the individual multi-modal-multi-level connection graph is constructed. The method for optimizing fine-tuning of a medical large model based on knowledge guidance according to claim 1 is characterized in that, The complete reasoning rules of all nodes are obtained according to each node of the individual multi-modal-multi-level connection graph, comprising: The nodes of the individual multi-modal-multi-layer connection graph are sequentially input into a strategy function of the reasoning rule calculation; The strategy function includes sequentially connected state coding layers, multiple feature extraction layers and a strategy selection layer, and an activation layer is arranged before each of the state coding layers, the multiple feature extraction layers and the strategy selection layer, and the output of the strategy selection layer further includes normalization processing. The method for optimizing fine-tuning of a medical large model based on knowledge guidance according to claim 1 is characterized in that, The mode types in the global multi-modal-multi-layer connection graph and the complete reasoning rules of all nodes are combined to obtain a thinking guide information part in the logic chain prompt, which includes: Representative data of different modes is extracted from the state hierarchical graph of the global multi-modal-multi-layer connection graph, and a question and answer example library with progressive reasoning is constructed for different modal data combinations; then, corresponding question and answer examples are extracted from the question and answer example library according to the mode types in the complete reasoning rules of all nodes, as the thinking guide information part in the logic chain prompt. The method for optimizing fine-tuning of a medical large model based on knowledge guidance according to claim 1 is characterized in that, The loss function used by the analogical deduction module during training is: wherein for the relaxed loss function, for the total number of triples in the training set, for the sine similarity function, for the hidden feature representation of the example triple, to predict the hidden feature representation of the triplets, To maximize the calculation formula, and Respectively, the hidden feature representations of the head and tail entities in the prediction pair. The method for optimizing fine-tuning of a medical large model based on knowledge guidance according to claim 1, characterized in that, The reward function used by the explicit deduction rule calculation module during training is: wherein Jtotal = Jreward + Jcost for global goal reward, rewarding for path efficiency, rewarding for path diversity, for the last node of the individual multi-modal-multi-level connection graph, for path length, for the set of explored paths, for the cosine similarity function, For coefficient of the equation, For coefficient of the equation, For The coefficient of The strategy function parameter update of the explicit deduction rule calculation module during training adopts the Monte Carlo policy gradient method. The method for optimizing fine-tuning of a medical large model based on knowledge guidance according to claim 1, characterized in that, The loss function used by the knowledge guide model during training is: wherein an overall loss function for training the knowledge-guided model, The loss function involved in updating the policy function parameters using the Monte Carlo policy gradient method is denoted as the path optimization loss, The semantic loss function is calculated for the reasoning process and the label of the thinking guide information part in the logic chain prompt: wherein, To label the inference process, For each embedding produced after a semantic encoding model, For each embedding produced after a semantic encoding model, The reasoning process of the thinking guide information part in the logic chain prompt. A knowledge guide-based medical large model fine-tuning optimization system, characterized in that, It includes: A data acquisition module for acquiring multi-modal data to be diagnosed; A prompt output module for inputting the multi-modal data to be diagnosed into the knowledge guide model to obtain the thinking guide information part in the logic chain prompt, and inputting the thinking guide information part into the medical large model to obtain the output normalized by the chain prompt; The knowledge guide model includes a multi-modal-multi-layer connection graph fusion module, an expert consensus driving module, an analogical deduction module, an explicit deduction rule calculation module and a progressive thinking guide module. The multi-modal-multi-layer connection graph fusion module is used to obtain the overall embedding features as states according to the modal data graph of each modal data of the disease to be diagnosed. The expert consensus driving module is used to extract expert consensus and clinical experience from the expert knowledge base of the disease to be diagnosed, combine the overall embedding features corresponding to the state semantic labels, determine the state relationship labels between the overall embedding features, extract the embedding feature representation of the relationship from the state relationship labels, and combine the overall embedding features, the state semantic labels, the state relationship labels and the embedding feature representation of the relationship to construct a state hierarchical graph, and then combine the modal data graph of the modal data to obtain a global multi-modal-multi-layer connection graph. The analogical deduction module is used to construct an individual multi-modal-multi-layer connection graph based on multiple sets of triplets through analogical deduction according to the global multi-modal-multi-layer connection graph; the triplets include two overall embedding features and corresponding embedding feature representations. The explicit deduction rule calculation module is used to obtain complete reasoning rules of all nodes according to the nodes of the individual multi-modal-multi-layer connection graph. The progressive thinking guidance module is used for obtaining the thinking guidance information part in the logic chain prompt in combination with the modality categories in the global multi-modal-multi-level connection graph and the complete reasoning rules of all nodes. A computer program product comprising a computer program, characterized in that The computer program is executed by the processor to realize the knowledge-guided medical large model fine-tuning optimization according to any one of claims 1 to 8.