A closed-loop knowledge enhancement method and system for traditional Chinese medicine syndrome differentiation reasoning and a storage medium

By using tongue and pulse-based four diagnostic methods combined with a TCM syndrome path reasoning algorithm, along with probe prompts and a reliable feedback mechanism, the problem of insufficient qualitative analysis of tongue and pulse and dynamic updating of knowledge graph in TCM diagnostic methods is solved, thereby improving the accuracy and completeness of diagnostic results.

CN122290974APending Publication Date: 2026-06-26HUNAN NORMAL UNIVERSITY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HUNAN NORMAL UNIVERSITY
Filing Date
2026-04-30
Publication Date
2026-06-26

Smart Images

  • Figure CN122290974A_ABST
    Figure CN122290974A_ABST
Patent Text Reader

Abstract

This invention discloses a closed-loop knowledge enhancement method, system, and storage medium for TCM syndrome differentiation reasoning, belonging to the fields of artificial intelligence and TCM intelligent diagnosis and treatment technology. The method includes: extracting entities and labeling attributes from patient symptom descriptions; generating main semantic vectors and attribute embedding vectors using a tongue and pulse-dominated encoding mechanism; performing path search with syndrome differentiation direction constraints in a knowledge graph based on the vectors to obtain candidate syndromes and evidence triples; constructing composite query vectors for supplementary book slice retrieval when knowledge is insufficient, based on knowledge sufficiency evaluation information output by a large language model; extracting candidate triples from supplementary answers and hit slices, writing them into a temporary knowledge graph after credibility verification, and merging them into the knowledge graph when conditions are met. This method can improve the accuracy of TCM syndrome differentiation reasoning and achieve closed-loop processing of syndrome differentiation reasoning, supplementary retrieval, and knowledge write-back.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of artificial intelligence, natural language processing, and intelligent diagnosis and treatment in traditional Chinese medicine, specifically to a closed-loop knowledge enhancement method, system, and storage medium for TCM syndrome differentiation and reasoning. Background Technology

[0002] Traditional Chinese medicine (TCM) syndrome differentiation and treatment is an important method in TCM diagnosis and treatment. It involves comprehensively analyzing the patient's symptoms, tongue appearance, pulse, and other diagnostic information to determine the syndrome type. With the development of artificial intelligence technology, TCM reasoning systems based on natural language processing have gradually become a research hotspot. Their goal is to automatically understand patient descriptions and provide corresponding syndrome differentiation analysis results, thereby assisting in the TCM reasoning and syndrome differentiation decision-making process.

[0003] In existing technologies, one type of method primarily encodes symptom text based on pre-trained language models and then directly performs classification or generative reasoning. However, this type of method usually focuses on semantic information of the text and lacks effective modeling capabilities for the attribute information such as cold / heat and deficiency / excess that are relied upon in traditional Chinese medicine diagnosis. Clinically, the same set of main symptoms may correspond to different syndrome types when combined with different tongue and pulse appearances. Existing encoding methods, because they ignore the qualitative role of the tongue and pulse, tend to encode them as similar vectors, leading to misjudgment of the direction of diagnosis.

[0004] Another approach incorporates TCM knowledge graphs, constructing structured relationships such as "symptom-syndrome-prescription" to achieve graph-based reasoning and question answering. This type of method can improve the interpretability of reasoning to some extent; however, due to the high cost and limited coverage of knowledge graph construction, the system struggles to provide effective answers when encountering knowledge not included in the graph. Furthermore, existing methods often lack effective mechanisms to determine the sufficiency of knowledge, potentially leading to unfounded responses even when knowledge is insufficient.

[0005] In recent years, with the development of large language models, reasoning methods based on retrieval-enhanced generation have been gradually applied in the medical field. These methods typically generate data by retrieving relevant text or knowledge and inputting it into the model. However, in the context of traditional Chinese medicine (TCM) diagnosis, they still have the following shortcomings: First, existing coding methods do not fully reflect the TCM diagnostic characteristics of "tongue and pulse diagnosis, and comprehensive consideration of the four diagnostic methods," making it difficult to distinguish the diagnostic direction when different tongue and pulse combinations with the same main symptom are used. Second, existing knowledge graph retrieval methods are usually based on semantic similarity matching, making it difficult to ensure that the retrieval results are consistent with the diagnostic direction. Third, when the knowledge in the knowledge graph is insufficient to support a complete diagnostic analysis, existing systems usually lack an effective mechanism to identify knowledge gaps and automatically trigger supplementary retrieval. Fourth, when the system supplements knowledge gaps through external literature, the supplemented knowledge is usually not structurally embedded in the knowledge graph, lacking credibility verification and write-back mechanisms, making it difficult to dynamically update the knowledge graph.

[0006] Therefore, how to improve the ability of symptom coding to represent the attributes of TCM syndrome differentiation, how to improve the consistency between knowledge graph retrieval results and syndrome differentiation direction, how to automatically trigger supplementary retrieval when knowledge is insufficient, and how to perform reliable verification of supplementary knowledge and write it back to the knowledge graph to achieve dynamic updating of the knowledge graph are technical problems that urgently need to be solved in this field. Summary of the Invention

[0007] To address the aforementioned technical problems, this invention proposes a closed-loop knowledge enhancement method for TCM diagnostic reasoning. This method solves the existing technical problems at the four levels through the following collaborative technical means:

[0008] At the semantic encoding level, a tongue-pulse-driven four-diagnostic collaborative encoding mechanism (TPD-FDCE) is proposed. Through the collaborative work of components such as four-diagnostic serialization, cold and heat attribute segment labeling, type-aware tongue-pulse anchoring pooling, and tongue-pulse anchoring vector extraction, the encoder outputs a main semantic vector and an attribute embedding vector. The diagnostic direction of the main semantic vector is dominated by tongue-pulse information, and the attribute embedding vector encodes the eight-principle diagnostic attribute direction information, so that the same set of main symptoms will produce vector representations with different diagnostic directions when paired with different tongue-pulses.

[0009] At the knowledge retrieval level, a Traditional Chinese Medicine Syndrome Path Reasoning (TCM-SPR) algorithm is proposed. In each step of the knowledge graph path search, the main semantic vector and attribute embedding vector are comprehensively used to jointly evaluate the semantic relevance and attribute consistency of candidate expansion nodes. Expansion continues only in the direction where the comprehensive evaluation score exceeds the dynamic threshold, realizing the full constraint of the syndrome differentiation direction on the graph search process. And through a multi-source convergence mechanism, candidate syndromes that are jointly reached by multiple symptom paths are found.

[0010] At the knowledge sufficiency assessment level, a structured probe prompting word mechanism is designed, requiring the large language model to cite knowledge evidence, list uncovered symptoms, and output knowledge sufficiency evaluation information while outputting diagnostic conclusions. The system then determines whether the knowledge is sufficient. When knowledge is insufficient, it uses the uncovered symptom information and candidate syndrome information obtained in the first round of retrieval to construct a composite query vector and conduct supplementary retrieval of book slices.

[0011] At the knowledge update level, a Traditional Chinese Medicine Relation Extraction and Credible Feedback (TCM-RECF) algorithm is proposed. Candidate triples are extracted from the supplementary answers of the large language model and the hit book slices. After triple credibility verification of knowledge graph consistency, literature tracing and symptom relevance, they are written into the temporary graph. After meeting the merging conditions of cumulative occurrence and average credibility, they are formally merged into the knowledge graph, realizing the credible self-growth of the knowledge graph.

[0012] Specifically, this invention provides a closed-loop knowledge enhancement method for TCM diagnostic reasoning, comprising the following steps:

[0013] Step S1: Receive the patient's symptom description text, extract symptom entities, and obtain a list of symptom entities and the four diagnostic methods and eight principles of cold and heat attribute annotation for each entity.

[0014] Step S2: The symptom entity is encoded using a tongue and pulse-based four-diagnosis combined coding mechanism. The encoding mechanism comprises the following components working collaboratively: a four-diagnosis sequence arrangement part, which arranges the symptom entities according to the priority order of inspection, palpation, inquiry, and auscultation, placing tongue image words and pulse image words at the beginning of the input sequence, and simultaneously generating tongue image word position masks and pulse image word position masks; a cold / heat attribute segment labeling part, which assigns the segment type identifier of heat syndrome or neutral symptom words to the first value, and assigns the segment type identifier of cold syndrome symptom words to the second value; a type-aware tongue and pulse anchoring pooling part, which, after the pre-trained language model outputs the hidden state sequence, superimposes learnable type embedding vectors on tongue image words and pulse image words respectively, and generates tongue and pulse anchoring query vectors through attention pooling; a tongue and pulse anchoring vector extraction part, which uses the tongue and pulse anchoring query vector as the query, performs attention-weighted aggregation and normalization on the hidden states of all sequence words to obtain the main semantic vector; and an attribute embedding vector extraction part, which uses the tongue and pulse anchoring query vector as input, and obtains the attribute embedding vector after mapping and normalization.

[0015] Step S3 involves using a TCM syndrome path reasoning algorithm to retrieve syndrome knowledge from a knowledge graph. This includes: determining entry nodes and then performing path searches on the knowledge graph from each entry node; calculating the comprehensive expansion score of candidate neighbor nodes in each expansion step, where the comprehensive expansion score is calculated based on the semantic similarity between the candidate node and the main semantic vector, and the attribute consistency between the candidate node and the attribute embedding vector; and continuing expansion only when the comprehensive expansion score exceeds a dynamic expansion threshold; performing convergence analysis and scoring ranking on nodes reached by the search paths of two or more entry nodes to obtain candidate syndromes; collecting triples on the paths leading to candidate syndromes, constructing probe prompts, and feeding them into a large language model.

[0016] Step S4: The probe prompt word constrains the large language model to reason only based on the provided triples, and requires the large language model to cite the triple number as a basis for each statement while outputting the dialectical conclusion, list the symptoms not covered by the triples, and output knowledge adequacy evaluation information including adequacy, partial deficiency, and severe deficiency.

[0017] Step S5: Analyze the response from the large language model and extract the knowledge sufficiency level and the list of uncovered symptoms; determine whether the knowledge is sufficient based on the knowledge sufficiency level and the number of uncovered symptoms; if sufficient, directly output the diagnostic result; if insufficient, proceed to step S6.

[0018] Step S6, constructing a composite query vector, includes: encoding uncovered symptoms into uncovered symptom vectors, encoding the top-ranked candidate syndromes in the first round into syndrome direction vectors, and combining them with the main semantic vector obtained in step S2 for weighted fusion and normalization to obtain a composite query vector; using the composite query vector to search in a pre-constructed book slice vector library, reordering the candidate slices and selecting the target slice, and sending it into the large language model in the format of supplementary prompt words to obtain supplementary answers and output the syndrome differentiation results;

[0019] Step S7: Extract candidate triples from the supplementary answers of the large language model and the hit book slices. Perform triple credibility checks on each candidate triple, including: knowledge graph consistency check, checking whether the new triple has semantic contradictions with existing triples in the knowledge graph, and rejecting it directly if there are contradictions; literature source check, checking whether the head and tail entities of the new triple appear in the original text of the hit book slice; symptom relevance check, checking whether the new triple is related to the symptoms of the current patient; calculate the comprehensive credibility score according to the triple check results and preset weights, and write the candidate triples that exceed the adoption threshold into the temporary knowledge graph; when the cumulative number of occurrences of the triple in the temporary knowledge graph is not less than the preset number threshold and the average credibility score of each occurrence is not less than the preset score threshold, the triple is merged into the formal knowledge graph.

[0020] Furthermore, in the extraction of the tongue vein anchoring vector, the attention score is calculated as follows: the tongue vein anchoring query vector is used to perform a dot product operation with the hidden state of each word through a trainable projection matrix and divided by the square root of the hidden layer dimension. After softmax normalization and filling position mask, the attention weight of each word is obtained. The trainable projection matrix is ​​one of the parameters added to the base pre-trained language model by the encoding mechanism.

[0021] Furthermore, in the path search of the dialectical direction constraint, the comprehensive expansion score is calculated as follows: the cosine similarity between the candidate node pre-encoded main semantic vector and the symptom main semantic vector is multiplied by a first coefficient, and the cosine similarity between the candidate node attribute vector and the symptom attribute embedding vector is multiplied by a second coefficient, the sum of the first coefficient and the second coefficient being 1; the dynamic expansion threshold is calculated as follows: ; Where d is the current search depth Based on the threshold, This is the depth attenuation coefficient. This is the lower limit of the threshold.

[0022] Furthermore, the knowledge sufficiency evaluation information includes three levels: sufficient, partially insufficient, and severely insufficient. When the evaluation information is sufficient, it is determined that the knowledge is sufficient and the diagnosis result is directly output. When the evaluation information is partially insufficient and the number of uncovered symptoms is less than a preset threshold, it is determined that the knowledge is basically sufficient and the diagnosis result is directly output. When the evaluation information is partially insufficient and the number of uncovered symptoms is not less than the preset threshold, it is determined that the knowledge is insufficient and a supplementary search is triggered. When the evaluation information is severely insufficient, it is determined that the knowledge is insufficient and a supplementary search is triggered.

[0023] Furthermore, the re-ranking of the diagnostic direction perception comprehensively considers three factors: the cosine similarity between the composite query vector and the slice vector, the overlap rate between the slice entity labels and the uncovered symptoms, and the consistency between the slice content and the candidate syndrome direction; the three factors are weighted and summed according to preset weights to obtain a comprehensive ranking score.

[0024] Preferably, the encoding mechanism is trained using a contrastive learning approach, and the loss function includes intra-batch contrast loss, tongue and vein confusion contrast loss, and attribute consistency auxiliary loss, in order to enhance the encoder's ability to distinguish the direction of diagnosis.

[0025] Furthermore, in the triple credibility verification, the weight of the literature source verification is higher than that of the knowledge graph consistency verification and the symptom relevance verification; the knowledge graph consistency verification determines whether the new triplet has a semantic contradiction with the existing triplet through a predefined contradiction rule table. If there is a contradiction, the candidate triplet is directly rejected regardless of the overall credibility score.

[0026] In some embodiments, when the patient's symptom description does not include tongue and pulse information, the encoding mechanism automatically downgrades to a standard encoding mode that does not contain dominant tongue and pulse information.

[0027] Preferably, the attribute embedding vector is extracted using the tongue vein anchoring query vector as input.

[0028] This invention also provides a closed-loop knowledge enhancement system for TCM diagnostic reasoning, comprising: The knowledge precoding module is used to encode entities and book slices of the TCM knowledge graph offline and store them in the entity vector index and slice vector database, respectively. The symptom coding module is used to receive patient symptom descriptions, extract symptom entities, and label them with the four diagnostic methods and cold / heat attributes. It outputs the main semantic vector and attribute embedding vector through a tongue and pulse-based coding mechanism. The graph reasoning module is used to perform path search for dialectical direction constraints in the knowledge graph based on the main semantic vector and attribute embedding vector. Candidate syndromes are obtained through multi-entry path convergence and comprehensive scoring, and probe prompt words are constructed and sent into the large language model. The knowledge sufficiency determination module is used to parse the output of the large language model and determine whether the knowledge is sufficient. When the knowledge is sufficient, it outputs the dialectical result; when the knowledge is insufficient, it triggers the literature supplementation search module. The literature supplementary retrieval module is used to construct composite query vectors when knowledge is insufficient, perform supplementary retrieval and reordering in the slice vector library, and send them into the large language model to obtain supplementary dialectical results. The knowledge feedback module is used to extract candidate triples from supplementary answers and hit slices, write them into the temporary knowledge graph after credibility verification, and merge them into the formal knowledge graph when the merging conditions are met, triggering the knowledge precoding module to update.

[0029] The present invention also provides a computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the above-mentioned closed-loop knowledge enhancement method for TCM syndrome differentiation and reasoning.

[0030] The beneficial effects of this invention include:

[0031] First, the TPD-FDCE encoding mechanism can highlight the dominant role of tongue and pulse information in diagnosis, enabling the same set of main symptoms to form different diagnostic directions when paired with different tongue and pulses, thereby improving the ability to distinguish between different symptoms in the same scenario.

[0032] Second, the TCM-SPR algorithm comprehensively utilizes the main semantic vector and attribute embedding vector for directional constraints during the knowledge graph path search process, so that the retrieval results are consistent with the dialectical direction, thereby improving the relevance of candidate syndromes and evidence triples.

[0033] Third, the probe prompting word mechanism and the knowledge sufficiency judgment mechanism enable the system to identify knowledge gaps and obtain supplementary evidence by supplementing the search through book slices when knowledge is insufficient, thereby improving the completeness of the dialectical results.

[0034] Fourth, the trusted feedback mechanism filters and verifies new knowledge through triple trust verification and temporary knowledge graph storage mechanism, reducing the risk of erroneous knowledge being written into the formal knowledge graph, and forming a closed-loop update process of "retrieval - supplementation - verification - write-back". Attached Figure Description

[0035] Figure 1 This is a schematic diagram of the module structure of the TCM syndrome differentiation and reasoning closed-loop knowledge enhancement system described in this invention.

[0036] Figure 2 This is a schematic diagram of the internal processing flow of the TPD-FDCE encoding mechanism described in this invention;

[0037] Figure 3 This is a schematic diagram of the dialectical direction constraint path search process of the TCM-SPR algorithm described in this invention;

[0038] Figure 4 This is a schematic diagram illustrating the logical judgment of knowledge sufficiency determination and process routing as described in this invention;

[0039] Figure 5 This is a schematic diagram of the composite query vector construction and reordering process in the supplementary document retrieval described in this invention;

[0040] Figure 6 This is a schematic diagram of the triple credibility verification and temporary map merging process of the TCM-RECF algorithm described in this invention;

[0041] Figure 7 This is a schematic diagram of the complete closed-loop process of the system described in this invention. Detailed Implementation

[0042] The present invention will be further described in detail below with reference to the accompanying drawings and embodiments. The following embodiments are only used to illustrate the technical solutions of the present invention and are not intended to limit the scope of protection of the present invention. I. System Overall Architecture

[0043] like Figure 1 As shown, the system of the present invention comprises six functional modules: knowledge precoding module, symptom coding module, graph reasoning module, knowledge sufficiency determination module, literature supplementation retrieval module, and knowledge feedback module.

[0044] The knowledge precoding module executes offline, precoding knowledge graph entities and book slices into vectors, which are then stored in the entity vector index and slice vector database, respectively. The symptom encoding module receives patient symptom descriptions, extracts symptom entities, and outputs a 768-dimensional main semantic vector and a 32-dimensional attribute embedding vector using the TPD-FDCE encoding mechanism. The graph reasoning module performs path search on the knowledge graph based on these vectors, constrained by dialectical direction, to obtain candidate syndromes and triple sets, constructing probe prompts which are then fed into the large language model. The knowledge sufficiency judgment module parses the output of the large language model, determining whether the knowledge is sufficient. If sufficient, the result is directly output; otherwise, the literature supplementation retrieval module is triggered. The literature supplementation retrieval module constructs a composite query vector for book slice retrieval and reordering of dialectical direction perception, sending the selected slices into the large language model to obtain supplementary answers. The knowledge feedback module extracts candidate triples from the supplementary answers and hit slices, performs triple credibility checks, and writes them into the temporary knowledge graph. When the merging conditions are met, they are merged into the formal knowledge graph, forming a knowledge loop. II. Module 1: Knowledge Precoding Module

[0045] In one embodiment, the entire text of a traditional Chinese medicine book is divided into segments of 100 to 300 characters each, with adjacent segments retaining a 50-character overlap. Each segment is labeled with entity tags (symptoms, syndromes, prescriptions, drugs, viscera, pathogenesis, tongue appearance, pulse) using a large language model, and the labeling results are stored along with the segment text.

[0046] For each entity in the knowledge graph, a contextual description text can be constructed. For example, triples with the entity as the head or tail entity can be collected, and the entity name and the natural language representation of the triples can be concatenated to form the description text. In one embodiment, the number of triples collected can be capped, for example, up to 10. For example, the description text for the entity "liver and gallbladder damp-heat" could be: "Liver and gallbladder damp-heat. Bitter taste in the mouth is common in liver and gallbladder damp-heat. A red tongue with a yellow and greasy coating suggests liver and gallbladder damp-heat. Treatment for liver and gallbladder damp-heat should focus on clearing and draining damp-heat. The formula for liver and gallbladder damp-heat is Gentianae Radix et Rhizoma Decoction."

[0047] The entity context description and book slices are encoded into 768-dimensional main semantic vectors and 32-dimensional attribute embedding vectors respectively using the knowledge text encoding method of the TPD-FDCE encoder, and stored in the entity vector index and slice vector database. Module 2: Symptom Coding Module 3.1 Symptom Entity Extraction

[0048] After receiving the patient's symptom description text, the system uses a large language model to output the name, four diagnostic methods (inspection / auscultation / inquiry / palpation), and eight principles of cold / heat (hot / cold / neutral) for each symptom entity in JSON format. The model output can be validated and corrected using a pre-built attribute lookup table, which can be compiled from materials such as "Traditional Chinese Medicine Diagnostics".

[0049] Taking the input "The patient has recently experienced a bitter taste in his mouth, accompanied by chest tightness and discomfort. Examination reveals a red tongue with a yellow and greasy coating, and a wiry, slippery, and rapid pulse" as an example, the extracted results are: bitter taste in mouth (inquiry, fever), chest tightness and discomfort (inquiry, neutral), red tongue with a yellow and greasy coating (inspection, fever), and wiry, slippery, and rapid pulse (palpation, fever). 3.2 TPD-FDCE Encoding Mechanism

[0050] like Figure 2 As shown, the TPD-FDCE uses MC-BERT as its base model (768 hidden layer dimensions, 12 Transformer layers). MC-BERT is a BERT-like language model pre-trained for Chinese medical corpora. Without modifying its internal structure, it realizes dialectical reasoning logic through the collaborative design of input organization and output extraction methods. It consists of the following components. 3.2.1 Serialization of the Four Diagnostic Methods

[0051] The symptom entities are rearranged according to the priority of inspection, palpation, inquiry, and auscultation, with tongue and pulse image terms placed at the beginning of the input sequence. Entities are connected by [SEP], with [CLS] added at the beginning and [SEP] at the end. A tongue image term position mask is also generated. and pulse image metaposition mask .

[0052] Taking Section 3.1 as an example, after serialization, it is: [CLS] Red tongue with yellow and greasy coating [SEP] Wiry, slippery, and rapid pulse [SEP] Bitter taste in mouth [SEP] Chest tightness and discomfort [SEP]. The tongue and pulse are located at the beginning of the sequence, which is beneficial for them to have a more significant impact on the whole sequence in multi-layer self-attention calculation. 3.2.2 Cold and Hot Attribute Segment Marking

[0053] use Carrying cold / heat attributes: hot or neutral symptom terms are assigned a value of 0, cold symptom terms are assigned a value of 1, and [CLS] and [SEP] are assigned a value of 0. After paragraph embedding is superimposed on the initial term representation, it participates in 12 layers of self-attention calculation, so that the information flow carries the cold / heat direction signal.

[0054] The cold and heat attribute segment labeling and the four diagnostic methods sequence can work together to enable the main symptom words in the posterior part to obtain semantic information and cold and heat direction information at the same time while focusing on the tongue and pulse words in the anterior part, thereby enhancing the ability of the encoding results to represent the direction of syndrome differentiation. 3.2.3 Type-aware tongue vein anchoring pooling

[0055] Input input_ids and token_type_ids into MC-BERT to obtain the hidden state sequence. Then, learnable tongue image type embedding vectors are introduced. and pulse type embedding vector These are respectively superimposed onto the hidden states of the tongue image and pulse image word units. This is achieved through learnable attention projection vectors. Attention scores are calculated for the type-enhanced tongue vein lexical units, and then weighted and aggregated after softmax normalization to obtain the tongue vein anchored query vector. .

[0056] In a preferred embodiment, the type embedding vector can be trained to automatically learn the relative importance of tongue and pulse appearance in diagnosis. 3.2.4 Extraction of tongue vein anchoring vector

[0057] by For the query vector, through a trainable projection matrix Calculate attention scores for the hidden states of all word terms in the entire sequence: Attention weights are obtained after softmax normalization and filling position masks. The main semantic vector is obtained by weighted aggregation and L2 normalization: .

[0058] because The semantics of the tongue and pulse and the information of the direction of diagnosis are encoded. The weighted aggregation assigns higher weight to the main symptoms with consistent diagnosis directions and lower weight to the symptoms with contradictory directions. 3.2.5 Attribute Embedding Vector Extraction

[0059] by The input is processed by a fully connected network and normalized to obtain the attribute embedding vector. In one embodiment, a two-layer fully connected network can be used to project the input from 768 dimensions to 32 dimensions to obtain the attribute embedding vector. .by As input, it helps to preserve the diagnostic attributes information related to tongue and pulse. 3.2.6 Synergistic Effect of Each Component

[0060] The above components can work together: the four diagnostic methods sequence provides the advantage of the sequence position of tongue and pulse words, the cold and heat attribute segment label injects the direction information of the syndrome differentiation attribute, the type-aware tongue and pulse anchor pooling generates the tongue and pulse anchor query vector, the tongue and pulse anchor vector extraction forms the main semantic vector, and the attribute embedding vector extraction forms an independent attribute representation, which together enhances the ability of the encoding result to distinguish the direction of syndrome differentiation. 3.2.7 Knowledge Text Encoding Methods

[0061] For knowledge text (book slices or entity context descriptions), the MC-BERT standard encoding is used: no four diagnostic methods or cold / hot segment marking are performed, token_type_ids are all 0, mean pooling is used to obtain a 768-dimensional main semantic vector, and a 32-dimensional attribute embedding vector is obtained from the [CLS] hidden state through the attribute projection head. Both ends share MC-BERT parameters and attribute projection head parameters. 3.2.8 Degradation processing when there is no tongue pulse information

[0062] When the patient does not provide information about the tongue and pulse. and All vectors are all zeros; type-aware tongue vein anchoring pooling produces a uniform distribution. It degenerates into the mean of the entire sequence, and subsequent extraction is performed normally to form a standard coding pattern that does not contain dominant tongue vein information. 3.2.9 Adding Parameters and Training

[0063] In one embodiment, the additional parameters of the encoding mechanism compared to the underlying pre-trained language model include the projection matrix. Tongue image type embedding vector Pulse type embedding vector The parameters include the attention projection vector 'a' and the attribute projection head. Preferably, the total number of newly added parameters can be controlled within a low proportion of the base model's parameters to reduce parameter update overhead and improve fine-tuning efficiency.

[0064] Preferably, the encoding mechanism can be fine-tuned using a contrastive learning approach. Positive sample pairs can be constructed from two sources: knowledge graphs and book slices. Positive sample pairs from the knowledge graph can be constructed by collecting associated symptoms using syndrome nodes, while positive sample pairs from book slices can be constructed by selecting slices that simultaneously contain both symptom entities and syndrome entities. Negative samples can include intra-batch negative sampling samples and tongue-pulse confusion negative samples. The tongue-pulse confusion negative samples refer to knowledge texts with the same main symptom but different tongue and pulse patterns, corresponding to different syndromes.

[0065] In one embodiment, the training loss function includes intra-batch contrast loss, tongue and pulse confusion contrast loss, and attribute consistency auxiliary loss. The intra-batch contrast loss is used to increase the similarity of correct symptom-knowledge matching pairs and decrease the similarity of incorrect matching pairs; the tongue and pulse confusion contrast loss is used to enhance the encoding mechanism's ability to distinguish between scenarios with the same primary symptom but different tongue and pulse characteristics; the attribute consistency auxiliary loss is used to bring samples with the same cold / heat attribute closer together and samples with different cold / heat attributes further apart in the attribute embedding space. Preferably, the training loss function can be expressed as: ; in, and For preset weighting coefficients, For intra-batch comparison loss, Due to the loss of contrast caused by confusion between the tongue and pulse, Auxiliary loss for attribute consistency.

[0066] In a preferred embodiment, a phased training strategy can be employed. The first phase trains newly added parameters and at least some network layers, while the second phase unfreezes all parameters for joint fine-tuning. During training, the AdamW optimizer can be used, and the optimal model parameters are selected based on retrieval performance metrics on the validation set. In some embodiments, the training process can be completed by combining a preset learning rate, batch size, temperature coefficient, and early stopping strategy. IV. Module 3: Graph Reasoning Module

[0067] like Figure 3 As shown, the graph reasoning module adopts the Traditional Chinese Medicine Syndrome Path Reasoning Algorithm (TCM-SPR), based on the main semantic vector. and attribute embedding vector Retrieving dialectical knowledge from a knowledge graph may include the following steps. 4.1 Determining the Entry Node

[0068] For each symptom entity, precise matching is prioritized; if a corresponding entity exists in the knowledge graph, it is directly anchored. If no corresponding entity exists, vector matching can be performed. In one embodiment, the entity with the highest cosine similarity exceeding a preset threshold can be selected as the matching result. Furthermore, the main semantic vector can be used to supplement and recall several similar entity nodes in the entity vector index, collectively forming the entry node set. In one embodiment, the preset threshold can be 0.75, and the supplementary recall quantity can be 10. 4.2 Path Search under Dialectical Directional Constraints

[0069] From the set of entry nodes Each node in the algorithm initiates a path search. In one embodiment, a breadth-first search can be used, with a maximum search depth of 3 hops. In each expansion step, a comprehensive expansion score is calculated for candidate neighbor nodes. This score is weighted by the semantic similarity between the candidate node and the main semantic vector, and the attribute consistency between the candidate node and the attribute embedding vector. An exemplary calculation method is as follows: ; in, This represents the precoded main semantic vector of the candidate neighbor node. This represents the main semantic vector corresponding to the current patient's symptoms. The attribute vector representing the candidate neighbor node. This represents the attribute embedding vector corresponding to the current patient's symptoms. Only when A neighbor node is added to the search queue only when the dynamic expansion threshold is exceeded. In one embodiment, the dynamic expansion threshold may take the following form: ;in, Indicates the current search depth.

[0070] By employing the aforementioned path search, the graph search process can be made more aligned with the diagnostic direction corresponding to the current symptom combination. For example, when tongue and pulse information points to a heat syndrome, the main semantic vector and attribute embedding vector can jointly guide the search to preferentially expand to candidate nodes consistent with the heat syndrome direction. 4.3 Multi-entry path convergence analysis

[0071] Record which entry nodes reached each searched node, and mark nodes that are reached by the search paths of two or more entry nodes as convergence nodes. The convergence node typically represents a candidate node that is associated with multiple current symptoms. 4.4 Scoring and Ranking of Candidate Syndromes

[0072] The convergence nodes are scored and ranked to obtain candidate symptoms. In one embodiment, the scoring may comprehensively consider factors such as symptom coverage, path density, entity type information, and similarity to the main semantic vector. An exemplary scoring method is as follows: ; in, , , and The preset weighting coefficients are used. In one embodiment, the top K nodes with the highest scores can be selected as candidate evidence, for example, K can be 3. 4.5 Triple Collection and Probe Cue Construction

[0073] Triples leading to candidate syndromes are collected, deduplicated, and sorted by relevance. The triple set and candidate syndrome information are organized into probe prompts, which may include role settings, patient symptom information, a numbered list of triples, candidate syndrome information, and output format instructions. In some embodiments, the output format instructions may require the large language model to output the diagnostic conclusion, the basis for citing each triple number, uncovered symptoms, knowledge adequacy assessment information, and corresponding treatment methods or prescriptions. V. Module Four: Knowledge Sufficiency Assessment Module

[0074] like Figure 4As shown, the dialectical analysis output of the large language model based on probe prompts is structured and parsed to extract the knowledge sufficiency level and the list of uncovered symptoms, and the subsequent process is determined accordingly.

[0075] In one embodiment, the following judgment rules can be adopted: when the evaluation information is "sufficient," the diagnosis result is directly output; when the evaluation information is "partially insufficient" and the number of uncovered symptoms is less than a preset number, the diagnosis result is directly output with an additional prompt; when the evaluation information is "partially insufficient" and the number of uncovered symptoms is not less than the preset number, supplementary retrieval is triggered; when the evaluation information is "severely insufficient," supplementary retrieval is triggered. In one embodiment, the preset number can be 2. When supplementary retrieval is triggered, the list of uncovered symptoms, the name of the highest-ranked candidate syndrome, and the original main semantic vector can be passed to the supplementary retrieval module. . VI. Module Five: Supplementary Literature Search Module

[0076] like Figure 5 As shown, a composite query vector is constructed by fusing the output information from the first round of retrieval. In one embodiment, the composite query vector can be represented as: ; in, The symptom encoding vector is the one that is not covered. This is the candidate syndrome encoding vector. This is the original symptom semantic vector. The weight of the uncovered symptom vector can be higher than other components to highlight knowledge gap information.

[0077] After recalling candidate slices from the slice vector library using the composite query vector, a direction-aware re-ranking can be performed. In one embodiment, the 20 candidate slices with the highest similarity can be recalled first, and then re-ranked according to the following exemplary scoring formula: ; in, This indicates the overlap rate between the slice entity labels and the uncovered symptoms. This indicates the consistency between the slice content and the direction of the candidate symptoms. In one embodiment, the first 5 slices can be selected as target slices.

[0078] The target slice is fed into the large language model in the form of supplementary prompts. The supplementary prompts may include the first round of diagnosis conclusions, the list of uncovered symptoms, and the supplementary slice text. The diagnosis analysis is required to be improved by combining the supplementary information and output in a predetermined format as the final diagnosis result. 7. Module Six: Knowledge Feedback Module

[0079] like Figure 6As shown, after obtaining supplementary answers and hit slices, candidate triples can be extracted from the supplementary answers and hit slices. In one embodiment, a large language model can be used to extract candidate triples in a specified JSON format, and the candidate relation types may include common in, prompt, belong to, appropriate for treatment, prescription, containing medicine, and pathogenesis, etc. 7.1 Triple Credibility Verification

[0080] The first layer is a Knowledge Graph (KG) consistency check. This check is used to verify whether the candidate triples contradict existing triples in the knowledge graph. In one embodiment, if a semantic contradiction is found, the candidate triple is rejected directly; if no contradiction is found, the corresponding consistency confidence score is given.

[0081] The second layer is a literature source verification. This verification checks whether the head and tail entities of the candidate triples appear in the original text of the matched book slice. In one embodiment, a high confidence level is given when both the head and tail entities appear, a medium confidence level is given when only one entity appears, and a low confidence level is given when neither appears.

[0082] The third layer is the symptom relevance check. This check examines whether the candidate triples are relevant to the current patient's symptoms. In one embodiment, different levels of relevance confidence can be assigned based on the degree of match.

[0083] The overall credibility can be calculated by weighting the results of the three layers of verification mentioned above. An example calculation method is as follows: ; in, This represents the consistency confidence score obtained from the knowledge graph consistency check. This represents the source tracing confidence level obtained from the literature source tracing verification. This represents the confidence level of the correlation obtained from the symptom correlation test. 7.2 Temporary Storage and Merging of Atlases

[0084] When the overall credibility exceeds the adoption threshold, candidate triples can be written into the temporary knowledge graph, and their cumulative occurrence count, credibility scores, and source slice information can be recorded. When a triple in the temporary knowledge graph simultaneously meets both the preset occurrence count threshold and the preset average credibility threshold, it can be merged into the formal knowledge graph after reconfirmation of no contradictions, and the knowledge precoding module will be triggered to update the entity vector index. In one embodiment, the adoption threshold can be 0.6, the occurrence count threshold can be 3, and the average credibility threshold can be 0.7.

[0085] By using temporary knowledge graph storage and multiple verification mechanisms, the risk of erroneous knowledge being written into the formal knowledge graph can be reduced, thereby improving the reliability of dynamic updates to the knowledge graph. VIII. Examples

[0086] The patient described: "Recently, I have experienced a bitter taste in my mouth, accompanied by chest tightness and discomfort, and occasional nausea. Examination revealed a red tongue with a yellow and greasy coating, and a wiry, slippery, and rapid pulse."

[0087] Symptom entity extraction results may include: bitter taste in the mouth (inquiry, fever), chest tightness and discomfort (inquiry, neutral), nausea (inquiry, neutral), red tongue with yellow and greasy coating (inspection, fever), and wiry, slippery, and rapid pulse (palpation, fever).

[0088] During TPD-FDCE encoding, the four diagnostic methods, after serialization, can be represented as: [CLS] Red tongue with yellow, greasy coating [SEP] Wiry, slippery, and rapid pulse [SEP] Bitter taste in mouth [SEP] Chest tightness and discomfort [SEP] Nausea [SEP]. Since all symptoms in this embodiment are either heat syndromes or neutral, the corresponding token_type_ids can all be 0. After encoding, the main semantic vector v_s is biased towards heat syndromes, and the attribute embedding vector... It also points in the direction of heat syndrome.

[0089] During TCM-SPR atlas inference, three entry nodes can be precisely matched: "bitter taste in the mouth," "red tongue with yellow and greasy coating," and "wiry, slippery, and rapid pulse." Taking "bitter taste in the mouth" as an example, in one embodiment, the first-hop neighbor "damp-heat in the liver and gallbladder" is... The value is 0.82, which is higher than the corresponding threshold, so it passes the screening; "Liver Qi Stagnation and Spleen Deficiency" The value is 0.28, which is below the corresponding threshold, so it is filtered out. "Damp-heat in the liver and gallbladder" can be reached by all three entry node paths and ranks first after scoring.

[0090] When determining the sufficiency of knowledge, the large language model can output "partially insufficient", and the uncovered symptom is "nausea". Since the number of uncovered symptoms is less than the preset number, it can be determined as basically sufficient, and the diagnosis result is output: Liver and gallbladder damp-heat syndrome, the treatment should be to clear and drain liver and gallbladder damp-heat, and the prescription is Gentianae Radix et Rhizoma Decoction.

[0091] In contrast, if the tongue and pulse are replaced with "pale tongue with white coating, thready and wiry pulse," the main semantic vector and attribute embedding vector output by TPD-FDCE can be biased towards cold syndrome. In TCM-SPR atlas reasoning, "liver stagnation and spleen deficiency" can be screened while "damp-heat in the liver and gallbladder" is filtered out. The final diagnosis can become liver stagnation and spleen deficiency syndrome, and the prescription is Xiaoyao San. This comparison shows that the same set of main symptoms, when combined with different tongue and pulse patterns, can lead to different diagnosis conclusions after processing by this invention. IX. Storage Media

[0092] This invention also provides a computer-readable storage medium storing a computer program. When executed by a processor, the computer program implements the steps of the closed-loop knowledge enhancement method for TCM syndrome differentiation and reasoning as described in the above embodiments. The computer-readable storage medium includes, but is not limited to, media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory, random access memory, magnetic disks, or optical disks. 10. Other Instructions

[0093] The above description is merely a preferred embodiment of the present invention and is not intended to limit the invention. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.

Claims

1. A closed-loop knowledge enhancement method for TCM syndrome differentiation reasoning, characterized in that, Includes the following steps: Step S1: Receive the patient's symptom description text, extract symptom entities, and obtain a list of symptom entities and the four diagnostic methods and cold / heat attribute labels corresponding to each symptom entity. Step S2 involves encoding the symptom entities using a tongue and pulse-dominant encoding mechanism. This includes: arranging the symptom entities according to the priority of observation, palpation, inquiry, and auscultation, placing tongue image and pulse image words at the beginning of the input sequence, and generating tongue image and pulse image word position masks; assigning the paragraph type identifiers of heat syndrome attribute symptom words and neutral attribute symptom words to the first value, and assigning the paragraph type identifiers of cold syndrome attribute symptom words to the second value, and inputting them into a pre-trained language model to obtain the full sequence hidden state; superimposing type embedding vectors on the tongue image and pulse image word hidden states respectively, and generating a tongue and pulse anchoring query vector through attention pooling; performing attention weighted aggregation and normalization on the full sequence hidden state based on the tongue and pulse anchoring query vector to obtain the main semantic vector, and obtaining the attribute embedding vector based on the tongue and pulse anchoring query vector through a projection network; Step S3: Using the main semantic vector and attribute embedding vector as queries, determine the knowledge graph entry node set through exact matching or vector similarity matching, and perform a breadth-first search starting from each entry node; in each expansion step, calculate the first similarity between the pre-encoded main semantic vector of the candidate neighbor node and the main semantic vector, and the second similarity between the attribute vector of the candidate neighbor node and the attribute embedding vector, and use the weighted combination of the first similarity and the second similarity as the comprehensive expansion score. Expansion continues only when the comprehensive expansion score exceeds the dynamic expansion threshold that decreases with search depth; Nodes that are reached by paths from two or more entry nodes are marked as convergence nodes, and candidate symptoms are obtained by sorting them according to the comprehensive score, and probe prompt words are constructed by collecting triples on the path; Step S4: The probe prompt words are sent to the large language model, and the large language model is required to output the dialectical conclusion, the symptoms not covered by the triple, and the knowledge adequacy evaluation information based on the triple. Step S5: Analyze the response from the large language model; when the knowledge sufficiency evaluation information represents sufficient knowledge, directly output the dialectical conclusion; when the knowledge sufficiency evaluation information represents insufficient knowledge, encode the uncovered symptoms as uncovered symptom vectors, encode the candidate syndrome with the highest ranking in the first round as syndrome direction vectors, and weight and fuse the uncovered symptom vectors, the syndrome direction vectors, and the main semantic vectors and normalize them to obtain a composite query vector. After supplementary retrieval and re-sorting in the book slice vector library, it is sent to the large language model to obtain a supplementary dialectical conclusion. Step S6: After the supplementary search is triggered in step S5, candidate triples are extracted from the supplementary answers and the hit book slices. Knowledge graph consistency check, literature source check, and symptom relevance check are performed on the candidate triples. If the knowledge graph consistency check determines that there is a semantic contradiction, the candidate triple is rejected. For the candidate triples that are not rejected, the comprehensive credibility is calculated based on the results of the knowledge graph consistency check, the literature source check, and the symptom relevance check. When the comprehensive credibility reaches the adoption threshold, the candidate triple is written into the temporary knowledge graph. When the candidate triples in the temporary knowledge graph meet the preset merging conditions, they are merged into the formal knowledge graph after the knowledge graph consistency check is performed again.

2. The method of claim 1, wherein, In step S2, the projection network includes two fully connected layers, and the dimension of the attribute embedding vector is lower than the hidden layer dimension of the pre-trained language model.

3. The method of claim 1, wherein, When the patient's symptom description text does not contain tongue image information and pulse image information, the tongue image word position mask and the pulse image word position mask are empty, and the attention pooling degenerates into uniform attention pooling of the entire sequence hidden state to generate the main semantic vector and the attribute embedding vector.

4. The method of claim 1, wherein, In step S3, the dynamic expansion threshold is calculated as follows: ; where d is the current search depth, is a base threshold, is a depth attenuation coefficient, is a threshold lower bound.

5. The method of claim 1, wherein, The knowledge adequacy evaluation information includes adequacy, partial deficiency, and severe deficiency. When the knowledge adequacy evaluation information is adequacy, a diagnostic conclusion is directly output. When the knowledge adequacy evaluation information is partial deficiency and the number of uncovered symptoms is lower than a preset threshold, a diagnostic conclusion is directly output. When the knowledge adequacy evaluation information is partial deficiency and the number of uncovered symptoms is not lower than the preset threshold or the knowledge adequacy evaluation information is severe deficiency, a supplementary search is triggered.

6. The method according to claim 1, characterized in that, The reordering in step S5 takes into account the following factors: The similarity between the composite query vector and the slice vector, the overlap rate between the slice entity labels and the uncovered symptoms, and the consistency between the slice content and the direction of the candidate syndrome are all considered.

7. The method according to claim 1, characterized in that, In step S6, the preset merging condition is: the cumulative number of occurrences of candidate triples in the temporary map is not less than a preset number threshold and the average confidence level is not lower than a preset score threshold; the literature tracing verification is used to determine whether the head entity and tail entity of the candidate triple appear in the hit book slice.

8. A closed-loop knowledge enhancement system for TCM diagnostic reasoning, characterized in that, include: The knowledge precoding module is used to encode entities and book slices of the TCM knowledge graph offline and store them in the entity vector index and slice vector database, respectively. The symptom coding module is used to receive patient symptom descriptions, extract symptom entities and label them with the four diagnostic methods and cold / heat attributes, and output the main semantic vector and attribute embedding vector through a tongue and pulse-dominated coding mechanism. The graph reasoning module is used to perform path search for dialectical direction constraints in the knowledge graph based on the main semantic vector and the attribute embedding vector. Candidate syndromes are obtained through multi-path convergence and comprehensive scoring, and probe prompt words are constructed and sent into the large language model. The knowledge sufficiency determination module is used to parse the output of the large language model and obtain knowledge sufficiency evaluation information. When the knowledge is sufficient, it outputs a dialectical conclusion and triggers the supplementary retrieval module when the knowledge is insufficient. The supplementary retrieval module is used to construct a composite query vector using the output information of the first round of retrieval when knowledge is insufficient. It then performs supplementary retrieval and reordering in the slice vector library and sends it into the large language model to obtain supplementary dialectical conclusions. The knowledge feedback module is used to extract candidate triples from supplementary answers and hit slices. After passing the knowledge graph consistency check, literature tracing check and symptom relevance check, they are written into the temporary knowledge graph. When the preset merging conditions are met, they are merged into the formal knowledge graph and the knowledge precoding module is updated.

9. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the closed-loop knowledge enhancement method of TCM syndrome differentiation reasoning as described in any one of claims 1 to 7.