A large language model enhanced question and answer generation method
By using the KnowledgeSubgraphsEnhanced LLMs framework and leveraging knowledge retrieval modules and subgraph construction techniques, the problems of data leakage and illusion in medical question answering using large language models are solved, achieving efficient and accurate question answer generation.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- NAT UNIV OF DEFENSE TECH
- Filing Date
- 2024-01-26
- Publication Date
- 2026-06-12
Smart Images

Figure CN118013051B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of knowledge graph technology, and in particular relates to a question-answering generation method enhanced by a large language model. Background Technology
[0002] Medical question answering (Q&A) involves using knowledge from domain-specific medical knowledge bases (such as documents, tables, and knowledge graphs) to answer patient queries. Since the introduction of knowledge graphs (KGs), they have become the primary form of medical knowledge storage and representation, prompting extensive research into Knowledge Graph Question Answering (KGQA). Most KGQA studies rely on the structural information of KGs for reasoning and answering. However, despite representing knowledge in a structured way, knowledge graphs have limitations in semantic interpretation and text generation capabilities.
[0003] Large Language Models (LLMs) have demonstrated impressive performance in Natural Language Processing (NLP) tasks such as text-based question answering and data reasoning. The emergence and advancements of LLMs have brought new breakthroughs to the development of intelligent healthcare systems. However, the inherent limitations and knowledge aging of LLMs can lead to incorrect and misleading outputs (i.e., “illusions”), making their direct deployment in medical question answering problematic. To address this issue, existing research has extensively investigated and evaluated aspects such as implementation fine-tuning guidelines, model performance evaluation, and the development of relevant datasets.
[0004] Medical LLMs primarily focus on fine-tuning external medical knowledge bases to enable LLMs to learn medical knowledge. Med-PaLM provides a benchmark for evaluating LLM clinical knowledge and improves upon Flan-PaLM by fine-tuning on medical datasets using guided prompts, significantly outperforming previous models. Med-PaLM2 combines PaLM2 with medical-domain fine-tuning and novel prompting strategies, particularly "integrated refinement," which improves performance by iteratively refining answers and explanations. ChatDoctor, fine-tuned based on LLaMA, leverages prompts to retrieve relevant knowledge and reliable sources, providing LLMs with more accurate responses to patient inquiries. BenTsao, developed based on LLaMa-7B, uses knowledge-based data for fine-tuning. Galactica introduces a suite of word segmentation techniques specifically for various input modalities, including the use of prompts and a general corpus during pre-training. DoctorGLM introduces Prompt Designer, enhancing disease-specific expertise and reliability by integrating professionally generated prompts into the GLM. These medical LLMs require converting the KG to a format suitable for pre-training and execution, and undergoing time-consuming training to acquire medical knowledge. However, for some medical institutions or hospitals, Q&A data may be sensitive, and passing all the data to an LLM could lead to a risk of privacy breaches.
[0005] Knowledge graph question answering (KGQA) is a task that uses facts from a knowledge graph to answer an input question. Previous methods can be categorized into different types: neural semantic parsing, information retrieval, and differentiable knowledge graph-based methods. All of these methods require additional labeled datasets for model training. Recently, KELLMs, which combine LLM as the underlying model for end-to-end text generation with KG, have attracted considerable attention in KGQA. Compared to previous methods, KELLMs are able to generate more logically coherent, readable, and comprehensive answers. However, KELLMs face the challenges of the inherent illusion of LLM and the incompleteness of knowledge in knowledge graphs.
[0006] Recent research has shown that the combination of KG and LLM exhibits competitive performance in question answering. KELLMs leverage the structured knowledge of KG and the semantic understanding and contextual modeling capabilities of LLM. This integration produces a synergistic enhancement effect, making it possible to build a medical question answering framework based on the "KG+LLM" paradigm.
[0007] Currently, there are two main approaches to KELLM: (1) KG-enhanced LLM pre-training and (2) KG-enhanced LLM inference. The former involves explicit or combined text-based pre-training objectives for triples in the KG, or adding additional knowledge adaptation layers; while the latter injects triples into the LLM during the inference phase to facilitate comprehensive inference enhanced by knowledge integration.
[0008] Recent work has focused on KELLM, retrieving information from KG as hints for LLM. These methods require only access to the LLM's API or partial information and consume relatively few computational resources. Therefore, they can be flexibly integrated with various types of KG. KAPING and MindMap retrieve problem-relevant facts from the KG, communicate them to the LLM, and guide its reasoning based on these facts. KnowledGPT utilizes a thought process (PoT) to generate a search language for the KG in code format and designs multiple hints to enhance the interaction between the LLM and the KG. ToG employs a specific constraint search algorithm to guide the LLM step-by-step along the reasoning path on the KG.
[0009] However, these works share two common drawbacks: (1) they rely entirely on LLMs for the entire reasoning process, which may lead to potential leakage or unauthorized access to the entire knowledge base, thus limiting the use of LLMs (especially closed resources) in sensitive medical data; (2) when LLMs reason on KGs stored in the form of large numbers of triples, the subtle structural relationships in the KGs may become obscured, leading to confusion and the spread of illusions within the LLMs. Summary of the Invention
[0010] In view of this, to address the aforementioned issues, this application proposes a medical question-answering framework based on KnowledgeSubgraphsEnhanced LLMs, consisting of two modules. The Knowledge Retrieval (KR) module is confidential and operates without online data transmission. The main tasks of KR are entity linking and the construction and merging of knowledge subgraphs to generate a question-relevant and refined medical knowledge base. The Reasoning and Response (RA) module facilitates the transfer of knowledge to the LLM for reasoning and answer generation. Compared to existing methods, this application effectively mitigates the risk of data leakage. It also surpasses the performance of state-of-the-art methods in terms of retrieval accuracy, answer completeness, and reduction of false facts.
[0011] To achieve the above objectives, this invention discloses a question-answering generation method enhanced by a large language model, comprising the following steps:
[0012] Extract information from the question text q, including medical entities and disease events;
[0013] A mask-based continuous pre-training strategy is adopted to capture the semantics of entities and to provide knowledge embedding for entity linking and subgraph construction tasks;
[0014] The matching rules combine character-based and semantic-based matching.
[0015] A multi-hop neighborhood exploration method combined with semantic relevance evaluation is used to construct a knowledge subgraph for each linked entity;
[0016] Multiple knowledge subgraphs are merged using two merging criteria to create a question-answering evidence graph;
[0017] The process involves textualizing and grouping triples from different subgraphs; then, these grouped triples are fed into a large language model, guiding the model to focus on the relationship between two linked entities and the path associated with any shared nodes during inference.
[0018] Furthermore, information It includes two types: medical entities and disease events, with the set of medical entities ε = {e1, e2, ..., e...} m} represents the medical terms or attributes mentioned in the question, expressed as entities; a set of disease events. Actions, behaviors, or situations related to a disease are described by verb phrases and noun phrases.
[0019] Furthermore, the mask-based continuous pre-training strategy includes:
[0020] Each triple {e h ,r,e t} is converted into a token sequence, as shown below:
[0021] x = [CLS]e h [SEP]r[SEP]e t [EOS].
[0022] e h 'r' is the marker for the head entity, 'r' is the marker for the relation, and 'e' is the marker for the head entity. t It is the marker for the tail entity, CLS is the identifier for the beginning of a sentence, SEP is the separator between two sentences, and EOS is the identifier for the end of a sentence.
[0023] Replace the markers corresponding to the head or tail entity with [MASK] to obtain the modified input marker sequence:
[0024] x = [CLS][MASK][SEP]r[SEP]e t [EOS],
[0025] x = [CLS]e h [SEP]r[SEP][MASK][EOS].
[0026] To train and update the model parameters, the pre-trained BERT model is tuned to predict masked entities, using cross-entropy as the loss function:
[0027]
[0028] Where N is the number of mask markers, y i It is the one-hot encoded vector of the real label, p i It is the predicted probability distribution vector of the i-th label.
[0029] Furthermore, the combination of character-based and semantic-based matching through matching rules includes:
[0030] For the information I extracted from question q, the linked entity set L is retrieved through the following five steps: exact matching, inclusion matching, partial matching, semantic matching, and knowledge denoising.
[0031] The exact matching refers to searching for entities that precisely match the key information I in character content; the inclusion matching is to find entities that contain the character I; the partial matching aims to identify entities corresponding to the string portion of I; the semantic matching is used to discover entities with similar semantic embeddings to I; the knowledge denoising refers to deleting other non-shared entities linked to different key information, which share the same entity.
[0032] Specifically, Information I iIf the character content of entity I is equal to the character content of entity e in the knowledge graph, and entity e is not in the linked entity set, then it is an exact match; Information I i If the character content of entity 'e' is contained within the character content of entity 'e' in the knowledge graph, and entity 'e' is not in the linked entity set, then it is an inclusion match; if And word(I) i )∩word(e)≥1; or word(I) i )∩word(e)≥2, and This is a partial match; if Then the following semantic matching is performed, namely if Then the following semantic matching is performed, namely if and Then we perform noise reduction:
[0033] It is the set of linked entities corresponding to the i-th key information. It is the set of linked entities corresponding to the j-th key information, and e' is the candidate entity. * It is any candidate entity to be added. It is a candidate entity set. The number of entities contained in the candidate entity set is denoted by char(·), len(·) represents the character content of the entity, corpuscular length is denoted by len(·), Cor(·,·)M uses MEM-BERT to calculate semantic relevance, and word(·) represents the words within the entity.
[0034] Furthermore, when constructing the subgraph, each linked entity in L is selected as an initial node; then, a breadth-first search algorithm is used to explore the knowledge graph; for an initial node e0∈L, the number of hops k required to traverse the knowledge graph is predefined; then, entities and relations are gradually added to the subgraph using the breadth-first search algorithm. middle.
[0035] Furthermore, to prevent irrelevant knowledge from being introduced into the subgraph, a semantic matching score is calculated before each entity is added to the subgraph. Specifically, for each candidate entity e′, if it passes the triplet... Connect to the subgraph and calculate the semantic similarity score Cor(t,q); if Cor(t,q) < α, then do not include e′. In this context, the semantic similarity score is calculated using MEM-BERT, and α represents a manually defined threshold.
[0036] Furthermore, once a subgraph has been built for each linked entity, these subgraphs are merged to create multiple question-answering evidence graphs.
[0037] The subgraph merging process includes:
[0038] (1) Aggregation of linked entities: For any entity If a triple {e} exists in the knowledge graph i ,r ij ,e j}, then e i and e j The subgraph via r ij Connected;
[0039] (2) Knowledge Refinement Based on Shared Entities: If multiple subgraphs share a "shared entity", only retain the triples pointing to the "shared entity" in their respective subgraphs, i.e., the common relations at the tail and the triples of the head entity; this step applies to the shared entity e. C The type of all edges connected to it in the subgraph is represented as in This represents the total number of edge types, and the corresponding connection entity is represented as ε. C ={e1,..,e k},Right now In this case, unless e * =e C Otherwise remove all
[0040] Furthermore, given a problem text q and At that time, guide the LLM through prompt templates. The system performs reasoning and generates an answer based on q. The prompt template includes six basic components: task description; explanation of input and output; reasoning process; input and output format; additional requirements; and examples.
[0041] Compared with the prior art, the beneficial effects of this application are as follows:
[0042] We introduce an entity linking method that combines character-based and semantic-based matching, making it more suitable for medical knowledge graphs and offering high time performance.
[0043] This paper proposes a knowledge subgraph construction and merging algorithm. The algorithm extracts a refined and problem-relevant knowledge subgraph from the knowledge graph by traversing multi-hop neighbors, calculating semantic similarity, merging common nodes, and performing pruning operations. To learn the embedded representations of medical entities, this application employs a knowledge-based BERT continuous pre-training method.
[0044] This application guides LLMs in one-off information extraction and problem-related reasoning by providing them with guidance and prompting templates. Attached Figure Description
[0045] Figure 1 The method flowchart of this application;
[0046] Figure 2 A flowchart of the entity linking algorithm in this application;
[0047] Figure 3 Example of a prompt template for this application. Detailed Implementation
[0048] The present invention will be further described below with reference to the accompanying drawings, but this is not intended to limit the present invention in any way. Any modifications or substitutions made based on the teachings of the present invention shall fall within the protection scope of the present invention.
[0049] Before introducing the embodiments of this application, some terms involved in this application will be explained. 1. Knowledge Graph (KG): Knowledge is stored in the form of triples in a knowledge graph, and triples are also called "facts". For a representation as The knowledge graph, where ε represents the set of entities. A set representing relations. Represents a set of facts / triples. In triples... In the middle, e h ,e t ∈ε represents the head entity or the tail entity. Representing relationships. From a graph perspective, ε can be viewed as a set of nodes. It can be viewed as a set of edges connecting nodes.
[0050] 2. Cue-Based LLM Reasoning: Cue-based LLM utilizes a conditional generative model to generate answers. Given input text X and LLM model parameters θ, the goal of LLM is to predict the probability distribution of the output answer Y, denoted as...
[0051]
[0052] X is a labeled sequence containing hints and questions. LLM uses the chain rule to compute P(X,Y;θ), that is:
[0053] P(X,Y;θ)=P(y1|x1;θ)·P(y2│x1,y2;θ).
[0054] ....P(yn|x1,x2,...,yn―1;θ),(2)
[0055] Where x i ,y jThis represents a tag in the text. LLM uses formula (3) to generate the answer:
[0056]
[0057] refer to Figure 1 The question-answer generation method for the medical knowledge base disclosed in this application includes four steps: (1) information extraction, (2) entity linking, (3) knowledge subgraph construction and merging, and (4) knowledge reasoning. Furthermore, a BERT-based semantic learning method for knowledge graphs is proposed. Of these four steps, (1) and (4) belong to the RA module, and (2) and (3) belong to the KR module.
[0058] Information Extraction
[0059] To understand the semantics involved in the question text q and extract relevant medical information from the knowledge graph to answer the question, we first extract the information contained in q. It includes two types: medical entities and disease events. The entity set E = {e1, e2, ..., e...} m} refers to the medical terms or attributes mentioned in the question, represented by entities. Event set Actions, behaviors, or situations related to a disease are usually described by verb phrases and noun phrases.
[0060] Unlike existing technologies, this application additionally extracts events to ensure that potentially overlooked key information is captured. This event extraction method is highly helpful in enhancing the association with entities in knowledge graphs, especially when some entities are better understood or represented as events. This application designs a simple prompt and provides an example to guide LLM in end-to-end information extraction. This prompt can also be used to evaluate question-answering results. More importantly, the prompt of this application can be generalized to various information extraction scenarios in specific domains.
[0061] Compared to tools like SpaCy and traditional Named Entity Recognition (NER) methods, the prompt-based approach in this application does not require a large number of training samples and negative examples. Compared to MindMap, which requires several examples, the prompts in this application are more explicit and scalable, suitable for one-shot or zero-shot scenarios. Figure 3 A complete prompt template is displayed.
[0062] Knowledge graph semantic learning
[0063] For domain-specific knowledge graphs, the semantic knowledge of entities cannot be obtained from general pre-trained language models (PLMs). In order to capture the semantics of entities and provide applicable knowledge embeddings for tasks such as entity linking and subgraph construction, this application proposes a knowledge graph semantic embedding learning model based on pre-trained BERT.
[0064] This application employs a mask-based continuous pre-training strategy. First, each triple {e h ,r,e t} is converted into a token sequence, as shown below:
[0065] x = [CLS]e h [SEP]r[SEP]e t [EOS].
[0066] Replace the tags corresponding to the head or tail entities with [MASK] to obtain the following modified input tag sequence:
[0067] x = [CLS][MASK][SEP]r[SEP]e t [EOS],
[0068] x = [CLS]e h [SEP]r[SEP][MASK][EOS].
[0069] To train and update the model parameters, the pre-trained BERT model is tuned to predict masked entities, using cross-entropy as the loss function:
[0070]
[0071] Where N is the number of mask markers, y i It is the one-hot encoded vector of the real label, p i It is the predicted probability distribution vector of the i-th label.
[0072] The fine-tuning process based on triples and MEM is shown in the figure. In the following sections of this paper, we will refer to the BERT model pre-trained based on our continuous MEM as "MEM-BERT".
[0073] Entity Links
[0074] Entity linking is key to effectively associating and utilizing knowledge in knowledge graphs. We identify corresponding entities in the knowledge graph by leveraging information extracted from the question. Unlike previous methods, such as character-based matching, semantic embedding-based matching, or complex deep learning-based sequence analysis models, this application introduces a simple and intuitive matching rule that combines character-based and semantic-based matching.
[0075] Specifically, for information I extracted from question q, this application uses the five steps outlined in Algorithm 1 to retrieve the linked entity set L. These steps include exact matching, containment matching, partial matching, semantic matching, and knowledge denoising. Exact matching refers to searching for entities that precisely match the key information I in character content. Containment matching finds entities that contain the character I. Partial matching aims to identify entities corresponding to portions of the string I. Semantic matching is used to discover entities with similar semantic embeddings to I. Knowledge denoising involves removing other non-shared entities linked to different key information that share some of the same entities.
[0076] refer to Figure 2 Algorithm 1 uses the function `char(·)` to represent the character content of an entity, `len(·)` to represent the character length, `Cor(·,·)` to calculate semantic relevance using MEM-BERT, `word(·)` to represent words within the entity, and `top3(·)(·)` to represent the top three semantic relevance scores. It's important to note that the partial matching algorithm in Algorithm 1 is specifically designed for English knowledge graphs. However, for Chinese knowledge graphs, the matching rules have been adjusted because Chinese entities do not contain "phrases". The partial matching standard has been modified to "matching at least 3 characters". (Information I) i If the character content of entity I is equal to the character content of entity e in the knowledge graph, and entity e is not in the linked entity set, then it is an exact match; Information I i If the character content of entity 'e' is contained within the character content of entity 'e' in the knowledge graph, and entity 'e' is not in the linked entity set, then it is an inclusion match; if And word(I) i )∩word(e)≥1; or word(I) i )∩word(e)≥2, and This is a partial match; if Then the following semantic matching is performed, namely if Then the following semantic matching is performed, namely if and Then we perform noise reduction:
[0077] For medical knowledge graphs, most entities have fixed representations. Therefore, this application adopts a two-step matching process: a rigorous exact matching stage and an inclusive matching stage. Then comes the partial character matching stage. Semantic similarity matching The linking is performed only between entities within the candidate set obtained from partial character matching. Here, N represents the total number of entities in the knowledge graph, M represents the character length, and D represents the embedding dimension. Given that M << D, the entity linking method of this application effectively reduces computational complexity and mitigates the risk of linking entities to entities with similar literal representations but different meanings. Furthermore, knowledge denoising reduces the inclusion of irrelevant information. This is because if two extracted elements are linked to the same entity, then that entity is highly likely to be relevant to the problem, while other entities are more likely to be irrelevant.
[0078] Construction and merging of knowledge subgraphs
[0079] In knowledge graphs, triples associated with linked entities form the basis for reasoning and question answering. To provide an efficient and concise knowledge repository for LLM, we devise a method for constructing and merging subgraphs. First, we employ a multi-hop neighborhood exploration approach combined with semantic relevance evaluation to construct a knowledge subgraph for each linked entity. Then, we apply two merging criteria to merge multiple knowledge subgraphs, thereby creating a question-answering evidence graph.
[0080] Construction of knowledge subgraph
[0081] When constructing the subgraph, each linked entity in L is first selected as an initial node. Then, a simple breadth-first search (BFS) algorithm is used to explore the knowledge graph. For an initial node e0 ∈ L, the number of hops k required to traverse the knowledge graph is predefined. Then, entities and relations are progressively added to the subgraph using BFS. The choice of k depends on the complexity of the knowledge graph structure and is a hyperparameter.
[0082] To prevent the introduction of noise (irrelevant knowledge) into the subgraph, which could obscure the inference process of LLM, a semantic matching score is calculated before each node (entity) is added to the subgraph. Specifically, for each candidate entity e′, if it passes the triplet... Connect to the subgraph and calculate the semantic similarity score Cor(t,q). If Cor(t,q) < α, then do not include e′. In this context, the semantic similarity score is calculated using MEM-BERT, and α represents a manually defined threshold. In the experiments, α is treated as a hyperparameter.
[0083] Merging of knowledge subgraphs
[0084] Once a subgraph has been constructed for each linked entity, this application merges these subgraphs to create multiple question-answering evidence graphs. Compared to a single subgraph, It incorporates more evidence paths that could lead to the correct answer while minimizing noise. By providing a simple hint, LLM can... This approach enables efficient reasoning, focusing on paths and shared nodes between linked entities. This can produce more accurate and comprehensive answers and explanations.
[0085] The subgraph merging process mainly includes two steps: (1) Aggregation of linked entities: for any If a triple {e} exists in the knowledge graph i ,r ij ,e j}, then e i and e j The subgraph via r ij (2) Knowledge refinement based on shared entities: If multiple subgraphs share a "shared entity", we only retain the same relations pointing to the "shared entity" (tail) and the triples of the head entity in their respective subgraphs. This step can be seen as "conditional pruning". For shared entity e C The type of all edges connected to it in the subgraph is represented as in The corresponding connection entity is represented as ε C ={e1,..,e k}(Right now In this case, unless e * =e C Otherwise remove all Through this step, we further refined the knowledge and eliminated triples that were irrelevant to the problem.
[0086] Finally, if it is not possible to merge all subgraphs after the two steps described above, all final subgraphs are output. The merging process is performed after each hop count traversal in the application to efficiently manage the graph size and computational overhead. The comprehensive process of subgraph construction and merging is shown in the figure.
[0087] LLMs cannot directly understand graph knowledge. Therefore, this application performs a process of textualizing and grouping triples from different subgraphs. These grouped triples are then fed into the LLM, effectively guiding them to focus on the relationship between two linked entities and the paths associated with any shared nodes during the reasoning process.
[0088] Knowledge Graph-Based LLM Reasoning
[0089] LLM demonstrates powerful semantic reasoning and text generation capabilities. In the application, given a question text q and... At that time, a well-structured hint was designed so that LLM could... The task is reasoned and an answer is generated based on q. Our prompt template consists of six basic components: (1) task description; (2) explanation of input and output; (3) reasoning process; (4) input and output format; (5) additional requirements; and (6) an example. The complete prompt template is shown in the figure (see Appendix).
[0090] Next, we will conduct comprehensive experiments to evaluate the performance of this application in medical question answering and compare it with existing methods.
[0091] This application evaluates and demonstrates the superiority of KoSEL in three aspects: knowledge extraction, knowledge retrieval, and question answering. For knowledge extraction, this application selects the Named Entity Recognition (NER) task for comparative experiments. For knowledge retrieval and question answering, we conduct experiments on three different types of question answering tasks: short dialogues, long dialogues, and multiple-choice questions. We select three Language Models (LLMs) as the base models for the question answering tasks: GPT-3.5, GPT-4, and LLaMa-33B. The first two are closed-source LLMs, while the third is an open-source LLM.
[0092] For the knowledge extraction evaluation task, this application uses two Named Entity Recognition (NER) datasets: CoNLL2003 (English) and CMeEE (Chinese). For the knowledge retrieval and question answering evaluation tasks, this application uses three medical domain question answering datasets: GenMedGPT-5k, CMCQA, and ExplainCPE, which are open-source datasets. In GenMedGPT-5k, we use EMCKG from MindMap as an external knowledge base for retrieval-based question answering. For CMCQA and ExplainCPE, this application constructs a knowledge graph called CMCKG-2 based on QASystemOnMedicalKG. Unlike CMCKG in MindMap, only 8 relations were selected: has_symptom, accompanied_symptom, cure_way, need_drug, can_check_disease, need_food, forbid_food, and beneficial_food. Inverse relations such as "possible_disease" were not included to reduce the size of the knowledge graph and avoid confusion during LLM inference.
[0093] For the test set, 800, 500, and 500 question-answer pairs were randomly selected from the three datasets, respectively, based on the dataset size. Since KoSEL is a zero-shot question-answering framework, the training set was not used in our experiments. Of the remaining data not included in the test set, a portion was used as a validation set to select hyperparameters, while a portion of the question-answer pairs were integrated into the prompt template as examples for one-off scenarios.
[0094] Baseline model
[0095] For the evaluation of the NER task, the extraction methods from SpaCy, vanilla BERT, and MindMap were selected as baseline models.
[0096] For knowledge retrieval and question answering evaluation, the BM25 retrieval engine, the embedded retrieval engine, and the knowledge graph retrieval engine were selected as baseline models for knowledge retrieval. These augmented retrieval baseline models all use GPT-4 as their base model. MindMap and standard LLM were also selected as baseline models for question answering.
[0097] The BM25 retrieval tool uses BM25 retrieval scores as logits to calculate conditional generation probabilities. This application leverages it to search for facts from entity-related knowledge documents generated based on EMCKG and CMCKG-2 (EMCKG has 99 documents, and CMCKG-2 has 8808 documents), and retrieves three gold-standard document contexts for each question.
[0098] The embedded retrieval tool utilizes word2vec to calculate semantic similarity. This application uses it to search for and rank knowledge within the same documents as the BM25 retrieval tool. Similarly, it retrieves three gold-standard document contexts for each question.
[0099] The knowledge graph retrieval tool transforms the set of relevant triples for each entity into individual documents, where each line in the document represents a triple. This application retrieves the most similar documents for each linked entity and finally deduplicates and merges the triples from all retrieved documents. These triples serve as an enhanced external knowledge base for the LLM.
[0100] For MindMap, this application uses the parameter configurations from its open-source code for experimentation. When applying a standard LLM for question answering, simple instructions are used to guide answer generation.
[0101] For NER tasks, precision, recall, and micro-average F1 score are used for evaluation. For knowledge retrieval tasks, a metric called utilization (UR) is employed. UR quantifies the proportion of retrieved knowledge used in the response. For document retrieval, this application calculates the number of documents retrieved for answering the question. For knowledge graph retrieval, UR is measured by calculating the ratio of evidence triples to retrieved triples.
[0102] For question-answering tasks, four evaluation metrics are defined: Answer Completeness (AC), Fact Precision (FP), Semantic Matching (SM), and Accuracy (Acc). The first three metrics are used to evaluate conversational question answering, while Acc is used to evaluate multiple-choice questions. AC measures the proportion of key information in the output answer that is consistent with the gold standard answer (Equation 5). FP quantifies the proportion of key information in the output answer that is neither present in the gold standard answer nor supported by any evidence found in the augmented knowledge base (Equation 4). In other words, AC reflects the completeness of the LLM answer, while FP reflects the presence of deceptive facts in the LLM answer. SM quantifies the BERTS score between the output answer and the gold standard answer. Acc measures the proportion of correctly answered multiple-choice questions.
[0103]
[0104]
[0105] Where Count represents the counting operator. KIO, KIG, and KIB represent keywords in the output answer, keywords in the gold standard answer, and entities / relationships in the knowledge subgraph, respectively.
[0106] The following hardware and software configuration was used in the experiments: two NVIDIA GeForce RTX 3090Ti graphics cards, Python version 3.10, PyTorch version 2.0.0, and CUDA version 11.7. This configuration was carefully selected to ensure optimal performance and compatibility, especially for LLM fine-tuning and inference. For GPT-3.5 and GPT-4, we used the openai Python package to access their functionality.
[0107] In each baseline model, the same LLM was used for information extraction and knowledge reasoning. For knowledge semantic learning, bert-base-uncased and bert-base-chinese were used as base models, respectively, and were continuously pre-trained on EMCKG and CMCKG-2 for question answering execution. In the entity linking stage, a character overlap ratio of 0.5 was predefined, and the top three entities with the highest semantic scores were selected as linking entities. These two parameters were empirically determined based on the general length of entity characters in the question and the number of entities in the question. In the subgraph construction and merging stage, α and k are hyperparameters, determined by cross-validation using 20% of the data randomly selected from each dataset as the validation set. We selected the parameter values with the highest F1 scores obtained on the validation set (for GenMedGPT-5K and CMCQA) and Acc (for ExplainPE) as parameter values for the test set.
[0108] In the evaluation, a 12-layer bert-base-uncased and a 12-layer bert-base-chinese were used to compute the BERTS score. Computing AC and FP also requires extracting key information; therefore, this application used the same prompt-based LLM implementation.
[0109] In the NER task analysis, this application focuses on the quantity and accuracy of extracted entity information, thus excluding classification performance evaluation. In entity label matching, only exact matches are considered successful recognition. We adapted the cue template for the non-medical dataset CoNLL-2003 to optimize extraction performance. Both this invention and MindMap use GPT-4 as the base model.
[0110] Table 1
[0111]
[0112] Table 2
[0113]
[0114] Table 1 shows a performance comparison of different information extraction methods on the GenMedGPT-5k test set for English question answering. Table 2 shows a performance comparison of different information extraction methods on the CMCQA long dialogue question answering set. The information extraction method of this application outperforms other methods in both datasets in terms of F1 score and recall. Compared with the English dataset, this application performs better on the Chinese dataset.
[0115] This application evaluates the retrieval and inference performance of KoSEL and baseline models on three experimental datasets. MindMap is not included in the comparison on UR because it relies on evidence reasoning based on a complete knowledge graph.
[0116] This application selects values for α and k from the ranges [0.4, 0.5, 0.6, 0.7] and [1, 2, 3], and comprehensively evaluates their performance by performing cross-validation on the validation set. Ultimately, the optimal values of α = 0.6 and k = 2 are determined to obtain the best results.
[0117] It is worth noting that MindMap exhibits a significant performance degradation on LLaMa-33B compared to GPT-3.5 / 4. In contrast, this application maintains consistent performance, likely due to its smaller knowledge base requirements. This enhances the robustness of this application, enabling it to adapt to LLMs with varying parameter sizes and performance levels.
[0118] This application proposes an enhanced question-answering generation method for the medical field, which utilizes knowledge retrieved from a knowledge graph as prompts for LLM (Learning Learning Model). The application consists of two modules: an online module (R&A) and an offline module (KR). The R&A module guides the LLM by providing prompt templates and retrieved knowledge graph triples, thereby achieving key information extraction, accurate reasoning, and answer generation. The KR module utilizes efficient entity linking algorithms, subgraph construction, and merging techniques to provide question-related knowledge for question answering. Experimental results show that this application has significant advantages over existing methods, successfully solving the confusion and illusion propagation problems caused by LLM directly reasoning in the knowledge base. Besides accurately utilizing knowledge and providing explanations during the question-answering process, this application also effectively mitigates the problem of data leakage into the LLM.
[0119] As used herein, the term "preferred" is meant as an example, illustration, or illustration. Any aspect or design described herein as "preferred" need not be construed as being more advantageous than other aspects or designs. Rather, the use of the term "preferred" is intended to present the concept in a specific manner. As used in this application, the term "or" is intended to mean an inclusive "or" rather than an exclusionary "or." That is, unless otherwise specified or clear from the context, "X uses A or B" naturally includes either of the permutations. That is, if X uses A; X uses B; or X uses both A and B, then "X uses A or B" is satisfied in any of the foregoing examples.
[0120] Furthermore, although this disclosure has been shown and described with respect to one or more implementations, equivalent variations and modifications will occur to those skilled in the art based on a reading and understanding of this specification and the accompanying drawings. This disclosure includes all such modifications and variations and is limited only by the scope of the appended claims. In particular, with respect to the various functions performed by the aforementioned components (e.g., elements, etc.), the terminology used to describe such components is intended to correspond to any component (unless otherwise indicated) that performs the specified function of said component (e.g., is functionally equivalent to it), even if structurally not equivalent to the disclosed structure performing the functions in the exemplary implementations of this disclosure shown herein. Moreover, although specific features of this disclosure have been disclosed with respect to only one of several implementations, such features may be combined with one or more features of other implementations that may be desirable and advantageous for a given or particular application. Furthermore, with regard to the use of the terms “comprising,” “having,” “containing,” or variations thereof in the Detailed Description or claims, such terms are intended to be included in a manner similar to the term “including.”
[0121] The functional units in this invention embodiment can be integrated into a processing module, or each unit can exist physically separately, or multiple units can be integrated into a module. The integrated module can be implemented in hardware or as a software functional module. If the integrated module is implemented as a software functional module and sold or used as an independent product, it can also be stored in a computer-readable storage medium. The storage medium mentioned above can be a read-only memory, a disk, or an optical disk, etc. The aforementioned devices or systems can execute the storage methods in the corresponding method embodiments.
[0122] In summary, the above embodiments are one implementation of the present invention, but the implementation of the present invention is not limited to the embodiments described above. Any changes, modifications, substitutions, combinations, or simplifications made that deviate from the spirit and principle of the present invention should be considered equivalent substitutions and are included within the protection scope of the present invention.
Claims
1. A question-answering generation method enhanced by a large language model, characterized in that, Includes the following steps: Extracting the problem text The information includes medical entities and disease events; A mask-based continuous pre-training strategy is adopted to capture the semantics of entities and to provide knowledge embedding for entity linking and subgraph construction tasks; The matching rules combine character-based and semantic-based matching. A multi-hop neighborhood exploration method combined with semantic relevance evaluation is used to construct a knowledge subgraph for each linked entity; Multiple knowledge subgraphs are merged using two merging criteria to create a question-answering evidence graph; The process of textualizing and grouping triples from different subgraphs; Then, these grouped triples are input into the large language model, guiding the large language model to focus on the relationship between the two linked entities and the path associated with any shared nodes during the reasoning process; Generate questions and answers in a medical knowledge base; The mask-based continuous pre-training strategy includes: Each triplet Convert it into a marker sequence, as shown below: , `r` is the identifier for the head entity, and `r` is the identifier for the relation. It is the marker for the tail entity. It is an identifier at the beginning of a sentence. It is a separator between two sentences. It is a sentence end identifier; use Replace the tags corresponding to the head or tail entities to obtain the modified input tag sequence: , To train and update the model parameters, the pre-trained BERT model is tuned to predict masked entities, using cross-entropy as the loss function: in It is the number of mask markers. It is the one-hot encoded vector of the real label. It is the first The predicted probability distribution vector of each label; Once a subgraph has been built for each linked entity, these subgraphs are merged to create multiple question-answer evidence graphs. ; The subgraph merging process includes: (1) Aggregation of linked entities: For any entity If triples exist in the knowledge graph ,but and subgraph via Connected; (2) Knowledge Refinement Based on Shared Entities: If multiple subgraphs share a "shared entity", only retain the triples pointing to the "shared entity" in their respective subgraphs, i.e., the common relations at the tail and the triples of the head entity; this step is for shared entities. The type of all edges connected to it in the subgraph is represented as ,in , This represents the total number of edge types, and the corresponding connection entity is represented as follows: ,Right now , It is a subgraph, in this case unless Otherwise remove all .
2. The question-answering generation method enhanced by a large language model according to claim 1, characterized in that, information It includes two types: medical entities and disease events, and a collection of medical entities. These are medical terms or attributes mentioned in the question, represented by entities; a set of disease events. Actions, behaviors, or situations related to a disease are described by verb phrases and noun phrases.
3. The question-answering generation method enhanced by a large language model according to claim 2, characterized in that, The combination of character-based and semantic-based matching through matching rules includes: Regarding the question Information extracted The following five steps are used to retrieve the linked set of entities. Exact matching, inclusion matching, partial matching, semantic matching, and knowledge-based denoising; The exact match refers to the search and precise matching of key information in the character content. The entity containing; the inclusion matching is the search for entities containing The entity of the character; the partial matching is intended to identify the entity with The entity corresponding to the string portion; the semantic matching is used to discover the entity corresponding to the string portion. Entities with similar semantic embeddings; the knowledge denoising refers to deleting other non-shared entities linked to different key information, which share the same entity; Specifically, Information I i If the character content of entity I is equal to the character content of entity e in the knowledge graph, and entity e is not in the linked entity set, then it is an exact match; Information I i If the character content of entity 'e' is contained within the character content of entity 'e' in the knowledge graph, and entity 'e' is not in the linked entity set, then it is an inclusion match; if and ;or ,and If it is a partial match, then it is a partial match; if Then, the following semantic matching is performed, namely: If 0 Then, the following semantic matching is performed, namely: ;if ,and Then we perform noise reduction: ; It is the first The set of linked entities corresponding to each key piece of information It is the first The set of linked entities corresponding to each key piece of information It is a candidate entity. It is any candidate entity to be added. It is a candidate entity set. It is the number of entities contained in the candidate entity set, the function. The character content representing the entity. Indicates the length of the character string. Using MEM-BERT to calculate semantic relevance Words that represent entities.
4. The question-answering generation method enhanced by a large language model according to claim 3, characterized in that, When constructing a subgraph, select Each linked entity in the graph is used as an initial node; then, a breadth-first search algorithm is used to explore the knowledge graph; for each initial node... Predefined number of hops required to traverse the knowledge graph Then, a breadth-first search algorithm is used to progressively add entities and relations to the subgraph. middle.
5. The question-answering generation method enhanced by a large language model according to claim 4, characterized in that, To prevent irrelevant knowledge from being introduced into the subgraph, a semantic matching score is calculated before each entity is added to the subgraph. Specifically, this includes: for each candidate entity... If it passes through the triplet , Connect to the subgraph and calculate the semantic similarity score Cor. If Cor Then not Included In this context, the semantic similarity score is calculated using MEM-BERT. This indicates a manually defined threshold.
6. The question-answering generation method enhanced by a large language model according to claim 5, characterized in that, Given a problem text and At that time, guide the LLM through prompt templates. Reasoning is performed on the above, and based on The prompt template for generating answers comprises six basic components: a task description; Interpretation of inputs and outputs; reasoning process; input and output formats; Additional requirements; example.