Knowledge sub-graph retrieval and multi-agent collaboration based knowledge graph question answering method and system
By employing a knowledge subgraph-based retrieval and multi-agent collaboration approach, this method addresses the issues of insufficient generalization and black-box reasoning processes in existing technologies, achieving high accuracy and interpretability in knowledge graph question answering. It is applicable to enterprise knowledge bases and vertical question answering scenarios.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHANGHAI UNIVERSITY OF ELECTRIC POWER
- Filing Date
- 2026-05-22
- Publication Date
- 2026-06-19
AI Technical Summary
Existing knowledge graph question answering methods lack generalization ability in multi-hop reasoning scenarios, cannot dynamically adjust the size of subgraphs, easily introduce redundant contextual information, and the reasoning process is black-boxed, making it difficult to meet the requirements of accuracy and interpretability of answers.
We employ a knowledge subgraph retrieval and multi-agent collaboration approach. By training a knowledge subgraph retrieval tool through gated flow propagation, we adaptively extract initial candidate subgraphs. We then utilize a multi-agent collaboration framework for semantic parsing and evidence chain construction, thereby achieving adaptive knowledge subgraph extraction and evidence archive assembly, which enhances the interpretability and accuracy of the reasoning process.
It improves the generalization and adaptability of knowledge graph question answering, reduces irrelevant information, enhances the ability to analyze complex questions, and achieves full-link traceability and interpretability of answers, adapting to the needs of enterprise knowledge bases and vertical question answering scenarios.
Smart Images

Figure CN122242779A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of natural language processing technology, and in particular to a knowledge graph question answering method and system based on knowledge subgraph retrieval and multi-agent collaboration. Background Technology
[0002] Knowledge graphs are a core technological support for knowledge-intensive scenarios such as enterprise knowledge base question answering and vertical domain question answering. Knowledge graph-based retrieval enhancement and generation technologies effectively alleviate industry pain points such as factual illusion and knowledge obsolescence by introducing structured external knowledge from the knowledge graph into large language models, becoming the mainstream technical path for question answering. Existing mainstream technical solutions have formed multiple technical routes, including graph neural network retrieval and intelligent agent interactive reasoning, and these solutions have been implemented in various knowledge-intensive question answering scenarios.
[0003] For knowledge graph question answering, the first step is to retrieve knowledge subgraphs related to the user's natural language question from the background knowledge graph. Existing solutions mostly use graph neural networks to build the retrieval engine. Patents with publication numbers CN118656475A and CN119848190A both use this type of graph neural network retrieval engine architecture to recall knowledge subgraphs. However, in multi-hop reasoning scenarios, these methods are prone to overfitting to specific graph structures, resulting in insufficient generalization and making it difficult to transfer the model to unseen knowledge domains. Furthermore, these retrieval schemes often employ fixed-size retrieval strategies, failing to dynamically adjust the subgraph size according to question complexity. Simple queries easily introduce a large amount of redundant contextual information, while complex multi-hop queries are prone to losing key reasoning paths, thus affecting the accuracy of answers in complex multi-hop question answering scenarios.
[0004] After retrieving the knowledge subgraph, the current mainstream approach is to input it along with the user's natural language question into a large language model for further processing to generate the final answer. However, existing solutions are often inflexible in parsing the user's natural language question. For example, the technical solutions used in patents with publication numbers CN119848190A and CN121412368A primarily decompose multi-hop problems into single-hop sub-problems and solve them one by one. This type of solution often struggles to effectively combine the solutions to sub-problems when faced with complex problems involving multiple constraints and nonlinear logic. Furthermore, during graph traversal, the correct reasoning path is easily pruned prematurely due to a low score for a local fact triple, leading to insufficient recall of core facts. In addition, existing solutions often input the retrieved triples and paths into the large model as an undifferentiated set of facts, severing the explicit association between candidate answers and the evidence chains corresponding to each logical constraint. This not only increases the cognitive load on the large model but also makes the reasoning process black-boxed, failing to meet requirements such as answer interpretability.
[0005] Therefore, it is necessary to design a knowledge graph retrieval enhancement generation intelligent question answering method with strong retrieval generalization, high adaptability to complex questions, and interpretable reasoning process, so as to provide stable and reliable technical support for knowledge-intensive scenarios such as enterprise knowledge base question answering and vertical question answering. Summary of the Invention
[0006] The purpose of this invention is to overcome the shortcomings of the prior art by providing a knowledge graph question answering method and system based on knowledge subgraph retrieval and multi-agent collaboration, so as to solve or partially solve the problems of unsatisfactory accuracy and interpretability of the existing methods.
[0007] The objective of this invention can be achieved through the following technical solutions: One aspect of the present invention provides a knowledge graph question answering method based on knowledge subgraph retrieval and multi-agent collaboration, comprising the following steps: The system acquires natural language question data input by the user, identifies key entities in the question, calculates the semantic similarity between the key entities and entities in the pre-acquired knowledge graph, and links the key entities with entities in the knowledge graph to obtain a set of topic entities. In the knowledge graph, entities within a preset range of hops are extracted as initial candidate subgraphs, centered on the set of topic entities. Based on the initial candidate subgraph, the confidence score of each edge in the initial candidate subgraph is calculated using a knowledge subgraph retrieval tool pre-trained based on gated flow propagation. The edge set is divided into signal clusters by clustering, and the edges in the signal clusters are filtered by the nearest neighbor rule to obtain the final adaptive knowledge subgraph. Based on the natural language problem data, the first intelligent agent decomposes the problem into a set of mutually independent atomic logic fragments through semantic parsing, identifies the target variable of the problem, constructs a constraint logic query graph, and obtains an ordered query plan including multiple constraints through master generation constraint selection and verification constraint transformation. For each constraint in the ordered query plan, multiple reasoning paths are obtained through the adaptive knowledge subgraph search. A second agent is assigned to each reasoning path. The second agent executes the ordered query plan to obtain a chain of evidence through a dual-signal scoring function and a dual-pool search. Based on the aforementioned chain of evidence, evidence files are assembled and answers are evaluated and reasoned to achieve knowledge graph question answering.
[0008] As a preferred technical solution, the process of obtaining the topic entity set includes the following steps: Based on the natural language problem data, a large language model is used to identify key entities in the problem; The key entities are encoded, and their semantic similarity with entities in the knowledge graph is calculated. Based on the semantic similarity, the correspondence between the key entities and entities in the knowledge graph is identified through a large language model, thereby achieving entity linking and obtaining a set of topic entities.
[0009] As a preferred technical solution, the process of obtaining the adaptive knowledge subgraph includes the following steps: Based on the confidence score of each edge in the initial candidate subgraph, K-Means clustering is used to divide the edges of the initial candidate subgraph into signal clusters and noise clusters with the goal of minimizing the sum of the distance differences between each edge in the same cluster and the centroid of the cluster. Edges within the signal cluster are filtered according to the nearest neighbor rule to obtain an adaptive knowledge subgraph.
[0010] As a preferred technical solution, the training process of the knowledge subgraph retrieval system includes the following steps: The embedding vectors of questions in the question-answering samples are calculated to correspond to the embedding vectors of each edge in the knowledge graph. The semantic prior of the edge is calculated by cosine similarity and mapped to the logarithmic space to filter out low-relevance noise edges. Initialize the initial states of the forward and backward flows in the logarithmic space. The forward flow takes the topic entity as the source and the backward flow takes the answer entity as the source. Perform K-step forward and backward flow iterative propagation to complete the global path semantic evaluation from the topic entity to the answer entity. Calculate the global importance score of the edge at each step, take the Top-N as positive samples and the rest as negative samples to obtain the binary supervised mask for each hop. By using a residual intent-gated loop unit, the query intent vector is updated with each inference step to simulate the shift of focus during the inference process. The conditional transition log probability of the edge is calculated by a gated multilayer perceptron and combined with the upstream cumulative flow to obtain the final edge log probability. A joint loss function based on dynamic weighted binary cross-entropy loss and sparsity regularization loss is constructed, and a knowledge subgraph retrieval engine is trained.
[0011] As a preferred technical solution, the process of obtaining the constraint logic query graph includes the following steps: The first intelligent agent decomposes the problem into a set of independent atomic logical fragments through a large language model, and identifies the target variable sought by the problem. The parsed logical fragments are assembled into a constraint logic query graph, and the connectivity of the graph is verified. If the graph is not connected, a self-correction loop is entered to regenerate the segments until a connected and valid constraint logic query graph is obtained. If the loop exceeds the limit, the basic main chain from the subject entity to the target variable is constructed as the constraint logic query graph.
[0012] As a preferred technical solution, the process of obtaining the ordered query plan includes the following steps: Based on the constraint logic query graph, all candidate main chains from the subject entity to the target variable are enumerated, and the score of each main chain is calculated by expected fan-out. The path with the smallest score is selected as the main generation constraint. Based on the main chain, the remaining nodes of the constraint logic query graph are transformed into verification constraints through graph traversal, generating an ordered query plan.
[0013] As a preferred technical solution, the dual-pool search process includes the following steps: Obtain the expected logical path length corresponding to the constraint, initialize the exploration pool and the completion pool, set the beamwidth B and the number of return paths K, and in each iteration, perform single-hop expansion on all paths in the exploration pool, score the new path using a dual-signal scoring function, and update the exploration pool by taking the Top-B score. Paths whose length reaches or exceeds the expected length are moved to the completion pool, while the original paths in the exploration pool are retained. When the top-K paths in the completion pool maintain a stable ranking throughout a complete iteration, the search is terminated, and the top-K paths in the completion pool are returned.
[0014] As a preferred technical solution, the dual-signal scoring function is constructed based on the confidence score output by the knowledge subgraph retrieval machine and the embedding semantic similarity between the reasoning path and the corresponding constraint.
[0015] As a preferred technical solution, the process of assembling the evidence file and evaluating the answer includes the following steps: The candidate answers, the evidence paths corresponding to the constraints, and the variable binding results are submitted to the evidence archive. If multiple second agents generate the same candidate answer, only the entry with the highest total score in the evidence set is retained, and finally a complete structured evidence archive is obtained. The structured evidence archive is input into a third-party intelligent agent, which evaluates and ranks candidate answers based on evidence completeness, evidence quality, and overall rationality, and simultaneously outputs the structured evidence chain corresponding to each answer.
[0016] Another aspect of the present invention provides a knowledge graph question answering system based on knowledge subgraph retrieval and multi-agent collaboration, for implementing the aforementioned knowledge graph question answering method, the system comprising: An adaptive knowledge subgraph extraction module is used to acquire natural language question data input by the user, identify key entities in the question, and link the key entities with entities in the knowledge graph by calculating the semantic similarity with the entities in the pre-acquired knowledge graph, thereby obtaining a topic entity set. In the knowledge graph, with the topic entity set as the center, entities within a preset range of hops are extracted as initial candidate subgraphs. Based on the initial candidate subgraphs, a knowledge subgraph retrieval device pre-trained based on gated flow propagation is used to calculate the confidence score of each edge in the initial candidate subgraphs. The edge set is divided into signal clusters by clustering, and the edges in the signal clusters are filtered by the nearest neighbor rule to obtain the final adaptive knowledge subgraph. The multi-agent collaborative question-answering reasoning module deploys multiple types of agents. The first agent, based on the natural language question data, decomposes the question into a set of independent atomic logical fragments through semantic parsing, identifies the target variable of the question, constructs a constraint logic query graph, and obtains an ordered query plan including multiple constraints through master generation constraint selection and verification constraint transformation. For each constraint in the ordered query plan, multiple reasoning paths are obtained through adaptive knowledge subgraph search. A second agent is assigned to each reasoning path. The second agent executes the ordered query plan to obtain an evidence chain through a dual-signal path scoring function and dual-pool search. Based on the evidence chain, evidence archive assembly and answer evaluation reasoning are performed to achieve knowledge graph question answering.
[0017] Compared with the prior art, the present invention has at least one of the following beneficial effects: (1) Good generalization and adaptive retrieval scale: This invention uses a knowledge subgraph retrieval device pre-trained based on gated flow propagation to calculate the confidence score of each edge in the initial candidate subgraph, divides the edge set into signal clusters through clustering, and filters the edges in the signal clusters through the nearest neighbor rule to obtain the final adaptive knowledge subgraph as the basic environment for subsequent reasoning. While ensuring the integrity of the key reasoning path, it reduces information that is irrelevant to the user's question.
[0018] (2) Strong flexibility of semantic parsing: Based on natural language problem data, this invention decomposes the problem into a set of mutually independent atomic logical fragments through semantic parsing, identifies the target variable of the problem, constructs a constraint logic query graph, and obtains an ordered query plan including multiple constraints through master generation constraint selection and verification constraint transformation. A semantic parsing scheme for constraint center is designed through planning agent (i.e. first agent). Based on logical fragment decomposition and constraint logic query graph construction, it can flexibly adapt to different complex professional problems containing multiple constraints and nonlinear logic.
[0019] (3) Improve the recall rate of the core evidence chain: For each constraint in the ordered query plan, the present invention obtains multiple reasoning paths by adaptive knowledge subgraph search, and assigns a second agent to each reasoning path. The second agent executes the ordered query plan to obtain the evidence chain through the dual-signal scoring function and dual-pool search. By aligning the agent group (i.e. the second agent), a dual-signal scoring function and dual-pool search architecture that fuses retrieval signals are designed, which effectively reduces the premature pruning of the correct reasoning path, thereby improving the recall of the core evidence chain.
[0020] (4) Full-chain traceability of answers: This invention submits candidate answers, the evidence paths corresponding to constraints, and variable binding results to the evidence archive. If multiple second agents generate the same candidate answer, only the entry with the highest total score in the evidence set is retained, resulting in a complete structured evidence archive. The structured evidence archive is then input into a third agent, which evaluates and ranks the candidate answers based on evidence completeness, evidence quality, and overall rationality, and simultaneously outputs the structured evidence chain corresponding to each answer. The inductive reasoning from an undifferentiated set of facts is transformed into a transparent evaluation of the evidence chain corresponding to the candidate answer and its constraints. This not only improves the accuracy of large model reasoning but also achieves full-chain traceability of answers, making it suitable for knowledge-intensive scenarios such as enterprise knowledge base questions and vertical domain question answering that have high requirements for answer credibility and interpretability. Attached Figure Description
[0021] Figure 1 This is a flowchart of the knowledge graph question answering method based on knowledge subgraph retrieval and multi-agent collaboration in the embodiment; Figure 2 This is a flowchart of the training process for the knowledge subgraph retrieval system based on gated flow propagation in this embodiment. Figure 3 This is a schematic diagram illustrating the principle of training a knowledge subgraph retrieval system based on gated flow propagation in the embodiment. Figure 4 This is a flowchart of the adaptive knowledge subgraph extraction process for user-oriented questions in this embodiment; Figure 5 This is a schematic diagram illustrating the principle of adaptive knowledge subgraph extraction for user-oriented questions in this embodiment. Figure 6 This is a flowchart of multi-agent collaborative question-answering reasoning based on structured evidence archives in the embodiment. Figure 7 This is a schematic diagram illustrating the principle of multi-agent collaborative question-answering reasoning based on structured evidence archives in the embodiment. Figure 8 This is a schematic diagram of a knowledge graph question-answering system based on knowledge subgraph retrieval and multi-agent collaboration in an embodiment. Detailed Implementation
[0022] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present invention.
[0023] Example 1 To address the problems existing in the aforementioned prior art, this embodiment provides a knowledge graph question answering method based on knowledge subgraph retrieval and multi-agent collaboration. This method first completes the training of a subgraph retrieval device based on gated flow propagation, then uses the retrieval device to complete the extraction of adaptive knowledge subgraphs, and then uses a multi-agent collaborative framework to complete the construction of structured evidence archives and answer reasoning, ultimately achieving high accuracy and high interpretability of knowledge graph intelligent question answering.
[0024] The overall architecture of the method consists of three core stages. The first stage is the training of a knowledge subgraph retrieval device based on gated flow propagation. The second stage is the adaptive knowledge subgraph extraction oriented towards user questions, which receives user questions and outputs adaptive knowledge subgraphs related to the questions. The third stage is multi-agent collaborative reasoning based on structured evidence archives, which completes question parsing, evidence collection and answer evaluation within the subgraph, and finally outputs the answer and interpretable evidence chain.
[0025] See Figure 1 The methods specifically include: Step S1: Training a knowledge subgraph retrieval system based on gated flow propagation.
[0026] The parts involved in this step belong to the following definitions: (1) Knowledge graph.
[0027] Defined as ,in For a set of entity nodes, For a set of relation types, Let be the set of fact triple edges, and a single triple be represented as: h is the head entity, r is the relation, and t is the tail entity.
[0028] (2) Natural language problem.
[0029] Defined as q, its corresponding set of subject entities is The set of answer entities is .
[0030] (3) Time state grid.
[0031] Defined as a hierarchical directed acyclic graph that unfolds the knowledge graph along the reasoning time step t, with grid nodes being state tuples. Let represent the state of entity u at reasoning step t, and let the directed edges of the grid be . , representing the propagation path of inference confidence; (4) Nodal state potential .
[0032] The cumulative log confidence of entity u after t steps of reasoning represents the edge transition potential. Let represent the logarithmic probability of the inference confidence shifting along the edge in step t.
[0033] Specifically, such as Figure 2 and Figure 3 As shown, the specific training process for a knowledge subgraph retrieval system based on gated flow propagation includes the following sub-steps: Step S101: For the question-answering training samples, complete the semantic prior injection and calculate the semantic prior score of the knowledge graph edges.
[0034] For the question-answer samples in the training set, the embedding vector of question q is calculated using a text embedding model. Embedding vectors corresponding to the edges in the knowledge graph The semantic prior of edges is calculated using cosine similarity and mapped to logarithmic space, filtering out low-relevance noise edges and providing basic semantic guidance for subsequent flow propagation. The core formula is:
[0035] in, This is a truncation function. This is the minimum similarity threshold.
[0036] Step S102: Perform bidirectional flow propagation through the teacher module, calculate the global importance score of the edges, and generate a high-quality supervised mask.
[0037] The teacher module, which has access to both the subject and answer entities, performs bidirectional flow propagation to generate high-quality supervision signals, specifically as follows: Initialize the initial states of the forward and backward flows in logarithmic space, with the forward flow starting from the subject entity. As the source, the reverse flow uses the answer entity. Source:
[0038] Define a flow propagation operator in logarithmic space, perform K-step forward and backward flow iterative propagation, and complete the global path semantic evaluation from the topic entity to the answer entity:
[0039] in, For entities The set of incoming neighbor entities; forward flow from arrive Perform forward propagation, and reverse flow from the inverted knowledge graph. arrive Perform backpropagation. It is a smooth maximum value function.
[0040] Regarding the first The edge of the step The core formula for calculating its global importance score is:
[0041] For each hop, select the top-N edges based on their importance scores as positive samples and assign them a value of 1; the rest are considered negative samples and assigned a value of 0. This generates a binary supervision mask for each hop. As the training objective of the student module, For fact triple edges in a knowledge graph.
[0042] Step S103: Train the student module to fit the global navigation logic of the teacher module based on the gated flow propagation mechanism, and complete the model convergence through the joint loss function.
[0043] The student module is a weight-shared recurrent network that performs scalar probability propagation on a time-state grid. It learns global navigation logic by fitting a supervised mask generated by the teacher module. Specifically: By using a residual intent-gated loop unit, the query intent vector is updated with each inference step, simulating the shift of concerns during the inference process:
[0044]
[0045] in, The residual coefficients balance the current exploration intent with the global constraints of the original query. This indicates a gated loop unit.
[0046] The conditional transition log probability of the edge is calculated using a gated multilayer perceptron, and the final edge log probability is obtained by combining it with the upstream cumulative flow.
[0047] In the formula, This represents a multilayer perceptron. , , Entities , The embedding vector, and , Embedded vectors of relationships between them The position is encoded for the current hop count by mapping the hop count to an embedding vector through a learnable mapping layer. Indicates splicing.
[0048] Construct a joint loss function, train the model to obtain the knowledge subgraph retrieval tool, and the loss formula is as follows:
[0049] The first term is the dynamically weighted binary cross-entropy loss, and the second term is the sparsity regularization loss. For sparsity intensity hyperparameter, Let cross-entropy be the loss function. The binary supervision mask represents the ratio of the number of positive samples to the number of negative samples in each training batch. The ratio of the number of 1s to the number of 0s in the middle. The sigmoid activation function maps edge transition probabilities to between 0 and 1. This is used to encourage the model to generate sparser subgraphs.
[0050] Step S2: Adaptive knowledge subgraph extraction for user-related questions.
[0051] For details, see Figure 4 and Figure 5 The specific process for adaptive knowledge subgraph extraction for user-oriented questions includes: Step S201: For the natural language question input by the user, identify the subject entities and construct an initial candidate subgraph.
[0052] Natural language issues related to user input First, a large language model is used to identify key entities from natural language problems. Then, these key entities are encoded using a text embedding model. Next, the semantic similarity between each key entity and entities in the knowledge graph is calculated, and the entities with the highest similarity are returned (the top few similar entities for each key entity are returned). Finally, the large language model assists in determining the correspondence between key entities and entities in the knowledge graph, thus completing the entity linking process. These sets of topic entities are denoted as... .
[0053] When the subject entity set Once determined, Centered on the background knowledge graph, an initial candidate subgraph for the problem is constructed. This means extracting entities from a knowledge graph within a specified number of hops, centered on a topic entity, where the range is a parameter that can be set manually.
[0054] Step S202: Using the trained retrieval system, output the confidence scores of each edge for the initial candidate subgraph.
[0055] Based on the trained retrieval system, output the initial candidate subgraph. The final log-confidence score of each edge in the equation. .
[0056] Step S203: Perform adaptive subgraph pruning through K-Means clustering and output the final adaptive knowledge subgraph.
[0057] Set the edge scores As input, K-Means clustering ( ) divided into signal clusters With noise clusters The clustering optimization objective is:
[0058] in, These are the centroids of the signal cluster and the noise cluster, respectively. Edges within signal clusters are filtered according to the nearest neighbor rule, ultimately generating an adaptive knowledge subgraph. This serves as the foundational environment for subsequent reasoning.
[0059] Step S3: Multi-agent collaborative question-answering reasoning based on structured evidence archives.
[0060] The following are definitions of some of the terms used in this step: (1) Constraint logic query graph.
[0061] Undirected graph Node set It consists of subject entities and logical variables, and is an edge set. Atomic logic fragments decomposed from the problem The structure consists of a subject s and an object o, which are the main entities or logical variables, and a predicate p, which is an abstract relational phrase used to describe the constraint relationships between the entities and variables in the problem.
[0062] (2) Query plan .
[0063] Master generation constraints Side verification constraints The ordered execution plan is formed, where the constraints are logical paths composed of nodes and edges in the constraint logic query graph, representing the complete constraint relationship from one entity or variable to another in the problem, such as "Zhang San - attends - school - located at - address".
[0064] (3) Structured evidence archives.
[0065] The mapping set of candidate answers to their corresponding evidence sets can be formally represented as:
[0066] in, as candidate answers The corresponding set of evidence, To satisfy constraints The path of evidence.
[0067] like Figure 6 and Figure 7 As shown, the specific process of multi-agent collaborative question-answering reasoning based on structured evidence archives is as follows: Step S301: The planning agent receives the user's question and the adaptive knowledge subgraph, completes the question parsing, and constructs the constraint logic query graph.
[0068] The planning agent (i.e., the first agent) is responsible for transforming unstructured natural language problems into executable constraint-centered query plans. First, it invokes a semantic parsing tool to decompose the problem into a set of independent atomic logical fragments using a large language model. Simultaneously, it identifies the core information sought by the problem, i.e., the target variable. For example, when the natural language problem is "The location of the school Zhang San attends," it is decomposed into (Zhang San, attends, school) and (school, located, address), where the target variable is the address. The parsed logical fragments are assembled into a constraint logic query graph, and the graph's connectivity is verified. If the graph is not connected, a self-correcting loop is entered, regenerating segments until a connected and valid constraint logic query graph is obtained. If the loop exceeds its limits, a fallback strategy is executed, constructing a basic main chain from the subject entity to the target variable to ensure the inference is executable.
[0069] Step S302: The planning agent generates an ordered constraint center query plan based on the constraint logic query graph.
[0070] Based on the effective constraint logic query graph, a final query plan is generated using a query building tool. All candidate main chains from the subject entity to the target variable are enumerated, and the score of each main chain is calculated using expected fan-out. The path with the lowest score is selected as the primary generation constraint. The core formula is as follows:
[0071]
[0072] in, abstract predicate Specific relationships in the adaptive subgraph semantic similarity, For relationship The fan-out number is the number of tail entities connected by the relation. This is the product of the expected fan-outs of all abstract predicates on the candidate main chain. Based on the main chain, the remaining nodes are transformed into validation constraints through graph traversal, ultimately generating an ordered query plan.
[0073] Step S303: Align the intelligent swarm to execute the query plan, and complete the evidence path search for each constraint through dual signal scoring and dual pool search.
[0074] The alignment agent swarm (i.e., the second agent) is a cluster of sub-agents that execute in parallel, responsible for executing the query plan and collecting evidence chains. For each constraint in the query plan, the alignment agent invokes a path search tool to perform a search in the adaptive knowledge subgraph, returning the top-k high-scoring inference paths. Each inference path corresponds to an instantiated alignment sub-agent, responsible for instantiating the entities in the inference path to the variables of their corresponding constraints. All sub-agents execute in parallel, completing the evidence search and variable binding for all constraints.
[0075] The path search tool employs a dual-pool beam search algorithm to reduce the risk of premature pruning of correct paths. First, the expected logical path length corresponding to the constraint is determined, i.e., the number of abstract predicates required for the first entity or variable in the constraint to reach the last entity or variable. An exploration pool for active path expansion and a completion pool for storing paths reaching the expected length are initialized. The beamwidth B and the number of returned paths K are set. In each iteration, single-hop expansion is performed on all paths in the exploration pool. New paths are scored using a dual-signal scoring function, and the top-B paths are updated based on their scores. Paths reaching or exceeding the expected length are moved to the completion pool, while still participating in subsequent explorations in the exploration pool. When the top-K paths in the completion pool maintain a stable ranking throughout a complete iteration, the search terminates, and the top-K paths in the completion pool are returned.
[0076] The core of the path search tool is a dual-signal scoring function that integrates the edge confidence scores output by the subgraph retrieval tool. This function balances the local factual plausibility of the path with its global semantic relevance. The original scores of candidate paths are normalized to the log-rank space using a log-rank transformation. Where rank is the dimensional ranking of the candidate path, two core scoring signals are defined: structure score. The semantic score is taken from the triple edge confidence score and semantic score output by the subgraph retrieval tool. Let be the embedding semantic similarity between the reasoning path and its corresponding constraint, and calculate the final total score of the path:
[0077] Step S304: Align the intelligent agent group to complete variable binding and iterative alignment, and assemble to generate a structured evidence archive.
[0078] After each sub-agent completes all constraints, it submits the candidate answer, the evidence path corresponding to each constraint, and the variable binding result to the evidence archive. If multiple sub-agents generate the same candidate answer, only the entry with the highest total score in the evidence set is retained, and finally a complete structured evidence archive is obtained.
[0079] In step S305, the answer evaluation agent evaluates and ranks candidate answers based on the evidence archive, and outputs the final answer and the interpretable evidence chain.
[0080] The structured evidence archive is input into the answer evaluation agent (i.e., the third agent), which evaluates and ranks the candidate answers based on the completeness, quality, and overall rationality of the evidence. At the same time, it outputs the structured evidence chain corresponding to each answer, so as to realize the full-link interpretability and traceability of the reasoning process.
[0081] Example 2 Building upon Example 1, this example provides a knowledge graph question-answering system based on knowledge subgraph retrieval and multi-agent collaboration. This system implements the knowledge graph question-answering method of Example 1. It can be deployed on a server or high-performance computer. For a user's question text (or multimodal data including text), it retrieves relevant knowledge from the knowledge graph to generate a response to the user's question. See also... Figure 8 The system includes: (1) Adaptive knowledge subgraph extraction module.
[0082] This process involves acquiring natural language question data input by the user, identifying key entities in the question, calculating semantic similarity with entities in a pre-acquired knowledge graph, linking key entities with entities in the knowledge graph, and obtaining a topic entity set. Within the knowledge graph, entities within a preset hop count are extracted as initial candidate subgraphs, centered on the topic entity set. Based on the initial candidate subgraphs, a knowledge subgraph retrieval tool pre-trained based on gated flow propagation calculates the confidence score of each edge in the initial candidate subgraphs. The edge set is then divided into signal clusters through clustering, and edges within these signal clusters are filtered using nearest neighbor rules to obtain the final adaptive knowledge subgraph. The multi-agent collaborative question-answering reasoning module deploys various types of agents. The first agent, based on natural language question data, decomposes the question into a set of independent atomic logical fragments through semantic parsing, identifies the target variable of the question, constructs a constraint logic query graph, and obtains an ordered query plan including multiple constraints through master generation constraint selection and verification constraint transformation. For each constraint in the ordered query plan, multiple reasoning paths are obtained through adaptive knowledge subgraph search. A second agent is assigned to each reasoning path. The second agent executes the ordered query plan to obtain an evidence chain through a dual-signal path scoring function and dual-pool search. Based on the evidence chain, evidence archive assembly and answer evaluation reasoning are performed to achieve knowledge graph question answering.
[0083] The above description is merely a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any person skilled in the art can easily conceive of various equivalent modifications or substitutions within the technical scope disclosed in the present invention, and these modifications or substitutions should all be covered within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.
Claims
1. A knowledge graph question answering method based on knowledge sub-graph retrieval and multi-agent collaboration, characterized in that, Includes the following steps: The system acquires natural language question data input by the user, identifies key entities in the question, calculates the semantic similarity between the key entities and entities in the pre-acquired knowledge graph, and links the key entities with entities in the knowledge graph to obtain a set of topic entities. In the knowledge graph, entities within a preset range of hops are extracted as initial candidate subgraphs, centered on the set of topic entities. Based on the initial candidate subgraph, the confidence score of each edge in the initial candidate subgraph is calculated using a knowledge subgraph retrieval tool pre-trained based on gated flow propagation. The edge set is divided into signal clusters by clustering, and the edges in the signal clusters are filtered by the nearest neighbor rule to obtain the final adaptive knowledge subgraph. Based on the natural language problem data, the first intelligent agent decomposes the problem into a set of mutually independent atomic logic fragments through semantic parsing, identifies the target variable of the problem, constructs a constraint logic query graph, and obtains an ordered query plan including multiple constraints through master generation constraint selection and verification constraint transformation. For each constraint in the ordered query plan, multiple reasoning paths are obtained through the adaptive knowledge subgraph search. A second agent is assigned to each reasoning path. The second agent executes the ordered query plan to obtain a chain of evidence through a dual-signal scoring function and a dual-pool search. Based on the aforementioned chain of evidence, evidence files are assembled and answers are evaluated and reasoned to achieve knowledge graph question answering. 2.The knowledge graph question answering method based on knowledge sub-graph retrieval and multi-agent collaboration according to claim 1, characterized in that, The process of obtaining the set of topic entities includes the following steps: Based on the natural language problem data, a large language model is used to identify key entities in the problem; The key entities are encoded, and their semantic similarity with entities in the knowledge graph is calculated. Based on the semantic similarity, the correspondence between the key entities and entities in the knowledge graph is identified through a large language model, thereby achieving entity linking and obtaining a set of topic entities. 3.The knowledge graph question answering method based on knowledge sub-graph retrieval and multi-agent collaboration according to claim 1, characterized in that, The process of obtaining the adaptive knowledge subgraph includes the following steps: Based on the confidence score of each edge in the initial candidate subgraph, K-Means clustering is used to divide the edges of the initial candidate subgraph into signal clusters and noise clusters with the goal of minimizing the sum of the distance differences between each edge in the same cluster and the centroid of the cluster. Edges within the signal cluster are filtered according to the nearest neighbor rule to obtain an adaptive knowledge subgraph. 4.The knowledge graph question answering method based on knowledge sub-graph retrieval and multi-agent collaboration according to claim 1, characterized in that, The training process of the knowledge subgraph retrieval machine includes the following steps: The embedding vectors of questions in the question-answering samples are calculated to correspond to the embedding vectors of each edge in the knowledge graph. The semantic prior of the edge is calculated by cosine similarity and mapped to the logarithmic space to filter out low-relevance noise edges. Initialize the initial states of the forward and backward flows in the logarithmic space. The forward flow takes the topic entity as the source and the backward flow takes the answer entity as the source. Perform K-step forward and backward flow iterative propagation to complete the global path semantic evaluation from the topic entity to the answer entity. Calculate the global importance score of the edge at each step, take the Top-N as positive samples and the rest as negative samples to obtain the binary supervised mask for each hop. By using a residual intent-gated loop unit, the query intent vector is updated with each inference step to simulate the shift of focus during the inference process. The conditional transition log probability of the edge is calculated by a gated multilayer perceptron and combined with the upstream cumulative flow to obtain the final edge log probability. A joint loss function based on dynamic weighted binary cross-entropy loss and sparsity regularization loss is constructed, and a knowledge subgraph retrieval engine is trained.
5. The knowledge graph question answering method based on knowledge subgraph retrieval and multi-agent collaboration according to claim 1, characterized in that, The process of obtaining the constraint logic query graph includes the following steps: The first intelligent agent decomposes the problem into a set of independent atomic logical fragments through a large language model, and identifies the target variable sought by the problem. The parsed logic segments are assembled into a constraint logic query graph, and the connectivity of the graph is verified. If the graph is not connected, a self-correction loop is entered to regenerate the segments until a connected and valid constraint logic query graph is obtained. If the loop exceeds the limit, a basic main chain from the subject entity to the target variable is constructed as a constraint logic query graph.
6. The knowledge graph question answering method based on knowledge subgraph retrieval and multi-agent collaboration according to claim 1, characterized in that, The process of obtaining the ordered query plan includes the following steps: Based on the constraint logic query graph, all candidate main chains from the subject entity to the target variable are enumerated, and the score of each main chain is calculated by expected fan-out. The path with the smallest score is selected as the main generation constraint. Based on the main chain, the remaining nodes of the constraint logic query graph are transformed into verification constraints through graph traversal, generating an ordered query plan.
7. The knowledge graph question answering method based on knowledge subgraph retrieval and multi-agent collaboration according to claim 1, characterized in that, The dual-pool search process includes the following steps: Obtain the expected logical path length corresponding to the constraint, initialize the exploration pool and the completion pool, set the beamwidth B and the number of return paths K, and in each iteration, perform single-hop expansion on all paths in the exploration pool, score the new path using a dual-signal scoring function, and update the exploration pool by taking the Top-B score. Paths whose length reaches or exceeds the expected length are moved to the completion pool, while the original paths in the exploration pool are retained. When the top-K paths in the completion pool maintain a stable ranking throughout a complete iteration, the search is terminated, and the top-K paths in the completion pool are returned.
8. The knowledge graph question answering method based on knowledge subgraph retrieval and multi-agent collaboration according to claim 1, characterized in that, The dual-signal scoring function is constructed based on the confidence score output by the knowledge subgraph retrieval machine and the embedding semantic similarity between the reasoning path and the corresponding constraint.
9. The knowledge graph question answering method based on knowledge subgraph retrieval and multi-agent collaboration according to claim 1, characterized in that, The process of assembling the evidence file and evaluating the reasoning for the answer includes the following steps: The candidate answers, the evidence paths corresponding to the constraints, and the variable binding results are submitted to the evidence archive. If multiple second agents generate the same candidate answer, only the entry with the highest total score in the evidence set is retained, and finally a complete structured evidence archive is obtained. The structured evidence archive is input into a third-party intelligent agent, which evaluates and ranks candidate answers based on evidence completeness, evidence quality, and overall rationality, and simultaneously outputs the structured evidence chain corresponding to each answer.
10. A knowledge graph question-answering system based on knowledge subgraph retrieval and multi-agent collaboration, characterized in that, The system for implementing the knowledge graph question answering method as described in any one of claims 1-9 includes: An adaptive knowledge subgraph extraction module is used to acquire natural language question data input by the user, identify key entities in the question, and link the key entities with entities in the knowledge graph by calculating the semantic similarity with the entities in the pre-acquired knowledge graph, thereby obtaining a topic entity set. In the knowledge graph, with the topic entity set as the center, entities within a preset range of hops are extracted as initial candidate subgraphs. Based on the initial candidate subgraphs, a knowledge subgraph retrieval device pre-trained based on gated flow propagation is used to calculate the confidence score of each edge in the initial candidate subgraphs. The edge set is divided into signal clusters by clustering, and the edges in the signal clusters are filtered by the nearest neighbor rule to obtain the final adaptive knowledge subgraph. The multi-agent collaborative question-answering reasoning module deploys multiple types of agents. The first agent, based on the natural language question data, decomposes the question into a set of independent atomic logical fragments through semantic parsing, identifies the target variable of the question, constructs a constraint logic query graph, and obtains an ordered query plan including multiple constraints through master generation constraint selection and verification constraint transformation. For each constraint in the ordered query plan, multiple reasoning paths are obtained through adaptive knowledge subgraph search. A second agent is assigned to each reasoning path. The second agent executes the ordered query plan to obtain an evidence chain through a dual-signal path scoring function and dual-pool search. Based on the evidence chain, evidence archive assembly and answer evaluation reasoning are performed to achieve knowledge graph question answering.
Citation Information
Patent Citations
Question and answer method for enhancing large language model generation based on graph neural network retriever
CN118656475A
Knowledge graph question and answer method combining large model and graph neural network
CN119848190A
Retrieval method and device based on knowledge graph
CN121412368A