Knowledge Graph Retrieval Methods and Devices

By employing a progressive knowledge graph retrieval method and utilizing relationship filtering strategies across different iteration rounds, the frequency of calls to large language models and computational latency are reduced. This addresses the issue of high computational resource consumption in iterative knowledge graph retrieval, thereby achieving efficient knowledge graph retrieval.

CN122309725APending Publication Date: 2026-06-30CLOUDCHAIN GRP CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CLOUDCHAIN GRP CO LTD
Filing Date
2026-06-02
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

In existing image recognition technologies, there is a lack of effective relationship filtering mechanisms to reduce the frequency of large language model calls and computational latency, resulting in high computational latency and high resource consumption in the iterative knowledge graph retrieval process.

Method used

A progressive graph retrieval method is adopted, which uses relation filtering strategies with different computational latencies during the iteration process. In the initial stage, a low-latency strategy is used to quickly filter candidate relations, and in the deep exploration stage, a high-latency strategy is used for accurate filtering, thereby reducing the use of high-latency computing resources.

Benefits of technology

It reduces the end-to-end response latency and computing resource utilization of iterative knowledge graph retrieval while ensuring retrieval accuracy.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122309725A_ABST
    Figure CN122309725A_ABST
Patent Text Reader

Abstract

This application provides a knowledge graph retrieval method and apparatus, relating to the field of electronic digital data processing technology. The method includes: determining a starting entity set based on a natural language question, and using this set as the first-round entities to perform multi-round progressive graph retrieval. In each round, candidate relation triples associated with the entities are obtained; an applicable strategy is determined from multiple relation filtering strategies based on the current round, with earlier rounds having lower computational latency than later rounds; the strategy is used to filter candidate relations and obtain associated document fragments; and the next round of entities is determined based on the filtering results. This application can solve the problem of high computational latency and high resource consumption caused by frequent calls to large language models in iterative knowledge graph retrieval. By adopting a low-latency filtering strategy in the early stages of retrieval and only enabling the large language model in the deep exploration phase, the number of model calls and waiting time can be significantly reduced, effectively reducing end-to-end response latency and computing device resource utilization while ensuring retrieval accuracy.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of electronic digital data processing technology, and in particular to knowledge graph retrieval methods and devices. Background Technology

[0002] In Retrieval-Augmented Generation (RAG) systems, relying solely on vector similarity to retrieve document chunks often falls short of answering complex questions requiring multi-step reasoning. For example, the question "Who invested in the company founded by entrepreneur A?" requires first finding the companies founded by entrepreneur A, and then finding the investors who invested in those companies, involving multi-hop relational reasoning within a knowledge graph (KG). Therefore, the industry has proposed KG-enhanced RAG methods, combining the structured reasoning capabilities of knowledge graphs with the semantic matching capabilities of vector retrieval. Several methods have been proposed, such as Microsoft's community-detection-based knowledge graph retrieval method (GraphRAG), the lightweight knowledge graph retrieval method (LightRAG), the knowledge graph-guided retrieval enhancement method (KG2RAG), the tightly coupled graph-text iterative exploration method driven by a large language model (LLM) (Think-on-Graph 2, ToG-2), and the path retrieval method based on flow propagation pruning (PathRAG).

[0003] GraphRAG divides the knowledge graph into multi-level communities using the Leiden community detection algorithm and generates summary reports. During retrieval, it recalls entities through vector retrieval and combines them with community reports to form context. Its scoring algorithm is the product of vector similarity and PageRank node importance. LightRAG adopts a simplified entity relation extraction process, omitting the community detection step, and recalls entities through a combination of keyword matching and vector retrieval. KG2RAG obtains seed document fragments through preliminary retrieval, expands the graph using a fixed-hop breadth-first search (BFS) based on the triples associated with the seed fragments, removes redundant triples using a maximum spanning tree (MST), and finally performs cross-encoder reordering. ToG-2 is driven by a large language model (LLM) for iterative exploration. Each round performs three steps: relation discovery, relevance scoring and filtering of each relation by the large language model, and entity discovery. After each round, the large language model determines whether the information is sufficient. PathRAG uses a flow propagation pruning algorithm to discover critical paths between relevant nodes and sorts them by the average resource value of the paths.

[0004] However, while the aforementioned methods each have their own characteristics in algorithm design, they share common technical shortcomings in relation selection strategies. ToG-2 relies entirely on a large language model to score candidate relations one by one in each iteration, requiring calls to the large language model in each iteration, resulting in high computational latency and resource consumption. GraphRAG implicitly limits the range of relations based on community boundaries, making it unable to perform fine-grained selection of individual relations. KG2RAG uses a maximum spanning tree based on graph structure for redundancy removal, without considering the semantic relevance between relations and queries. PathRAG uses a resource propagation threshold for pruning, essentially a structured metric based on graph topology, without involving semantic understanding. LightRAG only performs planar vector recall and does not perform relation selection during graph exploration.

[0005] Therefore, existing technologies lack a relationship filtering mechanism that can effectively reduce the frequency of large language model calls during iterative knowledge graph retrieval, thereby reducing computational latency and computational device resource consumption. Summary of the Invention

[0006] In view of this, embodiments of this application provide a knowledge graph retrieval method and device to eliminate or improve one or more defects existing in the prior art.

[0007] One aspect of this application provides a knowledge graph retrieval method, including: Determine the starting entity set based on the natural language question contained in the current query request; Using the initial entity set as the topic entity set for the first iteration round, perform at least one iteration round of progressive graph retrieval steps to obtain retrieval information for the natural language problem as the retrieval result for the natural language problem; the retrieval information includes: relationship information between different entities and document fragments related to the natural language problem; The progressive map retrieval step includes: Obtain a set of candidate relationships associated with the set of topic entities in the current round; wherein, the set of candidate relationships contains multiple candidate relationship data, and the candidate relationship data is a triple in a preset knowledge graph used to represent the relationship between two entities; Based on the current iteration round, determine one relation filtering strategy applicable to the current iteration round from at least two preset relation filtering strategies; wherein, in each iteration round, the computational delay of the relation filtering strategy applicable to the earlier iteration round is lower than the computational delay of the relation filtering strategy applicable to the later iteration round. The candidate relation set for the current iteration is filtered using the relation filtering strategy applicable to the current iteration, resulting in a filtered relation set for the current iteration, and document fragments associated with the filtered relation set are obtained. If the current iteration does not meet the preset iteration termination condition, the set of subject entities for the next iteration is determined based on the filtered relation set of the current iteration.

[0008] In some embodiments of this application, the types of relationship filtering strategies include: The first relation filtering strategy is applicable to the first iteration round. The first relation filtering strategy is used to filter based on the degree of overlap between the words contained in the candidate relation data and the words in the natural language problem. The second relation filtering strategy, applicable to the second iteration round, is used to filter based on the similarity between the vector representation of the candidate relation data and the vector representation of the natural language question. The third relation screening strategy is applicable to the third and subsequent iteration rounds. The third relation screening strategy is used to screen the candidate relation data and the natural language problem based on the relevance score of the large language model.

[0009] In some embodiments of this application, if the relation filtering strategy applicable to the current iteration round is the first relation filtering strategy, then the step of filtering the candidate relation set of the current round using the relation filtering strategy applicable to the current iteration round to obtain the filtered relation set of the current iteration round includes: For each candidate relation data, the number of overlaps between the words contained in each candidate relation data and the words contained in the natural language question is calculated. The ratio of the number of overlaps to the total number of words in the natural language question is multiplied by a preset word overlap weight to obtain the word overlap score of the candidate relation data. In addition, a list of key entities corresponding to the natural language question is obtained; the key entities stored in the list are entity names identified from the natural language question and having corresponding nodes in the knowledge graph. For each candidate relation data, when the head entity in the candidate relation data matches a key entity in the key entity list, or when the tail entity in the candidate relation data matches a key entity in the key entity list, a preset entity matching score is added to the word overlap score. Furthermore, the sub-question decomposition structure corresponding to the natural language question is obtained, and the sub-question decomposition structure includes source entity, target entity, and relation hints. For each candidate relation data, when the head entity and tail entity in the candidate relation data match the source entity and target entity in the sub-question decomposition structure, respectively, a preset target matching score is further added to the current score, and the added score is used as the new current score. Divide the current score by a preset normalization divisor to obtain the screening score for the candidate relationship data; Based on the screening scores of each candidate relationship data, a portion of the candidate relationship data is selected from the candidate relationship set to form the filtered relationship set for the current iteration round.

[0010] In some embodiments of this application, if the relation filtering strategy applicable to the current iteration round is the third relation filtering strategy, then the step of filtering the candidate relation set of the current round using the relation filtering strategy applicable to the current iteration round to obtain the filtered relation set of the current iteration round includes: Check whether the counter for consecutive failures of the large language model call has reached the preset circuit breaker threshold; If so, then abandon the invocation of the large language model and instead use the second relation filtering strategy to filter the candidate relation set of the current iteration round; If not, the candidate relation data in the candidate relation set is formatted as text, and then the formatted text and the natural language question are submitted to the large language model to obtain the relevance score of each candidate relation data in the candidate relation set to the natural language question output by the large language model; if the acquisition is successful, the consecutive failure count counter is reset to zero; if the acquisition fails, the consecutive failure count counter is incremented by one, and the scores of all candidate relation data in this round are discarded. Based on the relevance scores of each candidate relation data, a portion of the candidate relation data is selected from the candidate relation set to form the filtered relation set for the current iteration round.

[0011] In some embodiments of this application, obtaining the set of candidate relationships associated with the set of topic entities in the current round includes: Create a request-level relationship cache; wherein, the request-level relationship cache is used to store the mapping relationship between the queried entities and candidate relationship data within the lifecycle of a single query request; For each entity in the topic entity set of the current round, check whether the candidate relationship data corresponding to the entity has been stored in the request-level relationship cache; if it has been stored, read the candidate relationship data corresponding to the entity from the request-level relationship cache; if it has not been stored, record the entity as a missing entity, and initiate a query to the search engine to obtain the candidate relationship data corresponding to each missing entity, and store the queried candidate relationship data into the request-level relationship cache. After processing all entities in the current round of topic entity set, candidate relationship data corresponding to each entity in the current round of topic entity set is read from the request-level relationship cache and summarized to form a candidate relationship set associated with the current round of topic entity set.

[0012] In some embodiments of this application, initiating a query to the search engine to obtain candidate relation data corresponding to each missing entity includes: Using all the missing entities in the current round's topic entity set as head entity conditions, the search engine is used to batch query the knowledge graph triples to obtain the first query result; Traverse the first query result and mark the entities that appear as tail entities in the first query result and belong to each of the missing entities as covered entities; The remaining entities in each of the missing entities, excluding the covered entities, are used as tail entity conditions to batch query the knowledge graph triples in the search engine to obtain the second query result. The first query result and the second query result are combined to form the candidate relationship data.

[0013] In some embodiments of this application, the progressive map retrieval step further includes: After obtaining the document fragments associated with the filtered relationship set, the document fragments obtained in the current iteration and the filtered relationship set are formatted as text and used as new content for the current iteration. If an evidence summary from the previous iteration exists, the new content of the current iteration and the evidence summary from the previous iteration are submitted to the large language model for incremental analysis, so that the large language model outputs the list of answered aspects, the list of unanswered aspects, and the evidence summary for the current iteration. If there is no evidence summary output from the previous iteration, the new content of the current iteration is submitted to the large language model for incremental analysis, so that the large language model outputs the list of answered aspects, the list of unanswered aspects, and the evidence summary of the current iteration. Furthermore, before the large language model performs the incremental analysis, based on the filtered relation set of the current iteration round and the newly added document fragments in the current iteration round, the set of topic entities involved in the prediction of the next iteration round is determined as the prediction result data; during the execution of the incremental analysis by the large language model, asynchronous parallel execution is performed: a relation query for the prediction result data is pre-initiated in the current request-level relation cache; The preset iteration termination conditions include: the current iteration round reaches the preset maximum iteration round or the target condition has been met; the target condition includes: the unanswered aspect list of the current iteration round is empty and the answered aspect list of the current iteration round is not empty.

[0014] In some embodiments of this application, determining the set of topic entities for the next iteration based on the filtered set of relations in the current iteration round includes: Extract the head entity and tail entity from the filtered relation set of the current iteration round, exclude the entities that have been expanded, and obtain the candidate entity set; wherein, the expanded entities include entities that have been queried in the previous iteration round and earlier. For each candidate entity in the candidate entity set, determine the document segments that contain the candidate entity in the acquired document segments. Based on the similarity score, ranking position, and preset exponential decay factor of each document segment containing the candidate entity, calculate the contribution score of each document segment to the candidate entity. Sum the contribution scores to obtain the importance score of the candidate entity. Select the top 100 candidate entities with the highest importance scores as the set of topic entities for the next iteration.

[0015] In some embodiments of this application, before determining the starting entity set based on the natural language question included in the current query request, the method further includes: By invoking a large language model in a single query, multiple analysis tasks are performed simultaneously on the natural language questions contained in the current query request. These tasks include: query type classification, keyword extraction, sub-question decomposition, and key entity identification. Query type classification is used to obtain query complexity types, including simple query types, multi-hop query types, and open reasoning query types. Keyword extraction is used to obtain a list of search keywords. Sub-question decomposition is used to obtain a sub-question decomposition structure, which includes source entities, target entities, and relational hints. Key entity identification is used to obtain a list of key entities. The key entities in the list are the names of entities identified from the natural language questions that have corresponding nodes in the knowledge graph. Based on the query complexity type of the natural language problem, determine the maximum number of iteration rounds corresponding to that query complexity type; wherein, when the query complexity type is the simple query type, the maximum number of iteration rounds is 1; when the query complexity type is the multi-hop query type or the open reasoning query type, the maximum number of iteration rounds is greater than 1.

[0016] In some embodiments of this application, determining the starting entity set based on the natural language question contained in the current query request includes: Based on the natural language question contained in the current query request, keyword retrieval and vector semantic retrieval are performed to obtain a set of seed document fragments and the similarity score of each seed document fragment in the set. The seed document fragments are arranged in descending order of their similarity scores, and the score difference between the seed document fragment ranked first and the seed document fragment ranked Nth is calculated, where N is a preset integer greater than 1. If the score difference is less than a preset confidence threshold, or the similarity score of the seed document fragment ranked first is less than a preset minimum score threshold, then related entities are extracted from the knowledge graph triple associated with the seed document fragment set, and the related entities are merged with the entities in the key entity list to remove duplicates, forming an initial entity set. The method further includes: If the score difference is equal to or greater than a preset confidence threshold, and the similarity score of the seed document fragment ranked first is equal to or greater than a preset minimum score threshold, then the set of seed document fragments is directly used as the search result for the natural language question.

[0017] In some embodiments of this application, the knowledge graph retrieval method further includes: For each document fragment in the retrieved information, a knowledge graph relevance score is calculated for that document fragment. The knowledge graph relevance score is obtained by weighted summation of entity coverage score, relational relevance score, and proximity decay coefficient. The entity coverage score is the proportion of the number of extended entities contained in the document fragment to the preset total number of extended entities. The relational relevance score is calculated by logarithmically scaling the number of hits of candidate relation data associated with the document fragment. The proximity decay coefficient is determined based on the iteration round from which the document fragment originates, wherein the earlier the iteration round, the larger the proximity decay coefficient. The query type factor is determined based on the query complexity type corresponding to the natural language question, and the coverage factor is calculated using a smoothing function based on the entity coverage score. The preset basic knowledge graph weight, the query type factor, and the coverage factor are multiplied together to obtain the dynamic knowledge graph weight. For each document fragment in the retrieved information, the similarity score of the document fragment is multiplied by the target difference to obtain a first product; and the knowledge graph relevance score is multiplied by the dynamic knowledge graph weight to obtain a second product; the sum of the first product and the second product is used as the final score of the document fragment; wherein, the target difference is 1 minus the difference of the dynamic knowledge graph weight; The document fragments are sorted in descending order according to the final score to obtain a sorted list of document fragments. The sorted list of document fragments is then output as the target retrieval result for the natural language question.

[0018] Another aspect of this application provides a knowledge graph retrieval system, comprising: The preprocessing module is used to determine the starting entity set based on the natural language questions contained in the current query request; The progressive graph retrieval module is used to take the initial entity set as the topic entity set of the first iteration round, perform at least one iteration round of progressive graph retrieval steps, and obtain retrieval information for the natural language question as the retrieval result for the natural language question; the retrieval information includes: relationship information between different entities and / or document fragments; The progressive map retrieval step includes: Obtain a set of candidate relationships associated with the set of topic entities in the current round; wherein, the set of candidate relationships contains multiple candidate relationship data, and the candidate relationship data is a triple in a preset knowledge graph used to represent the relationship between two entities; Based on the current iteration round, determine one relation filtering strategy applicable to the current iteration round from at least two preset relation filtering strategies; wherein, in each iteration round, the computational delay of the relation filtering strategy applicable to the earlier iteration round is lower than the computational delay of the relation filtering strategy applicable to the later iteration round. The candidate relation set for the current iteration is filtered using the relation filtering strategy applicable to the current iteration, resulting in a filtered relation set for the current iteration, and document fragments associated with the filtered relation set are obtained. If the current iteration does not meet the preset iteration termination condition, the set of subject entities for the next iteration is determined based on the filtered relation set of the current iteration.

[0019] A third aspect of this application provides an electronic device including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the knowledge graph retrieval method described above.

[0020] A fourth aspect of this application provides a computer-readable storage medium having a computer program stored thereon that, when executed by a processor, implements the knowledge graph retrieval method described above.

[0021] A fifth aspect of this application provides a computer program product, including a computer program that, when executed by a processor, implements the knowledge graph retrieval method described above.

[0022] The knowledge graph retrieval method provided in this application determines a starting entity set based on the natural language question contained in the current query request; uses the starting entity set as the topic entity set for the first iteration round, and performs at least one iterative graph retrieval step to obtain retrieval information for the natural language question as the retrieval result for the natural language question; the retrieval information includes: relationship information between different entities and / or document fragments; wherein, the progressive graph retrieval step includes: obtaining a candidate relationship set associated with the topic entity set of the current round; wherein, the candidate relationship set contains multiple candidate relationship data, and the candidate relationship data is a triple in a preset knowledge graph used to represent the relationship between two entities; based on the current iteration round, determining a relationship filtering strategy applicable to the current iteration round from at least two preset relationship filtering strategies; wherein, in each iteration round, the computation delay of the relationship filtering strategy applicable to the earlier iteration round is lower than that of the later iteration round. The computational latency of the applicable relation filtering strategy; using the relation filtering strategy applicable to the current iteration round to filter the candidate relation set of the current round, obtaining the filtered relation set of the current iteration round, and obtaining the document fragments associated with the filtered relation set; if the current iteration round does not meet the preset iteration termination condition, then the topic entity set of the next iteration round is determined based on the filtered relation set of the current iteration round; by determining the applicable strategy from at least two preset relation filtering strategies according to the current iteration round, and the computational latency of the strategy used in earlier iteration rounds is lower than that of the strategy used in later iteration rounds, a low-latency filtering method can be used to quickly filter candidate relations in the early rounds of iterative knowledge graph retrieval, and a high-latency filtering method can be used for precise filtering only in the later deep exploration rounds, thereby reducing the frequency of calling high-latency computing resources during iterative retrieval, reducing the end-to-end response latency of a single retrieval request, and reducing the resource occupancy of computing devices executing the retrieval process.

[0023] Additional advantages, objectives, and features of this application will be set forth in part in the description which follows, and will in part become apparent to those skilled in the art upon review of the following description, or may be learned by practice of the application. The objectives and other advantages of this application can be realized and obtained by means of the structures specifically pointed out in the specification and drawings.

[0024] Those skilled in the art will understand that the purposes and advantages that can be achieved with this application are not limited to those specifically described above, and that the above and other purposes that this application can achieve will be more clearly understood from the following detailed description. Attached Figure Description

[0025] The accompanying drawings, which are included to provide a further understanding of this application and form part of this application, do not constitute a limitation thereof. The components in the drawings are not drawn to scale but are merely for illustrating the principles of this application. For ease of illustration and description of certain parts of this application, corresponding portions in the drawings may be enlarged, i.e., may appear larger relative to other components in an exemplary device actually manufactured according to this application. In the drawings: Figure 1 This is a schematic diagram of the first flowchart of a knowledge graph retrieval method in one embodiment of this application.

[0026] Figure 2 This is a schematic diagram of the first step of the progressive map retrieval process in one embodiment of this application.

[0027] Figure 3 This is a schematic diagram of a second step in the progressive map retrieval process according to an embodiment of this application.

[0028] Figure 4 This is a schematic diagram of the second flowchart of a knowledge graph retrieval method in one embodiment of this application.

[0029] Figure 5 This is a schematic diagram of the structure of a knowledge graph retrieval system according to an embodiment of this application.

[0030] Figure 6 This is a flowchart illustrating the progressive knowledge graph retrieval method in an application example of this application.

[0031] Figure 7 This is a diagram illustrating the working principle of the Relation Cache in an application example of this application.

[0032] Figure 8 This is a schematic diagram illustrating the upgrade of the progressive relationship filtering strategy in an application example of this application.

[0033] Figure 9 This is a state transition diagram of the fuse and degradation mechanism in an application example of this application. Detailed Implementation

[0034] To make the objectives, technical solutions, and advantages of this application clearer, the application will be further described in detail below with reference to the embodiments and accompanying drawings. Here, the illustrative embodiments and their descriptions are used to explain this application, but are not intended to limit it.

[0035] It should also be noted that, in order to avoid obscuring this application with unnecessary details, only the structures and / or processing steps closely related to the solution according to this application are shown in the accompanying drawings, while other details that are not closely related to this application are omitted.

[0036] It should be emphasized that the term "including / comprises" as used herein refers to the presence of a feature, element, step, or component, but does not exclude the presence or addition of one or more other features, elements, steps, or components.

[0037] It should also be noted that, unless otherwise specified, the term "connection" in this article can refer not only to a direct connection, but also to an indirect connection involving an intermediary.

[0038] In the following description, embodiments of the present application will be illustrated with reference to the accompanying drawings. In the drawings, the same reference numerals represent the same or similar parts, or the same or similar steps.

[0039] First, we need to introduce the main existing knowledge graph retrieval schemes: (1) Option 1: GraphRAG method GraphRAG is a knowledge graph retrieval method proposed by Microsoft based on community detection. Its algorithm consists of two stages: offline indexing and online retrieval. In the offline stage, entities and relationships are extracted from documents using a large language model to construct a knowledge graph. Then, the Leiden community detection algorithm is used to divide the graph into multi-level communities, generating a summary report for each community. In the online retrieval stage, a multi-path recall strategy is employed—vector retrieval recalls entity nodes relevant to the query, filtering matching entities by answer type keywords, and then discovering indirectly related entities and relationships through N-hop path expansion. Finally, the summary report of the community to which the entity belongs constitutes the retrieval context. N-hop path expansion is a graph traversal method that starts from the initial entity and gradually expands outward along relation edges to N levels of neighboring nodes. The larger N is, the wider the coverage but the more noise is introduced. Its scoring algorithm is as follows: Final_Score = Sim × PageRank, which is the result of multiplying the vector similarity Sim by the node importance in PageRank. This reflects the idea of ​​Bayesian posterior probability. PageRank is a node importance scoring algorithm based on graph structures. It iteratively calculates the probability that a node is visited by a random walk in the graph structure, and the higher the score, the more important the node.

[0040] (2) Option 2: LightRAG method LightRAG is a lightweight knowledge graph retrieval method. Its core idea is to simplify the graph construction and retrieval process of GraphRAG: it uses lighter-weight large language model prompts for entity and relation extraction, omitting the community detection step; during retrieval, it directly recalls at the entity and relation levels through a combination of keyword matching and vector retrieval, without relying on community reports. While LightRAG reduces the overhead of the large language model for graph construction, its retrieval algorithm is essentially still a planar recall based on vector similarity, lacking deep utilization of the graph structure.

[0041] (3) Option 3: KG2RAG method KG2RAG is a knowledge graph-guided retrieval enhancement method. Its algorithm consists of three stages: the first stage obtains seed document fragments through a hybrid retrieval of BM25 and vectors (weight ratio 0.05:0.95); the second stage starts from the triples associated with the seed fragments and expands the graph with a fixed M-hop using breadth-first search (BFS) to discover more relevant entities and relationships; the third stage removes redundant triples by constructing a weighted graph and calculating the maximum spanning tree (MST), retaining the most relevant knowledge structures; finally, a cross-encoder reorders the data based on a dual representation of triples and text. The core innovation of this method lies in the fact that each triple traces back to the source document fragment, achieving a bidirectional association between the knowledge graph structure and the document fragments. Among them, BM25 is a classic sparse text retrieval algorithm based on term frequency and inverse document frequency, which is good at accurate keyword matching; Cross-Encoder reordering is a fine ranking method that concatenates the query and candidate documents and inputs them into the cross-encoder model to output relevance scores. It has high accuracy but high computational cost; Maximum Spanning Tree (MST) selects the spanning tree with the largest sum of weights in a weighted graph. It is used to remove redundant edges from a large number of triples and retain the most relevant knowledge structure.

[0042] (4) Option 4: ToG-2 (Think-on-Graph2) method ToG-2 is a tightly coupled iterative exploration method driven by a large language model. Its algorithm flow is as follows: In the initialization phase, named entity recognition and vector matching link queries to knowledge graph entities, and the large language model selects the best starting entity. In the iterative exploration phase, each round executes three steps: 1) Relationship discovery: batch querying all relations of the current entity; 2) Relationship pruning: the large language model scores and filters each relation based on its relevance; 3) Entity discovery: extracting new candidate entities from the filtered relations. After each round, entity pruning based on document fragment relevance selects the starting point for the next round of exploration, and the large language model determines whether the current information is sufficient. The core innovation of this method lies in the tight feedback loop where the knowledge graph structure guides document retrieval, and the document context, in turn, guides entity pruning.

[0043] (5) Option 5: PathRAG method PathRAG is a path retrieval method based on flow propagation pruning. Its algorithm consists of three stages: The first stage extracts query keywords using a large language model and then matches relevant entity nodes in the knowledge graph using vector retrieval; the second stage, the core of the algorithm, is the flow-based path pruning algorithm, which performs resource propagation for each pair of query-related nodes: assigning an initial resource value of 1.0 to the starting node, and then evenly distributing the resources to neighboring nodes according to their out-degree using breadth-first search (multiplied by a decay coefficient α). Propagation stops when the node's resource value falls below the pruning threshold θ. Then, all paths from the starting node to the target node are backtracked, and the Top-K paths are selected based on their average resource value; the third stage converts the structured paths into text descriptions and uses a "Lost-in-Middle" optimization strategy to arrange the path order.

[0044] However, KG2RAG uses graph-based MST for redundancy removal, neglecting the semantic relevance between relations and queries; GraphRAG relies on implicitly defined community boundaries, making fine-grained filtering of individual relations impossible; PathRAG uses resource propagation values ​​as the filtering criterion, essentially a structured metric based on graph topology, without semantic understanding; ToG-2 uses a large language model to score and filter relations throughout, achieving high accuracy but introducing latency in each iteration. LightRAG does not involve an iterative relation filtering approach. Currently, no existing method employs a filtering strategy that adaptively upgrades with exploration depth.

[0045] Based on this, in order to solve the problems of high computational latency and high resource consumption caused by frequent calls to large language models in existing iterative knowledge graph retrieval, this application provides a knowledge graph retrieval method, a knowledge graph retrieval system for executing the knowledge graph retrieval method, an electronic device, a computer-readable storage medium, and a computer program product. By adopting a low-latency screening strategy in the early stage of retrieval and activating the large language model only in the deep exploration stage, the number of model calls and waiting time can be significantly reduced. While ensuring retrieval accuracy, it can effectively reduce end-to-end response latency and computing device resource occupancy.

[0046] The following examples will provide a detailed description.

[0047] Based on this, embodiments of this application provide a knowledge graph retrieval method that can be implemented by a knowledge graph retrieval system. This method can be applied to a retrieval-augmented generation (RAG) system to perform iterative retrieval on a preset knowledge graph (KG) to obtain retrieval information related to the natural language question input by the user. See also Figure 1 The knowledge graph retrieval method specifically includes the following: Step 100: Determine the starting entity set based on the natural language question contained in the current query request.

[0048] Understandably, a natural language question refers to a query entered by a user in natural language form, such as "Who invested in the company founded by entrepreneur A?". The initial entity set refers to a set of one or more entities used to initiate iterative exploration within the knowledge graph. These entities can be obtained from key entities identified in the natural language question, or from entities associated with seed document fragments obtained through hybrid retrieval; this embodiment does not specifically limit this.

[0049] In this context, an entity is a node in a predefined knowledge graph. A knowledge graph is a structured knowledge base organized in a graph structure, consisting of entities (nodes) and the relationships (edges) between entities. Each edge is represented as a triple (head entity, relationship description, tail entity). A triple is the smallest unit of knowledge in a knowledge graph, such as (entrepreneur A, founded, company B).

[0050] Step 200: Using the initial entity set as the topic entity set for the first iteration round, perform at least one iteration round of progressive graph retrieval steps to obtain retrieval information for the natural language problem as the retrieval result for the natural language problem; the retrieval information includes: relationship information between different entities and document fragments related to the natural language problem.

[0051] In one or more embodiments of this application, the topic entity set refers to the set of entities that serve as the starting point for graph retrieval in the current iteration round. In the first iteration round, the topic entity set is the starting entity set determined in step 100; in subsequent iteration rounds, the topic entity set is determined by the filtering results of the previous iteration.

[0052] It should be noted that retrieval information refers to content related to natural language problems obtained from the knowledge graph and associated documents after one or more rounds of graph retrieval. Retrieval information may include relationship information between different entities extracted from the knowledge graph, such as lists of triples or descriptions of relationship paths; it may also include document chunks retrieved from the document repository, i.e., text fragments of documents segmented semantically or by a fixed length; or it may include both of the above. Among these, document chunks, which are text fragments of long documents segmented semantically or by a fixed length, are the basic unit of retrieval.

[0053] In step 200, the progressive map retrieval step specifically includes the following steps: Step 210: Obtain a set of candidate relationships associated with the set of topic entities in the current round; wherein, the set of candidate relationships contains multiple candidate relationship data, and the candidate relationship data is a triple in a preset knowledge graph used to represent the relationship between two entities.

[0054] It should be noted that the candidate relation set refers to the set of all triples retrieved from the knowledge graph, using entities in the current round's topic entity set as endpoints. Each candidate relation data is a triple.

[0055] The action of obtaining the candidate relation set may include: for each entity (also called the topic entity) in the topic entity set (or simply entity set), using the entity as the head entity and tail entity respectively, querying the knowledge graph for all triples connected to it; combining the queried triples and removing duplicates to form the candidate relation set.

[0056] Step 220: Based on the current iteration round, determine one relation filtering strategy applicable to the current iteration round from at least two preset relation filtering strategies; wherein, in each iteration round, the computational delay of the relation filtering strategy applicable to the earlier iteration round is lower than the computational delay of the relation filtering strategy applicable to the later iteration round.

[0057] It is understood that a relation filtering strategy refers to a method used to filter the relevance of a set of candidate relations. This method presupposes at least two relation filtering strategies, which differ in computational latency. For example, a first relation filtering strategy and a second relation filtering strategy can be presupposed, with the computational latency of the first strategy being lower than that of the second strategy. The computational latency refers to the time cost required to execute the relation filtering strategy. For example, the computational latency of a rule-based relation filtering strategy is approximately 0 milliseconds, that of a vector similarity-based relation filtering strategy is approximately 50 milliseconds, and that of a large language model-based relation filtering strategy is approximately 3 seconds.

[0058] During the iteration process, the appropriate strategy can be automatically selected based on the current iteration round: in earlier iteration rounds (such as round 1), a strategy with lower computational latency is used to quickly filter a large number of candidate relations; in later iteration rounds (such as round 3 and later), a strategy with higher computational latency is used to accurately filter key relations. This hierarchical filtering mechanism reduces the frequency of calls to high-latency computing resources (such as large language models), thereby reducing end-to-end retrieval latency.

[0059] Step 230: Use the relation filtering strategy applicable to the current iteration round to filter the candidate relation set of the current iteration round, obtain the filtered relation set of the current iteration round, and obtain the document fragments associated with the filtered relation set.

[0060] The filtered relation set refers to the subset of triples retained from the candidate relation set that are highly relevant to the natural language problem. The specific implementation of the filtering action depends on the relation filtering strategy adopted, such as keyword matching, vector similarity calculation, or semantic model scoring. This embodiment does not impose any specific limitations.

[0061] The action of obtaining document fragments associated with the filtered relationship set may include: extracting document fragment identifiers from the source document identifiers carried by each candidate relationship data in the filtered relationship set, or obtaining them from the document fragment identifiers associated with the head and tail entities involved in the candidate relationship data; merging and deduplicating the above document fragment identifiers, and then batch querying the document retrieval engine to obtain the corresponding document fragment text content and similarity score. A document retrieval engine is a search engine used to retrieve and return document fragment content based on document fragment identifiers.

[0062] Step 240: If the current iteration does not meet the preset iteration termination condition, then determine the set of subject entities for the next iteration based on the filtered relation set of the current iteration.

[0063] It should be noted that the preset iteration termination condition is used to determine whether to continue to the next iteration. This condition can be preset according to actual needs, such as reaching the preset maximum number of iterations, or the current information being sufficient to answer the natural language question.

[0064] If the current iteration does not meet the iteration termination condition, the head entity and tail entity are extracted from the filtered relation set. After deduplication and exclusion of expanded entities, a portion of these entities are selected as the subject entity set for the next iteration, and the process returns to step 210 to continue execution. If the iteration termination condition is met, the iteration is terminated, and the retrieved information is output as the retrieval results for the natural language question.

[0065] As can be seen from the above description, the knowledge graph retrieval method provided in this application determines the applicable strategy from at least two preset relationship filtering strategies according to the current iteration round. The computational latency of the strategy used in earlier iteration rounds is lower than that of the strategy used in later iteration rounds. This allows for the rapid filtering of candidate relationships using a low-latency filtering method in the early rounds of iterative knowledge graph retrieval, while enabling a high-latency filtering method for precise filtering only in later deep exploration rounds. This reduces the frequency of high-latency computing resource calls during iterative retrieval, lowers the end-to-end response latency of a single retrieval request, and reduces the resource occupancy rate of computing devices executing the retrieval process.

[0066] To further address the issue of how to specifically implement the hierarchical relationship filtering mechanism, this application provides a knowledge graph retrieval method that clarifies the specific types of three filtering strategies and their correspondence with iteration rounds. The types of relationship filtering strategies in the knowledge graph retrieval method specifically include the following: (1) The first relation filtering strategy (which can be simply referred to as rule filtering) is applicable to the first iteration round. The first relation filtering strategy is used to filter based on the degree of overlap between the words contained in the candidate relation data and the words in the natural language problem.

[0067] The degree of overlap can be measured by the number of words in the candidate relation data that are shared with the words in the natural language problem. The greater the overlap, the stronger the literal correlation between the candidate relation and the natural language problem.

[0068] (2) The second relation filtering strategy (which may be simply referred to as vector filtering or vector retrieval) is applicable to the second iteration round. The second relation filtering strategy is used to filter based on the similarity between the vector representation of the candidate relation data and the vector representation of the natural language problem.

[0069] Vector representation refers to the numerical sequence obtained by converting text into high-dimensional dense vectors through an embedding model. The similarity between two vectors can be calculated using cosine similarity; a higher similarity indicates a stronger semantic relevance. The second relation filtering strategy encodes text into high-dimensional dense vectors through an embedding model and measures the semantic similarity of the text by calculating the cosine similarity between the vectors.

[0070] (3) The third relation screening strategy (which can be simply referred to as: large language model screening) is applicable to the third and subsequent iteration rounds. The third relation screening strategy is used to screen the candidate relation data and the natural language problem based on the large language model.

[0071] Among them, the relevance score is the score output by the large language model after quantitatively evaluating the semantic relevance between candidate relation data and natural language questions.

[0072] The first relation selection strategy is applicable to the first iteration round (i.e., round 1); the second relation selection strategy is applicable to the second iteration round (i.e., round 2); and the third relation selection strategy is applicable to the third and subsequent iteration rounds (i.e., round 3, round 4, etc.).

[0073] Let's take the natural language processing question "Who invested in the company founded by entrepreneur A" as an example: (1) In the first iteration round (round 1), taking the entity in the initial entity set (such as "entrepreneur A") as the entity, obtain the set of candidate relations associated with it, which contains multiple triples, such as (entrepreneur A, founded, company B), (entrepreneur A, founded, company W), (entrepreneur A, born in, city D), etc. At this time, the first relation screening strategy is applied: calculate the number of overlaps between the words contained in each candidate relation data and the words in the natural language question. Taking (entrepreneur A, founded, company B) as an example, the words contained therein have a high degree of overlap with the words in the question; while (entrepreneur A, born in, city D) has a low degree of overlap. Based on the degree of overlap, the candidate relations that are literally related to the question can be retained, such as the "founded" relation, while irrelevant relations such as "born in" are filtered out, and the set of relations after the first round of screening is obtained.

[0074] (2) In the second iteration (round 2), the new entities extracted from the relation set after the previous round of screening (such as "Company B" and "Company W") are used as entities to obtain a set of candidate relations associated with them, which includes multiple triples, such as (Investor A, invests, Company B), (Investor B, invests, Company W), (Company B, located in, city D), etc. At this time, the second relation screening strategy is applied: the descriptive text of each candidate relation data (such as "Investor A invests in Company B") and the natural language question are encoded into vector representations through an embedding model, and the cosine similarity between the two vectors is calculated. The candidate relations related to investment (such as "Investor A invests in Company B") have a high semantic similarity to the question, while the candidate relations related to location (such as "Company B is located in city D") have a low semantic similarity. Based on the vector similarity, the candidate relations that are not semantically related can be further filtered to obtain the relation set after the second round of screening.

[0075] (3) In the third and subsequent iterations (round 3 and beyond), if further expansion is required, the third relation screening strategy is applied: the candidate relation data is formatted as text and submitted to the large language model, and the large language model outputs a relevance score (e.g., 0 to 10 points), and the precise screening is performed based on the score.

[0076] As can be seen from the above description, the knowledge graph retrieval method provided in this application can complete fast filtering with low computational latency in the early stage of retrieval by adopting a first relation filtering strategy based on word overlap in the first round, a second relation filtering strategy based on vector similarity in the second round, and a third relation filtering strategy based on large language model scoring in the third round and thereafter. It only activates a high-precision large language model in the deep exploration stage, thereby effectively reducing the overall computational latency and resource consumption while ensuring retrieval accuracy.

[0077] To further address the issue of achieving effective relation filtering with low computational latency in the first iteration, this application provides a knowledge graph retrieval method that clarifies a composite scoring rule based on word overlap and key entity matching. (See [link to relevant documentation]). Figure 2 If the relation filtering strategy applicable to the current iteration round is the first relation filtering strategy, then the first implementation of step 230 of the knowledge graph retrieval method specifically includes the following: Step 2311: For each candidate relation data, calculate the number of overlaps between the words contained in each candidate relation data and the words contained in the natural language question. Multiply the ratio of the number of overlaps to the total number of words in the natural language question by a preset word overlap weight to obtain the word overlap score of the candidate relation data.

[0078] And, step 2312: obtain the list of key entities corresponding to the natural language problem; the key entities stored in the list of key entities are entity names identified from the natural language problem and which have corresponding nodes in the knowledge graph.

[0079] Step 2313: For each candidate relation data, when the head entity in the candidate relation data matches a key entity in the key entity list, or when the tail entity in the candidate relation data matches a key entity in the key entity list, a preset entity matching bonus is added to the word overlap score. Also, the sub-question decomposition structure corresponding to the natural language question is obtained. The sub-question decomposition structure includes source entity, target entity, and relation hints. For each candidate relation data, when the head entity and tail entity in the candidate relation data match the source entity and target entity in the sub-question decomposition structure, a preset target matching bonus is further added to the current score, and the increased score is used as the current score.

[0080] Step 2314: Divide the current score by a preset normalization divisor to obtain the screening score of the candidate relationship data.

[0081] Step 2315: Based on the screening scores of each candidate relationship data, select a portion of the candidate relationship data from the candidate relationship set as the filtered relationship set for the current iteration round.

[0082] It should be noted that the token overlap weight is a preset coefficient used to adjust the proportion of token overlap scores in the overall score. The token overlap score is obtained by multiplying the number of overlapping tokens by this weight. The key entity list is a list of entity names identified from natural language questions that play a crucial role in answering the question. Each entity in the list is called a key entity. Key entity identification can be accomplished through named entity recognition or semantic analysis, etc., and this embodiment does not impose specific limitations. Entity matching bonus is a preset extra score. When the head or tail entity in the candidate relationship data matches a key entity name in the key entity list, this score is added to the original score to reflect the higher importance of the relationship involving the key entity. The normalization divisor is a preset divisor used to normalize the original score to a reasonable range. By dividing the accumulated score by this divisor, the screening score falls within a numerical range that is easy to compare. The screening score is a quantitative score calculated according to the above rules and used to sort and screen candidate relationship data. The current score is an intermediate score in the calculation process, referring to the score after adding entity matching points on top of word overlap scores.

[0083] Taking the natural language problem "Who invested in the company founded by entrepreneur A?" as an example, assuming the problem contains 6 valid words after word segmentation, the preset word overlap weight is 5.0, the entity matching bonus is 2.0, the target matching bonus is 3.0, and the normalization divisor is 10.0. In the first iteration, entities in the initial entity set (such as "entrepreneur A") are used as entities in the topic entity set, and a set of candidate relations associated with them is obtained, which contains multiple triples, such as: (1) Candidate relationship data 1: (Entrepreneur A, founded, Company B); (2) Candidate Relationship Data 2: (Entrepreneur A, born in, city D).

[0084] First, calculate the word overlap score for each candidate relation data. In candidate relation data 1, the word "found" overlaps with "found" in the natural language problem, with an overlap count of 1. Therefore, the word overlap score is (1 / 6) × 5.0 ≈ 0.833. In candidate relation data 2, the word "born" does not appear in the natural language problem, with an overlap count of 0. Therefore, the word overlap score is 0.

[0085] Secondly, obtain the list of key entities corresponding to the natural language problem. Suppose that the key entity identified from the problem is "Entrepreneur A", that is, "Entrepreneur A" is included in the list of key entities, and the target entity in the sub-problem decomposition structure is "Company B".

[0086] Next, each candidate relation is evaluated to determine if it matches a key entity. The head entity of candidate relation 1 is "Entrepreneur A," which matches a key entity in the key entity list. Therefore, an entity matching bonus of 2.0 is added to the word overlap score of 0.833. The tail entity "Company B" matches the target question entity, adding a target matching bonus of 3.0, bringing the current score to 5.833. The head entity of candidate relation 2 is also "Entrepreneur A." After adding the entity matching bonus of 2.0, the current score is 2.0.

[0087] Next, the current score is divided by the normalized divisor 10.0 to obtain the screening score. The screening score for candidate relationship data 1 is 5.833 / 10.0≈0.583, and the screening score for candidate relationship data 2 is 2.0 / 10.0=0.2.

[0088] Finally, the candidate relation data is filtered based on the screening scores of each candidate relation data, and the candidate relation data with higher scores (such as candidate relation data 1) is retained as the filtered relation set for the current iteration round.

[0089] As can be seen from the above description, the knowledge graph retrieval method provided in this application, through a composite scoring rule of word overlap score and key entity matching score, can effectively filter candidate relations in the first iteration without relying on large language model calls, achieve fast filtering with zero computational latency, and further reduce the overall retrieval computational latency and resource consumption.

[0090] To further address the issue of achieving semantic-level relation filtering with low computational latency in the second iteration, this application provides a knowledge graph retrieval method that explicitly includes a filtering approach based on vector representation similarity. (See [link to relevant documentation]). Figure 2 If the relation filtering strategy applicable to the current iteration round is the second relation filtering strategy, then the second implementation of step 230 of the knowledge graph retrieval method specifically includes the following: Step 232: Encode the description text and natural language question of each candidate relation data into vector representations through an embedding model. Calculate the similarity between the vector representation of each candidate relation data and the vector representation of the natural language question. Based on the similarity, filter each candidate relation data and retain the candidate relation data with higher similarity as the filtered relation set.

[0091] Similarity refers to the degree of similarity between two vector representations, which can be calculated using cosine similarity. The cosine similarity value typically ranges from -1 to 1; the closer the value is to 1, the more consistent the directions of the two vectors are, and the more similar their semantics.

[0092] Taking the natural language problem "Who invested in the company founded by entrepreneur A?" as an example. After the first iteration's first relation filtering strategy, the first round of filtered relations is obtained, which includes candidate relation data related to entrepreneur A's founding behavior, such as (entrepreneur A, founded, company B). In the second iteration, the new entities extracted from the first round of filtered relations (such as "company B") are added to the subject entity set, and a set of candidate relations associated with them is obtained, which includes multiple triples, such as: Taking the natural language problem "Who invested in the company founded by entrepreneur A?" as an example. After the first iteration's first relation filtering strategy, the first round of filtered relations is obtained, which includes candidate relation data related to entrepreneur A's founding behavior, such as (entrepreneur A, founded, company B). In the second iteration, the new entities extracted from the first round of filtered relations (such as "company B") are added to the subject entity set, and a set of candidate relations associated with them is obtained, which includes multiple triples, such as: (1) Candidate relationship data 3: (Investor C, Investment, Company B); (2) Candidate relationship data 4: (Company B, located in city D).

[0093] First, the descriptive text and natural language question of each candidate relation data are encoded into vector representations using an embedding model. For example, the text "Investor C invests in Company B" of candidate relation data 3 is encoded into vector V3, the text "Company B is located in city D" of candidate relation data 4 is encoded into vector V4, and the natural language question "Who invested in the company founded by entrepreneur A" is encoded into vector Vq.

[0094] Next, the similarity between the vector representations of each candidate relation data and the vector representation of the natural language question is calculated. Assuming the calculated cosine similarity between vector V3 and vector Vq is 0.85, and the cosine similarity between vector V4 and vector Vq is 0.12, it can be seen that candidate relation data 3, related to investment relationships, has a higher semantic similarity to the question, while candidate relation data 4, related to location relationships, has a lower semantic similarity.

[0095] Then, the calculated similarity scores were used as the screening scores for each candidate relationship data. The screening score for candidate relationship data 3 was 0.85, and the screening score for candidate relationship data 4 was 0.12.

[0096] Finally, the candidate relation data is filtered based on the screening scores of each candidate relation data. Candidate relation data with higher scores (such as candidate relation data 3) are retained, while candidate relation data with lower scores (such as candidate relation data 4) are removed. The retained candidate relation data is used as the filtered relation set for the current iteration round.

[0097] As can be seen from the above description, the knowledge graph retrieval method provided in this application encodes candidate relation data and natural language questions into vectors and calculates similarity. It can achieve semantic-based relevance judgment in the second iteration with lower computational latency. Compared with the first round of literal overlap screening, it has higher semantic understanding ability, and compared with large language model screening, it has lower computational latency, thus achieving a balance between accuracy and efficiency.

[0098] To further address the challenges of achieving high-precision semantic filtering in the third and subsequent iterations, and ensuring the availability of the retrieval process during large language model call failures, this application provides a knowledge graph retrieval method, see [link to relevant documentation]. Figure 2 If the relation filtering strategy applicable to the current iteration is the third relation filtering strategy, then the third implementation of step 230 of the knowledge graph retrieval method specifically includes the following: Step 2331: Check whether the consecutive failure count counter of the large language model call has reached the preset circuit breaker threshold; if yes, proceed to step 2332; if no, proceed to step 2333.

[0099] Step 2332: Abandon calling the large language model and instead use the second relation filtering strategy to filter the candidate relation set for the current iteration round.

[0100] Step 2333: Format all candidate relation data in the candidate relation set into text, and then submit the formatted text and the natural language question to the large language model to obtain the relevance score of each candidate relation data in the candidate relation set to the natural language question output by the large language model; if the acquisition is successful, reset the consecutive failure count counter to zero; if the acquisition fails, increment the consecutive failure count counter by one and discard the scores of all candidate relation data in this round.

[0101] Step 2334: Based on the relevance score of each candidate relationship data, select a portion of the candidate relationship data from the candidate relationship set as the filtered relationship set for the current iteration round.

[0102] The `consecutive_failure_counter` is a counter variable used to record the number of consecutive failures in large language model calls. The counter increments by one each time a call fails and resets to zero upon success. The `circuit_breaker_threshold` is a preset upper limit on the number of failures. When the consecutive failure counter reaches this threshold, the circuit breaker mechanism is triggered, temporarily abandoning calls to the large language model. This can be implemented using a circuit breaker. A circuit breaker is a fault-tolerant design pattern that automatically cuts off calls when a service experiences consecutive failures reaching a threshold, preventing continuous invalid calls from slowing down the system, and automatically resumes attempts after a period of time. Circuit breaker degradation is a fault-tolerant handling method that, when the circuit breaker mechanism is triggered, abandons calls to the large language model and instead uses a second-relationship filtering strategy for filtering.

[0103] Taking the natural language problem "Who invested in the company founded by entrepreneur A?" as an example. Assume that after the first two iterations of filtering and expansion, in the third iteration, the new entities extracted from the relation set after the second round of filtering (such as "investor C") are used as entities in the subject entity set. A candidate relation set associated with this entity is obtained, containing multiple triples, such as: (1) Candidate relationship data 5: (Investor C, Investment, Company E); (2) Candidate relationship data 6: (Investor C, employed by, institution F).

[0104] First, format each candidate relationship data as text. For example, format candidate relationship data 5 as "Investor C invests in company E", and format candidate relationship data 6 as "Investor C works for institution F".

[0105] Secondly, check whether the counter for consecutive failures of the large language model call has reached the preset circuit breaker threshold (e.g., the preset threshold is 2).

[0106] If the number of consecutive failures reaches the circuit breaker threshold, the large language model is abandoned, and a second relation filtering strategy is adopted to filter the candidate relation set in the current iteration. The specific implementation method of the second relation filtering strategy can refer to the aforementioned embodiment, that is, after encoding the candidate relation data and the natural language question into vector representations through an embedding model, the similarity is calculated, and the filtering is performed based on the similarity.

[0107] If the number of consecutive failures does not reach the circuit breaker threshold, the formatted text and the natural language question are submitted together to the large language model to obtain the relevance scores of each candidate relation data and the natural language question output by the large language model. For example, after submission, the large language model outputs a relevance score of 9 for candidate relation data 5 and a relevance score of 2 for candidate relation data 6.

[0108] After submitting the large language model, update the consecutive failure count counter based on the call result: if a score is successfully obtained, the consecutive failure count counter is reset to zero; if the call fails (e.g., timeout or service unavailable), the consecutive failure count counter is incremented by one, and the scoring of the current candidate relation data is abandoned.

[0109] Finally, the candidate relation data is filtered based on its relevance score. Candidate relation data 5 scores 9 points, and candidate relation data 6 scores 2 points. Candidate relation data 5, with the higher score, is retained as the filtered relation set for the current iteration.

[0110] As can be seen from the above description, the knowledge graph retrieval method provided in this application can achieve semantic-level precise screening of candidate relations by enabling a large language model to perform accurate relevance scoring in deep iteration rounds; at the same time, through the circuit breaker degradation mechanism, it automatically switches to vector similarity screening when the large language model fails to be called continuously, avoiding interruption of the retrieval process due to external service failures, and improving the robustness and availability of the retrieval method.

[0111] In existing technologies, during the iterative exploration of ToG-2, relation discovery, document fragment retrieval, and entity pruning are performed independently by querying search engines to obtain relation data. Relationship queries for the same batch of entities are repeatedly executed within the same request. KG2RAG's graph expansion and MST construction also query relation information separately. While PathRAG's subgraph loading retrieves a local subgraph at once, it is limited to a fixed range (2-3 hops) and does not support incremental expansion and cache reuse during iteration. Existing methods lack request-level relation data caching and sharing mechanisms. In other words, existing technologies suffer from the problem of relation queries being repeatedly executed across multiple steps, lacking cross-step query reuse.

[0112] Based on this, in order to further address the problems of repeated execution of relation queries across multiple steps and lack of cross-step query reuse in existing methods, a knowledge graph retrieval method provided in this application embodiment clarifies a reuse mechanism based on request-level relation caching, see [link to relevant documentation]. Figure 2 Step 210 of the knowledge graph retrieval method further includes the following: Step 211: Create a request-level relationship cache; wherein the request-level relationship cache is used to store the mapping relationship between the queried entities and candidate relationship data within the lifecycle of a single query request.

[0113] Step 212: For each entity in the topic entity set of the current round, check whether the candidate relationship data corresponding to the entity has been stored in the request-level relationship cache; if it has been stored, read the candidate relationship data corresponding to the entity from the request-level relationship cache; if it has not been stored, record the entity as a missing entity, and initiate a query to the search engine to obtain the candidate relationship data corresponding to each missing entity, and store the queried candidate relationship data into the request-level relationship cache.

[0114] Step 213: After processing all entities in the topic entity set of the current round, read the candidate relationship data corresponding to each entity in the topic entity set of the current round from the request-level relationship cache, and summarize them to form a candidate relationship set associated with the topic entity set of the current round.

[0115] The request-level relation cache is a temporary storage area created within the lifecycle of a single query request. It stores the mapping between retrieved entities and candidate relation data. This cache is created at the start of a request and destroyed at the end, with caches from different requests isolated from each other. A missed entity is an entity whose corresponding candidate relation data was not found in the request-level relation cache for the current round. These entities need to initiate a query with the search engine to obtain their associated candidate relation data. The mapping relationship is the correspondence between an entity name and the list of candidate relation data corresponding to that entity.

[0116] Take the natural language problem "Who invested in the company founded by entrepreneur A?" as an example. In the current iteration round, the subject entity set includes the following entities: entrepreneur A and company B.

[0117] First, a request-level relationship cache is created. This cache stores the mapping between retrieved entities and candidate relationship data throughout the lifecycle of this query request.

[0118] Secondly, for each entity in the subject entity set, check if the request-level relationship cache already stores candidate relationship data for that entity. Assuming the current cache is empty, then neither Entrepreneur A nor Company B will be matched.

[0119] Record entrepreneur A and company B as missing entities, and send a query to the search engine to obtain candidate relation data for each missing entity. Assume the search engine returns the following query results: the candidate relation data for entrepreneur A includes (entrepreneur A, founded, company B) and (entrepreneur A, born in, city D); the candidate relation data for company B includes (investor C, invested in, company B) and (company B, located in, city D).

[0120] The retrieved candidate relationship data is stored in the request-level relationship cache. At this point, the cache stores the mapping relationship between entrepreneur A and their candidate relationship data, as well as the mapping relationship between company B and its candidate relationship data.

[0121] After processing all entities in the current round's topic entity set, candidate relationship data corresponding to each entity in the current round's topic entity set is read from the request-level relationship cache and aggregated to form a candidate relationship set associated with the current round's topic entity set. This candidate relationship set includes: (Entrepreneur A, founded, Company B), (Entrepreneur A, born in, city D), (Investor C, invested in, Company B), and (Company B, located in, city D).

[0122] In subsequent iterations, if the subject entity set again contains entrepreneur A or company B, the candidate relationship data corresponding to that entity is read directly from the request-level relationship cache without having to query the search engine again.

[0123] As can be seen from the above description, the knowledge graph retrieval method provided in this application creates a request-level relationship cache within the lifecycle of a single query request, stores and reuses the mapping relationship between the queried entities and candidate relationship data, and can avoid repeatedly querying the relationship data of the same entity to the search engine in the same request, thereby reducing the number of search engine calls and waiting time, and further reducing the overall retrieval computation latency and resource consumption.

[0124] To further address the issue of high search engine call counts and response latency caused by initiating separate queries for multiple missing entities in existing methods, this application provides a knowledge graph retrieval method that explicitly utilizes a dual-round batch query optimization mechanism. Step 212 of the knowledge graph retrieval method, which involves initiating a query to the search engine to obtain candidate relation data corresponding to each missing entity, specifically includes the following: (1) Using all the missing entities in the current round of topic entity set as head entity conditions, query the knowledge graph triples in batches to the search engine to obtain the first query result.

[0125] (2) Traverse the first query result and mark the entities that appear as tail entities in the first query result and belong to each of the missing entities as covered entities.

[0126] (3) Take the remaining entities in each of the missing entities other than the covered entities as tail entity conditions, and query the knowledge graph triples in batches to the search engine to obtain the second query result.

[0127] (4) Combine the first query result and the second query result to form the candidate relationship data.

[0128] The query is structured as follows: Head entity condition: This condition uses an entity as the head entity in a triplet. For example, querying all triples where the head entity is "Entrepreneur A". Tail entity condition: This condition uses an entity as the tail entity in a triplet. For example, querying all triples where the tail entity is "Company B". Covered entities are entities that appeared as tail entities in the first round of query results but were not found. Since the relational data for these entities as tail entities has already been indirectly obtained in the first round of query, there is no need to repeat the query using tail entity conditions in the second round. Remaining entities are the entities that were not found but are not covered entities. The relational data for these entities as tail entities was not covered in the first round of query and needs to be obtained in the second round of query.

[0129] Taking the natural language problem "Who invested in the company founded by entrepreneur A" as an example, assume that the missing entities in the current round's subject entity set are: entrepreneur A, company B, and investor C.

[0130] First, using all the missing entities (Entrepreneur A, Company B, Investor C) in the current round's topic entity set as head entity conditions, perform a batch query on the search engine for knowledge graph triples to obtain the first query result. Assume the first query result contains the following triples: (Entrepreneur A, founded, Company B), (Entrepreneur A, born in, city D), and (Company B, located in, city D).

[0131] Secondly, the first query results are traversed, and entities that appear as tail entities and belong to the missing entities are marked as covered entities. In the first query results, tail entities include "Company B" and "City D". "Company B" is a missing entity, while "City D" is not. Therefore, "Company B" is marked as a covered entity.

[0132] Then, the remaining entities from each of the missing entities (excluding those already covered) are used as tail entity conditions to perform batch queries on the knowledge graph triples to obtain the second query results. Currently, the missing entities are Entrepreneur A, Company B, and Investor C. Company B is already covered, therefore the remaining entities are Entrepreneur A and Investor C. Using Entrepreneur A and Investor C as tail entity conditions, a batch query is performed. Assume the second query results contain the following triples: (Investor E, Investment, Entrepreneur A).

[0133] Finally, the results of the first and second queries are merged to form the candidate relation data. The merged candidate relation data includes: (Entrepreneur A, founded, Company B), (Entrepreneur A, born in, city D), (Company B, located in, city D), and (Investor E, invested in, Entrepreneur A).

[0134] The above merged results are the candidate relationship data corresponding to each unmatched entity obtained after querying the search engine, which can be stored in the request-level relationship cache for later use.

[0135] As can be seen from the above description, the knowledge graph retrieval method provided in this application embodiment can reduce the number of relation queries for multiple entities within the same request from multiple queries one by one to two batch queries by batch querying in the first round with the missing entities as the head entity condition and marking the covered entities, and in the second round only querying the remaining entities with the tail entity condition. This significantly reduces the number of search engine calls and network round-trip time, and further reduces the overall retrieval computation latency and resource consumption.

[0136] In existing technologies, ToG-2 submits all accumulated retrieved document fragments to a large language model for sufficiency assessment after each iteration. As the number of iterations increases, the accumulated document content grows linearly, requiring the large language model to repeatedly process previously analyzed content, resulting in a waste of input tokens. Furthermore, GraphRAG, KG2RAG, and PathRAG do not perform evidence evaluation during the iteration process; they only return results once all retrievals are complete, failing to determine sufficiency and terminate prematurely. In other words, existing technologies suffer from a crude evidence evaluation method and a lack of incremental analysis during iterative exploration.

[0137] To further address the problems of crude evidence evaluation methods and lack of incremental analysis leading to wasted input tags in large language models during iterative exploration in existing methods, as well as the problem of redundant iterations due to the lack of early termination judgment mechanisms, a knowledge graph retrieval method is provided in this application embodiment, see [link to relevant documentation]. Figure 3 The progressive graph retrieval steps in the knowledge graph retrieval method further include the following content between steps 230 and 240: Step 250: After obtaining the document fragments associated with the filtered relationship set, format the document fragments obtained in the current iteration and the filtered relationship set into text as new content for the current iteration. Step 260: If there is an evidence summary output from the previous iteration, submit the new content of the current iteration and the evidence summary output from the previous iteration to the large language model for incremental analysis, so that the large language model outputs the list of answered aspects, the list of unanswered aspects, and the evidence summary of the current iteration. Step 270: If there is no evidence summary output from the previous iteration, submit the new content of the current iteration to the large language model for incremental analysis, so that the large language model outputs the list of answered aspects, the list of unanswered aspects, and the evidence summary of the current iteration. And, step 280: before the incremental analysis is performed in the large language model, based on the filtered relation set of the current iteration round and the document fragments added in the current iteration round, the set of topic entities involved in the prediction of the next iteration round is determined as the prediction result data. During the incremental analysis performed in the large language model, the following is performed asynchronously and in parallel: a relation query for the prediction result data is initiated in advance to the current request-level relation cache. The preset iteration termination conditions include: the current iteration round reaches the preset maximum iteration round or the target condition has been met; the target condition includes: the unanswered aspect list of the current iteration round is empty and the answered aspect list of the current iteration round is not empty.

[0138] Then proceed to step 240.

[0139] The newly added content consists of document fragments acquired in the current iteration that did not appear in previous iterations, and text formatted from filtered relation sets. This content is one of the inputs for incremental sufficiency assessment. The evidence summary is a concise summary of the currently accumulated evidence output by the large language model after incremental analysis. The evidence summary is used for incremental analysis in the next iteration, enabling the large language model to perform continuous reasoning based on historical analysis results. Incremental analysis is a sufficiency analysis performed by the large language model based solely on the newly added content of the current iteration and the evidence summary output from the previous iteration, rather than a repetitive analysis of all accumulated content. The answered aspect list is a list of sub-questions or information dimensions in a natural language question that have been explicitly answered by the currently acquired information, output by the large language model. The unanswered aspect list is a list of sub-questions or information dimensions in a natural language question that have not yet been answered by the currently acquired information, output by the large language model. Prediction result data is data obtained by predicting the topic entities that may be involved in the next iteration based on the filtered set of relationships in the current iteration round. The target condition is a criterion used to determine whether the information obtained is sufficient; specifically, the list of unanswered aspects in the current iteration round is empty, while the list of answered aspects is not empty. Asynchronous parallel execution means that two tasks are executed concurrently within the same time period, without blocking each other's execution progress.

[0140] Take the natural language problem "Who invested in the company founded by entrepreneur A?" as an example. Suppose we are currently in the second iteration round, and we have completed the acquisition and filtering of document fragments associated with the relation set. The evidence summary output in the previous iteration round (the first round) was: "Entrepreneur A founded company B, but no investor information has been found yet."

[0141] First, format the document fragments and the filtered relation set obtained in the current iteration into text, and use this as the new content for the current iteration. Assume that the filtered relation set in this round contains (Investor C, Investment, Company B), and the obtained document fragment contains the description "Investor C invests 50 million yuan in Company B". The new content after formatting into text is: "Relationship: Investor C invests in Company B; Document fragment: Investor C invests 50 million yuan in Company B".

[0142] Secondly, since there is an evidence summary output from the previous iteration, the new content of the current iteration and the evidence summary output from the previous iteration are submitted to the large language model for incremental analysis.

[0143] After analyzing the submitted content, the large language model outputs the current iteration: (1) List of answered aspects: including "Who is the investor?" (It has been learned from the new content that the investor is investor C). (2) List of unanswered aspects: empty (the question only required finding investors, which has now been found); (3) Evidence summary: "Entrepreneur A founded Company B, and investor C invested RMB50 million in Company B."

[0144] Meanwhile, during the execution of the above incremental analysis in the large language model, the relation prefetching step is executed asynchronously in parallel: based on the filtered relation set of the current iteration round (such as investor C investing in company B), the topic entities that may be involved in the next iteration round (such as investor C) are predicted, and relation queries for the predicted result data are initiated in advance to the current request-level relation cache so that the candidate relation set can be read directly from the cache when it is obtained in the next iteration round.

[0145] After the incremental analysis is completed, it is determined whether the current iteration round meets the preset iteration termination condition. In this round, the list of unanswered aspects is empty and the list of answered aspects is not empty, which meets the target condition. Therefore, it is determined that the preset iteration termination condition is met, the iteration is terminated, and the step of determining the set of topic entities for the next round is no longer executed.

[0146] As described above, the knowledge graph retrieval method provided in this application, through an incremental evidence extraction mechanism, allows the large language model to process only the newly added content of the current iteration in each round and analyze it in conjunction with the evidence summary of the previous round, thus avoiding repeated processing of historically analyzed content and reducing the amount of input tags for the large language model; through an asynchronous parallel prefetching mechanism, it can pre-query the relational data that may be used in the next round while waiting for the analysis results of the large language model, reducing end-to-end latency; and through a sufficiency judgment mechanism, it can terminate the iteration in advance when the information is sufficient, avoiding unnecessary deep exploration and further reducing computational latency and resource consumption.

[0147] To further address the challenge of accurately selecting the starting point for the next round of exploration from a large pool of candidate entities to control the exploration scope and reduce noise introduction, this application provides a knowledge graph retrieval method that explicitly employs an entity pruning mechanism based on exponential decay weighting. (See [link to relevant documentation]). Figure 2 Step 240 of the knowledge graph retrieval method further includes the following: Step 241: If the current iteration does not meet the preset iteration termination condition, extract the head entity and tail entity from the filtered relation set of the current iteration, exclude the expanded entities, and obtain the candidate entity set; wherein, the expanded entities include entities that have been queried in the previous iteration and earlier.

[0148] Step 242: For each candidate entity in the candidate entity set, determine the document fragments that contain the candidate entity in the acquired document fragments. Based on the similarity score, ranking position, and preset exponential decay factor of each document fragment containing the candidate entity, calculate the contribution score of each document fragment to the candidate entity. Summate the contribution scores to obtain the importance score of the candidate entity.

[0149] Step 243: Select the top 100 candidate entities with the highest importance scores as the set of topic entities for the next iteration.

[0150] Expanded entities are entities that have been used as topic entities in relation queries in previous iterations or earlier. Excluding these entities avoids duplicate exploration of the same entity. The candidate entity set is the set of entities obtained by extracting head and tail entities from the filtered relation set of the current iteration and excluding expanded entities. Entities in this set are candidate sources for topic entities in the next round. Rank position is the order of a document fragment within the acquired document fragment set, sorted in descending order of similarity score or relevance. The higher the rank (i.e., the smaller the rank value), the higher the relevance of the document fragment to the natural language question. The exponential decay factor is a preset decay coefficient used to calculate the contribution score. The contribution score decays exponentially with the rank position of the document fragment; the later the rank, the smaller the contribution to the entity importance score. The contribution score is the contribution of a single document fragment containing a candidate entity to the candidate entity's importance score, calculated based on the document fragment's rank position and the exponential decay factor. The importance score is a comprehensive evaluation of a candidate entity, calculated by summing the contribution scores of all document fragments containing that candidate entity. A higher importance score indicates greater relevance and importance of the candidate entity within the current retrieval context.

[0151] Take the natural language problem "Who invested in the company founded by entrepreneur A?" as an example. Assume we are currently in the second iteration round, and we have obtained the set of relationships after this round of screening, which includes the following candidate relationship data: (Investor C, Investment, Company B) and (Company B, Location, City D).

[0152] The acquired document fragment set contains the following document fragments (sorted in descending order of similarity score): Document fragment P1 ranked 1st: content involves "Investor C invests 50 million yuan in Company B"; Document fragment P2 ranked 2nd: content involves "Company B performs well in the Asian market"; Document fragment P3 ranked 3rd: content involves "City D is a transportation hub".

[0153] First, extract the head and tail entities from the filtered relation set of the current iteration, including: "Investor C", "Company B", and "City D". Assume that "Entrepreneur A" and "Company B" have already been queried in previous iterations and are considered expanded entities; therefore, exclude them. The resulting candidate entity set after exclusion includes: "Investor C" and "City D".

[0154] Secondly, for each candidate entity in the candidate entity set, determine the document fragments that contain that candidate entity from the acquired document fragments.

[0155] For the candidate entity "Investor C": the document fragment containing this entity is P1; For the candidate entity “City D”: the document fragment containing this entity is P3.

[0156] Then, based on the ranking position of each document fragment containing the candidate entity and the preset exponential decay factor, the contribution score of each document fragment to the candidate entity is calculated. Assuming the preset exponential decay factor is 0.5, the ranking position of the document fragment starts from 0 (i.e., the ranking position of the document fragment ranked 1st is 0), the contribution score is calculated as document similarity score × exp(-decay factor × ranking position).

[0157] The similarity score of document fragment P1 is 0.85, its ranking position is 0, and its contribution score to the candidate entity "Investor C" is: 0.85×exp(-0.5×0)≈0.85; The similarity score of document fragment P3 is 0.55, its ranking position is 2, and its contribution score to the candidate entity "City D" is: exp(-0.5×2)≈0.202.

[0158] The importance score for each candidate entity is obtained by summing up the scores of each contribution. The importance score for candidate entity "Investor C" is 0.85, and the importance score for candidate entity "City D" is 0.202.

[0159] Finally, select the top 100 candidate entities with the highest importance scores (e.g., a preset number of entities) as the subject entity set for the next iteration. In this example, "Investor C" is selected as the subject entity set for the next iteration.

[0160] As can be seen from the above description, the knowledge graph retrieval method provided in this application calculates the importance score of candidate entities based on the document fragment ranking position and the exponential decay factor, and selects the candidate entity with the highest score as the topic entity in the next round. This can prioritize retaining entities that are highly relevant to the current retrieval context for in-depth exploration, suppress the expansion of entities that are weakly related to the question, thereby controlling the blind growth of the exploration scope, reducing the introduction of irrelevant data, and further improving retrieval efficiency and result accuracy.

[0161] In existing technologies, GraphRAG and KG2RAG employ a fixed-hop graph expansion strategy (N hops or M hops), with the hop count determined by preset parameters. Regardless of whether the query is a simple fact lookup or a complex problem requiring multi-step reasoning, they perform the same depth of graph structure traversal. LightRAG, on the other hand, does not perform any graph exploration at all, only planar recall. PathRAG's flow propagation path length is limited by the subgraph loading range (2-3 hops) and does not dynamically adjust according to query complexity. Only ToG-2 dynamically controls the exploration depth by judging sufficiency using a large language model, but each round of sufficiency judgment requires a large language model call, which is costly. Existing methods lack a lightweight query complexity-aware mechanism to automatically route different exploration depths. In other words, existing technologies still suffer from fixed or missing graph exploration depths, failing to adapt to queries of varying complexity.

[0162] Based on this, in order to address the problems of fixed or missing graph exploration depth and inability to adapt to queries of varying complexity in existing methods, this application provides a knowledge graph retrieval method that explicitly establishes a routing decision mechanism based on query complexity classification. (See [link to relevant documentation]). Figure 4 The knowledge graph retrieval method further includes the following content before step 100: Step 010: Through a single large language model call, multiple analysis tasks are executed simultaneously on the natural language question contained in the current query request. These multiple analysis tasks include: query type classification, keyword extraction, sub-question decomposition, and key entity identification. Query type classification is used to obtain query complexity types, including simple query types, multi-hop query types, and open reasoning query types. Keyword extraction is used to obtain a list of search keywords. Sub-question decomposition is used to obtain a sub-question decomposition structure, which includes source entities, target entities, and relational hints. Key entity identification is used to obtain a list of key entities. The key entities stored in the key entity list are entity names identified from the natural language question and having corresponding nodes in the knowledge graph.

[0163] Step 020: Determine the maximum number of iterations corresponding to the query complexity type of the natural language problem; wherein, when the query complexity type is the simple query type, the maximum number of iterations is 1; when the query complexity type is the multi-hop query type or the open reasoning query type, the maximum number of iterations is greater than 1.

[0164] Query type classification is the process of semantically analyzing natural language questions to determine their complexity type. Complexity type reflects the depth of knowledge graph exploration required to answer the question. Simple query type refers to questions that can be answered with a single fact lookup, such as asking for a single attribute value of an entity. Multi-hop query type refers to questions that require chain-like reasoning between two or more entities to answer. Open reasoning query type refers to questions that require comparison, induction, or cross-source reasoning from multiple information sources to answer. The search keyword list is a list of keywords extracted from the natural language question for subsequent mixed seed retrieval. The sub-question decomposition structure is a structured representation formed by breaking down a natural language question into multiple sub-questions. Each sub-question contains a source entity (reasoning starting point), a target entity (reasoning goal), and relation hints (expected relation type). The source entity is the entity in the sub-question that serves as the starting point for reasoning. The target entity is the entity in the sub-question that serves as the goal for reasoning. Relationship hints are keywords or descriptions in the sub-question that indicate the expected relation type. The maximum number of iterations is the maximum number of iterations allowed in the incremental map retrieval step. When the current iteration reaches this value, the iteration will terminate even if no other termination conditions are met.

[0165] Take the natural language problem "Who invested in the company founded by entrepreneur A" as an example. Before determining the initial set of entities based on this natural language problem, multiple analysis tasks are performed simultaneously on this natural language problem through a single large language model call.

[0166] First, the query type is categorized. After analyzing the problem using a large language model, it is determined that the problem requires finding the companies founded by entrepreneur A, and then finding the investors who invested in these companies. This involves chained reasoning involving more than two entities, therefore the query complexity type is a multi-hop query.

[0167] Secondly, keyword extraction is performed. The large language model extracts keywords from the question to obtain a list of search keywords, such as "investment," "entrepreneur A," "found," and "company."

[0168] Then, sub-problem decomposition is performed. The large language model breaks down the problem into a sub-problem decomposition structure, which may include: First sub-problem (Source entity: Entrepreneur A, Target entity: Pending, Relationship hint: Start); Second sub-problem (Source entity: Pending, Target entity: Pending, Relationship hint: Invest).

[0169] Finally, key entity identification is performed. The large language model identifies entity names that correspond to nodes in the knowledge graph from natural language problems, obtaining a list of key entities, such as "Entrepreneur A".

[0170] After completing the above analysis, determine the maximum number of iterations based on the query complexity type. Since the query complexity type of this problem is a multi-hop query, the maximum number of iterations is greater than 1, for example, it can be set to 3.

[0171] Suppose another natural language question is "In which city was entrepreneur A born?". After analysis using a large language model, it is determined that this question only requires a single fact lookup, and the query complexity type is simple query, with a maximum iteration round of 1.

[0172] As can be seen from the above description, the knowledge graph retrieval method provided in this application can simultaneously complete multiple analysis tasks such as query type classification, keyword extraction, sub-problem decomposition, and key entity identification through a single large language model call. It also determines the corresponding maximum number of iterations based on the query complexity type, enabling simple queries to quickly return results in fewer iterations, while complex queries gain sufficient opportunities for in-depth exploration. This ensures retrieval accuracy while avoiding redundant calculations for simple queries, further reducing average computational latency and resource consumption.

[0173] To further address the issue of how to pre-determine the reliability of seed retrieval results before entering iterative graph exploration, this application provides a knowledge graph retrieval method that explicitly includes a fast-track mechanism for confidence checking based on score intervals. (See [link to relevant documentation]). Figure 4 Step 100 of the knowledge graph retrieval method specifically includes the following: Step 110: Perform keyword retrieval and vector semantic retrieval based on the natural language question contained in the current query request to obtain a set of seed document fragments and the similarity score corresponding to each seed document fragment in the set.

[0174] Step 120: Sort each of the seed document fragments in descending order of their similarity scores, and calculate the score difference between the seed document fragment ranked first and the seed document fragment ranked Nth, where N is a preset integer greater than 1.

[0175] Step 130: If the score difference is less than a preset confidence threshold, or the similarity score of the seed document fragment ranked first is less than a preset minimum score threshold, then extract the associated entities from the knowledge graph triple associated with the seed document fragment set, merge the associated entities with the entities in the key entity list to remove duplicates, and form the initial entity set.

[0176] Then proceed to step 200.

[0177] Correspondingly, the method also specifically includes the following: Step 140: If the score difference is equal to or greater than a preset confidence threshold, and the similarity score of the seed document fragment ranked first is equal to or greater than a preset minimum score threshold, then the set of seed document fragments is directly used as the retrieval result for the natural language question. Step 200 is not executed.

[0178] Keyword retrieval is a sparse retrieval algorithm based on term frequency and inverse document frequency (BM25). It retrieves data based on the degree of matching between query keywords and words in documents, excelling at precise keyword matching. Vector semantic retrieval encodes query text and document fragments into high-dimensional dense vectors using embedding models, and measures semantic similarity by calculating the similarity between vectors. The seed document fragment set consists of initial document fragments obtained through keyword retrieval and vector semantic retrieval. The similarity score is a rating of the relevance of a document fragment to a natural language question. For vector semantic retrieval, the similarity score can be the cosine similarity between the query vector and the document vector. The score difference is the difference between the similarity score of the first seed document fragment and the similarity score of the Nth seed document fragment. N is a preset integer greater than 1, for example, 5. The confidence threshold is a preset lower limit for the score difference. When the score difference reaches or exceeds this threshold, it indicates that the first document fragment has a high degree of distinguishability from other document fragments. The minimum score threshold is a preset lower limit for the similarity score of the first document fragment. When the first-rank score reaches or exceeds this threshold, it indicates that the first-rank document fragment itself has sufficient relevance. Associated entities are the head and tail entities extracted from the knowledge graph triples associated with the seed document fragment set.

[0179] Let's take the natural language query "Which city was entrepreneur A born in?" as an example. Assume this query is classified as a simple query type. First, keyword retrieval and vector semantic retrieval are performed based on the natural language query to obtain a set of seed document fragments and similarity scores for each type of sub-document fragment. Assume the following seed document fragments are obtained (sorted in descending order of similarity score): (1) Seed document fragment 1: The content is "Entrepreneur A was born in city D", with a similarity score of 0.92; (2) Seed document fragment 2: The content is "Entrepreneur A once studied in city E", with a similarity score of 0.68; (3) Seed document fragment 3: The content is "Entrepreneur A founded Company B", with a similarity score of 0.55; (4) Seed document fragment 4: The content is "Entrepreneur A won a certain award", with a similarity score of 0.48; (5) Seed document fragment 5: The content is "Entrepreneur A attends an event", with a similarity score of 0.42.

[0180] Next, the seed document fragments are sorted in descending order of similarity score, and the score difference between the first seed document fragment and the Nth seed document fragment is calculated. Assuming N is preset to 5, the score difference between seed document fragment 1 (score 0.92) and seed document fragment 5 (score 0.42) is calculated, resulting in 0.50. Then, it is determined whether the high confidence condition is met. Assuming the preset confidence threshold is 0.30 and the minimum score threshold is 0.85, in this example, the score difference of 0.50 is greater than the confidence threshold of 0.30, and the first-place similarity score of 0.92 is greater than the minimum score threshold of 0.85, thus satisfying the high confidence condition. Since the high confidence condition is met, the set of seed document fragments is directly output as the search result for the natural language question, and the subsequent progressive graph retrieval steps are not executed. At this point, the search results include seed document fragments 1 to 5.

[0181] Suppose another natural language question, "Who invested in the company founded by entrepreneur A?", after mixed seed retrieval, the first-order similarity score is 0.72 (less than the minimum score threshold of 0.85), or the score difference is only 0.12 (less than the confidence threshold of 0.30), then the high confidence condition is not met. In this case, related entities are extracted from the knowledge graph triples associated with the seed document fragment set. For example, from the triple (entrepreneur A, founded, company B) of seed document fragment 1, the related entities "entrepreneur A" and "company B" are extracted; from the triple (entrepreneur A, employed by, institution F) of seed document fragment 2, the related entities "entrepreneur A" and "institution F" are extracted. The extracted related entities are merged with the entities in the previously obtained key entity list (e.g., containing "entrepreneur A") to remove duplicates, forming the initial entity set, for example, {entrepreneur A, company B, institution F}. Then, the subsequent progressive graph retrieval steps are performed.

[0182] As can be seen from the above description, the knowledge graph retrieval method provided in this application calculates the score difference between the first seed document fragment and the Nth seed document fragment after mixed seed retrieval, and compares it with the confidence threshold and the lowest score threshold. This allows the retrieval results to be returned directly when the seed results are sufficiently reliable, skipping subsequent iterative graph exploration steps. This avoids unnecessary deep calculations for high-confidence queries and further reduces average retrieval latency and resource consumption.

[0183] In existing technologies, KG2RAG's scoring relies entirely on cross-encoder reordering, with knowledge graph structure information only reflected in the auxiliary input represented by triples and not explicitly involved in score calculation. GraphRAG directly multiplies vector similarity by PageRank, a static global structural metric that doesn't dynamically change with queries. ToG-2 doesn't perform explicit score fusion, with ranking implicitly handled by the large language model. PathRAG only sorts by path resource values, without fusing document vector similarity. Existing methods lack an adaptive scoring mechanism that dynamically adjusts fusion weights based on query type and knowledge graph coverage. Furthermore, in retrieval methods that require fusing knowledge graph structure information and vector similarity, calculating the correlation between document fragments and knowledge graph relationships typically uses linear counting or linear multiplication: counting the number of relational hits associated with fragments and then multiplying by an amplification factor. When the number of relational hits is high, the score quickly saturates to its upper limit, resulting in a loss of discriminability between different fragments. Existing methods do not employ saturation-resistant nonlinear mapping functions. In other words, existing technologies still suffer from problems such as the lack of an adaptive mechanism for the fusion of knowledge graph scoring and vector similarity, and numerical saturation in the calculation of relational correlation in knowledge graph scoring.

[0184] Based on this, in order to further address the problems of existing methods lacking an adaptive mechanism for knowledge graph scoring and vector similarity fusion, and numerical saturation in relational correlation calculation during scoring fusion, this application provides a knowledge graph retrieval method that explicitly establishes a routing decision mechanism based on query complexity classification. (See [link to relevant documentation]). Figure 4 The knowledge graph retrieval method further includes the following content after step 200: Step 310: For each document fragment in the retrieved information, calculate the knowledge graph relevance score for that document fragment; wherein, the knowledge graph relevance score is obtained by weighted summation of entity coverage score, relational relevance score, and proximity decay coefficient; the entity coverage score is the proportion of the number of extended entities contained in the document fragment to the preset total number of extended entities; the relational relevance score is calculated by logarithmically scaling the number of hits of the candidate relation data associated with the document fragment; the proximity decay coefficient is determined according to the iteration round of the source of the document fragment, wherein the earlier the iteration round of the source, the larger the proximity decay coefficient.

[0185] Step 320: Determine the query type factor based on the query complexity type corresponding to the natural language question, and calculate the coverage factor using a smoothing function based on the entity coverage score. Multiply the preset basic knowledge graph weight, the query type factor, and the coverage factor to obtain the dynamic knowledge graph weight.

[0186] Step 330: For each document fragment in the retrieved information, multiply the similarity score of the document fragment by the target difference to obtain a first product; and multiply the knowledge graph relevance score by the dynamic knowledge graph weight to obtain a second product; and use the sum of the first product and the second product as the final score of the document fragment; wherein, the target difference is 1 minus the difference of the dynamic knowledge graph weight.

[0187] Step 340: Sort each document fragment in descending order according to the final score to obtain a sorted list of document fragments, and output the sorted list of document fragments as the target retrieval result for the natural language question.

[0188] The knowledge graph relevance score is calculated based on the degree of association between a document fragment and the knowledge graph structure, measuring the importance of the document fragment in the knowledge graph retrieval context. The entity coverage score is the proportion of extended entities contained in a document fragment to the preset total number of extended entities. Extended entities refer to all relevant entities discovered during iterative exploration; the total number of extended entities is the total number of these entities. Higher entity coverage indicates a more comprehensive knowledge graph entity coverage for the document fragment. Extended entities are knowledge graph entities related to natural language questions extracted from the topic entity set and the filtered relation set in each iteration of the progressive graph retrieval process. The relation relevance score is calculated using log-scaling based on the number of hits in the candidate relation data associated with the document fragment. Log-scaling avoids score saturation due to excessive hits, maintaining the distinguishability between different document fragments. Log-scaling is a mathematical method that uses a logarithmic function to non-linearly compress numerical values; for example, using log-scaling makes the growth of large value ranges tend to be gradual. Log-scaling uses the log(1+x) function to non-linearly compress values, making the growth of large value ranges more gradual and avoiding fractional saturation problems caused by linear multipliers.

[0189] The proximity decay coefficient is determined based on the iteration round of the document fragment's source. The earlier the iteration round, the larger the proximity decay coefficient, indicating that the document fragment obtained in an earlier iteration round is closer to the graph of the query starting point and should be assigned a higher weight. The query type factor is a weight adjustment factor determined based on the query complexity type corresponding to the natural language question. Different types of queries have different degrees of dependence on knowledge graph structure information; for example, the factor for multi-hop queries is greater than 1, while the factor for simple queries is less than 1. The coverage factor is a weight adjustment factor calculated using a smoothing function (Sigmoid function) based on the entity coverage score. Using a smoothing function can avoid the weight abrupt changes that piecewise functions produce near the threshold. The Sigmoid function is an S-shaped smoothing function that maps any real number to the (0, 1) interval, used to achieve continuous threshold judgment and smooth transition. The basic knowledge graph weight is the preset basic proportion coefficient of the knowledge graph relevance score in the final score. The dynamic knowledge graph weight is the actual knowledge graph weight adjusted by the query type factor and coverage factor, which can adaptively adjust according to different query types and retrieval coverage. The target difference is 1 minus the difference in the dynamic knowledge graph weights, used to calculate the proportion of vector similarity score in the final score. The first product is the product of the document fragment similarity score and the target difference, representing the contribution of vector similarity to the final score. The second product is the product of the knowledge graph relevance score and the dynamic knowledge graph weights, representing the contribution of knowledge graph structure information to the final score. The final score is the sum of the first and second products, representing the comprehensive score of the document fragment. The target retrieval result is a list of document fragments sorted in descending order of the final score, which serves as the final output retrieval result.

[0190] Taking the natural language processing question "Who invested in the company founded by entrepreneur A?" as an example, assuming that after processing in stages 0 and 2, retrieval information for this natural language processing question has been obtained, including multiple document fragments. The following explanation uses two of these document fragments as examples: Document fragment P1: The content is "Investor C invests 50 million yuan in Company B", the iteration round of the source is the 2nd round, and the similarity score is 0.85; Document fragment P2: The content is "Entrepreneur A delivers a speech at an industry summit", the iteration round of the source is round 1, and the similarity score is 0.60.

[0191] First, for each document fragment, calculate its knowledge graph relevance score.

[0192] Calculate the entity coverage score. Assume the total number of extended entities is 4, including Entrepreneur A, Company B, Investor C, and City D. Document fragment P1 contains entities "Investor C" and "Company B," resulting in 2 matched extended entities and an entity coverage score of 2 / 4 = 0.50. Document fragment P2 contains entity "Entrepreneur A," resulting in 1 matched extended entity and an entity coverage score of 1 / 4 = 0.25.

[0193] Calculate the relational score. Assume document fragment P1 has candidate relation data including (Investor C, Investment, Company B), with a hit count of 1; document fragment P2 has candidate relation data including (Entrepreneur A, Founder, Company B), with a hit count of 1. The logarithmic scaling function log(1 + hit count × relation multiplier) / log(1 + maximum number of relations × relation multiplier) is used for calculation. Assuming the relation multiplier is 5.0 and the maximum number of relations is 5, the relational score for both is log(1 + 1 × 5) / log(1 + 5 × 5) = log6 / log26 ≈ 0.55.

[0194] Determine the proximity decay coefficient. Document fragment P1 originates from the second iteration, and its proximity decay coefficient is preset to 0.4; document fragment P2 originates from the first iteration, and its proximity decay coefficient is preset to 0.7.

[0195] The scores from the three dimensions above are then summed using preset weights. Assuming entity coverage has a weight of 0.4, relational relevance has a weight of 0.4, and proximity decay coefficient has a weight of 0.2, then: The knowledge graph relevance score for document fragment P1 = 0.4 × 0.50 + 0.4 × 0.55 + 0.2 × 0.4 = 0.20 + 0.22 + 0.08 = 0.50; The knowledge graph relevance score for document fragment P2 is 0.4×0.25+0.4×0.55+0.2×0.7=0.10+0.22+0.14=0.46.

[0196] Secondly, the query type factor is determined based on the query complexity type corresponding to the natural language question. This question involves a multi-hop query, and the corresponding query type factor is preset to 1.5. The coverage factor is calculated using a smoothing function based on entity coverage. Assuming the current average entity coverage score is 0.375, the coverage factor after smoothing is 1.1. Multiplying the preset basic knowledge graph weight (e.g., 0.4), the query type factor 1.5, and the coverage factor 1.1 yields the dynamic knowledge graph weight: 0.4 × 1.5 × 1.1 = 0.66. The dynamic knowledge graph weight is then limited to a preset upper limit of 0.8, resulting in a final dynamic knowledge graph weight of 0.66.

[0197] Then, for each document fragment, calculate the final score. Target difference = 1 - 0.66 = 0.34.

[0198] The final score for document fragment P1 = 0.34 × 0.85 + 0.66 × 0.50 = 0.289 + 0.33 = 0.619; The final score for document fragment P2 is 0.34 × 0.60 + 0.66 × 0.46 = 0.204 + 0.3036 = 0.5076.

[0199] Finally, the document fragments are sorted in descending order of their final scores, with document fragment P1 (0.619) preceding document fragment P2 (0.5076). The sorted list of document fragments is then output as the target search results.

[0200] As can be seen from the above description, the knowledge graph retrieval method provided in this application embodiment can dynamically adjust the contribution ratio of vector similarity and graph structure information according to different query types and retrieval coverage by adaptively fusing three-dimensional knowledge graph relevance scores and dynamic knowledge graph weights. It also uses logarithmic scaling to avoid score saturation of relational relevance, thereby obtaining more accurate document fragment ranking in various query scenarios and improving the relevance and accuracy of retrieval results.

[0201] It should be noted that the above-mentioned scoring parameters can be managed through a hierarchical configuration structure, and preset strategies can be provided for users to choose from. For example, balanced mode, precise mode, and recall mode can be preset to adapt to different search scenarios. Users can also override specific parameter values ​​through dotted paths. The above configuration method does not constitute a limitation on the scope of protection of this application, but is only an example.

[0202] In summary, the method provided in this application's embodiments designs a query complexity-aware automatic routing mechanism, completing query classification and decomposition through a single large language model call. It automatically selects different graph exploration depths and filtering strategies based on query type, providing fast returns for simple queries and in-depth exploration for complex queries. A progressive relation filtering strategy is proposed, automatically upgrading the filtering method (rules, vectors, large language model) according to iteration depth, adaptively balancing accuracy and efficiency at different stages of graph exploration. An incremental evidence extraction mechanism is introduced, where the large language model processes only newly added content in each round and performs sufficiency judgments based on rules, coordinating with asynchronous parallel execution of evidence extraction and data prefetching for the next round. An adaptive mechanism is designed. The score fusion algorithm dynamically calculates knowledge graph weights based on query type and knowledge graph coverage. It also introduces logarithmically scaled relational degree calculation and a sigmoid-smooth coverage factor to solve the problems of numerical saturation and weight mutation. It constructs a request-level relation cache to share relation data and fragment identifiers across multiple steps in a single retrieval request, eliminating duplicate queries. It extracts all scoring parameters into a configurable hierarchical structure, providing preset strategies and fine-grained parameter coverage mechanisms to support flexible tuning for different retrieval scenarios. It introduces a large language model circuit breaker and graceful degradation mechanism to automatically degrade to a non-large language model alternative method when the large language model fails, ensuring retrieval availability.

[0203] From a software perspective, this application also provides a knowledge graph retrieval system for performing all or part of the knowledge graph retrieval method described above, see [link to relevant documentation]. Figure 5 The knowledge graph retrieval system specifically includes the following: Preprocessing module 10 is used to determine the starting entity set based on the natural language questions contained in the current query request; The progressive graph retrieval module 20 is used to take the initial entity set as the topic entity set of the first iteration round, and perform at least one progressive graph retrieval step in the first iteration round to obtain retrieval information for the natural language question as the retrieval result for the natural language question; the retrieval information includes: relationship information between different entities and / or document fragments.

[0204] The embodiments of the knowledge graph retrieval system provided in this application can be used to execute the processing flow of the embodiments of the knowledge graph retrieval method described above. Its functions will not be repeated here, but can be referred to the detailed description of the embodiments of the knowledge graph retrieval method described above.

[0205] The knowledge graph retrieval part of the knowledge graph retrieval system can be completed on either a server or a client device. The choice can be made based on the processing power of the client device and the limitations of the user's usage scenario. This application does not impose any limitations in this regard. If all operations are completed on the client device, the client device may further include a processor for the specific processing of the knowledge graph retrieval.

[0206] The aforementioned client device may have a communication module (i.e., a communication unit) that can communicate with a remote server to achieve data transmission with the server. The server may include a server on the task scheduling center side; in other implementation scenarios, it may also include a server on an intermediate platform, such as a server on a third-party server platform that has a communication link with the task scheduling center server. The server may include a single computer device, a server cluster consisting of multiple servers, or a distributed server structure.

[0207] The server and the client device can communicate using any suitable network protocol, including those not yet developed as of the date of this application. Such network protocols may include, for example, TCP / IP, UDP / IP, HTTP, HTTPS, etc. Furthermore, such network protocols may also include RPC (Remote Procedure Call Protocol) and REST (Representational State Transfer Protocol) protocols used on top of the aforementioned protocols.

[0208] As described above, the knowledge graph retrieval system provided in this application determines the initial entity set through a preprocessing module and dynamically selects an applicable strategy from at least two preset relationship filtering strategies based on the current iteration round through a progressive graph retrieval module. The computational latency of the strategy used in earlier iteration rounds is lower than that of the strategy used in later iteration rounds. This allows for rapid filtering of candidate relationships using a low-latency filtering method in the early rounds of iterative knowledge graph retrieval, while a high-latency filtering method is used only in later deep exploration rounds for precise filtering. The modular design of this system enables functional units such as query analysis, hybrid seed retrieval, relationship caching, progressive graph retrieval, and adaptive score fusion to work collaboratively. This reduces the frequency of high-latency computing resource calls during iterative retrieval, lowers the end-to-end response latency of a single retrieval request, and reduces the resource occupancy of computing devices executing the retrieval process. Furthermore, the clear division of responsibilities among the modules facilitates system maintenance, upgrades, and expansion, enabling it to adapt to knowledge graph retrieval scenarios of varying scales and complexities.

[0209] To further illustrate the above embodiments, this application also provides a specific application example of the above knowledge graph retrieval method and system, see [link to relevant documentation]. Figure 6 This provides a complete query-aware, progressive knowledge graph retrieval solution, encompassing fields such as artificial intelligence, natural language processing, and information retrieval. First, let's explain the basic principles of this application example: (1) Query-aware routing: Before the retrieval begins, query classification, keyword extraction, sub-question decomposition, and key entity identification are completed simultaneously through a single large language model call. Different retrieval depths and strategies are automatically selected based on the query complexity type (e.g., simple query type, multi-hop query type, or open reasoning query type) (see Figure 6 Phase 0 of the graph (to avoid performing unnecessary deep graph exploration for simple queries).

[0210] (2) Request-level relation caching: Design a relation caching structure that spans the entire lifecycle of a retrieval request (see...). Figure 7 All methods that need to query knowledge graph relationships (such as relationship retrieval, fragment retrieval, and entity pruning) share the same cache instance. This method only initiates search engine queries for entities that are not cached, thereby eliminating duplicate queries.

[0211] (3) Progressive relation screening: The relation screening methods are arranged from low to high accuracy and computational cost (e.g., rule-based relation screening strategy, vector similarity-based relation screening strategy to large language model-based relation screening strategy), and the screening method used is automatically upgraded at different stages of the iteration process (see Figure 8 In early iterations, a zero-latency rule-based approach is used for rapid filtering, while in later iterations, a high-precision large language model approach is used for accurate filtering.

[0212] (4) Configurable scoring system: Extract all scoring constants scattered in the algorithm into a hierarchical data class configuration structure, provide three preset strategies: "precise", "balanced" and "recall", and support fine-grained parameter coverage through dot path, so that users can flexibly optimize the search behavior according to different search scenarios.

[0213] (5) Score fusion of smoothing and log scaling: The smoothing function is used instead of the piecewise ladder function to calculate the coverage factor to eliminate weight abrupt changes; the log scaling function is used instead of the linear multiplier to calculate the relational degree to avoid score saturation; the reordered scores of the cross-encoder are subjected to min-max normalization and the exact matching results are protected from being downgraded after reordering.

[0214] (6) Circuit breaker fault tolerance mechanism: Monitor the number of consecutive failures of the large language model call, and automatically trigger the circuit breaker when the number of failures reaches a preset threshold. Subsequent steps will automatically degrade to alternative methods that are not part of the large language model (see [link]). Figure 9 This ensures that the retrieval process will not be interrupted due to a failure of the large language model. Figure 9 In the process, a timeout threshold is set for each large language model call (default 30 seconds). Timeout events are equivalent to call failures and are both included in the failure counter.

[0215] In this application example, the preprocessing module in the knowledge graph retrieval system may include a query analysis module and a hybrid seed retrieval module; it may also include a relation caching module, an adaptive score fusion module, a configurable scoring rule module, and an observability module that are communicatively connected to the progressive graph retrieval module.

[0216] The query analysis module is responsible for receiving natural language questions input by the user, outputting query type classification, search keyword list, sub-question decomposition structure, and key entity list through a single large language model call, and determining routing strategies based on the analysis results. Its execution details can be found in [link to relevant documentation]. Figure 6 Phase 0. The hybrid seed retrieval module is responsible for obtaining initial seed document fragments through both BM25 keyword retrieval and vector semantic retrieval, and determining whether to proceed to the graph structure expansion phase through confidence checks. Its execution details can be found in [link to relevant documentation]. Figure 6 Phase 1. The progressive graph retrieval module is the core module of this application example, responsible for iterative relationship discovery and progressive relationship filtering on the knowledge graph (see [link to implementation details]). Figure 8 ), entity discovery, document fragment acquisition, incremental evidence extraction, and sufficiency assessment (see also: Figure 6 Phase 2). The relation caching module is responsible for caching all queried entity-relationship mappings and entity-fragment identifier mappings within a single request, for use by various steps of the progressive graph retrieval module. Its execution details can be found in [link to relevant documentation]. Figure 7 The adaptive score fusion module is responsible for calculating the knowledge graph relevance score for each document fragment, dynamically adjusting the knowledge graph weights, and completing score fusion and final ranking. Its execution details can be found in [link to relevant documentation]. Figure 6 Phase 3. Configurable Scoring Rule Module: Responsible for managing all scoring-related configuration parameters, providing preset strategy selection and parameter coverage capabilities. Observability Module: Responsible for recording performance metrics such as the time consumed in each phase, the number of large language model calls, the number of search engine queries, iteration details, etc., as well as diagnostic data such as circuit breaker status and degradation information.

[0217] Based on this, the following section provides a detailed explanation of the knowledge graph retrieval method provided in the application examples of this application: (I) Phase 0: Query Analysis (Single Large Language Model Call) The query analysis module performs the following four analysis tasks simultaneously through a single large language model call, using prompt word templates that include classification rules and few sample examples: (1) Query type classification: The questions are divided into three categories: simple query type (corresponding to single fact query), multi-hop query type (corresponding to query involving two or more entities or chain relationships) and open reasoning query type (corresponding to query that requires synthesis, comparison or cross-source reasoning).

[0218] (2) Keyword extraction: Extract a predetermined number (e.g., 3 to 8) of important search keywords.

[0219] (3) Sub-problem decomposition: For multi-hop query type and open reasoning query type problems, the original problem is decomposed into a chain of sub-problems, each of which contains source entity, target entity and relation hint.

[0220] (4) Key entity identification: Identify the names of the most important entities in the problem.

[0221] The routing decision is determined based on a combination of retrieval mode and query complexity, as follows: (1) When the retrieval mode is fast mode, the maximum number of iterations is 0 regardless of the query complexity, and the progressive map retrieval step is skipped directly.

[0222] (2) When the retrieval mode is adaptive mode, and the current mode is the default mode set by the method, then: if the complexity type is simple query type, the maximum number of iterations is 1, and only the first relation filtering strategy is enabled; if the complexity type is multi-hop query type, the maximum number of iterations is the smaller of 2 and the maximum number of hops (max_hops), the first relation filtering strategy is enabled in the first round, and the second relation filtering strategy is enabled in the second round; if the complexity type is open reasoning query type, the maximum number of iterations is the maximum number of hops (max_hops), the first relation filtering strategy is enabled in the first round, the second relation filtering strategy is enabled in the second round, and the third relation filtering strategy is enabled from the third to the maximum number of iterations.

[0223] (3) When the retrieval mode is the thorough mode: if the complexity type is simple query type, the maximum iteration round is 1, and only the first relation filtering strategy is enabled; if the complexity type is multi-hop query type, the maximum iteration round is the maximum number of hops (max_hops), the first relation filtering strategy is enabled in the first round, the second relation filtering strategy is enabled in the second round, and the third relation filtering strategy is enabled from the third round to the maximum number of hops (max_hops); if the complexity type is open reasoning query type, the maximum iteration round is the sum of the maximum number of hops (max_hops) and the maximum depth (max_depth), the first relation filtering strategy is enabled in the first round, the second relation filtering strategy is enabled in the second round, and the third relation filtering strategy is enabled from the third round to the maximum iteration round.

[0224] This design merges the two separate calls to the large language model—keyword extraction and query classification—into a single call, thereby reducing the overhead of calling the large language model.

[0225] Understandably, the query type classification in the query analysis module can be replaced with a rule-based classification method: classification is performed based on syntactic features such as the number of entities in the statistical question, whether it contains comparison words ("compare", "contrast"), and whether it contains chained relation words ("of..."). This eliminates the need for a large language model. This alternative has lower accuracy but zero latency, making it suitable for scenarios where a large language model is unavailable or where latency is extremely sensitive.

[0226] (II) Phase 1: Mixed Seed Search The hybrid seed retrieval module uses BM25 keyword retrieval and vector semantic retrieval to obtain initial seed document fragments. During the retrieval process, keywords generated in the query analysis phase are used to enhance the original query question.

[0227] The confidence check uses the score gap to determine whether the seed result is sufficiently reliable. (1) Sort the various sub-document fragments in descending order according to their corresponding similarity scores.

[0228] (2) Calculate the score difference between the seed document fragment ranked 1st and the seed document fragment ranked Nth (N is a configurable integer, such as the default value of 5).

[0229] (3) When the score difference is greater than or equal to the preset confidence threshold, and the similarity score of the seed document fragment ranked first is greater than or equal to the preset minimum score threshold, the current seed result is determined to be of high confidence, and the set of seed document fragments is directly returned as the retrieval result, skipping the subsequent graph expansion stage.

[0230] Among them, see Figure 7 Relationship caching is one of the key innovations of this application's embodiments. Its data structure includes: (1) Entity (entity_name) to list of relation dicts.

[0231] (2) Mapping of entities to a set of chunk_ids.

[0232] (3) The set of duplicates of the known relation pair (head, tail).

[0233] Among them, see Figure 7 Batch query optimization is used to perform two rounds of search engine queries on a batch of uncached entity names: (1) First round of query: Query with entity name as the head entity (from_entity) condition, and record which entities to be queried have appeared in the query results as tail entities (to_entity) (these entities are marked as "overwritten entities").

[0234] (2) Second round of query: Only the entities not covered in the first round of query are queried using the tail entity (to_entity) condition, thereby reducing the amount of data that needs to be queried.

[0235] Reuse mechanism: The relation caching module is shared by three methods: relation retrieval method (get_relations), fragment identifier retrieval method (get_chunk_ids_for_entities), and entity pruning method (context_entity_prune).

[0236] Understandably, relational caching can be replaced by cross-request persistent caching based on an LRU (Least Recently Used) eviction policy: maintaining a limited-capacity global cache in memory, sharing cached data across different requests, and setting a TTL (Time To Live) to ensure data freshness. This alternative can further reduce search engine queries, but it needs to address cache consistency issues.

[0237] (III) Phase 2: Progressive Graph Retrieval (Core Innovation) The progressive map retrieval module performs the following 8 steps in an iterative loop: (1) Step 1: Relationship Acquisition: The relationship caching module obtains the relationships of entities in the current topic, automatically identifies which entities are cached and which are not, and initiates search engine queries only for the uncached entities.

[0238] (2) Step 2: Progressive relationship filtering: See Figure 8 The first iteration uses a rule-based relation filtering strategy. It calculates the word overlap between the relation description and the query question, and multiplies it by a preset word overlap weight (e.g., default value 5.0). If the head or tail entity in the relation matches a key entity, a preset entity matching score is added (e.g., default value 2.0). If the head or tail entity in the relation matches a source or target entity in the sub-question decomposition structure, a preset target matching score is added (e.g., default value 3.0). Finally, the total score is divided by a preset normalization divisor (e.g., default value 10.0) to obtain a normalized score.

[0239] Second iteration: Use a relationship filtering strategy based on vector similarity. Encode the query question and the descriptive text of all candidate relationships into vectors, and obtain a score for each relationship by batch calculating cosine similarity.

[0240] In the third round and subsequent iterations: a relation selection strategy based on a large language model is used. Candidate relations are formatted as text and submitted to the large language model for relevance scoring (scoring ranges from 0 to 10, where 8 to 10 indicates direct relevance, 5 to 7 indicates useful context, 1 to 4 indicates indirect relevance, and 0 indicates no relevance). When the circuit breaker is open, the strategy automatically downgrades to a relation selection strategy based on vector similarity.

[0241] (3) Step 3: Entity discovery: Extract the head and tail entities from the filtered relationships, exclude entities that have been expanded, and obtain a new list of candidate entities.

[0242] (4) Step 4: Document fragment acquisition: The document fragment identifiers associated with candidate entities are retrieved from the relation caching module, and then combined with the fragment identifiers carried in the filtered relations. After merging and deduplication, a fragment retrieval is initiated to the search engine. The newly retrieved fragments are marked with their source as the current iteration round.

[0243] (5) Steps 5 and 6: Evidence extraction and relation prefetching are performed in parallel: To save waiting time, execute the following two types of operations simultaneously in an asynchronous parallel manner: Step 5: Incremental Evidence Extraction: The large language model only receives document fragments and relational paths newly added in the current round, and performs incremental analysis by combining the evidence summaries output from previous iterations, outputting a list of answered aspects, a list of unanswered aspects, key facts, and preliminary answers.

[0244] Step 6: Relationship prefetching: Submit the entity names that may be used in the next iteration to the relation caching module in advance for pre-warming, and use the analysis waiting time of the large language model to complete the search engine query.

[0245] (6) Step 7: Sufficiency check: The system determines whether the information obtained is sufficient based on preset rules: when the list of unanswered aspects is empty and the list of answered aspects is not empty, or when the large language model provides a clear answer, the system determines that the information is sufficient and terminates the iteration loop in advance.

[0246] (7) Step 8: Pruning the actual plant: The next round of topic entities is selected from candidate entities using a weighted scoring method based on exponential decay. For each candidate entity, the occurrence of its associated document fragments in the current search results is counted. The entity score is obtained by multiplying the fragment ranking by a preset exponential decay factor and then summing the results. The top W entities with the highest scores (W is a preset width parameter) are selected as the topic entities for the next round of iteration.

[0247] Understandably, the three-level strategy (rules, vectors, large language model) in the progressive relation filtering can be replaced with a two-level strategy (vectors, large language model), skipping the first round of rule filtering and directly using vector filtering. This alternative has higher accuracy in the first round (vectors outperform rules), but adds a latency of one embedding model call. The three static preset strategies can also be replaced with an adaptive preset strategy based on historical retrieval results: the system automatically adjusts the scoring parameters based on user retrieval feedback (e.g., click-through rate, answer satisfaction, etc.) without requiring manual selection of preset strategies by the user. This alternative has a higher degree of automation, but requires additional user feedback data collection and learning mechanisms.

[0248] (iv) Phase 3: Adaptive Score Fusion 1. Dynamic Knowledge Graph Weight Calculation Using the basic knowledge graph weights as initial values, the dynamic knowledge graph weights are obtained by multiplying them sequentially by the query type factor and the entity coverage factor.

[0249] Dynamic knowledge graph weights (It can also be written as) It can be calculated using the following formula: (Formula 1) in, The weights of the basic knowledge graph specified by the user; This is a query type factor determined based on the query type (e.g., 1.5 for multi-hop queries, 0.7 for simple queries, and 1.0 for open-reasoning queries; all of these values ​​are configurable parameters).

[0250] For the coverage factor, a smoothing function is used to map the coverage to factor values ​​to avoid abrupt weight changes caused by the piecewise function. The calculation process is as follows: (Formula 2) in, Here, steepness is the steepness parameter, coverage is the entity coverage, and center is the center point parameter. center = (high coverage threshold (high_threshold) + low coverage threshold (low_threshold)) / 2, steepness = 10 / (high coverage threshold (high_threshold) - low coverage threshold (low_threshold)).

[0251] Then, based on the intermediate variables Calculate coverage factor : (Formula 3) in, Low coverage factor It is a high coverage factor.

[0252] The final calculated dynamic knowledge graph weights are limited to the range of [0, upper limit of knowledge graph weights].

[0253] 2. Knowledge Graph Relevance Score Calculation The knowledge graph score is obtained by weighting and summing the entity coverage score, relationship relevance score, and proximity score.

[0254] Knowledge graph relevance score ( Perform a three-dimensional weighted calculation using the following formula: (Formula 4) in, , , These are the preset weighting coefficients for the three dimensions. Score the entity coverage. Score the degree of relationship relevance. This is the proximity attenuation coefficient.

[0255] The calculation methods for the three dimensions are as follows: (1) Entity coverage ( ): This refers to the ratio of the number of extended entities hit in a text fragment to the total number of extended entities.

[0256] (2) Relationship correlation ( Logarithmic scaling is used to avoid fraction saturation; its calculation formula is: (Formula 5) in, This refers to the number of times a fragment identifier or document identifier is matched in a relational record. For the total number of relations, The preset relation multiplier (e.g., default value 5.0).

[0257] (3) Proximity: Decrease based on the iteration round from which the fragment originates. For example, the proximity of the seed fragment is set to 1.0, the proximity of the fragment found in the first iteration (progressive_iter1) is set to 0.7, the proximity of the fragment found in the second iteration (progressive_iter2) is set to 0.4, and the proximity of the fragment found in more distant iterations is set to 0.3 (all of the above decay values ​​are configurable parameters).

[0258] Understandably, the sigmoid smoothing coverage factor can be replaced by a piecewise linear interpolation function: linear interpolation is used between the low and high coverage thresholds, and saturation is applied outside the threshold range. This alternative is simpler to calculate, but it may introduce abrupt slope changes near the threshold.

[0259] 3. Final score calculation and reordering For highly similar, precisely matched segments (e.g., similarity greater than or equal to a preset threshold, defaulting to 0.85), their similarity score is directly used as the final score, unaffected by the knowledge graph weights. For other segments, the similarity score and the knowledge graph score are weighted and summed according to the complementary relationship of the dynamic knowledge graph weights to obtain the final score. The final score is calculated using the following formula. : (Formula 6) in, This represents the vector similarity score of the document fragments.

[0260] If cross-encoder reordering is enabled, the reordered score is min-max normalized to map its value range to the [0, 1] interval. For exact match segments, the maximum of its similarity score and the reordered score is taken as the final score to protect exact match results from being downgraded by reordering.

[0261] The top K document fragments are selected as the target retrieval results in descending order of the final score, and the retrieval results, reasoning process, performance indicators and diagnostic information are output.

[0262] (v) Configurable scoring rules and preset strategies All scoring parameters are organized into a hierarchical data class configuration structure, with the top-level configuration class (Scoring RulesConfig) containing seven sub-configuration sections. See Tables 1 through 10 for detailed configuration instructions.

[0263] 1. Rule Filter Configuration The rule-based relationship filtering behavior is controlled in the first iteration, as shown in Table 1.

[0264] Table 1 Rule Filtering Configuration 2. Confidence Check Configuration The confidence level of the seed results is controlled to determine whether to skip graph expansion, as shown in Table 2.

[0265] Table 2 Confidence Check Configuration 3. Dynamic Knowledge Graph Weight Configuration (Dynamic KG Weight Config) (1) Query Factor Config, as shown in Table 3: Table 3. Weight Configuration of Dynamic Knowledge Graph Based on Query Type (2) Coverage configuration, as shown in Table 4: Table 4. Coverage Factor in Weight Configuration of Dynamic Knowledge Graph (3) Weight upper limit, as shown in Table 5: Table 5 Weight Cap Configuration 4. Knowledge Graph Score Configuration (KG Score Config) As shown in Table 6: Table 6 Knowledge Graph Scoring Configuration Notice: + The sum should be equal to 1.0 to ensure that the value range of the knowledge graph relevance score is within [0, 1].

[0266] The proximity decay configuration is shown in Table 7: Table 7 Proximity Attenuation Configuration 5. Score Fusion Configuration, as shown in Table 8: Table 8 Score Fusion Configuration 6. Topic Selection Configuration As shown in Table 9: Table 9 Theme Selection Configuration 7. Chunk Retrieval Configuration As shown in Table 10: Table 10 Fragment Retrieval Configuration 8. Preset Strategy The system provides three preset strategies for factory functions, each of which creates an independent instance, as shown in Table 11.

[0267] Table 11 Description of Preset Strategies Users can precisely override any parameter using dotted paths. For unknown paths or type mismatches, the system only logs a warning message and will not block the normal execution of the search request.

[0268] (vi) Fuse and Degradation Mechanism 1. Fuse status management (see...) Figure 9 ) (1) Maintain a counter to record the number of consecutive failed calls to the large language model. Reset the counter to zero after each successful call to the large language model.

[0269] (2) When the number of consecutive failures reaches the preset fuse threshold (e.g., the default value of 2 times), the fuse status is switched to "open".

[0270] (3) During the period when the circuit breaker is in the "open" state, all subsequent calls to the large language model will be directly rejected and no actual call will be initiated.

[0271] Understandably, a circuit breaker based on consecutive failure counts can be replaced by a circuit breaker based on a sliding window failure rate: the failure rate is calculated in the most recent N large language model calls, and when the failure rate exceeds a preset threshold, the circuit breaker is triggered, and after a preset cooldown period, it automatically attempts to enter a half-open state. This alternative has a higher tolerance for occasional failures, but its implementation complexity is correspondingly higher.

[0272] 2. Downgrade behavior (1) When the query analysis module fails to call the large language model: the default multi-hop query type (multi_hop) is used to analyze the results.

[0273] (2) When the large language model is unavailable during the relationship filtering process: automatically downgrade to a relationship filtering strategy based on vector similarity.

[0274] (3) When the large language model fails to be called in the evidence extraction step: skip the evidence extraction step and use the default insufficient information result to continue the iteration process until the upper limit of the round is reached.

[0275] Each large language model call is set with an asynchronous timeout limit (e.g., a default value of 30 seconds). An exception is thrown immediately after the call times out and the circuit breaker count is triggered to avoid long-term blocking.

[0276] (vii) Observability Two types of diagnostic data are generated after each retrieval request is executed: (1) Performance metrics (Retrieval Metrics): include the time spent in each stage (such as query analysis time, seed retrieval time, graph retrieval time, score fusion time), the number of large language model calls and cache hits, the number of search engine queries, and detailed information for each iteration (such as the number of relation inputs / outputs, the filtering method used, the number of new fragments, the time spent in each step, etc.).

[0277] (2) Diagnostic information (Retrieval Diagnostics): includes information such as whether a degradation has occurred and a list of stages in which the degradation occurred, error details, and whether the fuse has been tripped.

[0278] (viii) Interface layer design This system exposes its retrieval capabilities to the outside world through the Hypertext Transfer Protocol (HTTP) interface. Its basic request parameters include: question text, knowledge base identifier list, retrieval mode (e.g., fast, adaptive, or thorough mode), maximum number of hops, maximum depth, exploration width, number of returned results, similarity threshold, knowledge graph weights, etc.

[0279] The newly added rating configuration parameters include: Scoring_preset: Used to specify the name of the scoring preset (optional values ​​are balanced, precision, recall), with the default value being balanced.

[0280] Scoring parameter override configuration (scoring_overrides): Used to provide a dictionary of point number path overrides.

[0281] The response structure includes: a sorted list of document chunks (each chunk contains information such as content, source, similarity score, knowledge graph score, and final score), reasoning information, performance metrics, and diagnostic information.

[0282] Based on this, the knowledge graph retrieval method and system provided in the application examples of this application involve the following key technical points: 1. Query-aware hierarchical routing mechanism: Query classification and multi-task analysis are completed through a single large language model call, and different iteration depths and filtering strategies are automatically selected according to query complexity.

[0283] 2. Request-level relation caching and dual-round query optimization: The relation caching structure runs through a single retrieval request and supports sharing and reuse by multiple methods; the first round of query coverage analysis automatically reduces the scope of the second round of query.

[0284] 3. Progressive relation filtering strategy: The relation filtering method (rule, vector, large language model) is automatically upgraded according to the iteration depth to achieve the optimal balance between accuracy and efficiency.

[0285] 4. Hierarchical configurable scoring system and preset strategies: The design of a combination of hierarchical data structure for scoring parameters, preset factory functions, and point path coverage mechanism.

[0286] 5. Correlation between Sigmoid smoothing coverage factor and logarithmic scaling: Replace discrete step function and linear multiplier with continuous function to eliminate weight abrupt changes and fraction saturation.

[0287] 6. Reordering score normalization and exact match protection: The cross-encoder output is normalized, and the exact match result is protected by taking the maximum value after reordering.

[0288] 7. Large Language Model Circuit Breaker and Graceful Degradation: A circuit breaker state machine based on the number of consecutive failures, combined with a multi-level degradation strategy.

[0289] 8. Incremental evidence extraction and parallel prefetching: The large language model only processes the new content in each round, while asynchronously and in parallel performing evidence extraction and pre-warming of the next round's relation cache.

[0290] Based on this, compared with existing technologies, the knowledge graph retrieval method and system provided in this application example have the following beneficial effects: 1. Query-Aware Adaptive Exploration Depth: Existing technologies like GraphRAG and KG2RAG use fixed-hop expansion, performing the same depth of graph traversal regardless of the problem's complexity; LightRAG doesn't explore the graph at all; ToG-2 dynamically controls the depth, but its sufficiency judgment in each round requires calling a large language model, which is costly. This application's embodiment completes query classification through a single large language model call, dividing problems into three categories: simple, multi-hop, and open reasoning, and automatically routes different iteration depths and filtering strategies accordingly. Simultaneously, a confidence check mechanism based on score intervals is introduced, returning directly when the seed result is of sufficiently high quality, without needing to enter subsequent graph exploration stages.

[0291] 2. Progressive Relationship Filtering: Achieving an Adaptive Trade-off Between Accuracy and Efficiency: This application proposes a three-level progressive relationship filtering strategy: the first round uses rule-based filtering based on word overlap and entity matching (no external calls, approximately 0 milliseconds latency); the second round uses filtering based on vector embedding cosine similarity (approximately 50 milliseconds latency); and the third and subsequent rounds use semantic scoring filtering based on a large language model (approximately 3 seconds latency). This progressive upgrade strategy achieves an adaptive binding between the relationship filtering method and the iteration depth at the algorithm level.

[0292] 3. Incremental Evidence Extraction and Asynchronous Parallel Processing: In each iteration, ToG-2 submits all accumulated document fragments to the large language model, causing the amount of input tags to increase linearly with each iteration. This application's embodiment employs an incremental evidence extraction method: the large language model only receives the currently added document fragments and relation paths in each iteration, and executes the evidence extraction step and the relation cache prefetching step for the next round simultaneously through an asynchronous parallel mechanism, allowing the waiting time of the large language model to overlap with the search engine's query time.

[0293] 4. Adaptive score fusion to solve the problem of static weight and numerical defects: The embodiments of this application design an adaptive score fusion algorithm, which includes improvements including: (1) Dynamic knowledge graph weight: calculated based on query type factor and Sigmoid smooth coverage factor; (2) Logarithmically scaled relational degree: using logarithm scaling function to replace linear multiplier to avoid score saturation; (3) Three-dimensional knowledge graph scoring: the proximity dimension decreases according to the source iteration round of the document fragment.

[0294] 5. Request-level Relationship Caching and Two-Round Query Optimization: This application embodiment constructs a relationship cache that spans the entire lifecycle of a single retrieval request. The three methods—relationship retrieval, fragment identifier retrieval, and entity pruning—share the same cache instance. The two-round query optimization strategy is as follows: In the first round of head-by-head entity query, it records which entities to be queried have appeared as tail entities. In the second round, it only performs tail entity queries on entities that have not been covered, reducing the number of repeated relationship queries for the same batch of entities in the same request from a maximum of 6 times to 2 times.

[0295] 6. Flexible and configurable scoring system: This application's embodiment extracts over 25 scoring constants into a hierarchical data class configuration structure, providing three preset strategies: precise, balanced, and recall, and supporting fine-grained parameter coverage for dot path. The factory function for each preset strategy creates an independent instance, and dot path coverage only modifies specified fields without affecting other parameters, achieving an orthogonal combination of preset strategies and custom coverage.

[0296] 7. Large Language Model Circuit Breaker and Graceful Degradation: This application's embodiments introduce a circuit breaker mode: monitoring the number of consecutive failures of the large language model call, automatically disconnecting the circuit breaker when a preset threshold is reached, and automatically degrading subsequent steps, i.e., the relation filtering step is degraded to vector filtering, the evidence extraction step is skipped, and the query analysis step uses default results. When the large language model service returns to normal, the circuit breaker can automatically close.

[0297] 8. Significant performance improvement: (1) Reduce large language model calls: Query analysis is merged into a single large language model call (compared to the two calls required by ToG-2), and the entire process of a simple query only requires one large language model call.

[0298] (2) Reduce the amount of input labels: Incremental evidence extraction only processes newly added content in each round, and the amount of input labels of the large language model does not increase linearly with the number of iterations.

[0299] (3) Reduce duplicate queries: By optimizing relation caching and double-round queries, the number of duplicate queries is reduced from a maximum of 6 to 2.

[0300] (4) Reduce end-to-end latency: The asynchronous parallel execution of evidence extraction and relation prefetching steps effectively reduces the end-to-end latency of the overall retrieval.

[0301] To further verify the beneficial effects of the knowledge graph retrieval method and system provided in the application examples of this application, this application also provides the following evaluation method: Using 80 requirement and design documents from a specific business line of a company, 500 question-answer pairs were extracted using a large-scale model as an evaluation dataset. The questions from the evaluation dataset were input into a retrieval method. The large-scale model compared the document fragments returned by the retrieval method with the answers from the evaluation dataset, assigning a relevance score: 0 (completely irrelevant), 1 (marginally relevant, containing some relevant information), 2 (partially relevant, containing some key information), 3 (highly relevant, containing confidence-based answers). The evaluation metrics and calculation methods are as follows: 1. Mean Recess Rank (MRR): in i It is the position of the first document fragment with a score greater than or equal to 2 in the results returned by the retrieval method; if there is none, the MRR is 0.

[0302] 2. Normalized Discounted Cumulative Gain (NDCG): in k This indicates retrieving the first few results returned by the search method. k The bit is used to calculate the index. Si Indicates the first result in the returned result. i Relevance score of bit document fragments, actual This indicates the actual order of the document fragments in the returned results. ideal This indicates the ideal sorting, meaning that the document fragments in the returned results are sorted from highest to lowest relevance score.

[0303] 3. Accuracy: in n The number of document fragments with a relevance score greater than or equal to 2 in the results returned by the retrieval method. m This represents the total number of segments.

[0304] 4. Hit Rate: in k This indicates retrieving the first few results returned by the search method. k The bit is used to calculate the index. Si This indicates the first result returned by the retrieval method. i The relevance score of the position.

[0305] The final statistical metric used was the average of the metrics after each retrieval. Comparison methods included KG2RAG, PathRAG, and a hybrid retrieval method combining sparse retrieval and vector retrieval based on term frequency and inverse document frequency (BM25+embedding). All methods used the Tongyi Qianwen multilingual text vector model (text-embedding-v4) for word embedding, the Tongyi Qianwen re-ranking model (qwen3-rerank) for re-ranking the retrieval results, and the Tongyi Qianwen series large model (qwen3.5-plus) for large-scale model inference and knowledge graph entity relation extraction. The results are shown in Table 12. Table 12 Performance comparison of different retrieval methods on the evaluation dataset As shown in Table 12 above, the method provided in this application example outperforms or matches existing methods in four metrics: Mean Reciprocal Rank (MRR), Normalized Discount Cumulative Gain (NDCG), Precision, and Hit. This application embodiment also provides an electronic device, which may include a processor, a memory, a receiver, and a transmitter. The processor is used to execute the knowledge graph retrieval method mentioned in the above embodiments. The processor and memory can be connected via a bus or other means, taking a bus connection as an example. The receiver can be connected to the processor and memory via wired or wireless means.

[0306] The processor can be a central processing unit (CPU). The processor can also be other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or combinations of the above types of chips.

[0307] Memory, as a non-transitory computer-readable storage medium, can be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as the program instructions / modules corresponding to the knowledge graph retrieval method in the embodiments of this application. The processor executes various functional applications and data processing by running the non-transitory software programs, instructions, and modules stored in the memory, thereby implementing the knowledge graph retrieval method in the above method embodiments.

[0308] The memory may include a program storage area and a data storage area. The program storage area may store the operating system and applications required for at least one function; the data storage area may store data created by the processor, etc. Furthermore, the memory may include high-speed random access memory and non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, the memory may optionally include memory remotely located relative to the processor, which can be connected to the processor via a network. Examples of such networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.

[0309] The one or more modules are stored in the memory, and when executed by the processor, they perform the knowledge graph retrieval method in the embodiment.

[0310] In some embodiments of this application, the user equipment may include a processor, a memory, and a transceiver unit. The transceiver unit may include a receiver and a transmitter. The processor, memory, receiver, and transmitter may be connected via a bus system. The memory is used to store computer instructions, and the processor is used to execute the computer instructions stored in the memory to control the transceiver unit to send and receive signals.

[0311] As one implementation method, the functions of the receiver and transmitter in this application can be implemented by transceiver circuits or dedicated transceiver chips, and the processor can be implemented by dedicated processing chips, processing circuits or general-purpose chips.

[0312] As another implementation approach, the server provided in this application embodiment can be implemented using a general-purpose computer. That is, the program code implementing the processor, receiver, and transmitter functions is stored in memory, and the general-purpose processor implements the processor, receiver, and transmitter functions by executing the code in memory.

[0313] This application also provides a computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements the steps of the aforementioned knowledge graph retrieval method. The computer-readable storage medium can be a tangible storage medium, such as random access memory (RAM), main memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, floppy disks, hard disks, removable storage disks, CD-ROMs, or any other form of storage medium known in the art.

[0314] This application also provides a computer program product, including a computer program that, when executed by a processor, implements the steps of the aforementioned knowledge graph retrieval method.

[0315] Those skilled in the art will understand that the exemplary components, systems, and methods described in conjunction with the embodiments disclosed herein can be implemented in hardware, software, or a combination of both. Whether implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application. When implemented in hardware, it can be, for example, electronic circuits, application-specific integrated circuits (ASICs), appropriate firmware, plug-ins, function cards, etc. When implemented in software, the elements of this application are programs or code segments used to perform the required tasks. The programs or code segments can be stored on a machine-readable medium or transmitted over a transmission medium or communication link via data signals carried on a carrier wave.

[0316] It should be clarified that this application is not limited to the specific configurations and processes described above and shown in the figures. For the sake of brevity, detailed descriptions of known methods are omitted here. In the above embodiments, several specific steps are described and shown as examples. However, the method process of this application is not limited to the specific steps described and shown. Those skilled in the art can make various changes, modifications, and additions, or change the order of steps, after understanding the spirit of this application.

[0317] In this application, features described and / or illustrated for one embodiment may be used in the same or similar manner in one or more other embodiments, and / or combined with or in place of features of other embodiments.

[0318] The above description is merely a preferred embodiment of this application and is not intended to limit this application. Various modifications and variations can be made to the embodiments of this application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the protection scope of this application.

Claims

1. A knowledge graph retrieval method, characterized in that, include: Determine the starting entity set based on the natural language question contained in the current query request; Using the initial entity set as the topic entity set for the first iteration round, perform at least one iteration round of progressive graph retrieval steps to obtain retrieval information for the natural language question as the retrieval result for the natural language question; The retrieval information includes: information on the relationships between different entities and document fragments related to natural language issues; The progressive map retrieval step includes: Obtain a set of candidate relationships associated with the set of topic entities in the current round; wherein, the set of candidate relationships contains multiple candidate relationship data, and the candidate relationship data is a triple in a preset knowledge graph used to represent the relationship between two entities; Based on the current iteration round, determine one relation filtering strategy applicable to the current iteration round from at least two preset relation filtering strategies; wherein, in each iteration round, the computational delay of the relation filtering strategy applicable to the earlier iteration round is lower than the computational delay of the relation filtering strategy applicable to the later iteration round. The candidate relation set for the current iteration is filtered using the relation filtering strategy applicable to the current iteration, resulting in a filtered relation set for the current iteration, and document fragments associated with the filtered relation set are obtained. If the current iteration does not meet the preset iteration termination condition, the set of subject entities for the next iteration is determined based on the filtered relation set of the current iteration. 2.The knowledge graph retrieval method of claim 1, wherein, The types of relationship filtering strategies include: The first relation filtering strategy is applicable to the first iteration round. The first relation filtering strategy is used to filter based on the degree of overlap between the words contained in the candidate relation data and the words in the natural language problem. The second relation filtering strategy, applicable to the second iteration round, is used to filter based on the similarity between the vector representation of the candidate relation data and the vector representation of the natural language question. The third relation screening strategy is applicable to the third and subsequent iteration rounds. The third relation screening strategy is used to screen the candidate relation data and the natural language problem based on the relevance score of the large language model. 3.The knowledge graph retrieval method of claim 2, wherein, If the relation filtering strategy applicable to the current iteration round is the first relation filtering strategy, then the step of filtering the candidate relation set of the current round using the relation filtering strategy applicable to the current iteration round to obtain the filtered relation set of the current iteration round includes: For each candidate relation data, the number of overlaps between the words contained in each candidate relation data and the words contained in the natural language question is calculated. The ratio of the number of overlaps to the total number of words in the natural language question is multiplied by a preset word overlap weight to obtain the word overlap score of the candidate relation data. In addition, a list of key entities corresponding to the natural language question is obtained; the key entities stored in the list are entity names identified from the natural language question and having corresponding nodes in the knowledge graph. For each candidate relation data, when the head entity in the candidate relation data matches a key entity in the key entity list, or when the tail entity in the candidate relation data matches a key entity in the key entity list, a preset entity matching score is added to the word overlap score. Additionally, the sub-question decomposition structure corresponding to the natural language question is obtained. The sub-question decomposition structure includes source entity, target entity, and relation hints. For each candidate relation data, when the head entity and tail entity in the candidate relation data match the source entity and target entity in the sub-question decomposition structure, respectively, a preset target matching score is further added to the current score, and the added score is used as the current score. Divide the current score by a preset normalization divisor to obtain the screening score for the candidate relationship data; Based on the screening scores of each candidate relationship data, a portion of the candidate relationship data is selected from the candidate relationship set to form the filtered relationship set for the current iteration round. 4.The knowledge graph retrieval method of claim 2, wherein, If the relation filtering strategy applicable to the current iteration round is the third relation filtering strategy, then the step of filtering the candidate relation set of the current round using the relation filtering strategy applicable to the current iteration round to obtain the filtered relation set of the current iteration round includes: Check whether the counter for consecutive failures of the large language model call has reached the preset circuit breaker threshold; If so, then abandon the invocation of the large language model and instead use the second relation filtering strategy to filter the candidate relation set of the current iteration round; If not, the candidate relation data in the candidate relation set is formatted as text, and then the formatted text and the natural language question are submitted to the large language model to obtain the relevance score of each candidate relation data in the candidate relation set to the natural language question output by the large language model; if the acquisition is successful, the consecutive failure count counter is reset to zero; if the acquisition fails, the consecutive failure count counter is incremented by one, and the scores of all candidate relation data in this round are discarded. Based on the relevance scores of each candidate relation data, a portion of the candidate relation data is selected from the candidate relation set to form the filtered relation set for the current iteration round. 5.The knowledge graph retrieval method of claim 1, wherein, The step of obtaining the candidate relation set associated with the topic entity set of the current round includes: Create a request-level relationship cache; wherein, the request-level relationship cache is used to store the mapping relationship between the queried entities and candidate relationship data within the lifecycle of a single query request; For each entity in the topic entity set of the current round, check whether the candidate relationship data corresponding to the entity has been stored in the request-level relationship cache; if it has been stored, read the candidate relationship data corresponding to the entity from the request-level relationship cache; if it has not been stored, record the entity as a missing entity, and initiate a query to the search engine to obtain the candidate relationship data corresponding to each missing entity, and store the queried candidate relationship data into the request-level relationship cache. After processing all entities in the current round of topic entity set, candidate relationship data corresponding to each entity in the current round of topic entity set is read from the request-level relationship cache and summarized to form a candidate relationship set associated with the current round of topic entity set. 6.The knowledge graph retrieval method of claim 5, wherein, The step of initiating a query to the search engine to obtain candidate relation data corresponding to each unmatched entity includes: Using all the missing entities in the current round's topic entity set as head entity conditions, the search engine is used to batch query the knowledge graph triples to obtain the first query result; Traverse the first query result and mark the entities that appear as tail entities in the first query result and belong to each of the missing entities as covered entities; The remaining entities in each of the missing entities, excluding the covered entities, are used as tail entity conditions to batch query the knowledge graph triples in the search engine to obtain the second query result. The first query result and the second query result are combined to form the candidate relationship data.

7. The knowledge graph retrieval method according to claim 1, characterized in that, The progressive map retrieval step further includes: After obtaining the document fragments associated with the filtered relationship set, the document fragments obtained in the current iteration and the filtered relationship set are formatted as text and used as new content for the current iteration. If an evidence summary from the previous iteration exists, the new content of the current iteration and the evidence summary from the previous iteration are submitted to the large language model for incremental analysis, so that the large language model outputs the list of answered aspects, the list of unanswered aspects, and the evidence summary for the current iteration. If there is no evidence summary output from the previous iteration, the new content of the current iteration is submitted to the large language model for incremental analysis, so that the large language model outputs the list of answered aspects, the list of unanswered aspects, and the evidence summary of the current iteration. Furthermore, before the large language model performs the incremental analysis, based on the filtered relation set of the current iteration round and the newly added document fragments of the current iteration round, the set of topic entities involved in the prediction of the next iteration round is determined as the prediction result data. During the execution of the incremental analysis by the large language model, the following is performed asynchronously and in parallel: a relation query for the prediction result data is pre-initiated in the current request-level relation cache. The preset iteration termination conditions include: the current iteration round reaches the preset maximum iteration round or the target condition has been met; the target condition includes: the unanswered aspect list of the current iteration round is empty and the answered aspect list of the current iteration round is not empty.

8. The knowledge graph retrieval method according to claim 1, characterized in that, The process of determining the set of topic entities for the next iteration based on the filtered set of relations from the current iteration round includes: Extract the head entity and tail entity from the filtered relation set of the current iteration round, exclude the entities that have been expanded, and obtain the candidate entity set; wherein, the expanded entities include entities that have been queried in the previous iteration round and earlier. For each candidate entity in the candidate entity set, determine the document segments that contain the candidate entity in the acquired document segments. Based on the similarity score, ranking position, and preset exponential decay factor of each document segment containing the candidate entity, calculate the contribution score of each document segment to the candidate entity. Sum the contribution scores to obtain the importance score of the candidate entity. Select the top 100 candidate entities with the highest importance scores as the set of topic entities for the next iteration.

9. The knowledge graph retrieval method according to claim 1, characterized in that, Before determining the starting entity set based on the natural language question contained in the current query request, the method further includes: By invoking a large language model in a single query, multiple analysis tasks are performed simultaneously on the natural language questions contained in the current query request. These tasks include: query type classification, keyword extraction, sub-question decomposition, and key entity identification. Query type classification is used to obtain query complexity types, including simple query types, multi-hop query types, and open reasoning query types. Keyword extraction is used to obtain a list of search keywords. Sub-question decomposition is used to obtain a sub-question decomposition structure, which includes source entities, target entities, and relational hints. Key entity identification is used to obtain a list of key entities. The key entities in the list are the names of entities identified from the natural language questions that have corresponding nodes in the knowledge graph. Based on the query complexity type of the natural language problem, determine the maximum number of iteration rounds corresponding to that query complexity type; wherein, when the query complexity type is the simple query type, the maximum number of iteration rounds is 1; when the query complexity type is the multi-hop query type or the open reasoning query type, the maximum number of iteration rounds is greater than 1.

10. The knowledge graph retrieval method according to claim 9, characterized in that, The step of determining the starting entity set based on the natural language question contained in the current query request includes: Based on the natural language question contained in the current query request, keyword retrieval and vector semantic retrieval are performed to obtain a set of seed document fragments and the similarity score of each seed document fragment in the set. The seed document fragments are arranged in descending order of their similarity scores, and the score difference between the seed document fragment ranked first and the seed document fragment ranked Nth is calculated, where N is a preset integer greater than 1. If the score difference is less than a preset confidence threshold, or the similarity score of the seed document fragment ranked first is less than a preset minimum score threshold, then related entities are extracted from the knowledge graph triple associated with the seed document fragment set, and the related entities are merged with the entities in the key entity list to remove duplicates, forming an initial entity set. The method further includes: If the score difference is equal to or greater than a preset confidence threshold, and the similarity score of the seed document fragment ranked first is equal to or greater than a preset minimum score threshold, then the set of seed document fragments is directly used as the search result for the natural language question.

11. The knowledge graph retrieval method according to claim 9, characterized in that, Also includes: For each document fragment in the retrieved information, a knowledge graph relevance score is calculated for that document fragment. The knowledge graph relevance score is obtained by weighted summation of entity coverage score, relational relevance score, and proximity decay coefficient. The entity coverage score is the proportion of the number of extended entities contained in the document fragment to the preset total number of extended entities. The relational relevance score is calculated by logarithmically scaling the number of hits of candidate relation data associated with the document fragment. The proximity decay coefficient is determined based on the iteration round from which the document fragment originates, wherein the earlier the iteration round, the larger the proximity decay coefficient. The query type factor is determined based on the query complexity type corresponding to the natural language question, and the coverage factor is calculated using a smoothing function based on the entity coverage score. The preset basic knowledge graph weight, the query type factor, and the coverage factor are multiplied together to obtain the dynamic knowledge graph weight. For each document fragment in the retrieved information, the similarity score of the document fragment is multiplied by the target difference to obtain a first product; and the knowledge graph relevance score is multiplied by the dynamic knowledge graph weight to obtain a second product; the sum of the first product and the second product is used as the final score of the document fragment; wherein, the target difference is 1 minus the difference of the dynamic knowledge graph weight; The document fragments are sorted in descending order according to the final score to obtain a sorted list of document fragments. The sorted list of document fragments is then output as the target retrieval result for the natural language question.

12. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the knowledge graph retrieval method as described in any one of claims 1 to 11.

13. A computer-readable storage medium having a computer program stored thereon, characterized in that, When executed by a processor, the computer program implements the knowledge graph retrieval method as described in any one of claims 1 to 11.

14. A computer program product, comprising a computer program, characterized in that, When executed by a processor, the computer program implements the knowledge graph retrieval method as described in any one of claims 1 to 11.