An agent-based adaptive retrieval augmentation generation method and system

The retrieval enhancement generation method, which adopts agent adaptive decision-making and multi-retrievalr collaborative scheduling, solves the problems of fixed retrieval strategies and coarse feedback in the existing technology. It realizes dynamic adjustment of retrieval strategies under different questions, reduces the risk of illusion and improves the accuracy and efficiency of answers.

CN122240751APending Publication Date: 2026-06-19启元实验室

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
启元实验室
Filing Date
2026-05-19
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing search enhancement generation technologies struggle to adaptively adjust search strategies when faced with questions of varying difficulty and type, leading to over-searching, under-searching, coarse feedback, and insufficient collaboration among multiple searchers. This increases the risk of illusions and makes it difficult to balance accuracy with cost and latency.

Method used

An agent-based adaptive retrieval enhancement generation method is adopted, which dynamically adjusts the retrieval type and Top-K parameters through agent adaptive decision-making. Combined with discriminative feedback and state update, it realizes controllable re-retrieval and multi-retrieval collaborative scheduling. The Actor submodule and Critic submodule are used for policy optimization, and reinforcement learning is used for training.

🎯Benefits of technology

It effectively reduces over-searching and under-searching, lowers the risk of illusion, improves the accuracy and verifiability of answers, and achieves a better balance between accuracy, latency, and cost.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122240751A_ABST
    Figure CN122240751A_ABST
Patent Text Reader

Abstract

An agent-based adaptive retrieval enhancement generation method and system are disclosed. The method includes: constructing an initial state based on a user question; having an agent select a retrieval action from a preset retrieval strategy space according to the current state, wherein the retrieval action includes at least a retrieval type and a recall quantity; invoking the corresponding retrieval agent to perform a retrieval according to the retrieval action to obtain a candidate document set; reordering the candidate document set to obtain an ordered document subset; evaluating the matching score between the ordered document subset and the user question, and determining whether to trigger a re-retrieval or enter generation based on the comparison result of the matching score with a preset threshold and whether the current round has reached a preset round limit; if a re-retrieval is triggered, updating the state based on the retrieval result of the current round and returning to the step selected by the agent for the next round of retrieval; if generation is entered, constructing prompt words from the user question and the evidence set, and generating an answer through a large language model.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of natural language processing and information retrieval technology, and in particular to an adaptive retrieval enhancement generation method and system based on intelligent agents. Background Technology

[0002] In recent years, large language models have made significant progress in the fields of natural language understanding and generation, and have been widely applied in scenarios such as intelligent question answering, knowledge retrieval, and dialogue assistants. However, in tasks such as open-domain knowledge question answering, multi-hop reasoning, and timely knowledge updates, large language models still commonly suffer from the illusion problem, that is, the model generates seemingly reasonable but factual content when there is a lack of sufficient supporting evidence. To reduce the illusion and improve the verifiability of answers, retrieval-enhanced generation techniques have been proposed: introducing an external knowledge base before generating the answer, and using a retrieval engine to recall relevant document fragments as context for the generator to generate the answer. This technical approach alleviates the dependence of pure generation models on parameter memorization to a certain extent, and has become an important engineering solution for knowledge-intensive tasks.

[0003] From existing implementations, typical retrieval-enhanced generative systems generally include sparse retrieval units (such as BM25 based on term statistics), dense retrieval units (such as semantic matching using dual encoders), re-ranking models, and a final generative model. Sparse retrieval is fast and performs well in keyword matching; dense retrieval has higher recall quality for complex queries such as semantic generalization and multi-hop reasoning, but it also has higher computational cost and latency. To balance efficiency and effectiveness, some systems adopt fixed retrieval strategies, such as using dense retrieval uniformly for all queries and setting a fixed number of recalls; other systems use a fusion of sparse and dense recall or a cascaded structure. In addition, to address the problem of insufficient initial retrieval, existing technologies have also introduced secondary retrieval mechanisms, such as training a classifier to determine "whether further retrieval is needed," or manually deciding whether to conduct multiple rounds of retrieval based on rules such as question length and the number of reasoning clues, or using a self-reflective approach to allow the model to judge whether the evidence is sufficient.

[0004] However, the aforementioned existing technologies still reveal several shortcomings in real-world business environments, mainly in the following aspects: First, retrieval strategies are relatively fixed and difficult to adapt to the dynamic differences in difficulty and type of questions. Many systems are configured with fixed parameters such as retrieval type and recall when they go live. For simple factual questions, using dense retrieval and a large recall can lead to over-retrieval, introducing a lot of redundant context, increasing the generation burden, and raising latency and cost. For multi-hop questions or questions containing interfering information, using sparse retrieval or a small recall can easily lead to under-retrieval, causing the generator to output answers even when there is insufficient evidence, thus creating illusions. Existing technologies lack a unified decision-making mechanism that can adaptively adjust based on the query content and the quality of the retrieved evidence.

[0005] Second, the feedback and re-retrieval mechanisms are rather crude, prone to over-looping or missed triggers. Some solutions use offline-trained re-retrieval discriminators to decide whether to continue searching, but these discriminators usually rely on specific data distributions and are prone to failure when migrated to new domains or document repositories; rule-based difficulty determination also suffers from insufficient generalization. More importantly, many solutions treat "whether to re-retrieve" as a one-time decision, lacking the measurement and utilization of changes in the quality of content between the two rounds of retrieval. It is difficult to adjust the strategy in a timely manner based on new evidence during multiple rounds of retrieval, which may lead to repeated retrieval on invalid evidence or premature termination of retrieval before obtaining key bridging entities (key information).

[0006] Third, the lack of collaboration among multiple search engines makes it difficult to effectively schedule searches between sparse and dense searches. While some systems operate in parallel or integrate multiple search engines, the common practice is static fusion (e.g., concatenating results with fixed weights) or pre-specifying which type of search engine to use for a particular type of question. Such approaches struggle to achieve fine-grained selection based on the current question, the quality of the current search results, and the current round. For example, some questions may find key entities in the first round of sparse search, but subsequent rounds may require switching to dense search to supplement semantic association evidence; conversely, the first round of dense search may find semantic neighbors but lack keyword evidence, requiring subsequent sparse search to reinforce it. This kind of dynamic collaboration is often absent in existing technologies.

[0007] Fourth, existing system optimization often relies on retraining or complex multi-module, multi-agent structures, resulting in high engineering and inference costs. Some improvement schemes tend to jointly optimize the retrieval unit, reorderer, and generator, or introduce a hierarchical multi-agent framework to plan retrieval and inference paths. While these may offer improvements, they typically lead to problems such as high training resource consumption, high module coupling, and increased latency due to longer inference paths, making them unsuitable for resource-constrained applications or those requiring low-latency responses.

[0008] In summary, existing retrieval enhancement generation technologies urgently need a technical solution that can address the aforementioned problems of over-retrieval, under-retrieval, coarse feedback, and insufficient collaboration, thereby significantly reducing the risk of illusion in multi-hop question answering, interference-containing knowledge environments, and large-scale online services, and achieving a better balance between accuracy, cost, and latency. Summary of the Invention

[0009] To address the shortcomings of existing technologies, the present invention aims to provide an agent-based adaptive retrieval enhancement generation method and system that can adaptively adjust retrieval strategies, coordinate multiple retrieval machines, and achieve controllable re-retrieval.

[0010] To achieve the above objectives, the present invention provides an agent-based adaptive retrieval enhancement generation method, comprising the following steps: An initial state is constructed based on the user's question. The agent selects a retrieval action from a preset retrieval strategy space based on the current state. This retrieval action includes at least a retrieval type and a recall quantity. The agent invokes the corresponding retrieval device to perform the retrieval, obtaining a candidate document set. The candidate document set is reordered to obtain an ordered subset of documents. The matching score between the ordered subset of documents and the user's question is evaluated. Based on the comparison of the matching score with a preset threshold and whether the current round has reached a preset round limit, it is determined whether to trigger a re-retrieval or proceed to generation. If a re-retrieval is triggered, the state is updated based on the retrieval results of the current round, and the agent returns to the step selected for the next round of retrieval. If generation is initiated, the user's question and the evidence set are used to construct prompt words, and an answer is generated using a large language model.

[0011] Furthermore, the searcher type includes sparse searchers, dense searchers, or a combination of both.

[0012] Furthermore, the initial state includes at least the normalized question, round count, historical search content summary, and search budget constraint.

[0013] Furthermore, the intelligent agent includes an Actor submodule and a Critic submodule; The Actor submodule is used to output the strategy distribution based on the current state and sample the retrieval actions to determine the retrieval strategy for this round. The Critic submodule is used to evaluate the value of the current state and, during the training phase, calculates the advantage function by combining the reward obtained from the retrieval action to guide the policy parameter update of the Actor submodule.

[0014] Furthermore, the agent is trained using reinforcement learning, and the parameters of the Actor submodule and the Critic submodule are updated by constructing a multidimensional reward function and using a proximal policy optimization algorithm.

[0015] Furthermore, the multidimensional reward function includes: Strategy rewards are used to encourage the adoption of lower-cost retrieval types and smaller recall values; A reward is determined, wherein the reward is equal to the matching score; The re-search advantage reward and feedback differential reward are used to measure whether the current search strategy is an improvement over the previous search strategy.

[0016] Furthermore, the step of updating the status based on the search results of the current round further includes: extracting high-confidence sentences, key entities, bridging entities or summaries from the currently ordered document subset, forming supplementary information and adding extended questions or state memory.

[0017] To achieve the above objectives, the present invention also provides an agent-based adaptive retrieval enhancement generation system for implementing the agent-based adaptive retrieval enhancement generation method described above. The system includes: an input processing module for receiving user questions and constructing an initial state; a policy space configuration module for constructing a retrieval policy space; an agent decision-making module for selecting a retrieval action from the policy space based on the current state; a collaborative retrieval module for calling the corresponding retrieval device to perform a retrieval according to the retrieval action, thereby obtaining a candidate document set; a reordering module for reordering the candidate document set to obtain an ordered subset of documents; a discrimination and re-retrieval control module for evaluating the matching score and determining whether to trigger a re-retrieval or proceed to generation; and a prompt construction and generation module for constructing prompt words from the user question and evidence set, and generating an answer through a large language model.

[0018] To achieve the above objectives, the present invention also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor is configured to execute the computer program stored in the memory to implement the agent-based adaptive retrieval enhancement generation method as described above.

[0019] To achieve the above objectives, the present invention also provides a computer-readable storage medium storing a computer program, which is loaded and executed by a processor to implement the agent-based adaptive retrieval enhancement generation method as described above.

[0020] The adaptive retrieval enhancement generation method based on intelligent agents provided by this invention dynamically adjusts the retrieval type and Top-K parameters through intelligent agent adaptive decision-making, achieves controllable re-retrieval by combining discriminative feedback and state update, and coordinates multiple retrieval agents, effectively reducing over-retrieval and under-retrieval, and achieving a better balance between accuracy, latency and cost.

[0021] Other features and advantages of the invention will be set forth in the description which follows, and will be apparent in part from the description, or may be learned by practicing the invention. Attached Figure Description

[0022] The accompanying drawings are provided to further illustrate the invention and form part of the specification. They are used together with embodiments of the invention to explain the invention and do not constitute a limitation thereof. In the drawings: Figure 1 This is a flowchart of an agent-based adaptive retrieval enhancement generation method according to an embodiment of the present invention; Figure 2 This is a flowchart illustrating the training process according to an embodiment of this application; Figure 3This is a schematic diagram of the structure of an agent-based adaptive retrieval enhancement generation system according to an embodiment of this application; Figure 4 This is a schematic diagram of an electronic device structure according to an embodiment of the present invention. Detailed Implementation

[0023] The preferred embodiments of the present invention will be described below with reference to the accompanying drawings. It should be understood that the preferred embodiments described herein are for illustration and explanation only and are not intended to limit the present invention.

[0024] Embodiments of the present invention will now be described in more detail with reference to the accompanying drawings. While some embodiments of the invention are shown in the drawings, it should be understood that the invention can be implemented in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided to provide a more thorough and complete understanding of the invention. It should be understood that the accompanying drawings and embodiments are for illustrative purposes only and are not intended to limit the scope of protection of the invention.

[0025] The term "comprising" and its variations as used in this invention are open-ended, meaning "including but not limited to". The term "based on" means "at least partially based on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; and the term "some embodiments" means "at least some embodiments".

[0026] It should be noted that the concepts of "first" and "second" may be mentioned in this invention only to distinguish different devices, components or parts, and are not used to limit the order of the functions performed by these devices, components or parts or their interdependence.

[0027] It should be noted that the terms "one" and "multiple" used in this invention are illustrative rather than restrictive. Those skilled in the art should understand that, unless explicitly stated otherwise in the context, they should be understood as "one or more". "Multiple" should be understood as two or more.

[0028] In this invention: Top-K is a retrieval parameter that represents the number of most relevant document fragments returned to the user in a single retrieval operation, where K represents the number of returned entries.

[0029] An intelligent agent refers to a lightweight decision-making module used to select the optimal retrieval action (including retrieval type and Top-K parameters) from a preset retrieval strategy space based on the current state (including question characteristics, historical search results and round information).

[0030] Definitions for other terms will be provided in the following description.

[0031] Example 1 Figure 1 The flowchart below shows the agent-based adaptive retrieval enhancement generation method according to an embodiment of the present invention. Figure 1 The embodiments of the present invention will be described in further detail.

[0032] Step 101: Obtain the question and construct the initial state. In this embodiment of the invention, an input processing module receives the user question Q, parses and processes the question, and obtains a normalized question. Characteristics of the problem Construct the initial state. It includes at least: Round count The historical search content summary is empty, and there are default search budget constraints (such as maximum context length, maximum number of searches T, etc.).

[0033] Preferably, after receiving a user's question, the input processing module first parses the question and performs normalization processing (including word segmentation or sub-word encoding, removal of invalid symbols, language detection, spelling normalization, length pruning, etc.), and then extracts features such as question length, domain words, entities, time constraints, and expected output format, and generates an initial query; if ambiguity or lack of limiting conditions is detected, a clarifying subquery can be generated but interaction is not forced, and multi-way retrieval is preferred for coverage.

[0034] Step 102: Construct or read the retrieval strategy space. In embodiments of the present invention, a retrieval strategy space (Act) is constructed for the agent to choose from. This retrieval strategy space includes at least the following parameter dimensions: 1) retrieval type (sparse retrieval, dense retrieval, or a combination of both); 2) the number of Top-K retrievals K (e.g., K∈{2,4,6,8}); 3) the upper limit of rounds T (e.g., T=3 or 5). In addition, other optional parameters may be included, such as the BM25 parameter, vector similarity threshold, filtering rules, and whether re-ranking is enabled.

[0035] For example, define Where N is the total number of search actions, and each search action It should include at least: retrieval type selection (sparse, dense, joint), and Top-K value K; optional features include reordering switch, filtering threshold, etc. The Act should be written into the agent's accessible configuration.

[0036] It's important to note that the retrieval strategy space doesn't necessarily need to be built in real-time for each question-and-answer session. In a real-world deployment system, the strategy space configuration module can be pre-built and persistently stored in a system configuration file, database, or distributed cache. When the system receives a user question, this module first checks if a pre-built strategy space exists: if it does, it reads and loads it directly, skipping the repetitive construction process; if it doesn't, it dynamically generates the strategy space according to preset default parameters (including retrieval type, Top-K candidate set, round limit, etc.). The read operation includes parsing the storage format (such as JSON, YAML, or serialized objects), verifying the integrity of the strategy space, and converting it into an action enumeration set that the agent can execute. This read mechanism reduces the computational overhead during online inference and supports hot updates and reuse of the strategy space across multiple scenarios.

[0037] Step 103: The agent selects the retrieval strategy for this round. In embodiments of the present invention, the retrieval strategy is calculated based on problem characteristics and the current state (such as current load, budget, and index freshness). Specifically, this includes: the agent selects the current state... Input its Actor submodule to obtain the policy distribution. Output and execute actions This determines the search strategy for this round: in, Includes parameters such as searcher type and K.

[0038] Preferably, the Actor submodule includes: a state encoding and feature extraction unit, a policy network unit, and an action sampling and execution unit. The input to the state encoding and feature extraction unit is the state of the current round. (Including standardization issues) Search history from previous rounds, current round count (etc.), which relies on a base large language model (as a feature extractor) to map discrete text and numerical states into high-dimensional continuous dense feature vectors, which serve as inputs to policy network units. The policy network units contain multilayer perceptrons or linear mapping layers, which receive the aforementioned feature vectors and compute their values ​​in a predefined retrieval policy space. The probability scores of each candidate retrieval action are calculated, and a normalized probability distribution is output through the Softmax activation function. The action sampling and execution unit is used to determine the probability distribution output by the policy network unit. Action sampling is performed (probability-based sampling during the training phase to ensure exploratory nature, while the action with the highest probability is typically selected during the inference phase), and the sampled actions are... This is interpreted as a specific retrieval strategy, i.e. The retrieval strategy specifically includes two dimensions: the type of retrieval tool (e.g., Sparse / BM25 or Dense / DeBERTaV3) and the specific value of Top-K (e.g., K=4).

[0039] To improve the accuracy of the agent's search, the Actor submodule also includes a Proximity Policy Optimization (PPO) unit, which uses the Proximity Policy Optimization (PPO) algorithm to update the Actor's network parameters. Its calculation logic is as follows: receiving the advantage function from the Critic submodule. Calculate the probability ratio between the old and new strategies. And calculate the Actor's loss function based on the truncation function. The core parameters of the policy network unit are updated through backpropagation.

[0040] In embodiments of the present invention, Specifically, the core decision parameters include the following two dimensions: Dimension 1, the retrieval type identifier, which is a discrete classification value (usually encoded as 0 or 1 in the underlying code), determining which retrieval model to use in the current round. Dimension 2, the document recall quantity parameter, which is a positive integer value (e.g., K=6), determining the number of most relevant documents that the current retrieval machine needs to return downstream.

[0041] Step 104: Perform a retrieval and obtain a candidate document set. In an embodiment of the present invention, based on... The corresponding search engine is invoked to perform the search and obtain a set of candidate documents. .in: When sparse retrieval is selected, the Top-K keywords are retrieved based on keyword relevance using an inverted index (e.g., BM25). When selecting dense search, Encode the data into a vector and perform nearest neighbor retrieval in the vector index to recall the Top-K; During joint retrieval, the results from sparse and dense paths are merged, deduplicated, and truncated. Normalized scores can be used for weighted fusion to obtain the final set of candidate documents.

[0042] Preferably, the sparse retrieval system retrieves candidate documents from the knowledge base based on term statistics (e.g., BM25); the dense retrieval system retrieves candidate documents based on encoder vector recall (e.g., nearest neighbor search after obtaining vectors through DeBERTaV3 encoding); and the joint retrieval scheduler decides to call one of the retrieval systems or proportionally fuse the results of the two retrieval systems based on the agent's output, and supports cross-retrieval deduplication and merging.

[0043] Step 105: Reorder the candidate document set. In an embodiment of the present invention, the candidate document set... Performing a reordering operation yields an ordered subset of documents. Re-ranking can be achieved using a cross-encoder or other relevance assessment models to score and rank the "question-candidate document fragment" pairs. Only the first K' (K'≤K) fragments can be retained for subsequent discrimination and evidence construction, thus controlling the context length and inference cost.

[0044] Step 106: Determine if the evidence is sufficient and generate a matching score. In an embodiment of the present invention, the matching score between the current search content and the question is calculated, and a determination is made based on a preset threshold to determine whether a re-search is needed; if the current round reaches the round limit T, then the generation process is forced.

[0045] This step specifically includes: calculating the matching score. Used to measure the current ordered subset of documents With the question The degree of matching, coverage, or responsiveness. The calculation formula is as follows: Among them, the discriminant function It can comprehensively consider indicators such as relevance, coverage, information novelty, and conflict detection. With preset threshold Compare and make a decision based on the round limit T: like If the evidence is deemed sufficient, proceed to the generation step; like and The evidence was deemed insufficient, triggering a re-search. like (Upon reaching the limit), even It also enters the generation process to ensure the upper bound of latency and cost.

[0046] In this embodiment, The normalized confidence score is calculated by a separate lightweight discriminative model, specifically including: an input concatenation layer, used to normalize the questions from the current round. With the currently reordered subset of documents The process involves text-level concatenation; a deep feature interaction layer, based on a lightweight Transformer architecture, enables deep word-level and semantic-level interaction between the question token (the smallest semantic unit of text processed by the model, which may be a word, a subword, or a character) and the document token through a self-attention mechanism, extracting a global feature vector representing the degree of matching between the two; and a normalization scoring head, which inputs the global feature vector into a multilayer perceptron or fully connected linear layer, and finally maps the unbounded values ​​to a normalized score using a Sigmoid activation function. Within the interval, output the final matching score. The training data for this model comes from the interaction trajectories and reward feedback generated by the system while exploring the question-answering dataset: when the discriminator model assigns a score... This score is then passed to the Agent as a reward. If the large language model ultimately generates the correct answer based on the current document, the overall reward for reinforcement learning increases, and the gradient backpropagation will automatically optimize the discriminative model so that it can score higher when faced with similar sufficient context next time.

[0047] Step 107: Re-retrieval status update (if re-retrieval is triggered). In an embodiment of the present invention, when re-retrieval is triggered, the status is updated based on the retrieval results of the current round (such as a candidate document set or an ordered subset of documents) to form the status for the next round. Preferably, from Extract high-confidence sentences, key entities, bridging entities, or summaries from the data to form supplementary information and add it to extended questions or state memories, for example: And update the round count to Then return to step 103, where the agent reselects the retrieval action in the new state (allowing changes to the retrieval type and K value to achieve dynamic re-retrieval of the modified strategy).

[0048] Step 108: Construct prompts and perform generation. In embodiments of the present invention, the question and selected document fragments are organized into generated prompt words and following text, and a large language model is invoked to generate the answer. This step includes: when the evidence is deemed sufficient or the round limit is reached, the question... With evidence set D (which can be the final round) (or a collection of evidence after multiple rounds of deduplication and merging) Construct a Prompt (including evidence fragments, source identifiers, deduplication, truncation, and formatting rules) according to a preset prompt template, and input it into a large language model to generate answer A: During generation, you can optionally enable "Citation and Source Tracing Output" to establish a correspondence between the answer and the evidence fragments.

[0049] In the embodiments of the present invention, information extraction and answer generation are achieved through context summarization and reasoning based on prompt word engineering using a large language model, without the need to introduce an additional independent information extraction model. For example, the specific implementation process is as follows: Contextual cue word concatenation: When the discriminative model determines the current subset of candidate documents for re-ranking. Sufficient evidence has been obtained (i.e.) Matching score with the question ), or the search rounds have reached the preset limit ( When this happens, the system will process the current user issue. With candidate document subset The text is concatenated to create an enhanced contextual cue word. ; Large language model reasoning generation: The enhanced contextual cue words constructed above are input into the generator large language model (in this embodiment, the large language model adopts a pre-trained model with powerful reasoning ability). Implicit Information Extraction and Logical Synthesis: In the above generation process, the generator's large language model utilizes its deep Transformer self-attention mechanism to automatically extract information from long texts. Focus and Problems Strongly relevant supporting facts or bridging entities. Subsequently, based on its powerful natural language understanding capabilities, the large language model integrates and summarizes these scattered fragmented pieces of knowledge through logical chains, ultimately outputting a fluent and accurate natural language answer A.

[0050] In step 109, the results and log entries are output. In this embodiment of the invention, the final output is the answer and optional evidence references, and the action sequence, K-value for each round, retrieval selection, matching score, number of rounds, time consumption, and hit rate are recorded for offline evaluation, auditing, and subsequent training.

[0051] In embodiments of the present invention, the steps of agent training and multidimensional reward updating are also included.

[0052] To enable the agent to learn to select appropriate retrieval methods and adaptively set K for problems of varying difficulty, this invention employs reinforcement learning to train the agent, preferably using the Proximal Policy Optimization (PPO) algorithm. The training process is as follows: Figure 2 As shown, it includes: Step S1: Sample training problems and initialize the environment. Extract problems from the training set. Initialize rounds The environmental condition is ,in This indicates that the summary of historical search results is empty.

[0053] Step S2: Policy Execution and Trajectory Collection. Perform several rounds of the aforementioned steps 103 to 107 to obtain the action sequence. Matching score sequences The trajectory is formed by information such as the retrieval time and K value for each round.

[0054] Step S3: Construct multi-dimensional rewards and summarize the total reward. For each step, construct a reward that includes at least: (1) Strategy Rewards Encourage prioritizing lower-cost retrieval strategies for simpler questions and reducing K. This can be implemented as a retrieval reward. (For example, sparse retrieval uses 1, dense retrieval uses 0) and Top-K reward (For example To encourage smaller K), the combination is: in , These are the combined weight coefficients.

[0055] (2) Distinguish rewards The relevance matching score output in step 106 is used as the immediate reward, i.e.: (3) Re-retrieval advantage reward and feedback differential reward : To measure whether the strategy in this round has improved compared to the previous round (the degree of score improvement), i.e.: The total reward is: in , , This is a weighting coefficient, which can be set according to task requirements.

[0056] In embodiments of the present invention, before actual training, hyperparameter optimization algorithms such as grid search or Bayesian optimization are used on the validation set according to the requirements of the application scenario. , , Fine-tuning is then performed. To ensure gradient stability during reinforcement learning training, these three weights are normalized (e.g., to satisfy certain conditions). (or limit it to a specific magnitude). This invention, by adjusting the relative sizes of the three weights, enables the agent to adapt to different business scenarios without modifying the underlying network structure, solely through reward-driven mechanisms. For example: Scenario A: Pursuing ultimate response speed and low cost (significantly increasing) When the strategy reward weight When in a dominant position, the agent becomes "stingy" in order to maximize rewards. Faced with the vast majority of queries, the agent will very likely choose a sparse retrieval tool with extremely low computational cost and force the output of a very small Top-K value. This approach is suitable for simple factual question-and-answer scenarios with high concurrency and low latency (such as general customer service questions and weather queries).

[0057] Scenario B: Pursuing the ultimate accuracy and high quality (significantly increasing) When determining reward weights When dominant, computational costs are ignored, and the focus is entirely on content relevance. In this scenario, the agent overwhelmingly prefers denser retrieval systems with deeper semantic understanding and outputs larger Top-K values ​​to ensure downstream large models receive the most sufficient context. This approach is suitable for specialized fields with extremely low tolerance for errors (such as medical diagnostic assistance and legal document review).

[0058] Scenario C: Handling complex long logic chains and multi-hop reasoning (significantly increases...) When the feedback differential reward weight When in a dominant state, the agent pays close attention to "quality improvement between adjacent rounds of retrieval." This enables the agent to learn powerful trial-and-error and corrective capabilities. For example, if the first round of sparse retrieval finds insufficient information, the second round can quickly switch to dense retrieval and increase the K value to compensate for knowledge gaps. This is suitable for long-tail complex problem scenarios such as complex reasoning and multi-document information aggregation (e.g., deep question answering for academic papers or long financial reports).

[0059] Step S4: PPO updates the Actor and Critic submodules. The advantage function is calculated based on the trajectory, and the parameters are updated using PPO cutout targets, including: calculating temporal differences. With advantages ; Calculate the probability ratio of the old and new strategies Minimize the PPO loss function to update the Actor and Critic. After the update is complete, proceed to the next batch of training until convergence.

[0060] In embodiments of the present invention, the discrimination module J may employ a lightweight cross-encoder, which can be jointly trained or fixed with the agent. End-to-end joint training is preferred to avoid additional offline training steps. During training, the strategy, retrieval results, rearranged lists, final answers, user feedback, or offline annotations for each question-and-answer session can all be used as samples to construct reward signals or supervision labels. Rewards can consist of multiple objectives, including at least task correctness (consistency with labeled answers or user confirmation), evidence sufficiency (the proportion of assertions supported by evidence), cost (number of retrieval calls and number of tokens generated), latency (end-to-end response time), and stability (number of re-retrieval rounds). To avoid overfitting the strategy to a single domain, training can be performed stratified sampling by domain, difficulty, and document freshness, and constraints can be placed on the recall boundaries of different retrieval machines to ensure robustness even when the index changes.

[0061] Through the above steps, the agent-based adaptive retrieval enhancement generation method provided in this embodiment of the invention achieves a closed loop of policy adaptation, multi-retrieval agent cooperation, controllable re-retrieval, and evidence constraint generation within the same RAG framework, and achieves the following effects: By using a single lightweight agent to make adaptive decisions on key parameters such as retrieval type, Top-K recall, round limit, and optional threshold within a unified policy space, a retrieval enhancement generation closed loop that dynamically adjusts according to question characteristics and the quality of retrieved evidence is achieved. Compared with the fixed-policy RAG scheme, it can effectively reduce over-retrieval and redundant context for simple questions, and reduce end-to-end latency and invocation cost. The sufficiency of evidence is quantitatively evaluated by the discrimination and re-retrieval control module, and the retrieval instrument and parameters are changed in subsequent rounds by state update to avoid missed triggers or invalid loops in re-retrieval, thereby improving the effective recall and evidence coverage of complex problems and reducing the risk of hallucination caused by insufficient evidence. The multi-searcher collaboration and fusion re-ranking mechanism enables the precise matching advantage of sparse retrieval and the semantic recall advantage of dense retrieval to be scheduled on demand, improving the reliability and verifiability of answers without significantly increasing the complexity of the system structure, and facilitating continuous iterative optimization through logs and multi-dimensional rewards.

[0062] Therefore, this invention provides a Retrieval-Augmented Generation (RAG) scheme that enables dynamic planning of retrieval strategies, adaptive adjustment of retrieval parameters, and supports multi-retrieval collaboration and controllable re-retrieval without significantly increasing system complexity and training burden. This scheme can: automatically select sparse or dense retrieval for questions of varying difficulty and dynamically set key parameters such as Top-K; output interpretable feedback signals based on the matching degree between retrieval results and questions to determine whether to proceed to re-retrieval and how to adjust the strategy for the next round; and utilize retrieval quality gain during multi-round retrieval to avoid invalid loops, improve retrieval efficiency, and increase final generation accuracy. This invention significantly reduces the risk of illusions in multi-hop question answering, environments with interfering knowledge, cross-domain knowledge bases, and large-scale online services, while achieving a better balance between accuracy, cost, and latency.

[0063] Example 2 In embodiments of the present invention, an agent-based adaptive retrieval enhancement generation system is also provided to implement the steps of the agent-based adaptive retrieval enhancement generation method as described in Embodiment 1.

[0064] Figure 3 This is a schematic diagram of the structure of an agent-based adaptive retrieval enhancement generation system according to an embodiment of this application, as shown below. Figure 3 As shown, the system includes: an input processing module 301, a policy space configuration module 302, an agent decision-making module 303, a collaborative retrieval module 304, a reordering module 305, a discrimination and re-retrieval control module 306, and a prompt construction and generation module 307.

[0065] The input processing module 301 receives the user's question text and performs normalization processing, including: word segmentation or sub-word encoding, removal of invalid symbols, language detection, spell normalization, and length pruning; it can also generate question features. Its input is the user question Q, and its output is the normalized question. and problem characteristics .

[0066] The policy space configuration module 302 is used to construct a retrieval policy space Act that can be selected by the agent. This module outputs the policy space. .

[0067] The agent decision-making module 303 is used to select the most appropriate retrieval action from the policy space based on the current state (including the question, historical search results, current round, etc.) and output the retrieval parameters. This module further includes: Actor submodule: Outputs policy distribution and samples actions. ; The Critic submodule evaluates the value of the current state and is used to calculate the advantage function and update strategy during training. Its output is the retrieval strategy for this round. (i represents the current round), including parameters such as the searcher type and K.

[0068] In embodiments of this application, the Critic submodule further includes: The state value prediction unit is used to receive a state representation that includes problem characteristics and retrieval history context. via parameters Deep neural networks Output a scalar value This characterizes the expected total return that the current system state can obtain in the future. The advantage function calculation unit (activated only during training) combines multi-dimensional joint rewards (including retrieval strategy reward, matching score reward, and re-retrieval advantage reward) and calculates the advantage value of the current retrieval action using the generalized advantage estimation (GAE) algorithm. This value is then passed to the Actor submodule to guide the update of the retrieval strategy distribution; the value network optimization unit employs loss functions such as mean squared error. The difference between the predicted value and the actual cumulative return is calculated, and the network parameters of the Critic submodule are updated gradient through the backpropagation mechanism to improve the accuracy of its subsequent evaluation of the retrieval status.

[0069] The collaborative retrieval module 304 is used to invoke the corresponding retrieval engine to perform the retrieval according to the retrieval strategy, thereby obtaining a candidate document set. This module further includes: a sparse retrieval engine, which retrieves candidate documents from the knowledge base based on term statistics (e.g., BM25); a dense retrieval engine, which retrieves candidate documents based on encoder vector recall (e.g., nearest neighbor search after obtaining vectors through DeBERTaV3 encoding); and a joint retrieval scheduler, which determines which retrieval engine to invoke or proportionally merges the results of both based on the agent's output, and supports cross-retrieval engine deduplication and merging. This module outputs a candidate document set.

[0070] The reordering module 305 is used to fine-tune the candidate document set (e.g., by using a cross encoder to score the "question-document" pairs) and output an ordered subset of documents.

[0071] The discrimination and re-retrieval control module 306 is used to calculate the matching score between the current search content and the question. It compares the result with a threshold to determine if a re-search is needed; when the round limit is reached, it forces the generation process. Its output is either a re-search flag or the generation function, along with a matching score.

[0072] The prompt construction and generation module 307 is used to organize the question and selected document fragments into generated prompt words and following text, and calls the large language model to generate the answer. Its output is answer A and optional supporting paragraphs.

[0073] The above modules are connected in sequence. The output of the discrimination and re-retrieval control module 306 is also fed back to the agent decision module 303 (when re-retrieval is triggered) and the prompt construction and generation module 307 (when the evidence is sufficient or the round limit is reached).

[0074] The agent-based adaptive retrieval enhancement generation system provided in this embodiment realizes adaptive selection of retrieval strategies, dynamic collaboration of multiple retrieval machines, and a controllable re-retrieval mechanism, thereby improving retrieval efficiency and answer reliability while ensuring generation quality.

[0075] Furthermore, the system also includes a training and update module 308. This module is used to collect interaction logs and trajectory data during system operation, construct multi-dimensional reward signals, and update the policy network parameters of the agent decision-making module 303 to achieve continuous optimization of the agent. The inputs to the training and update module 308 include: question features from the input processing module 301, action and state information from the agent decision-making module 303, matching scores from the discrimination and re-retrieval control module 306, and the final answer and optional user feedback from the prompt construction and generation module 307. Its output is the updated network parameters of the agent decision-making module 303.

[0076] Training and update module 308 further includes: The trajectory collection unit is used to record the state sequence, action sequence (including retrieval type, Top-K value, etc.), discrimination score sequence, generated answer, and final user feedback or offline annotation results in each round of question and answer, forming a complete interaction trajectory. The reward construction unit is used to calculate multi-dimensional rewards for each action, including at least: strategy reward (encouraging low-cost retrieval strategies and small K values), discrimination reward (based on the relevance score output by the discrimination module), and re-retrieval advantage reward (measuring the degree of improvement of the strategy in this round compared to the previous round), and the rewards of each dimension are weighted and summed into a total reward; The strategy optimization unit uses a reinforcement learning algorithm (preferred proximal policy optimization PPO algorithm) to update the network parameters of the Actor submodule and Critic submodule in the agent decision module 303 based on the collected trajectories and the calculated advantage function, so that the agent gradually learns to select the optimal retrieval strategy under different problems and retrieval states.

[0077] During system operation, the training and update module 308 can operate in offline training mode (batch update based on historical logs) or online learning mode (based on real-time interactive streaming update), thereby enabling the system to have continuous adaptive capabilities and maintain the optimality of the retrieval strategy in different fields and document library environments.

[0078] Example 3 In embodiments of the present invention, an electronic device is also provided. Figure 4 This is a schematic diagram of the structure of an electronic device according to an embodiment of the present invention, such as... Figure 4 As shown, the electronic device of the present invention includes a processor 401 and a memory 402, wherein, The memory 402 stores a computer program, which, when read and executed by the processor 401, performs the steps described above in the embodiment of the agent-based adaptive retrieval enhancement generation method.

[0079] Example 4 In embodiments of the present invention, a computer-readable storage medium is also provided, wherein a computer program is stored in the computer program, wherein the computer program is configured to execute the steps in the embodiments of the agent-based adaptive retrieval enhancement generation method described above when running.

[0080] In this embodiment, the aforementioned computer-readable storage medium may include, but is not limited to, various media capable of storing computer programs, such as USB flash drives, read-only memory (ROM), random access memory (RAM), portable hard drives, magnetic disks, or optical disks.

[0081] It will be understood by those skilled in the art that the above description is merely a preferred embodiment of the present invention and is not intended to limit the present invention. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing embodiments or make equivalent substitutions for some of the technical features. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.

Claims

1. An agent-based adaptive retrieval enhancement generation method, characterized in that, Includes the following steps: The initial state is constructed based on the user's question; The agent selects a retrieval action from a preset retrieval strategy space based on the current state. The retrieval action includes at least a retrieval type and a recall quantity. The agent then calls the corresponding retrieval agent to perform the retrieval according to the retrieval action, obtaining a candidate document set. The candidate document set is then reordered to obtain an ordered document subset. The matching score between the ordered document subset and the user's question is evaluated. Based on the comparison result of the matching score with a preset threshold and whether the current round has reached the preset round limit, it is determined whether to trigger a re-retrieval or enter the generation process. If a re-retrieval is triggered, the status is updated based on the retrieval results of the current round, and the step selected by the agent is returned to proceed to the next round of retrieval; if generation is initiated, the user's question and evidence set are constructed into prompt words, and the answer is generated through a large language model.

2. The agent-based adaptive retrieval enhancement generation method according to claim 1, characterized in that, The searcher types include sparse searchers, dense searchers, or a combination of both.

3. The agent-based adaptive retrieval enhancement generation method according to claim 1, characterized in that, The initial state includes at least the normalized question, round count, historical search content summary, and search budget constraint.

4. The agent-based adaptive retrieval enhancement generation method according to claim 1, characterized in that, The intelligent agent includes an Actor submodule and a Critic submodule; The Actor submodule is used to output the strategy distribution based on the current state and sample the retrieval actions to determine the retrieval strategy for this round. The Critic submodule is used to evaluate the value of the current state and, during the training phase, calculates the advantage function by combining the reward obtained from the retrieval action to guide the policy parameter update of the Actor submodule.

5. The agent-based adaptive retrieval enhancement generation method according to claim 4, characterized in that, The agent is trained using reinforcement learning, and the parameters of the Actor submodule and the Critic submodule are updated by constructing a multidimensional reward function and using a proximal policy optimization algorithm.

6. The agent-based adaptive retrieval enhancement generation method according to claim 5, characterized in that, The multidimensional reward function includes: Strategy rewards are used to encourage the adoption of lower-cost retrieval types and smaller recall values; A reward is determined, wherein the reward is equal to the matching score; The re-search advantage reward and feedback differential reward are used to measure whether the current search strategy is an improvement over the previous search strategy.

7. The agent-based adaptive retrieval enhancement generation method according to claim 1, characterized in that, The step of updating the status based on the search results of the current round further includes: extracting high-confidence sentences, key entities, bridging entities or summaries from the currently ordered document subset, forming supplementary information and adding extended questions or state memory.

8. An agent-based adaptive retrieval and enhancement generation system, characterized in that, The system for implementing the agent-based adaptive retrieval enhancement generation method according to any one of claims 1 to 7, the system comprising: an input processing module for receiving user questions and constructing an initial state; a policy space configuration module for constructing a retrieval policy space; an agent decision-making module for selecting a retrieval action from the policy space according to the current state; a collaborative retrieval module for calling the corresponding retrieval device to perform retrieval according to the retrieval action and obtaining a candidate document set; a reordering module for reordering the candidate document set to obtain an ordered document subset; a discrimination and re-retrieval control module for evaluating the matching score and determining whether to trigger a re-retrieval or enter generation; and a prompt construction and generation module for constructing prompt words from the user question and evidence set, and generating an answer through a large language model.

9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, The processor is configured to execute the computer program stored in the memory to implement the agent-based adaptive retrieval enhancement generation method according to any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that, The storage medium stores a computer program, which is loaded and executed by a processor to implement the agent-based adaptive retrieval enhancement generation method according to any one of claims 1 to 7.