Adaptive multi-agent cooperative task planning method based on confidence feedback adjustment

By introducing an adaptive multi-agent collaborative task planning method with confidence feedback adjustment, precise intervention is implemented for different error levels, solving the problem of resource waste caused by blind retries in multi-agent systems, and achieving efficient and adaptive task solving and resource optimization.

CN122242514APending Publication Date: 2026-06-19BEIJING UNIV OF POSTS & TELECOMM

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
BEIJING UNIV OF POSTS & TELECOMM
Filing Date
2026-02-06
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

In existing technologies, multi-agent systems blindly retry when tasks fail to meet targets, resulting in wasted computing resources. They also lack graded intervention strategies based on the severity of errors, and cannot effectively address failures caused by capability deficiencies.

Method used

An adaptive multi-agent collaborative task planning method based on confidence feedback is introduced. By quantifying confidence assessment and using dual threshold judgment, a topology reconstruction mechanism and a prompt word optimization mechanism are implemented. Corresponding intervention strategies are matched for different error levels, including freezing tasks, introducing expert roles, adjusting task dependency graphs, and providing introspection prompt words.

Benefits of technology

It improves the robustness of the system and the efficiency of computing resource utilization, realizes intelligent, efficient and adaptive multi-agent collaborative task solving, reduces resource waste, and improves the success rate and final quality of complex tasks.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122242514A_ABST
    Figure CN122242514A_ABST
Patent Text Reader

Abstract

This invention provides an adaptive multi-agent cooperative task planning method based on confidence feedback adjustment, belonging to the field of data processing technology. The method includes: semantically parsing received natural language instructions to obtain a task dependency graph and an initial set of agents; having at least one agent in the initial agent set execute a target sub-task according to the task dependency graph to generate an intermediate result; having a critic agent evaluate the intermediate result to generate a quantified confidence score; and, based on a comparison of the confidence score with at least one preset low and high thresholds, executing one of the following feedback control steps: triggering a topology reconstruction mechanism, triggering a prompt word optimization mechanism, or solidifying the intermediate result. This method significantly improves the robustness of the system and the efficiency of computational resource utilization while ensuring the solution rate of complex tasks, achieving intelligent, efficient, and adaptive multi-agent cooperation.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of data processing technology, and in particular to an adaptive multi-agent cooperative task planning method based on confidence feedback adjustment. Background Technology

[0002] With the rapid development of Large Language Models (LLMs), leveraging multiple agents to collaboratively solve complex problems in different roles (such as product managers, architects, and engineers) has become an important trend in the field of artificial intelligence. Through division of labor and cooperation, multi-agent systems have demonstrated potential that surpasses single-agent models in tasks such as code generation, complex reasoning, and creative writing.

[0003] Existing feedback control mechanisms are overly simplistic, typically employing a binary error handling logic that only determines whether a task succeeds or fails and requires a retry. When a task fails to meet its objectives, the same agent is often blindly forced to repeatedly generate content, lacking a tiered intervention strategy based on the severity of the error. This indiscriminate retry mechanism not only fails to address failures caused by capability deficiencies but also results in a significant waste of computational resources. Summary of the Invention

[0004] This invention provides an adaptive multi-agent cooperative task planning method based on confidence feedback adjustment, which addresses the shortcomings of existing technologies where, when a task fails to meet its objectives, the same agent is often blindly allowed to repeatedly generate content. This not only fails to resolve failures caused by capability deficiencies but also results in a huge waste of computing resources. This invention avoids blindly allowing the same agent to repeatedly generate content, thus resolving failures caused by capability deficiencies and preventing a huge waste of computing resources.

[0005] This invention provides an adaptive multi-agent cooperative task planning method based on confidence feedback adjustment, comprising the following steps.

[0006] The received natural language instructions are semantically parsed to obtain a task dependency graph and an initial set of agents; At least one executing agent from the initial set of agents executes a target subtask according to the task dependency graph to generate an intermediate result; The intermediate results are evaluated by a critic agent to generate a quantified confidence score. Based on the comparison result of the confidence score with at least one preset low threshold and high threshold, perform one of the following feedback control steps: If the confidence score is lower than the low threshold, a topology reconstruction mechanism is triggered to re-execute the target subtask. Based on the confidence score being higher than the low threshold and lower than the high threshold, a prompt word optimization mechanism is triggered to drive the execution agent to correct the intermediate result; The intermediate result is solidified based on the confidence score being higher than the high threshold.

[0007] According to the present invention, an adaptive multi-agent cooperative task planning method based on confidence feedback adjustment is provided, wherein the topology reconstruction mechanism includes the following steps: Freeze the target subtask; Based on the reasons for the failure of the target sub-task, the target expert role is determined from the expert role library; Reconstruct the task dependency graph; The target subtask is restarted based on the target expert role and the reconstructed task dependency graph.

[0008] According to the present invention, an adaptive multi-agent cooperative task planning method based on confidence feedback adjustment is provided, wherein the reconstructing of the task dependency graph includes: Modify the execution mode of the part corresponding to the target subtask in the task dependency graph to a multi-node parallel mode or a collaborative mode.

[0009] According to the present invention, an adaptive multi-agent cooperative task planning method based on confidence feedback adjustment includes the following steps in its prompt word optimization mechanism: The prompt word optimization engine is invoked to determine introspection prompt words based on the modification suggestions output by the critic agent; The target subtask is re-executed based on the introspection prompt.

[0010] According to the present invention, an adaptive multi-agent cooperative task planning method based on confidence feedback adjustment is provided, wherein the step of semantically parsing the received natural language instructions to obtain a task dependency graph and an initial set of agents includes: The received natural language instructions are semantically parsed to obtain an atomized subtask sequence represented in the form of a directed acyclic graph; Based on the atomized subtask sequence, a task dependency graph and an initial set of agents are determined.

[0011] According to the adaptive multi-agent cooperative task planning method based on confidence feedback adjustment provided by the present invention, before executing the target sub-task according to the task dependency graph to generate an intermediate result, the method further includes: The target subtask to be executed is vectorized to obtain the current task vector; Calculate the semantic relevance between the current task vector and the vectors of each historical memory segment stored in the shared memory pool; Historical memory fragments with a semantic relevance higher than a preset relevance threshold to the current task vector are determined from the shared memory pool and combined to obtain context information; The context information is provided to the executing agent.

[0012] According to the present invention, an adaptive multi-agent cooperative task planning method based on confidence feedback adjustment is provided, wherein the adaptive multi-agent cooperative task planning method based on confidence feedback adjustment further includes: Based on the fact that the intermediate results corresponding to all subtasks in the task dependency graph have been solidified; or based on the fact that the iteration count of any subtask has reached the preset maximum iteration count, the summarizing agent is controlled to extract all solidified intermediate results from the shared memory pool. All the solidified intermediate results are integrated to generate and output the final task result.

[0013] The present invention also provides an adaptive multi-agent cooperative task planning device based on confidence feedback adjustment, comprising the following modules: The parsing module is used to perform semantic parsing on the received natural language instructions to obtain a task dependency graph and an initial set of agents; A generation module is used to have at least one execution agent from the initial set of agents execute a target subtask according to the task dependency graph to generate an intermediate result; An evaluation module is used to evaluate the intermediate results by a critic agent to generate a quantified confidence score. Based on the comparison result of the confidence score with at least one preset low threshold and high threshold, perform one of the following feedback control steps: If the confidence score is lower than the low threshold, a topology reconstruction mechanism is triggered to re-execute the target subtask. Based on the confidence score being higher than the low threshold and lower than the high threshold, a prompt word optimization mechanism is triggered to drive the execution agent to correct the intermediate result; The intermediate result is solidified based on the confidence score being higher than the high threshold.

[0014] The present invention also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the adaptive multi-agent cooperative task planning method based on confidence feedback adjustment as described above.

[0015] The present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the adaptive multi-agent cooperative task planning method based on confidence feedback adjustment as described above.

[0016] The present invention also provides a computer program product, including a computer program that, when executed by a processor, implements the adaptive multi-agent cooperative task planning method based on confidence feedback adjustment as described above.

[0017] The adaptive multi-agent cooperative task planning method based on confidence feedback adjustment provided by this invention can accurately distinguish between execution errors and capability deficiencies by introducing quantitative confidence assessment and dual threshold determination. By matching different levels of intervention strategies—cue word optimization and topology reconstruction—to different error levels, this method significantly improves the robustness of the system and the efficiency of computational resource utilization while ensuring the solution rate of complex tasks, achieving intelligent, efficient, and adaptive multi-agent cooperation. Attached Figure Description

[0018] To more clearly illustrate the technical solutions in this invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.

[0019] Figure 1 This is one of the flowcharts of the adaptive multi-agent cooperative task planning method based on confidence feedback adjustment provided by the present invention.

[0020] Figure 2 This is a flowchart illustrating the topology reconfiguration mechanism provided by the present invention.

[0021] Figure 3 This is a flowchart illustrating the prompt word optimization mechanism provided by the present invention.

[0022] Figure 4 This invention provides a schematic diagram of the process of semantically parsing received natural language instructions to obtain a task dependency graph and an initial set of intelligent agents.

[0023] Figure 5 This is the second flowchart of the adaptive multi-agent cooperative task planning method based on confidence feedback adjustment provided by the present invention.

[0024] Figure 6 This is the third flowchart of the adaptive multi-agent cooperative task planning method based on confidence feedback adjustment provided by the present invention.

[0025] Figure 7 This is the fourth flowchart of the adaptive multi-agent cooperative task planning method based on confidence feedback adjustment provided by the present invention.

[0026] Figure 8 This is a schematic diagram of the three-level feedback control based on dual thresholds provided by the present invention.

[0027] Figure 9 This is a schematic diagram of the structure of the adaptive multi-agent cooperative task planning device based on confidence feedback adjustment provided by the present invention.

[0028] Figure 10 This is a schematic diagram of the structure of the electronic device provided by the present invention.

[0029] Figure label: 901: Parsing module; 902: Generation module; 903: Evaluation module; 1010: Processor; 1020: Communication interface; 1030: Memory; 1020: Communication bus. Detailed Implementation

[0030] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of this invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this invention. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without creative effort are within the scope of protection of this invention.

[0031] This invention provides an adaptive multi-agent cooperative task planning method based on confidence feedback adjustment. The executing agent can be a computer device, such as a server, personal computer (PC), or a mobile terminal with a certain computing capability. This method can also be deployed in a cloud computing environment or distributed computing platform, collaboratively completing the task by calling the resources of multiple computing nodes.

[0032] The following is combined Figures 1 to 10 This invention describes an adaptive multi-agent cooperative task planning method based on confidence feedback adjustment.

[0033] Figure 1 This is one of the flowcharts illustrating the adaptive multi-agent cooperative task planning method based on confidence feedback adjustment provided by this invention, such as... Figure 1 As shown, the method includes the following: Step 101: Perform semantic parsing on the received natural language instructions to obtain the task dependency graph and the initial set of intelligent agents.

[0034] First, it will receive input from the user, which is usually a natural language instruction describing a complex task objective, such as "Please develop a website backend service for me with user registration and login functions."

[0035] To enable multiple agents to understand and collaboratively execute the task, an initial planning phase is required. This phase can be performed by a master control unit within the system or a dedicated master control agent (Meta-Agent). The master control unit performs in-depth semantic parsing of the received natural language instructions. Its core objective is to decompose the macroscopic, vague task objective into a series of smaller, more specific, and executable subtasks, and to clarify the logical relationships between these subtasks.

[0036] The result of semantic parsing is, on the one hand, the generation of a task dependency graph. This graph defines the execution order and pre- and post-delay dependencies between subtasks, ensuring the orderly nature of the entire collaborative process. For example, in the task of developing a login service, the subtask of designing the database table structure must be completed before the subtask of writing the data access layer code. This task dependency graph can be a data structure containing nodes and directed edges, where nodes represent subtasks and edges represent dependencies.

[0037] Another aspect of semantic parsing is the generation of an initial set of agents. Based on the nature of each subtask, the system matches and instantiates one or more of the most suitable agents from a pre-defined expert role library. Here, an agent can be understood as an automated program encapsulated based on a large language model, endowed with specific roles, skills, and instructions. For example, for the subtask of designing database table structures, the system instantiates a database architect agent; for the subtask of writing backend interface code, it instantiates a backend engineer agent. This group of initially instantiated agents constitutes the initial set of agents.

[0038] Step 102: At least one agent from the initial agent set executes the target subtask according to the task dependency graph to generate an intermediate result.

[0039] In this context, the executing agent is the agent currently activated by the task dependency graph from the initial set of agents. The target subtask is the specific work that the executing agent needs to complete in the current step. Based on its assigned role and received task instructions, the executing agent will invoke its underlying large language model or other tools to generate work outputs.

[0040] This output is called an intermediate result. An intermediate result is the initial output of a subtask, and it can take various forms, such as a piece of source code, a document, a design diagram, a configuration file, or any other form of data. It is called an intermediate result because it has not yet undergone final verification; its quality and usability are uncertain and it needs to enter the subsequent evaluation stage.

[0041] For example, after a backend engineer agent performs the subtask of writing a login interface, the intermediate result might be a Python code file containing login logic.

[0042] Step 103: A critic agent evaluates the intermediate results to generate a quantified confidence score.

[0043] In this system, the critic agent acts as a quality assurance or test engineer, whose core responsibility is to objectively and systematically evaluate the intermediate results submitted by the execution agent. It is separate from the execution agent to ensure the impartiality of the evaluation.

[0044] The critic agent can examine intermediate results based on a pre-defined set of constraints. These constraints can vary depending on the task type. For example, for code-related intermediate results, constraints might include: whether the code runs without errors (runnability), whether the code logic meets the task requirements (logical consistency), whether the coding style follows project standards (formatting compliance), and whether there are any known security vulnerabilities. For copywriting-related intermediate results, constraints might include: whether the syntax is correct, whether the logic is coherent, and whether it aligns with the brand's tone.

[0045] After evaluation, the critic agent needs to output a quantified confidence score. This is fundamentally different from the simple pass / fail binary judgment in existing technologies. This score is a numerical value that can accurately represent the quality of the intermediate result, such as a floating-point number normalized to between 0 and 1, or an integer between 0 and 100. The higher the score, the more confident the critic agent is in the quality of the intermediate result. For example, after testing, if the code runs perfectly and is grammatically correct, it might get a score of 0.95; if the code runs but has some edge case defects, it might get a score of 0.7; if the code has syntax errors that prevent it from running, it might only get a score of 0.3.

[0046] Step 104: Based on the comparison result of the confidence score with at least one preset low threshold and high threshold, perform one of the following feedback control steps: trigger the topology reconstruction mechanism to re-execute the target subtask based on the confidence score being lower than the low threshold; trigger the prompt word optimization mechanism to drive the executing agent to correct the intermediate results based on the confidence score being higher than the low threshold and lower than the high threshold; solidify the intermediate results based on the confidence score being higher than the high threshold.

[0047] In this step, two key numerical boundaries are pre-defined: a low threshold (e.g., 0.6) and a high threshold (e.g., 0.9). These two thresholds divide all possible values ​​of the confidence score into three intervals, each corresponding to a distinct processing strategy, thereby enabling graded intervention based on the severity of the error.

[0048] When the confidence score falls below a low threshold (e.g., less than 0.6), the system determines that the intermediate result has a serious flaw, or that the current agent's ability is insufficient to complete the target subtask. This situation is considered a failure or circuit breaker. At this point, the system triggers a high-cost but powerful intervention mechanism: topology refactoring. The macro-level goal of this mechanism is to overcome difficulties by changing the problem-solving approach or introducing new resources, rather than simply repeating attempts. Once triggered, the system takes appropriate measures to ensure successful completion of the target subtask in subsequent executions.

[0049] When the confidence score falls between a low and high threshold (e.g., between 0.6 and 0.9), the system determines that the intermediate result is basically usable but flawed, falling into a questionable state. This indicates that the agent's ability is sufficient, but a remediable error occurred during execution. At this point, the system triggers a low-cost prompt optimization mechanism. This mechanism aims to provide precise and guiding feedback to the same agent, helping it recognize its mistakes and correct them. This approach avoids unnecessary resource waste and promotes iterative improvement of the agent.

[0050] When the confidence score is higher than a high threshold (e.g., greater than 0.9), the system determines that the intermediate result is of high quality and fully meets the requirements, and the task is considered successful. At this point, the system will solidify the intermediate result. Solidification may include writing the intermediate result data to a shared, persistent storage space, such as a shared memory pool, database, or file system. The solidified result will serve as reliable input for subsequent subtasks that depend on it. Simultaneously, the system will activate the next or subsequent batch of downstream subtasks based on the task dependency graph, thereby driving the entire collaborative process forward.

[0051] In this embodiment, by introducing quantified confidence assessment and dual threshold determination, it is possible to accurately distinguish between two different types of errors: execution errors and capability deficiencies. By matching different levels of intervention strategies—cue word optimization and topology reconstruction—to different error levels, this method greatly improves the robustness of the system and the utilization efficiency of computing resources while ensuring the solution rate of complex tasks, achieving intelligent, efficient, and adaptive multi-agent collaboration.

[0052] In some embodiments, such as Figure 2 As shown, the topology reconfiguration mechanism includes the following steps: Step 201: Freeze the target subtask.

[0053] Freezing is a state marking operation. When the system decides to trigger the topology refactoring mechanism based on a confidence score below a low threshold, it first marks the corresponding node of the currently failed target subtask in the task dependency graph as frozen or paused. This step aims to prevent the system from continuing invalid attempts on the erroneous path before refactoring is complete, and also ensures that the task state is stable and controllable during structural adjustments. For example, if a backend engineer agent fails to execute the login interface writing subtask, the system will change the state of that task node from "in execution" to "frozen," and all downstream tasks that depend on that node (such as writing frontend integration code) will not be activated.

[0054] Step 202: Based on the reasons for the failure of the target sub-task, determine the target expert role from the expert role library.

[0055] The reasons for failure can be obtained from the evaluation report of the critic agent that triggered the refactoring. As before, the output of the critic agent not only includes a low score, but also usually comes with structured diagnostic information, indicating the specific type of failure, such as timeout due to excessive algorithm complexity, use of deprecated third-party libraries, or failure to handle critical concurrency safety issues.

[0056] The system will use this failure reason to search a pre-defined expert role database. This database is a collection of agent configuration templates with various advanced skills. For example, it might include roles such as algorithm optimization expert, security audit expert, database performance tuning expert, and application programming interface (API) design expert.

[0057] The process of determining the target expert role involves matching the expert best suited to solve the problem based on the cause of failure. This can be a rule-based matching process; for example, if the cause of failure contains keywords related to algorithms or performance, an algorithm optimization expert is matched; if the cause of failure involves Structured Query Language (SQL) injection or cross-site scripting, a security audit expert is matched. Alternatively, it can be a semantic similarity-based retrieval process, which vectorizes the textual description of the cause of failure and compares it with the capability description vectors of each role in the expert role database, selecting the role with the highest similarity as the target expert role.

[0058] Step 203: Reconstruct the task dependency graph.

[0059] In this step, the reconstructed task dependency graph becomes more refined locally, enabling more effective organization of multiple agents to collaboratively process target subtasks.

[0060] Step 204: Restart the target subtask based on the target expert role and the reconstructed task dependency graph.

[0061] Once the partial reconstruction of the task dependency graph is complete, the previous freeze on the target subtask will be lifted. Tasks will then be assigned according to this new, reconstructed partial task dependency graph.

[0062] Based on the target expert role, the newly introduced, instantiated expert intelligence is formally incorporated into the workflow. For example, if an algorithm optimization expert is introduced, the system will first assign tasks to it.

[0063] The restructured task dependency graph implies a change in the task execution flow. The system will strictly schedule agents according to the new subgraph structure. For example, if the new structure is sequential, the system will wait for the algorithm optimization expert to complete its work and use its output as input for the next backend engineer agent.

[0064] In this embodiment, by freezing the problem, diagnosing the cause, introducing experts, adjusting the process, and re-executing, a series of coherent operations are performed to dynamically and intelligently enhance the system's ability to solve specific problems at runtime. When the system faces a complex challenge that exceeds the capabilities of a single agent in the initial plan, it no longer helplessly declares failure, but can proactively and on-demand assemble an expert team to collaboratively tackle the problem. This greatly improves the problem-solving rate and the final quality of the solution for the entire multi-agent system in open and uncertain environments.

[0065] In some embodiments, reconstructing the task dependency graph includes: Modify the execution mode of the part corresponding to the target subtask in the task dependency graph to a multi-node parallel mode or a collaborative mode.

[0066] In this embodiment, the multi-node parallel mode is suitable for scenarios where a complex, large target subtask can be decomposed into several smaller subtasks that are independent or weakly dependent on each other. In this mode, the system will split the target subtask, which was originally handled by a single executing agent, into multiple subtasks that can be performed simultaneously, and assign them to multiple agents, including the original executing agent and newly introduced expert agents, to execute them separately.

[0067] By using this divide-and-conquer parallel model, what was originally a single, highly difficult task is effectively broken down, with each agent focusing on its area of ​​expertise, thereby greatly improving the success rate of the task and the quality of the final output.

[0068] In the collaborative mode, also known as the serial collaborative mode, a new expert agent node is inserted upstream or downstream of the original executing agent to form an enhanced pipeline operation process.

[0069] This collaborative model, where experts guide and engineers implement the work sequentially, ensures that the core design and ideas come from the most specialized roles, while the implementation details are handled by the execution roles, thus effectively resolving failures caused by insufficient core capabilities.

[0070] In this embodiment, by flexibly modifying the single-node execution mode to a multi-node parallel or collaborative mode, the method of the present invention can dynamically organize the optimal temporary team and the most efficient collaborative process according to specific failure scenarios. This adaptive and refined reconfiguration capability is the key to the present invention's ability to solve highly complex and uncertain tasks. It transforms a static, predefined flowchart into a living system that can self-evolve and optimize based on real-time feedback.

[0071] In some embodiments, such as Figure 3 As shown, the prompt word optimization mechanism includes the following steps: Step 301: Invoke the prompt word optimization engine to determine introspection prompt words based on the modification suggestions output by the critic agent.

[0072] The prompt word optimization engine is a functional module or component in the method of this invention. This engine is activated when the system decision routing unit determines that a prompt word optimization strategy should be executed based on the confidence score. It can be understood as a processor specifically responsible for translating and encapsulating feedback information. The engine receives a detailed evaluation report from the critic agent as input, and its task is to transform this technical report, which may contain complex data structures, into a natural language instruction that is friendly to large language models, easy to understand and execute—namely, an introspection prompt word.

[0073] The self-reflection prompts include the following: Contextual reiteration: The prompt word at the beginning reiterates the original target subtask and the role of the currently executing agent. This helps to awaken the agent's memory and allow it to focus on the core objective.

[0074] Preliminary Work Demonstration: The prompt will fully include the flawed intermediate results submitted by the agent in the previous round. Letting the agent see its previous work is the basis for reflection and correction.

[0075] Precise feedback injection: This is the core component. The prompt optimization engine extracts key information from the critic agent's report, such as the problem description and recommended modifications, and expresses it in clear, direct, and unambiguous natural language. For example, it transforms a JSON-formatted error report into a natural language instruction: Your function fails when processing an empty list input. Please add a boundary condition check to return a default value when the list is empty.

[0076] Clearly defined action instructions: At the end of the prompt, a clear call to action will be given, such as "Based on the feedback above, please correct your code and submit a complete new version".

[0077] The self-reflection prompts determined in this way are no longer simple, vague "please try again" instructions, but rather a complete, structured mini-task package that includes goals, current situation, problems, solutions, and action requirements.

[0078] Step 302: Re-execute the target subtask based on the introspection prompt.

[0079] In this step, the newly generated, information-rich introspection prompt is sent as new input to the same executing agent (i.e., the agent that submitted the intermediate result in the previous round). Since the agent's role, core capabilities, and long-term memory have not changed, it now receives what is essentially a revised job with detailed annotations.

[0080] Upon receiving an introspection prompt, the agent will treat it as the core instruction to follow for the current generation task. Because the instruction clearly identifies the problem and the direction for correction, the agent can modify the previous output very specifically, rather than blindly and randomly regenerating it. For example, upon receiving an introspection prompt about handling an empty list, the programmer agent will add logic like `if not input_list: return None` at the beginning of the function. The revised intermediate result will then be submitted to the critic agent for evaluation.

[0081] This embodiment details how the prompt word optimization mechanism calls the prompt word optimization engine, generates structured introspective prompt words based on critic suggestions, and drives the original agent to perform precise corrections. Essentially, this mechanism is a guided self-iteration process that efficiently corrects moderate errors with extremely low computational cost. Compared to high-cost topology reconstruction, this mechanism, as an intermediate and lightweight repair method, significantly improves the overall system efficiency and resource utilization, making the entire feedback control system more refined and economical.

[0082] In some embodiments, such as Figure 4 As shown, semantic parsing is performed on the received natural language instructions to obtain a task dependency graph and an initial set of agents, including: Step 401: Perform semantic parsing on the received natural language instructions to obtain an atomized subtask sequence represented in the form of a directed acyclic graph.

[0083] As in the aforementioned embodiments, after receiving a user's macroscopic natural language instruction, the system needs to decompose it. This embodiment provides a specific decomposition method. This decomposition process can be completed by a master intelligent agent (Meta-Agent), which is specially trained or endowed with powerful planning and decomposition capabilities through prompt engineering.

[0084] To achieve high-quality decomposition, the controlling agent can employ Chain-of-Thought (CoT) technology. That is, when faced with a complex instruction, the controlling agent does not directly output the final list of subtasks, but first internally generates a series of logical reasoning steps, simulating the problem-solving process of human experts. For example, for the instruction to create an online shopping website, the controlling agent's chain of thought might be: First, design the user system, including registration and login; second, design the product display system, including product lists and detail pages; third, design the shopping cart system; fourth, design the order and payment system... The user system is the foundation of all other systems, so it should be completed first. Product display is a prerequisite for the shopping cart...

[0085] Based on this thought process, the controlling agent will eventually decompose the macroscopic task into a series of atomized subtask sequences. Atomization means that each subtask should be small enough, specific enough, and have a single responsibility so that it can be executed by an agent with a specific role.

[0086] This sequence is not a simple linear list, but rather represented as a Directed Acyclic Graph (DAG). DAGs can very accurately describe the complex dependencies between tasks, not just their simple order. In a DAG, each node represents an atomic subtask, and the directed edges between nodes represent execution dependencies—that is, the task at the starting point of an arrow must be completed before the task at the ending point begins. For example, designing a database table node might have an edge pointing to writing a backend API node. This graph structure provides the foundation for subsequent parallel execution and complex collaborative workflows.

[0087] Step 402: Based on the atomized subtask sequence, determine the task dependency graph and the initial set of agents.

[0088] The DAG-form sequence of atomized subtasks generated in the previous step has essentially formed a rudimentary task dependency graph. This step can confirm and solidify this rudimentary graph, making it the blueprint for the entire collaborative task process.

[0089] After determining the task dependency graph, the system needs to assign an executor to each task node in the graph; this process is called determining the initial set of agents. The system will traverse each subtask node in the task dependency graph. For each node, the system will analyze the description text of the subtask (e.g., writing the backend logic for user authentication) and extract key information such as task nature and required skills.

[0090] Based on this information, the system will search and match from a pre-set expert role database. As mentioned before, this role database stores agent configuration templates for various roles. The matching process can be based on keywords, such as matching a front-end engineer agent if the task description contains "front-end" or "user interface" (UI); or it can be based on semantic vector similarity, where the task description is vectorized and compared with the vectors of the ability descriptions of each role in the role database, selecting the most similar role.

[0091] After the system matches and instantiates an agent for each task node in the graph, these instantiated agents together constitute the initial agent set. At this point, the system has completed a complete initial plan: it not only has a detailed construction plan with dependencies (task dependency graph), but also a person in charge of each process (initial agent set).

[0092] This embodiment details how to decompose ambiguous user instructions into a structured sequence of atomic subtasks represented by a Directed Acyclic Graph (DAG) using the thought chain technique, and further determine the task dependency graph and the initial set of agents based on this. This initial planning step is the starting point of the entire adaptive collaborative framework. A high-quality, high-precision initial plan lays a solid foundation for the smooth execution, evaluation, and feedback of all subsequent stages, ensuring that the entire multi-agent system moves in a clear, reasonable, and executable direction from the very beginning.

[0093] In some embodiments, such as Figure 5 As shown, before executing the target subtask according to the task dependency graph to generate an intermediate result, the following steps are also included: Step 501: Vectorize the target subtask to be executed to obtain the current task vector.

[0094] Specifically, when a target subtask is activated in the task dependency graph and is ready to be assigned to the corresponding agent, the system does not immediately generate the final prompt. Instead, it first processes the descriptive text of the target subtask. Vectorization is a technique that converts textual information into high-dimensional numerical vectors that machines can understand; it is also known as text embedding.

[0095] For example, if a function is written to validate the format of a user-input email address for a target subtask, the system will input it into an embedding model, resulting in a floating-point vector of, for example, 768 or 1536 dimensions. This vector represents the core intent of the task in the semantic space. We call this vector the current task vector.

[0096] Step 502: Calculate the semantic relevance between the current task vector and the vectors of each historical memory segment stored in the shared memory pool.

[0097] As in the aforementioned embodiment, successfully solidified intermediate results are stored in a shared memory pool. This memory pool may contain a large amount of historical information generated by upstream tasks, such as requirement documents, architecture designs, and completed code modules. In order to perform relevance calculations, this historical information stored in the shared memory pool (i.e., historical memory fragments) also needs to be vectorized in the same way during storage to obtain their respective memory fragment vectors.

[0098] In this step, the system retrieves the current task vector and calculates its semantic relevance against each of the memory segment vectors stored in the shared memory pool. Semantic relevance is a measure of how close two vectors are in the semantic space.

[0099] Cosine similarity ranges from -1 to 1. The closer the value is to 1, the closer the two vectors are in direction, meaning the texts they represent are semantically more related. For example, the cosine similarity between a current task vector related to user authentication and a historical memory fragment vector related to user database table design will be significantly higher than its similarity with a historical memory fragment vector related to a product recommendation algorithm. Of course, besides cosine similarity, other methods for measuring vector similarity, such as dot product or the reciprocal of Euclidean distance, can also be used.

[0100] Step 503: Determine historical memory fragments from the shared memory pool that have a semantic relevance higher than a preset relevance threshold to combine them to obtain contextual information.

[0101] After calculating all relevance scores, the system uses a preset relevance threshold (e.g., threshold = 0.8) for filtering. The system iterates through all historical memory fragments, retaining only those whose relevance scores are higher than the threshold.

[0102] This process acts like an information filter or memory pruner. It intelligently extracts the key pieces of information most relevant to the task at hand from a vast ocean of historical data. All historical memory fragments with a relevance below a threshold are temporarily ignored, effectively filtering out noise.

[0103] For example, when an agent is preparing to write a login API, this mechanism might filter out highly relevant historical memory fragments such as user table structure design documents and password encryption standards, while filtering out irrelevant information such as monthly product reports and homepage UI design drafts. These filtered key information fragments are then combined to form the final context information.

[0104] Step 504: Provide context information to the executing agent.

[0105] In this step, the pruned and highly relevant contextual information obtained in the previous step, along with the description of the target subtask itself, is encapsulated into a final prompt and sent to the executing agent.

[0106] In this way, what is provided to the executing agent is no longer a long, unfiltered history, but a clean, focused, and highly information-dense context.

[0107] In this embodiment, a relevance-based memory pruning mechanism can significantly optimize the input quality when the agent performs tasks. It achieves intelligent filtering of contextual information by vectorizing tasks and historical memories and calculating semantic relevance. This not only effectively solves the Lost the Middle phenomenon that easily occurs in large language models when processing extremely long contexts, ensuring that the agent can focus on key instructions and information, thereby greatly improving the accuracy and quality of its output; but also, by reducing the number of tokens provided to the model, it directly reduces API call costs and computational latency, making the entire system more efficient and scalable when handling long-chain, complex collaborative tasks.

[0108] In some embodiments, such as Figure 6 As shown, the adaptive multi-agent cooperative task planning method based on confidence feedback adjustment also includes: Step 601: Based on the fact that the intermediate results corresponding to all subtasks in the task dependency graph have been solidified; or based on the fact that the iteration count of any subtask has reached the preset maximum iteration count, control the summarizing agent to extract all solidified intermediate results from the shared memory pool.

[0109] This embodiment provides two main termination conditions, which are OR-related. Meeting either one will trigger the subsequent aggregation step.

[0110] The first termination condition is: the intermediate results corresponding to all subtasks in the task dependency graph have been solidified.

[0111] This represents the ideal scenario for successful task completion. As in the aforementioned embodiment, solidification refers to the intermediate result of a subtask achieving a confidence score higher than a high threshold, being deemed successful, and being stored in the shared memory pool. When the system detects that all nodes in the task dependency graph (i.e., all atomic subtasks) have entered the solidified state, it indicates that all steps of the initial planning have been completed with high quality. The system then determines that the entire macro-task has been successfully completed.

[0112] The second termination condition is: the number of iterations based on any subtask reaches the preset maximum number of iterations.

[0113] Specifically, a maximum number of iterations (e.g., 10) is set for each subtask node. This number of iterations includes the number of self-corrections triggered by the prompt word optimization mechanism and the number of re-executions after reconstruction triggered by the topology reconstruction mechanism. If a subtask, after undergoing multiple self-corrections and expert reconstructions, still fails to meet the passing standard (i.e., exceeds the high threshold), and the total number of attempts reaches this upper limit, the system will determine that the subtask challenge has failed. At this point, even if other subtasks have been completed, the system will forcibly terminate the entire process to avoid further ineffective consumption.

[0114] Step 602: Integrate all the solidified intermediate results to generate and output the final task result.

[0115] After extracting all valid intermediate results, the core task of the intelligent agent is to integrate this fragmented information. Integration is a broad concept that can include various specific operations. Its purpose is to combine the scattered outputs generated by different agents at different stages into a unified, coherent, and user-friendly deliverable.

[0116] Specific integration methods may include: Content aggregation: If the outputs of each subtask are different chapters of a document, the summarizing agent can piece them together into a complete report in logical order.

[0117] Code merging: If the output consists of multiple independent source code files or modules, the summarizing agent can organize them into a complete project structure and generate a README.md file that explains how to build and run it.

[0118] Summary and Refinement: The summarizing agent can read all the intermediate results and write a highly summarized summary or execution summary based on them.

[0119] Formatting and polishing: Make consistent adjustments to the format of all content and correct the language style so that it looks like a whole, rather than a simple patchwork of multiple parts.

[0120] After integration, the intelligent agents ultimately generate and output the final task results. This result is the culmination of the entire multi-agent collaborative work and can be directly delivered to the user.

[0121] In this embodiment, a clear endpoint and delivery mechanism are provided for the entire adaptive collaborative framework. It ensures the finiteness of the process by setting explicit success and failure termination conditions. Furthermore, by introducing a dedicated summarizing agent, it intelligently aggregates the high-quality intermediate results generated during the process and validated at each level into a complete and unified final deliverable, thus completing a full closed loop from receiving ambiguous user instructions to delivering final value.

[0122] In some embodiments, such as Figure 7 As shown, the adaptive multi-agent cooperative task planning method based on confidence feedback adjustment includes: Step 701: Receive user instructions.

[0123] Step 702: Task decomposition and topology construction.

[0124] Step 703: Dynamic role matching.

[0125] Step 704: The agent executes a subtask.

[0126] Step 705: Critics' rating.

[0127] Step 706: Determine the confidence score. If the confidence score is lower than the low threshold, proceed to step 707. If the confidence score is higher than the high threshold, proceed to step 708. If the confidence score is higher than the low threshold and lower than the high threshold, proceed to step 709. Step 707, Topology Reconstruction Mechanism.

[0128] Step 708: Consolidate intermediate results and advance key milestones.

[0129] Step 709: Prompt word optimization mechanism.

[0130] Step 710: Determine whether all tasks are completed. If the result is yes, proceed to step 711. If the result is no, proceed to step 704.

[0131] Step 711: Output the aggregated results.

[0132] Specifically, such as Figure 8 As shown, the critic agent (i.e. the critic evaluation module) receives intermediate results from the execution agent, whereby the critic agent performs an evaluation to generate a quantified confidence score and generate modification suggestions.

[0133] When the confidence score is below a low threshold, a topology reconstruction mechanism is triggered to select roles from the expert role library and instantiate the roles to reconstruct the task dependency graph. When the confidence score is higher than the high threshold, result solidification is triggered to solidify intermediate results and activate subsequent nodes; When the confidence score is below the high threshold but above the low threshold, the prompt word optimization mechanism is triggered, and modification suggestions are combined to generate introspection prompt words.

[0134] In this embodiment, to address the limitations of the pass / retry binary mechanism in traditional multi-agent systems, a three-level control strategy based on confidence intervals is proposed. This strategy can accurately distinguish between two different types of errors: execution failure and capability deficiency, and match them with two different levels of repair methods: self-reflection (low cost) and external intervention (high cost), achieving an optimal balance between system robustness and execution efficiency.

[0135] The topology remodeling mechanism breaks through the limitations of predefined computation graphs in traditional frameworks. Through the topology remodeling engine, the system can automatically trigger dynamic instantiation of expert roles and local redrawing of the DAG when task circuit breaking is detected at runtime. This on-demand evolution feature significantly improves the system's resolution rate when handling unknown and complex tasks.

[0136] To address the issue of excessively long contexts commonly encountered in long-chain collaboration among multi-agent systems, a memory retrieval mechanism based on semantic relevance is proposed. By activating only memory segments with a similarity exceeding a threshold to the current subtask vector, the lost-in-the-middle phenomenon is effectively resolved, significantly improving the agent's ability to follow key instructions.

[0137] The adaptive multi-agent cooperative task planning device based on confidence feedback adjustment provided by the present invention will be described below. The adaptive multi-agent cooperative task planning device based on confidence feedback adjustment described below can be referred to in correspondence with the adaptive multi-agent cooperative task planning method based on confidence feedback adjustment described above.

[0138] In some embodiments, an adaptive multi-agent cooperative task planning device based on confidence feedback adjustment is proposed, such as... Figure 9 As shown, it includes: The parsing module 901 is used to perform semantic parsing on the received natural language instructions to obtain a task dependency graph and an initial set of intelligent agents; The generation module 902 is used to generate an intermediate result by having at least one executing agent from the initial set of agents execute a target subtask according to the task dependency graph. Evaluation module 903 is used by a critic agent to evaluate intermediate results to generate a quantified confidence score; Based on the comparison of the confidence score with at least one preset low threshold and high threshold, perform one of the following feedback control steps: If the confidence score is below a low threshold, a topology reconstruction mechanism is triggered to re-execute the target subtask. Based on the confidence score being higher than the low threshold and lower than the high threshold, the prompt word optimization mechanism is triggered to drive the execution agent to correct the intermediate results; Intermediate results are solidified based on confidence scores that are higher than a high threshold.

[0139] Figure 10 An example is a schematic diagram of the physical structure of an electronic device, such as... Figure 10 As shown, the electronic device may include: a processor 1010, a communication interface 1020, a memory 1030, and a communication bus 1040, wherein the processor 1010, the communication interface 1020, and the memory 1030 communicate with each other through the communication bus 1040. The processor 1010 can call logical instructions in the memory 1030 to execute an adaptive multi-agent cooperative task planning method based on confidence feedback adjustment. This method includes: semantically parsing received natural language instructions to obtain a task dependency graph and an initial set of agents; having at least one agent in the initial set of agents execute a target sub-task according to the task dependency graph to generate an intermediate result; having a critic agent evaluate the intermediate result to generate a quantified confidence score; and, based on a comparison of the confidence score with at least one preset low threshold and high threshold, executing one of the following feedback control steps: triggering a topology reconstruction mechanism to re-execute the target sub-task if the confidence score is below the low threshold; triggering a prompt word optimization mechanism to drive the agent to correct the intermediate result if the confidence score is above the low threshold and below the high threshold; and solidifying the intermediate result if the confidence score is above the high threshold.

[0140] Furthermore, the logical instructions in the aforementioned memory 1030 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0141] On the other hand, the present invention also provides a computer program product, which includes a computer program that can be stored on a non-transitory computer-readable storage medium. When the computer program is executed by a processor, the computer can execute the adaptive multi-agent cooperative task planning method based on confidence feedback adjustment provided by the above methods. The method includes: performing semantic parsing on received natural language instructions to obtain a task dependency graph and an initial set of agents; having at least one executing agent in the initial set of agents execute a target sub-task according to the task dependency graph to generate an intermediate result; having a critic agent evaluate the intermediate result to generate a quantified confidence score; and, based on a comparison of the confidence score with at least one preset low threshold and high threshold, executing one of the following feedback control steps: triggering a topology reconstruction mechanism to re-execute the target sub-task based on a confidence score lower than the low threshold; triggering a prompt word optimization mechanism to drive the executing agent to correct the intermediate result based on a confidence score higher than the low threshold and lower than the high threshold; and solidifying the intermediate result based on a confidence score higher than the high threshold.

[0142] In another aspect, the present invention also provides a non-transitory computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements the adaptive multi-agent cooperative task planning method based on confidence feedback adjustment provided by the methods described above. This method includes: semantically parsing received natural language instructions to obtain a task dependency graph and an initial set of agents; having at least one agent in the initial set of agents execute a target sub-task according to the task dependency graph to generate an intermediate result; having a critic agent evaluate the intermediate result to generate a quantified confidence score; and, based on a comparison of the confidence score with at least one preset low threshold and high threshold, executing one of the following feedback control steps: triggering a topology reconstruction mechanism to re-execute the target sub-task based on a confidence score lower than the low threshold; triggering a prompt word optimization mechanism to drive the agent to correct the intermediate result based on a confidence score higher than the low threshold and lower than the high threshold; and solidifying the intermediate result based on a confidence score higher than the high threshold.

[0143] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate, and the components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.

[0144] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., including several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods of various embodiments or some parts of embodiments.

[0145] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. An adaptive multi-agent cooperative task planning method based on confidence feedback adjustment, characterized in that, include: The received natural language instructions are semantically parsed to obtain a task dependency graph and an initial set of agents; At least one executing agent from the initial set of agents executes a target subtask according to the task dependency graph to generate an intermediate result; The intermediate results are evaluated by a critic agent to generate a quantified confidence score. Based on the comparison result of the confidence score with at least one preset low threshold and high threshold, perform one of the following feedback control steps: If the confidence score is lower than the low threshold, a topology reconstruction mechanism is triggered to re-execute the target subtask. Based on the confidence score being higher than the low threshold and lower than the high threshold, a prompt word optimization mechanism is triggered to drive the execution agent to correct the intermediate result; The intermediate result is solidified based on the confidence score being higher than the high threshold.

2. The adaptive multi-agent cooperative task planning method based on confidence feedback adjustment according to claim 1, characterized in that, The topology reconfiguration mechanism includes the following steps: Freeze the target subtask; Based on the reasons for the failure of the target sub-task, the target expert role is determined from the expert role library; Reconstruct the task dependency graph; The target subtask is restarted based on the target expert role and the reconstructed task dependency graph.

3. The adaptive multi-agent cooperative task planning method based on confidence feedback adjustment according to claim 2, characterized in that, The reconstruction task dependency graph includes: Modify the execution mode of the part corresponding to the target subtask in the task dependency graph to a multi-node parallel mode or a collaborative mode.

4. The adaptive multi-agent cooperative task planning method based on confidence feedback adjustment according to claim 1, characterized in that, The prompt word optimization mechanism includes the following steps: The prompt word optimization engine is invoked to determine introspection prompt words based on the modification suggestions output by the critic agent; The target subtask is re-executed based on the introspection prompt.

5. The adaptive multi-agent cooperative task planning method based on confidence feedback adjustment according to claim 1, characterized in that, The step of semantically parsing the received natural language instructions to obtain a task dependency graph and an initial set of agents includes: The received natural language instructions are semantically parsed to obtain an atomized subtask sequence represented in the form of a directed acyclic graph; Based on the atomized subtask sequence, a task dependency graph and an initial set of agents are determined.

6. The adaptive multi-agent cooperative task planning method based on confidence feedback adjustment according to claim 1, characterized in that, Before executing the target subtask according to the task dependency graph to generate an intermediate result, the method further includes: The target subtask to be executed is vectorized to obtain the current task vector; Calculate the semantic relevance between the current task vector and the vectors of each historical memory segment stored in the shared memory pool; Historical memory fragments with a semantic relevance higher than a preset relevance threshold to the current task vector are determined from the shared memory pool and combined to obtain context information; The context information is provided to the executing agent.

7. The adaptive multi-agent cooperative task planning method based on confidence feedback adjustment according to claim 1, characterized in that, The adaptive multi-agent cooperative task planning method based on confidence feedback adjustment further includes: Based on the fact that the intermediate results corresponding to all subtasks in the task dependency graph have been solidified; or based on the fact that the iteration count of any subtask has reached the preset maximum iteration count, the summarizing agent is controlled to extract all solidified intermediate results from the shared memory pool. All the solidified intermediate results are integrated to generate and output the final task result.

8. An adaptive multi-agent cooperative task planning device based on confidence feedback adjustment, characterized in that, include: The parsing module is used to perform semantic parsing on the received natural language instructions to obtain a task dependency graph and an initial set of agents; A generation module is used to have at least one execution agent from the initial set of agents execute a target subtask according to the task dependency graph to generate an intermediate result; An evaluation module is used to evaluate the intermediate results by a critic agent to generate a quantified confidence score. Based on the comparison result of the confidence score with at least one preset low threshold and high threshold, perform one of the following feedback control steps: If the confidence score is lower than the low threshold, a topology reconstruction mechanism is triggered to re-execute the target subtask. Based on the confidence score being higher than the low threshold and lower than the high threshold, a prompt word optimization mechanism is triggered to drive the execution agent to correct the intermediate result; The intermediate result is solidified based on the confidence score being higher than the high threshold.

9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and running on the processor, characterized in that, When the processor executes the computer program, it implements the adaptive multi-agent cooperative task planning method based on confidence feedback adjustment as described in any one of claims 1 to 8.

10. A non-transitory computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the adaptive multi-agent cooperative task planning method based on confidence feedback adjustment as described in any one of claims 1 to 8.