Multi-agent task planning method and system, and storage medium
By constructing a closed-loop processing flow in multi-agent task planning and combining real-time state perception and feedback optimization, the problems of lagging state perception and insufficient closed-loop optimization in existing technologies are solved, thereby improving the collaborative execution efficiency and reliability of multi-agent systems in dynamic environments.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- BEIJING BOTONG CHUANGXIN TECH CO LTD
- Filing Date
- 2026-05-11
- Publication Date
- 2026-06-19
AI Technical Summary
Existing multi-agent task planning methods suffer from lagging state perception and insufficient closed-loop optimization in dynamic environments, resulting in low collaborative efficiency, poor adaptability, and insufficient reliability. Furthermore, they do not fully consider the differences in agent capabilities and real-time states, which can easily lead to problems such as unbalanced load and waste of resources.
By acquiring global task instructions, agent state information, and task constraint information, an initial task planning scheme is generated. During execution, real-time state information is collected, and state feedback information is generated. The planning scheme is optimized based on the feedback information, forming a closed-loop processing flow of task planning, state acquisition, and feedback correction, ensuring that the planning scheme matches the agent state.
It improves the adaptability and reliability of multi-agent task planning in dynamic environments, enhances collaborative execution efficiency, reduces the impact of state perception lag, and optimizes resource utilization.
Smart Images

Figure CN122242872A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of intelligent agent cooperative control technology, and more specifically to a multi-agent task planning method, system, and storage medium. Background Technology
[0002] With the rapid development of artificial intelligence technology, multi-agent systems (MAS) have become the core paradigm for solving complex distributed tasks and have been widely used in fields such as industrial manufacturing, intelligent transportation, and emergency rescue. The core objective of multi-agent task planning is to decompose complex global tasks into executable sub-tasks, rationally allocate them to each agent, and plan the optimal execution path to ensure efficient and collaborative task completion.
[0003] Current multi-agent task planning methods are mainly divided into two categories: one is planning methods based on traditional algorithms (such as reinforcement learning, genetic algorithms, and particle swarm optimization). Although these methods can achieve local optimization, they suffer from problems such as rigid task decomposition, difficulty in handling fuzzy task instructions, and poor adaptability to dynamic environments. Especially in scenarios with multiple conflicting objectives and complex resource constraints, they are prone to rigid planning schemes and low execution efficiency. The other category is planning methods based on Large Language Models (LLMs). These methods leverage the powerful natural language understanding, logical reasoning, and task decomposition capabilities of LLMs to achieve intelligent decomposition of complex tasks. However, existing methods generally suffer from two major drawbacks: (1) Disconnect between state perception and planning: Existing LLM-driven planning methods rely heavily on preset environmental information and agent state, lacking the ability to dynamically perceive real-time environmental dynamics and agent execution state. When the environment changes abruptly (such as the appearance of obstacles or agent failure) or when there is an execution deviation, the planning scheme cannot be adjusted in time, which can easily lead to task failure or a significant decrease in efficiency. (2) Lack of closed-loop optimization mechanism: The planning process and the execution process are independent of each other. The initial planning scheme generated by LLM does not form a closed loop with the execution feedback. It cannot be dynamically optimized based on information such as deviations, resource consumption, and task progress during the execution process, resulting in insufficient practicality and adaptability of the planning scheme and difficulty in balancing multiple objectives such as task completion efficiency, resource consumption, and collaborative accuracy.
[0004] Furthermore, in existing technologies, task allocation and path planning for multi-agent systems are mostly designed separately. Task decomposition does not fully consider the differences in capabilities and real-time states of each agent, which can easily lead to problems such as unbalanced load and waste of resources. At the same time, the inference results of LLM are subject to the risk of "illusion", and the generated sub-task allocation scheme may not conform to the actual execution conditions, which further reduces the reliability of planning.
[0005] To address the shortcomings of the existing technologies, there is an urgent need for a multi-agent task planning method that can integrate real-time state perception, realize a closed loop of planning, execution, feedback and optimization, and fully leverage the advantages of LLM inference, in order to solve problems such as low efficiency of multi-agent collaboration, poor planning adaptability and insufficient reliability in dynamic environments. Summary of the Invention
[0006] The purpose of this invention is to provide a multi-agent task planning method, system, and storage medium to at least solve the problems of lagging state perception, insufficient closed-loop optimization, and poor dynamic adaptability in existing task planning.
[0007] To achieve the above objectives, a first aspect of the present invention provides a multi-agent task planning method, the method comprising: acquiring global task instructions, agent state information, and task constraint information; generating a set of subtasks based on the global task instructions and the task constraint information, and executing an initial task planning scheme corresponding to the set of subtasks based on the agent state information; collecting real-time state information during the execution of the initial task planning scheme, and generating state feedback information based on the real-time state information; optimizing the initial task planning scheme based on the state feedback information to generate a target task planning scheme; and generating task execution instructions corresponding to each agent based on the target task planning scheme.
[0008] Optionally, generating a subtask set based on the global task instruction and the task constraint information includes: performing semantic parsing on the global task instruction to extract the task objective, task object, and task execution requirements; splitting the task objective into multiple subtasks based on the task constraint information; determining the execution relationship between each subtask and extracting the execution capability requirements, resource requirements, and time nodes corresponding to each subtask; and generating a subtask set based on the execution relationship, the execution capability requirements, the resource requirements, and the time nodes.
[0009] Optionally, executing the initial task planning scheme corresponding to the set of subtasks based on the agent state information includes: extracting execution capability parameters, available resource parameters, and current position parameters corresponding to each agent based on the agent state information; matching the execution capability parameters with the execution capability requirements corresponding to each subtask to generate a capability matching result; matching the available resource parameters with the resource requirements corresponding to each subtask to generate a resource matching result; determining the path cost for each agent to execute the corresponding subtask based on the current position parameters and the task execution position corresponding to each subtask; determining the allocation relationship between each subtask and the corresponding agent based on the capability matching result, the resource matching result, and the path cost; and generating and executing the initial task planning scheme based on the allocation relationship.
[0010] Optionally, collecting real-time status information during the execution of the initial task planning scheme includes: collecting environmental status information within the task execution area, execution status information of each agent, and execution status information of each subtask during the execution of the initial task planning scheme; wherein, the environmental status information includes any one or more of obstacle positions, environmental interference information, and task area change information; the execution status information of the agents includes any one or more of agent positions, running speed, resource consumption, execution progress, and fault status; the execution status information of the subtasks includes any one or more of subtask completion degree, execution deviation, and remaining execution time; and associating the environmental status information, the execution status information of the agents, and the execution status information of the subtasks based on the collection time to generate real-time status information.
[0011] Optionally, generating state feedback information based on the real-time state information includes: performing data alignment processing on the environmental state information, the execution state information of each agent, and the execution state information of each subtask in the real-time state information to generate a state data group corresponding to the same acquisition time; determining the environmental change, agent state deviation, and subtask execution deviation based on the state data group; performing fusion calculation on the environmental change, agent state deviation, and subtask execution deviation to generate a real-time state vector; and comparing the real-time state vector with the baseline state vector corresponding to the initial task planning scheme to generate state feedback information characterizing the degree of task execution deviation.
[0012] Optionally, optimizing the initial task planning scheme based on the state feedback information to generate a target task planning scheme includes: determining the task allocation deviation, path execution deviation, and resource consumption deviation in the initial task planning scheme based on the state feedback information; constructing corresponding planning adjustment parameters based on the task allocation deviation, the path execution deviation, and the resource consumption deviation; adjusting the sub-task allocation relationship and agent execution path in the initial task planning scheme based on the planning adjustment parameters to generate candidate task planning schemes; performing task constraint verification on the candidate task planning schemes, and determining the candidate task planning scheme that satisfies the task constraint information as the target task planning scheme.
[0013] Optionally, based on the planning adjustment parameters, the subtask allocation relationship and agent execution path in the initial task planning scheme are adjusted to generate a candidate task planning scheme, including: determining the subtask to be adjusted, the agent to be adjusted, and the corresponding adjustment triggering reason based on the planning adjustment parameters; selecting candidate agents that meet the execution conditions from each agent based on the execution capability requirements, resource requirements, and time nodes corresponding to the subtask to be adjusted; determining the acceptance cost of each candidate agent to accept the subtask to be adjusted based on the current position, available resource parameters, and current task occupancy status of each candidate agent; determining the candidate agent whose acceptance cost meets the preset cost condition as the target adjustment agent, and allocating the subtask to be adjusted to the target adjustment agent; generating the adjusted execution path corresponding to the target adjustment agent based on the current position of the target adjustment agent, the task execution position of the subtask to be adjusted, and the environmental state information within the task execution area; and generating a candidate task planning scheme based on the adjusted subtask allocation relationship and the adjusted execution path.
[0014] Optionally, generating task execution instructions for each intelligent agent based on the target task planning scheme includes: parsing the target task planning scheme to determine the target sub-tasks, task execution order, and target execution path for each intelligent agent; generating task control parameters for each intelligent agent based on the target sub-tasks, the task execution order, and the target execution path; converting the task control parameters according to the instruction format of each intelligent agent to generate task execution instructions for each intelligent agent; and issuing the task execution instructions to the corresponding intelligent agents so that each intelligent agent executes the corresponding sub-tasks according to the target task planning scheme.
[0015] A second aspect of the present invention provides a multi-agent task planning system, the system comprising: a data acquisition unit for acquiring global task instructions, agent state information, and task constraint information; a scheme planning unit for generating a set of sub-tasks based on the global task instructions and the task constraint information, and executing an initial task planning scheme corresponding to the set of sub-tasks based on the agent state information; a feedback acquisition unit for acquiring real-time state information during the execution of the initial task planning scheme, and generating state feedback information based on the real-time state information; a scheme optimization unit for optimizing the initial task planning scheme based on the state feedback information to generate a target task planning scheme; and an instruction generation unit for generating task execution instructions corresponding to each agent based on the target task planning scheme.
[0016] On the other hand, the present invention provides a computer-readable storage medium storing instructions that, when executed on a computer, cause the computer to perform the above-described multi-agent task planning method.
[0017] Through the above technical solution, the present invention generates an initial task planning scheme by using global task instructions, agent state information, and task constraint information, so that the generated subtasks match the actual state of the agent; during execution, real-time state information is collected and state feedback information is generated, so that task execution deviations can be identified in a timely manner; and the initial task planning scheme is then optimized based on the state feedback information, forming a closed-loop processing flow from task planning, state collection, feedback correction to instruction issuance, thereby reducing the impact of state perception lag on task execution and improving the adaptability, reliability, and collaborative execution efficiency of multi-agent task planning in dynamic environments.
[0018] Other features and advantages of the embodiments of the present invention will be described in detail in the following detailed description section. Attached Figure Description
[0019] The accompanying drawings are provided to further illustrate embodiments of the present invention and form part of the specification. They are used together with the following detailed description to explain the embodiments of the present invention, but do not constitute a limitation thereof. In the drawings: Figure 1 This is a flowchart of the steps of a multi-agent task planning method provided in one embodiment of the present invention; Figure 2 This is a schematic diagram of a closed-loop process for multi-agent task planning provided by one embodiment of the present invention; Figure 3 This is an architecture diagram of a real-time state perception and information fusion module provided in one embodiment of the present invention; Figure 4 This is a schematic diagram illustrating the working principle of a closed-loop optimization module provided in one embodiment of the present invention; Figure 5 This is a system architecture diagram of a multi-agent task planning system provided in one embodiment of the present invention. Detailed Implementation
[0020] The specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are for illustration and explanation only and are not intended to limit the present invention.
[0021] Figure 1 This is a flowchart illustrating the steps of a multi-agent task planning method provided in one embodiment of the present invention. Figure 1 As shown, an embodiment of the present invention provides a multi-agent task planning method, the method comprising: Step S10: Obtain global task instructions, agent state information, and task constraint information.
[0022] Specifically, before executing multi-agent task planning, the multi-agent system, real-time state perception module, large language model module, closed-loop optimization module, and task planning knowledge base are initialized and configured to enable subsequent task input, semantic parsing, state acquisition, task allocation, and closed-loop optimization to be executed within a unified data framework. The task planning knowledge base can store historical task samples, agent capability parameters, task constraint templates, collaborative rules, and path planning rules, and can also be configured according to specific scenarios such as UAV inspection, warehouse handling, industrial collaboration, and emergency rescue.
[0023] After initialization, input the global task command. This global task command supports natural language format, such as multiple drones completing a park inspection within two hours or multiple mobile robots handling goods in a warehouse area. This global task command clarifies the task objective, constraints, and evaluation metrics. Constraints may include time constraints, resource constraints, and coordination constraints. Time constraints limit the task completion time or subtask execution window. Resource constraints limit power, payload, computing power, communication bandwidth, or other execution resources. Coordination constraints limit avoidance relationships, task connection relationships, coverage relationships, or execution order relationships among multiple agents. Evaluation metrics may include task completion rate, resource consumption, and execution time, and are used to subsequently evaluate the initial task planning scheme and the target task planning scheme.
[0024] In terms of parameter definition, the multi-agent set is as follows: ,in, For the number of agents, Indicates the first There are several intelligent agents. The global task is denoted as... The set of task constraints is denoted as... ,in, Due to time constraints, Due to resource constraints, For collaborative constraints. The set of evaluation indicators is denoted as . ,in, For task completion rate, For resource consumption, For execution time.
[0025] The above parameters are used as the basic inputs for subsequent task analysis and planning optimization. In specific implementation, Not limited to homogeneous intelligent agents, but may also include flying intelligent agents, ground mobile intelligent agents, robotic arm intelligent agents or other devices with task execution capabilities; It is not limited to a single task objective, but can also include multiple task objectives that have a sequential or collaborative relationship; and The specific content can be expanded according to the application scenario, but the expanded constraints and evaluation indicators are still used to constrain, evaluate and correct the subsequently generated task planning schemes.
[0026] Step S20: Generate a set of subtasks based on the global task instructions and the task constraint information, and execute the initial task planning scheme corresponding to the set of subtasks based on the agent state information.
[0027] Specifically, generating a set of subtasks based on the global task instructions and the task constraint information includes: performing semantic parsing on the global task instructions to extract the task objective, task object, and task execution requirements; splitting the task objective into multiple subtasks based on the task constraint information; determining the execution relationship between the subtasks and extracting the execution capability requirements, resource requirements, and time nodes corresponding to each subtask; and generating a set of subtasks based on the execution relationship, the execution capability requirements, the resource requirements, and the time nodes.
[0028] Furthermore, the initial task planning scheme corresponding to the set of subtasks is executed based on the agent state information, including: extracting the execution capability parameters, available resource parameters, and current position parameters corresponding to each agent based on the agent state information; matching the execution capability parameters with the execution capability requirements corresponding to each subtask to generate a capability matching result; matching the available resource parameters with the resource requirements corresponding to each subtask to generate a resource matching result; determining the path cost for each agent to execute the corresponding subtask based on the current position parameters and the task execution position corresponding to each subtask; determining the allocation relationship between each subtask and the corresponding agent based on the capability matching result, the resource matching result, and the path cost; and generating and executing the initial task planning scheme based on the allocation relationship.
[0029] In this embodiment of the invention, after obtaining the global task instructions and task constraint information, the global task instructions and task constraint information are used as input for task parsing, and the large language model performs semantic parsing on the global task instructions. The semantic parsing includes at least the extraction of the task objective, task object, and task execution requirements. For example, in a drone inspection scenario, the global task instruction can be described as completing the inspection of multiple equipment areas within a limited time. In this case, the task objective is to complete the inspection, the task object is the multiple equipment areas, and the task execution requirements may include inspection accuracy, obstacle avoidance requirements, completion time limit, and remaining battery power requirements. Through this processing, the task description in natural language can be converted into structured content that can subsequently participate in task decomposition and task allocation.
[0030] After semantic parsing, the task objective is decomposed based on the task constraint information, generating multiple subtasks. This decomposition process can be performed according to task objects, task regions, execution order, or collaborative relationships, or it can be a comprehensive decomposition based on time constraints, resource constraints, and collaborative constraints. Let the global task be... The global task is decomposed into A set of subtasks is obtained by considering each subtask as either independent or collaborative. ,in, Indicates the first Sub-tasks This indicates the number of subtasks. For each subtask... Further, its core requirements are extracted, including execution capability requirements, resource requirements, and time nodes. Execution capability requirements characterize the speed, load, sensing accuracy, processing capacity, job type, or other capability conditions required to complete the subtask. Resource requirements characterize the power, computing resources, communication resources, tool resources, or load resources required to complete the subtask. Time nodes define the start time, completion time, duration, or temporal relationship between the subtask and other subtasks. The execution relationships between subtasks can include independent execution, sequential execution, parallel execution, and cooperative execution. These execution relationships, execution capability requirements, resource requirements, and time nodes together constitute a structured description of the subtask set.
[0031] After generating the set of subtasks, the initial task planning scheme corresponding to the set of subtasks is executed based on the agent state information. The agent state information is used to extract at least the execution capability parameters, available resource parameters, and current position parameters for each agent. The aforementioned multi-agent set is continued. , No. A smart agent The initial capability parameters can be denoted as:
[0032] in, , This is a capability dimension, which can include movement speed, load capacity, perception accuracy, etc. For sub-tasks... Its demand vector is denoted as ,in Subtasks Resource requirements; A smart agent The available resources are denoted as In the specific planning process, the execution capability parameters are matched with the execution capability requirements corresponding to each sub-task to generate capability matching results; the available resource parameters are matched with the resource requirements corresponding to each sub-task to generate resource matching results; and then, based on the current position parameters and the task execution positions corresponding to each sub-task, the path cost for each agent to execute the corresponding sub-task is determined. The path cost can be jointly determined by the distance from the current position to the task execution position, the estimated travel time, the obstacle detour cost, and the communication coverage conditions, and the specific weights can be set according to the application scenario.
[0033] After capability matching, resource matching, and path cost determination, an initial subtask allocation scheme is generated. This initial subtask allocation scheme can be expressed as:
[0034] in, Subtasks Assigned to intelligent agents Execution. This expression is used to define the correspondence between subtasks and agents. It does not require each agent to execute only one subtask, nor does it exclude the possibility of a subtask being executed collaboratively by multiple agents. When collaborative execution is required, it can be used... Based on this, it is expanded to include the correspondence between subtasks and multiple intelligent agents.
[0035] To mitigate the risk of discrepancies between the assignment relationships generated by the large language model and actual execution conditions, the initial subtask assignment scheme undergoes a rationality check. The rationality check can be calculated using the following formula:
[0036] in, For intelligent agents Capabilities and Subtasks The similarity between requirements can be calculated using cosine similarity, with a value range of [value range missing]. ; For subtasks The demand vector; For subtasks Resource requirements; For intelligent agents The amount of available resources; The deviation rate between the available resources of the agent and the resource requirements of the subtask can be expressed as:
[0037] in, This is the deviation penalty coefficient, and its value can be dynamically adjusted according to the importance of the task. A score used to characterize the reasonableness of the initial allocation scheme. When When the preset reasonableness conditions are met, the initial subtask allocation scheme is retained; when... If the preset rationality conditions are not met, the allocation relationship between subtasks and agents is re-determined based on the capability matching results, resource matching results, and path costs.
[0038] Therefore, the initial task planning scheme is not simply derived directly from task semantics, but rather incorporates global task instructions, task constraint information, core requirements of subtasks, and agent state information into the planning process. This scheme is applicable not only to space mobility tasks such as UAV inspections, but also to multi-agent collaborative tasks such as warehousing and handling, industrial collaboration, and emergency search and rescue. For different scenarios, the capability dimensions... Resource requirements The composition of path costs and the rationality conditions can be adjusted, but the adjusted data still needs to be used to generate and execute the initial task planning scheme corresponding to the sub-task set.
[0039] Step S30: Collect real-time status information during the execution of the initial task planning scheme, and generate status feedback information based on the real-time status information.
[0040] Specifically, collecting real-time status information during the execution of the initial task planning scheme includes: collecting environmental status information within the task execution area, execution status information of each agent, and execution status information of each subtask during the execution of the initial task planning scheme; wherein, the environmental status information includes any one or more of obstacle positions, environmental interference information, and task area change information; the execution status information of the agents includes any one or more of agent positions, running speed, resource consumption, execution progress, and fault status; the execution status information of the subtasks includes any one or more of subtask completion degree, execution deviation, and remaining execution time; and associating the environmental status information, the execution status information of the agents, and the execution status information of the subtasks based on the collection time to generate real-time status information.
[0041] Furthermore, generating state feedback information based on the real-time state information includes: performing data alignment processing on the environmental state information, the execution state information of each agent, and the execution state information of each subtask in the real-time state information to generate a state data group corresponding to the same acquisition time; determining the environmental change, agent state deviation, and subtask execution deviation based on the state data group; performing fusion calculation on the environmental change, agent state deviation, and subtask execution deviation to generate a real-time state vector; and comparing the real-time state vector with the baseline state vector corresponding to the initial task planning scheme to generate state feedback information characterizing the degree of task execution deviation.
[0042] In this embodiment of the invention, after the initial task planning scheme begins execution, multi-dimensional state information within the task execution area is collected at a set collection frequency. The real-time state information includes at least environmental state information, execution state information of each agent, and execution state information of each subtask. Environmental state information characterizes external changes in the task execution area relative to the initial planning conditions and may include any one or more of obstacle locations, environmental interference information, and task area change information.
[0043] For example, in a drone inspection scenario, obstacle locations can be the spatial coordinates of temporary construction equipment, environmental interference information can be wind speed changes or communication interference, and task area change information can be adjustments to the inspection area boundaries or temporary no-fly zones. The execution status information of each agent characterizes its actual operation during execution and can include any one or more of the following: agent location, operating speed, resource consumption, execution progress, and fault status. The execution status information of each subtask characterizes its completion status relative to the planned schedule and can include any one or more of the following: subtask completion rate, execution deviation, and remaining execution time.
[0044] To enable state data from different sources to participate in the same round of feedback calculation, environmental state information, the execution state information of each agent, and the execution state information of each subtask are correlated based on the acquisition time. Specifically, data acquired at the same acquisition time or within the same acquisition time window can be grouped into state data groups; for data with different acquisition frequencies, data alignment can be achieved by matching the most recent timestamp, linear interpolation, or preserving the previous valid value. Let the timestamp be... Environmental status information is recorded as ,in, This can include obstacle locations, environmental disturbances, and terrain changes. Each intelligent agent at a timestamp The corresponding execution status is denoted as ,in, This can include location, speed, resource consumption, execution progress, and fault status. Sub-tasks at timestamps The corresponding execution status is denoted as ,in, This can include completion rate, deviation, and remaining time.
[0045] After forming state data groups corresponding to the same acquisition time, these state data groups are denoised, complemented, and fused to generate a real-time state vector. The fusion calculation can employ a weighted fusion method, incorporating environmental state information, the execution state information of each agent, and the execution state information of each subtask into the same state representation. The fused real-time state vector is denoted as... The calculation relationship can be expressed as:
[0046] in, , , Let be the weighting coefficients, and satisfy:
[0047] , , It can be dynamically adjusted according to the task type. For example, in emergency rescue scenarios, environmental changes have a significant impact on task safety and accessibility, and adjustments can be made to improve [the system's capabilities]. The value of ; in warehousing and handling scenarios, congestion and resource consumption of intelligent agents have a more direct impact on task execution, which can improve The value of ; in multi-area inspection scenarios, the completion rate of sub-tasks and the remaining time have a significant impact on subsequent scheduling, which can improve The values of are determined by the weights. The above weight adjustments do not change the basic structure of the fusion calculation, but only change the proportion of various types of state information in the real-time state vector.
[0048] After generating the real-time state vector, it is compared with the baseline state vector corresponding to the initial task planning scheme to obtain state feedback information. The baseline state vector can be composed of the planning environment state, the planning agent state, and the planning subtask state from the initial task planning scheme, representing the expected execution state at the current timestamp. Through comparison... Compared with the baseline state vector, environmental changes, agent state deviations, and subtask execution deviations can be determined, and further, state feedback information is formed to characterize the degree of deviation in task execution. This state feedback information can include the deviation type, deviation magnitude, the object of deviation, and the time of deviation, and serves as input for subsequent optimization of the initial task planning scheme.
[0049] To handle sudden changes, anomaly thresholds can also be set. A state change is determined to have occurred when the changes in the real-time state vectors corresponding to two adjacent timestamps satisfy the following formula:
[0050] in, This is the threshold for abnormal status, which can be set according to the task scenario and security requirements. (Time stamp) This is used to achieve real-time updates of state information. The update frequency can be configured according to scenario requirements; for example, 100ms / time can be used in industrial scenarios, and 50ms / time can be used in drone scenarios. When a state change is detected, the change result is incorporated into the state feedback information, enabling subsequent planning and optimization to adjust for situations such as new obstacles, agent failures, abnormal resource consumption, or severe delays in subtask progress. The types of real-time state information, acquisition devices, and update frequencies mentioned above can be expanded according to specific application scenarios, but all acquired data generates state feedback information through acquisition time correlation, state fusion, and deviation comparison.
[0051] Step S40: Optimize the initial task planning scheme based on the status feedback information to generate a target task planning scheme.
[0052] Specifically, based on the state feedback information, the task allocation deviation, path execution deviation, and resource consumption deviation in the initial task planning scheme are determined; based on the task allocation deviation, path execution deviation, and resource consumption deviation, corresponding planning adjustment parameters are constructed; based on the planning adjustment parameters, the sub-task allocation relationship and agent execution path in the initial task planning scheme are adjusted to generate candidate task planning schemes; the candidate task planning schemes are subject to task constraint verification, and the candidate task planning schemes that satisfy the task constraint information are determined as the target task planning scheme.
[0053] Furthermore, based on the planning adjustment parameters, the subtask allocation relationship and agent execution path in the initial task planning scheme are adjusted to generate a candidate task planning scheme, including: determining the subtask to be adjusted, the agent to be adjusted, and the corresponding adjustment triggering reason based on the planning adjustment parameters; selecting candidate agents that meet the execution conditions from among the agents based on the execution capability requirements, resource requirements, and time nodes corresponding to the subtask to be adjusted; determining the acceptance cost of each candidate agent to accept the subtask to be adjusted based on the current position, available resource parameters, and current task occupancy status of each candidate agent; determining the candidate agent whose acceptance cost meets the preset cost condition as the target adjustment agent, and allocating the subtask to be adjusted to the target adjustment agent; generating the adjusted execution path corresponding to the target adjustment agent based on the current position of the target adjustment agent, the task execution position of the subtask to be adjusted, and the environmental state information within the task execution area; and generating a candidate task planning scheme based on the adjusted subtask allocation relationship and the adjusted execution path.
[0054] In this embodiment of the invention, after obtaining the status feedback information, dynamic optimization processing is performed on the initial task planning scheme. The status feedback information is used to characterize the degree of deviation between the task execution process and the initial planning state. Therefore, in this step, the task allocation deviation, path execution deviation, and resource consumption deviation in the initial task planning scheme are determined based on the status feedback information. The task allocation deviation is used to characterize the degree of mismatch between the current sub-task allocation relationship and the real-time execution capability. For example, an agent may be unable to continue executing the allocated sub-tasks due to a fault, increased load, or insufficient resources. The path execution deviation is used to characterize the deviation between the actual execution path of the agent and the planned path. For example, there may be obstacle detours, congestion area avoidance, or changes in restricted areas. The resource consumption deviation is used to characterize the difference between the resource consumption generated by the agent during actual execution and the estimated resource consumption. For example, the power consumption rate is higher than the estimated value or the communication resource usage is abnormal.
[0055] After identifying various deviations, planning adjustment parameters are constructed based on task allocation deviations, path execution deviations, and resource consumption deviations. These parameters quantify the degree and direction of adjustment required for the current planning scheme and serve as input for generating subsequent candidate task planning schemes. Planning adjustment parameters may include task migration priority, path replanning priority, resource scheduling priority, and execution time limit correction. The composition of planning adjustment parameters can be expanded for different application scenarios, but the expanded parameters are still used to correct the sub-task allocation relationships and agent execution paths in the initial task planning scheme.
[0056] In this embodiment, the state vector is fused in real time. As input for optimization, and combined with a set of evaluation metrics:
[0057] Construct a multi-objective optimization function. This function comprehensively evaluates the performance of the current task planning scheme in terms of task completion rate, resource consumption, and execution time. Let the objective optimization function be:
[0058] in, , , Let be the target weight coefficient, and satisfy:
[0059] The aforementioned target weighting coefficients can be dynamically adjusted based on task priority. For example, in emergency rescue scenarios, task completion rate has a higher priority and can be increased. The value of ; in warehouse scheduling scenarios, resource consumption and scheduling efficiency have a significant impact on long-term operation, and can be improved and The percentage.
[0060] The task completion rate is recorded as follows:
[0061] in, Subtasks In timestamp The corresponding completion rate has a range of values. .
[0062] Resource consumption is recorded as follows:
[0063] in, Represents intelligent agents In timestamp The corresponding resource consumption.
[0064] Execution time is marked as:
[0065] in, Indicates the current execution time. Indicates time constraints.
[0066] After obtaining the multi-objective optimization function, the planning adjustment parameters are solved to generate new task allocation and path planning relationships. In this embodiment, an improved particle swarm optimization algorithm is used for the solution. The optimized task allocation adjustment scheme is denoted as... The adjusted path planning scheme is denoted as During particle update, real-time state changes are incorporated into particle velocity updates, allowing current state deviations to directly participate in optimization direction adjustments. The particle velocity update relationship can be expressed as:
[0067] The particle position update relationship can be represented as:
[0068] in, and They represent the first The particle velocity and particle position corresponding to the next iteration; As an inertial weight, it can be expressed in a linearly decreasing manner:
[0069] in, Indicates the maximum number of iterations; and For learning factors; and It is a random number; and These represent the individual optimal position and the global optimal position of the particle, respectively. It is a state feedback factor, which is used to incorporate real-time state changes into the particle update process.
[0070] After generating the planning adjustment parameters, the subtask allocation relationships and agent execution paths in the initial task planning scheme are adjusted. Specifically, based on the planning adjustment parameters, the subtasks to be adjusted, the agents to be adjusted, and the corresponding adjustment triggering reasons are determined. Adjustment triggering reasons may include insufficient resources, task timeout risk, path blocking, failure exit, or coordination conflict. For the subtasks to be adjusted, candidate agents that meet the execution conditions are selected from multiple agents based on their corresponding execution capacity requirements, resource requirements, and time nodes.
[0071] For each candidate agent, the acceptance cost for undertaking the corresponding subtask to be adjusted is determined based on its current location, available resource parameters, and current task occupancy status. The acceptance cost can be composed of the travel distance, estimated execution time, remaining resource consumption, current task load, and path detour cost. Then, candidate agents whose acceptance costs meet the preset cost conditions are identified as target adjustment agents, and the subtasks to be adjusted are reassigned to the corresponding target adjustment agents.
[0072] After task reallocation, the adjusted execution path is generated based on the target agent's current position, the execution position of the subtasks to be adjusted, and the environmental state information within the task execution area. Environmental state information can be used during path generation to avoid new obstacles, congested areas, dangerous areas, or communication blind spots. For different scenarios, path planning methods can be implemented using grid search, graph search, sampling search, or trajectory interpolation, but all are used to generate the adjusted execution path corresponding to the target agent.
[0073] Finally, candidate task planning schemes are generated based on the adjusted subtask allocation relationships and adjusted execution paths, and task constraint verification is performed on these schemes. Task constraint verification includes at least time constraint verification, resource constraint verification, and coordination constraint verification. When a candidate task planning scheme meets the aforementioned task constraint information, it is determined as the target task planning scheme; for candidate task planning schemes that do not meet the task constraint information, the planning adjustment parameters are updated and the path is replanned.
[0074] After generating the target task planning scheme, the optimized scheme will be... and The data is converted into task execution instructions executable by the corresponding intelligent agent, and subsequent task planning reference data is updated synchronously. This optimization process is applicable to dynamic task scenarios such as UAV inspection, multi-robot collaboration, warehouse scheduling, industrial collaboration, and emergency rescue. For different application environments, the evaluation metrics, particle update parameters, burden composition, and task constraints in the multi-objective optimization function can be extended and configured, but the extended parameters are still used to generate the target task planning scheme.
[0075] Step S50: Generate task execution instructions for each intelligent agent based on the target task planning scheme.
[0076] Specifically, the target task planning scheme is analyzed to determine the target sub-tasks, task execution order, and target execution path for each intelligent agent; based on the target sub-tasks, task execution order, and target execution path, task control parameters corresponding to each intelligent agent are generated; based on the instruction format of each intelligent agent, the task control parameters are format-converted to generate task execution instructions corresponding to each intelligent agent; the task execution instructions are sent to the corresponding intelligent agents so that each intelligent agent executes the corresponding sub-tasks according to the target task planning scheme.
[0077] In this embodiment of the invention, after obtaining the target task planning scheme, the target task planning scheme is parsed to determine the target subtasks, task execution order, and target execution path corresponding to each agent. The target task planning scheme may include optimized task allocation relationships and path planning relationships, for example, based on timestamps. The resulting task allocation adjustment scheme and path planning adjustment scheme can be expressed as follows: and ,in, Used to characterize the allocation relationship between each subtask and its corresponding agent. Used to characterize the path planning results when each agent executes its corresponding sub-task.
[0078] By analyzing the above content, we can clarify the target sub-tasks that each agent needs to execute, the sequential relationship between the target sub-tasks, and the target execution path from the current position to the task execution position.
[0079] After determining the target sub-tasks, task execution order, and target execution path, corresponding task control parameters are generated for each agent. These parameters may include task identifier, target location, path nodes, execution time window, speed control parameters, resource usage limits, obstacle avoidance constraints, and task completion feedback requirements. For UAV scenarios, task control parameters may include waypoint coordinates, flight altitude, flight speed, and obstacle avoidance radius; for warehouse robot scenarios, they may include cargo location number, travel path, loading status, and arrival time limit. These parameters are not limited to specific agent types, as long as they enable the corresponding agent to execute the corresponding sub-task according to the target task planning scheme.
[0080] Since different agents may use different communication protocols or control interfaces, the task control parameters need to be format-converted based on the instruction format of each agent to generate corresponding task execution instructions for each agent. Format conversion may include field mapping, unit conversion, coordinate transformation, path node encoding, and control command encapsulation. After format conversion, the task execution instructions are sent to the corresponding agents, enabling each agent to execute the corresponding subtask according to the target task planning scheme. During execution, the execution results of each agent can still serve as a source of subsequent real-time status information, continuing to participate in status feedback and task planning correction.
[0081] Preferably, taking a park drone inspection scenario as an example, the task execution monitoring and closed-loop iteration process is explained. After generating the target task planning scheme, the optimized task execution instructions are issued to the corresponding drones, and each drone executes the inspection task according to the corresponding target execution path. During task execution, environmental status information, agent execution status information, and sub-task execution status information are continuously collected and correlated according to timestamps to form real-time status information. Let the first... A drone at timestamp The corresponding execution status is Corresponding subtask In timestamp The corresponding execution status is During execution, when a new obstacle is detected entering the originally planned flight area, the environmental state information changes, causing the real-time state vector to... There is a significant shift in the state vector compared to the previous timestamp, and the following conditions are met:
[0082] At this point, a state change is detected, and environmental state information, agent execution state information, and subtask execution state information within the task execution area are re-collected. State feedback information is regenerated based on the updated real-time state information, and the current task allocation relationship and agent execution path are adjusted based on this state feedback information. For example, the original... The executed subtasks are reassigned to locations closer to the target area and with higher remaining resources. Simultaneously, the corresponding adjusted execution path is regenerated. After the adjustment is completed, the new task execution instructions are reissued to the corresponding UAV to continue executing the task.
[0083] Throughout the execution process, the processes of status acquisition, status feedback, planning adjustment, and task execution are continuously repeated, forming a closed-loop iterative optimization. This optimization continues until the completion rate of all subtasks meets the specified requirements. Or the execution time meets the requirements. When the task is completed, the task planning process is terminated, and a task execution report is output.
[0084] In other implementations, when some agents enter a short-term weak communication connection state due to obstruction, strong interference, or cross-regional operation, they are not immediately judged as having failed and exited. Instead, their connection is determined based on the most recent timestamps of that agent. Having received the task execution instructions and the target execution path, a short-term state extrapolation result is constructed. If the extrapolated expected location, expected resource consumption, and expected subtask completion still meet the corresponding task constraints, the agent's current subtask is retained, and a communication silence protection zone is set in the task planning of other agents to avoid duplicate allocation of the same subtask or path conflicts.
[0085] When the communication silence duration exceeds the preset silence time, or the extrapolated resource consumption and path deviation do not meet the task constraints, the corresponding subtask of the agent is marked as a subtask to be taken over. Based on the remaining execution area, completion percentage, and target time node of the subtask to be taken over, candidate takeover agents are selected from other agents. Subsequently, a takeover cost is generated by combining the current location, available resource parameters, and current task occupancy status of the candidate takeover agents. The agent whose takeover cost meets the conditions is determined as the target takeover agent, and a target task planning scheme after takeover is generated. This implementation method is applicable to scenarios with unstable communication links, such as tunnel inspection, underground storage, and mountain rescue.
[0086] In other implementations, if the scope of the atomic task changes during task execution due to temporary closure, target location shift, movement of the work object, or changes in environmental boundaries, then the environmental state information based on continuous timestamps is used. and subtask execution status Calculate the boundary drift of the current subtask. If the boundary drift exceeds the preset drift condition but does not affect all subtasks, the overall task planning scheme will not be regenerated; instead, only the affected subtasks will be marked as boundary drift subtasks.
[0087] For boundary-drift subtasks, the completed, incomplete, and newly added task areas are extracted, and a new set of local subtasks is regenerated based on the incomplete and newly added task areas. Then, by combining the current position, remaining resources, and current task occupancy status of each agent, candidate agents capable of undertaking the local subtasks are determined, and the local repartition cost for each candidate agent is calculated. Agents whose local repartition costs meet preset cost conditions are identified as the target executing agents, generating a local task planning scheme specifically for boundary-drift subtasks. This implementation is suitable for scenarios where task boundaries continuously change, such as moving target inspection, disaster site search and rescue, and dynamic warehouse sorting, but the overall task objective should not be frequently reset.
[0088] Example 1: Taking a scenario of collaborative drone inspection in an industrial park as an example, the specific execution process of the method of this invention is explained. Six drones are configured to form a multi-agent ensemble:
[0089] The capability parameters corresponding to each UAV are denoted as follows: ,in, For flight speed, For load capacity, To ensure accuracy, the parameters for each UAV are shown in Table 1.
[0090] Table 1. UAV Capability Parameters
[0091] The input natural language task command is: "Six drones must collaboratively complete the inspection of 10 equipment areas in an industrial park within 2 hours. During the inspection, obstacles must be avoided, and the inspection accuracy of each equipment area must be no less than 0.8mm. The drones must have at least 20% battery remaining." The global task is recorded as follows: The set of subtasks is denoted as Each subtask corresponds to a device area inspection task, and the subtask requirement vector is denoted as:
[0092] in, , Indicates the estimated execution time. This represents the power required for the task. The task constraint set is denoted as:
[0093] in, , Used to ensure that the remaining battery power of the drone is not less than 20%. This is used to ensure that the drone's flight path does not overlap and avoids obstacles. The set of evaluation metrics is denoted as:
[0094] The corresponding weighting coefficients are:
[0095] The state awareness update frequency is set to 50ms / time, and the state anomaly threshold is set to... The large language model uses ChatGLM4, and the parameters of the improved PSO algorithm are set as follows:
[0096] The maximum number of iterations is During the task decomposition phase, the global inspection task is input into ChatGLM4, where it is parsed and broken down into 10 subtasks. An initial subtask allocation scheme is then generated. ,For example:
[0097] Calculate using the rationality check function:
[0098] Since the preset rationality conditions are met, the initial task allocation scheme is retained, and a corresponding initial flight path is generated. During task execution, environmental status is collected in real time. Agent state and subtask status For example, when When performing an inspection task, its real-time status is: Location ,speed Power consumption was 12%, and sensing accuracy was 0.9mm. A weighted fusion method was used to generate a real-time state vector.
[0099] in:
[0100] When a new temporary construction obstacle is detected, the following conditions must be met:
[0101] A sudden state change is detected, triggering a dynamic adjustment process. A multi-objective optimization function is constructed:
[0102] in:
[0103] Calculated Subsequently, the improved PSO algorithm was iterated 100 times to generate an optimized task allocation scheme. and path planning scheme Among them, the original reason Execution , Adjust to and replan Fly along a path that avoids newly added obstacle areas. Upon completion of the mission, you will receive:
[0104] The optimized objective function value is:
[0105] The method of the present invention is compared with the traditional PSO method and the ordinary LLM driving method. The results are shown in Table 2.
[0106] Table 2 Comparison Results of Different Planning Methods
[0107] Example 2: In an industrial collaborative operation scenario, five industrial robots are deployed to collaboratively complete the welding, assembly, and inspection tasks of automotive parts. The input natural language task instruction is: "Five robots collaboratively produce 100 automotive parts within 4 hours, with a welding accuracy of no less than 0.1mm and an assembly pass rate of no less than 99%." After parsing this task instruction, the overall production task is broken down into welding, assembly, and inspection sub-tasks. An initial task planning scheme is generated based on the welding accuracy, assembly speed, inspection efficiency, current load, and available workstation status of each industrial robot. During execution, the operating status, fault status, load status, raw material supply status, and ambient temperature of each industrial robot are continuously collected. When a welding robot malfunctions or raw material supply is delayed, status feedback information is generated. Based on this feedback, the sub-tasks to be adjusted and candidate robots are redefined. The welding task corresponding to the faulty robot is assigned to a robot that meets the welding accuracy requirements and is idle. Simultaneously, the subsequent assembly and inspection sequence is adjusted to ensure continuous production. This implementation method can be used for multi-robot collaborative scheduling in dynamic production environments.
[0108] Example 3: In an intelligent logistics scheduling scenario, eight AGV robots are deployed to collaboratively complete the inbound, outbound, and sorting tasks of 500 items in a smart warehouse. The input natural language task instruction is: "Eight AGVs collaboratively complete the sorting and outbound of 500 items, requiring completion within 2 hours, with no path congestion and minimal energy consumption." After semantic parsing of the task instruction, the overall logistics task is broken down into inbound, outbound, and sorting sub-tasks. An initial task planning scheme is generated based on the AGV robots' load capacity, movement speed, current position, remaining battery power, and current task occupancy status. During task execution, real-time data is collected on AGV positions, item positions, aisle occupancy status, warehouse congestion status, and the completion rate of each sub-task. When local aisle congestion is detected or an AGV runs out of power, status feedback information is generated based on real-time status information, and the original task allocation and travel paths are adjusted. Specifically, AGVs near congested aisles can be reassigned to adjacent cargo areas to perform tasks, while detour paths are generated for affected AGVs, and the task load of each AGV is balanced. This implementation method can be used for dynamic path planning and task scheduling in scenarios such as peak warehousing periods or frequent changes in cargo location.
[0109] Example 4: In an emergency rescue collaborative scenario, four rescue drones and two ground rescue robots are deployed to collaboratively complete tasks such as searching for trapped personnel, delivering relief supplies, and surveying road conditions after an earthquake in a mountainous area. The input natural language task command is: "Multi-agent collaborative rescue in a mountainous earthquake zone, prioritizing the search for trapped personnel, followed by delivering relief supplies and surveying road conditions. The core rescue task must be completed within 3 hours, ensuring the safety of rescue equipment." After parsing this task command, the overall rescue task is broken down into search and rescue sub-tasks, supply delivery sub-tasks, and road survey sub-tasks. An initial task planning scheme is generated based on the search range, endurance, and delivery capabilities of the rescue drones, as well as the terrain adaptability of the ground rescue robots. During execution, aftershock information, landslide information, obstacle distribution, agent battery levels, fault status, and the location of trapped personnel are collected in real time. When a landslide occurs in a certain area, the drone search and rescue path is adjusted based on status feedback information, prioritizing coverage of trapped personnel in dangerous areas, and some ground rescue robot tasks are reassigned to surveying safe passages. This implementation method is suitable for emergency rescue scenarios where the environment is constantly changing and task priorities need to be dynamically adjusted.
[0110] Example 5: This invention applies the method of collaborative inspection of industrial parks using drones. Before execution, a natural language task instruction is input, requiring multiple drones to complete the inspection of multiple equipment areas within a limited time, while meeting obstacle avoidance, remaining battery power, and inspection accuracy requirements. After the task is input, the large language model module first parses the global task and constraints, breaking down the inspection task into multiple sub-tasks, and generating an initial task allocation relationship and initial path planning based on the flight speed, remaining battery power, perception accuracy, and current position of each drone. This overall execution relationship can be found in [reference needed]. Figure 2 , Figure 2 It shows the complete data flow from system initialization, task decomposition, agent execution, state feedback to closed-loop optimization.
[0111] After the mission commences, each UAV flies to its designated equipment area according to the initial mission plan. During execution, the environmental status within the mission area, the execution status of each UAV, and the completion status of each sub-task are collected in real time. Environmental status includes obstacle locations, environmental interference, and terrain changes; UAV execution status includes position, speed, resource consumption, execution progress, and fault status; sub-task status includes completion rate, deviation, remaining time, and requirement matching degree. After denoising, outlier removal, and standardization, the above multi-source data is weighted and fused to generate a fused real-time state vector. . Figure 3 The processing structure for real-time state acquisition, preprocessing, weighted fusion, and state change detection is shown.
[0112] When a new temporary obstacle is detected near a certain inspection path, the difference between the fused real-time state vector and the state vector at the previous time exceeds a threshold. This generates state feedback information. Closed-loop optimization processing is based on... Evaluation indicators Initial allocation scheme Initial path and task constraints A multi-objective optimization function is constructed, and the optimized task allocation scheme is solved by an improved PSO algorithm. and path planning scheme The processing procedure can be found in [reference needed]. Figure 4 After the optimization results are generated, they are converted into executable path and task commands for each UAV and sent to the corresponding UAV to continue the inspection task. At the same time, the adjustment results are written into the task planning knowledge base as a reference for subsequent task planning. In this way, without interrupting the entire task, local adjustments can be made to the affected subtasks and flight paths, maintaining the continuity of the multi-agent task execution process.
[0113] Figure 5This is a system architecture diagram of a multi-agent task planning system provided in one embodiment of the present invention. Figure 5 As shown, this invention provides a multi-agent task planning system, comprising: a data acquisition unit for acquiring global task instructions, agent state information, and task constraint information; a scheme planning unit for generating a set of sub-tasks based on the global task instructions and the task constraint information, and executing an initial task planning scheme corresponding to the set of sub-tasks based on the agent state information; a feedback acquisition unit for acquiring real-time state information during the execution of the initial task planning scheme, and generating state feedback information based on the real-time state information; a scheme optimization unit for optimizing the initial task planning scheme based on the state feedback information to generate a target task planning scheme; and an instruction generation unit for generating task execution instructions corresponding to each agent based on the target task planning scheme.
[0114] The present invention also provides a computer-readable storage medium storing instructions that, when executed on a computer, cause the computer to perform the multi-agent task planning method described above.
[0115] Those skilled in the art will understand that all or part of the steps in the methods of the above embodiments can be implemented by a program instructing related hardware. This program is stored in a storage medium and includes several instructions to cause a microcontroller, chip, or processor to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as a USB flash drive, a portable hard drive, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk.
[0116] The optional embodiments of the present invention have been described in detail above with reference to the accompanying drawings. However, the embodiments of the present invention are not limited to the specific details described above. Within the scope of the technical concept of the embodiments of the present invention, various simple modifications can be made to the technical solutions of the embodiments of the present invention, and these simple modifications all fall within the protection scope of the embodiments of the present invention. It should also be noted that the various specific technical features described in the above specific embodiments can be combined in any suitable manner without contradiction. To avoid unnecessary repetition, the embodiments of the present invention will not further describe the various possible combinations.
[0117] Furthermore, various different embodiments of the present invention can be combined in any way, as long as they do not violate the spirit of the embodiments of the present invention, they should also be regarded as the content disclosed by the embodiments of the present invention.
Claims
1. A multi-agent task planning method, characterized in that, The method includes: Obtain global task instructions, agent state information, and task constraint information; A set of subtasks is generated based on the global task instructions and the task constraint information, and the initial task planning scheme corresponding to the set of subtasks is executed based on the agent state information. Collect real-time status information during the execution of the initial task planning scheme, and generate status feedback information based on the real-time status information; Based on the status feedback information, the initial task planning scheme is optimized to generate a target task planning scheme; Based on the target task planning scheme, task execution instructions are generated for each intelligent agent.
2. The multi-agent task planning method according to claim 1, characterized in that, A set of subtasks is generated based on the global task instructions and the task constraint information, including: The global task instructions are semantically parsed to extract the task objective, task object, and task execution requirements; Based on the task constraint information, the task objective is split into multiple sub-tasks; Determine the execution relationships between the subtasks, and extract the execution capacity requirements, resource requirements, and time nodes corresponding to each subtask; A set of subtasks is generated based on the execution relationship, the execution capability requirements, the resource requirements, and the time nodes.
3. The multi-agent task planning method according to claim 2, characterized in that, Based on the agent's state information, execute the initial task planning scheme corresponding to the sub-task set, including: Based on the agent state information, extract the execution capability parameters, available resource parameters and current position parameters corresponding to each agent; The execution capability parameters are matched with the execution capability requirements corresponding to each subtask to generate capability matching results; The available resource parameters are matched with the resource requirements corresponding to each subtask to generate resource matching results; Based on the current position parameters and the task execution position corresponding to each subtask, the path cost for each agent to execute the corresponding subtask is determined. Based on the capability matching results, the resource matching results, and the path cost, the allocation relationship between each subtask and its corresponding agent is determined, and the initial task planning scheme is generated and executed based on the allocation relationship.
4. The multi-agent task planning method according to claim 1, characterized in that, Collect real-time status information during the execution of the initial task planning scheme, including: During the execution of the initial task planning scheme, environmental state information within the task execution area, execution state information of each agent, and execution state information of each subtask are collected; among them, The environmental status information includes any one or more of the following: obstacle location, environmental interference information, and task area change information; The execution status information of the intelligent agent includes any one or more of the following: intelligent agent position, running speed, resource consumption, execution progress, and fault status. The execution status information of the subtask includes any one or more of the following: subtask completion degree, execution deviation, and remaining execution time; Based on the acquisition time, the environmental state information, the execution state information of the intelligent agent, and the execution state information of the subtask are correlated to generate real-time state information.
5. The multi-agent task planning method according to claim 4, characterized in that, Based on the real-time status information, status feedback information is generated, including: The environmental state information, the execution state information of each intelligent agent, and the execution state information of each subtask in the real-time state information are processed for data alignment to generate a state data group corresponding to the same acquisition time. Based on the state data set, the environmental change, agent state deviation, and subtask execution deviation are determined respectively. The environmental change, the agent state deviation, and the subtask execution deviation are fused and calculated to generate a real-time state vector. The real-time state vector is compared with the baseline state vector corresponding to the initial task planning scheme to generate state feedback information that characterizes the degree of deviation in task execution.
6. The multi-agent task planning method according to claim 1, characterized in that, Based on the status feedback information, the initial task planning scheme is optimized to generate a target task planning scheme, including: Based on the status feedback information, the task allocation deviation, path execution deviation, and resource consumption deviation in the initial task planning scheme are determined; Based on the task allocation deviation, the path execution deviation, and the resource consumption deviation, corresponding planning adjustment parameters are constructed. Based on the planning adjustment parameters, the subtask allocation relationship and agent execution path in the initial task planning scheme are adjusted to generate candidate task planning schemes; The candidate task planning schemes are subjected to task constraint verification, and the candidate task planning schemes that satisfy the task constraint information are determined as the target task planning schemes.
7. The multi-agent task planning method according to claim 6, characterized in that, Based on the planning adjustment parameters, the subtask allocation relationships and agent execution paths in the initial task planning scheme are adjusted to generate candidate task planning schemes, including: Based on the planning and adjustment parameters, determine the sub-tasks to be adjusted, the intelligent agents to be adjusted, and the corresponding adjustment triggering reasons; Based on the execution capability requirements, resource requirements, and time nodes corresponding to the sub-tasks to be adjusted, candidate agents that meet the execution conditions are selected from each agent. Based on the current position, available resource parameters, and current task occupancy status of each candidate agent, the cost for each candidate agent to undertake the sub-task to be adjusted is determined. Candidate agents whose acceptance costs meet preset cost conditions are identified as target adjustment agents, and the sub-tasks to be adjusted are assigned to the target adjustment agents; Based on the current position of the target adjustment agent, the task execution position of the subtask to be adjusted, and the environmental state information within the task execution area, an adjusted execution path is generated for the target adjustment agent. Based on the adjusted subtask allocation relationship and the adjusted execution path, a candidate task planning scheme is generated.
8. The multi-agent task planning method according to claim 1, characterized in that, Based on the target task planning scheme, task execution instructions are generated for each intelligent agent, including: The target task planning scheme is analyzed to determine the target sub-tasks, task execution order, and target execution path for each intelligent agent; Based on the target sub-task, the task execution order, and the target execution path, generate task control parameters corresponding to each intelligent agent; Based on the instruction format of each intelligent agent, the task control parameters are converted to generate task execution instructions corresponding to each intelligent agent. The task execution instructions are sent to the corresponding intelligent agents so that each intelligent agent executes the corresponding sub-task according to the target task planning scheme.
9. A multi-agent task planning system, characterized in that, The system includes: The data acquisition unit is used to acquire global task instructions, agent state information, and task constraint information. The scheme planning unit is used to generate a set of subtasks based on the global task instructions and the task constraint information, and to execute the initial task planning scheme corresponding to the set of subtasks based on the agent state information. The feedback acquisition unit is used to acquire real-time status information during the execution of the initial task planning scheme, and generate status feedback information based on the real-time status information. The scheme optimization unit is used to optimize the initial task planning scheme based on the status feedback information and generate a target task planning scheme. The instruction generation unit is used to generate task execution instructions for each intelligent agent based on the target task planning scheme.
10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores instructions that, when executed on a computer, cause the computer to perform the multi-agent task planning method as described in any one of claims 1-8.