A multi-unmanned aerial vehicle cooperative task planning method, system, device and medium
By integrating user task descriptions, external knowledge, and perception data, and utilizing large language models and diffusion models to generate optimal paths, the problem of insufficient high-level semantic parsing in multi-UAV mission planning is solved, enabling efficient collaborative task execution in dynamic environments.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- ZHEJIANG UNIV
- Filing Date
- 2026-06-03
- Publication Date
- 2026-06-30
AI Technical Summary
Existing technologies lack high-level semantic parsing in multi-UAV mission planning, making it difficult to adapt to dynamic environments. Furthermore, the complexity of multi-UAV collaborative mission planning increases exponentially with the number of UAVs, making it difficult for traditional methods to meet the needs of complex missions.
By fusing user task description text, external knowledge information, and perceptual data, the optimal subtask sequence is generated using a large language model, and security constraints are injected into the diffusion model to generate conflict-free optimal paths.
It improves the reliability of collaborative operation of multiple UAV intelligent agents in dynamic environments, enhances the ability to deeply understand complex environments, avoids redundancy and conflicts, and improves the dynamic obstacle avoidance capability of flight trajectories.
Smart Images

Figure CN122308410A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of unmanned aerial vehicle (UAV) technology, and in particular to a method, system, device, and medium for multi-UAV collaborative mission planning. Background Technology
[0002] In recent years, Large Language Models (LLMs) have become a core driving component for autonomous intelligent agents. LLM-driven UAV agents can leverage the powerful general reasoning capabilities of LLMs to understand user commands, generate executable plans, and collaborate with human operators or other intelligent agents. Based on this, UAV agents can perform multiple navigation tasks in complex open environments. For example, language navigation utilizes LLMs to parse natural language commands and transform them into executable control interface commands, thereby enhancing scene perception and action planning capabilities.
[0003] Since multi-UAV mission planning is essentially a nondeterministic polynomial-hard (NP-hard) problem, its complexity increases exponentially with the number of UAVs. Existing planning methods, when facing multi-UAV collaborative missions, suffer from insufficient high-level semantic parsing and are difficult to adapt to dynamic environments.
[0004] Therefore, how to effectively plan multi-UAV missions to meet the needs of complex tasks in dynamic environments has become a technical problem that urgently needs to be solved by those skilled in the art. Summary of the Invention
[0005] This invention provides a method, system, device, and medium for multi-UAV collaborative mission planning, which solves the problem of how to use LLM to collaboratively perceive multimodal information, generate accurate safe paths, and improve the reliability of multi-UAV intelligent agent collaborative operation.
[0006] To address the aforementioned technical problems, embodiments of the present invention provide a multi-UAV cooperative mission planning method, comprising: The user's task description text is input into the UAV knowledge base, which consists of scene information, flight action library and historical flight logs, for retrieval to obtain external knowledge information; Perception data is acquired from the onboard sensors of the UAV, and the user task description text, the external knowledge information and the perception data are fused to generate a task description embedding vector; Under the constraints of the conditional filtering network constructed by the flight action library, the task description is embedded into a vector input to a preset large language model for processing, generating an initial subtask sequence of the current task description; The large language model is driven to filter the initial subtask sequence to obtain the optimal subtask for each UAV; The security constraints established based on the conflict search algorithm are injected into the denoising network of the diffusion model to obtain the improved diffusion model. The optimal subtask is used to drive the improved diffusion model to output the optimal path for each UAV. Control each UAV to execute its corresponding task along the optimal path.
[0007] Furthermore, the user task description text is input into a UAV knowledge base composed of scene information, flight action database, and historical flight logs for retrieval to obtain external knowledge information, including: The user task description text is used to retrieve knowledge information from the UAV knowledge base that meets the first preset condition for similarity, which is then used as the external knowledge information.
[0008] Furthermore, before fusing the user task description text, the external knowledge information, and the perceived data, the following steps are included: The perceived data is input into a multimodal large model, and the weights of the modal projection function in the multimodal large model are dynamically adjusted using the embedding vector of the user task description text in order to align the modal features corresponding to the perceived data. During the alignment process, a designed loss function is introduced to optimize the parameters of the multimodal large model, thereby obtaining the embedded representation of the perceived data.
[0009] Furthermore, the process of generating the initial subtask sequence includes: The conditional filtering network is constructed using the flight action control functions, physical constraints, and rule constraints of the UAV in the flight action library as indicators; The task description is embedded into a vector input to the multi-head attention layer of the large language model for context awareness, thereby obtaining the global task intent. With reference to the historical flight logs, the thought chain of the large language model, driven by the conditional filtering network as a constraint, is used to reason and decompose the global task intent, generating the initial subtask sequence.
[0010] Furthermore, the driving large language model filters the initial subtask sequence to obtain the optimal subtask for each UAV, including: Using semantic offset, trajectory distribution similarity, and task risk level as screening indicators, the large language model is driven to verify each subtask in the initial subtask sequence, and the subtask whose verification results meet the second preset condition is determined as the optimal subtask.
[0011] Furthermore, the process of filtering the initial subtask sequence using the driving large language model to obtain the optimal subtask for each UAV also includes: The offset of the contextual semantics between each subtask in the initial subtask sequence and the global task intent is verified to obtain the first verification result corresponding to each subtask; A similarity analysis is performed on the current trajectory distribution and historical trajectory distribution corresponding to each subtask to obtain the second verification result corresponding to each subtask; The risk level of each subtask is verified using the conditional filtering network to obtain the third verification result for each subtask. By integrating the first verification result, the second verification result, and the third verification result, the target verification result of each sub-task is obtained; The subtask whose target verification result satisfies the second preset condition is determined as the optimal subtask.
[0012] Furthermore, the step of injecting the security constraints established based on the conflict search algorithm into the denoising network of the diffusion model to obtain an improved diffusion model, and using the optimal subtask to drive the improved diffusion model to output the optimal path corresponding to each UAV, includes: A conflict set is constructed based on the conflict search algorithm and encoded as a mask vector. The mask vector is then injected as a security constraint into the denoising network of the diffusion model to obtain the improved diffusion model. The optimal subtask is encoded as a task vector, and the improved diffusion model is driven to iteratively sample from the pre-trained trajectory distribution using the task vector as the initial condition. At the same time, a denoising network is invoked to dynamically adjust the trajectory distribution until the optimal path is output.
[0013] Another embodiment of the present invention provides a multi-UAV cooperative mission planning system, comprising: The retrieval module is used to input the user's task description text into the UAV knowledge base, which consists of scene information, flight action library and historical flight logs, and retrieve external knowledge information. The fusion module is used to acquire perception data from the UAV's onboard sensors, fuse the user task description text, the external knowledge information, and the perception data to generate a task description embedding vector. The task allocation module is used to embed the task description into a preset large language model for processing under the constraints of a conditional filtering network constructed from the flight action library, thereby generating an initial subtask sequence of the current task description. The task filtering module is used to drive the large language model to filter the initial subtask sequence and obtain the optimal subtask for each UAV. The path planning module is used to inject the safety constraints established according to the conflict search algorithm into the denoising network of the diffusion model to obtain the improved diffusion model, and use the optimal subtask to drive the improved diffusion model to output the optimal path for each UAV. The task execution module is used to control each UAV to execute the corresponding task along the optimal path.
[0014] Another embodiment of the present invention provides a computer device including a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, wherein the processor executes the computer program to implement the multi-UAV cooperative mission planning method as described above.
[0015] In another embodiment of the present invention, a computer-readable storage medium is provided, the computer-readable storage medium storing a computer program, wherein when the device containing the computer-readable storage medium executes the computer program, the multi-UAV cooperative task planning method described above is implemented.
[0016] Compared with the prior art, the beneficial effects of the embodiments of the present invention are at least one of the following: This invention acquires external knowledge by retrieving it from the UAV knowledge base, ensuring the professionalism and contextual accuracy of task understanding and avoiding semantic bias. Secondly, it fuses user commands, external knowledge, and real-time multimodal perception data to generate task description embedding vectors, enhancing the ability to deeply understand complex environments. Under the constraints of a conditional filtering network, it uses a large language model to generate and filter sub-task sequences, ensuring the rationality and feasibility of task allocation while effectively avoiding redundancy and conflicts. Furthermore, by injecting safety constraints into the diffusion model and driving it to generate optimal paths, it achieves physical conflict suppression and feasibility screening, improving the dynamic obstacle avoidance capability of flight trajectories in multi-agent dynamic scenarios and the semantic analysis capability of complex tasks. Attached Figure Description
[0017] Figure 1 This is a schematic diagram of the multi-UAV collaborative task planning method in one embodiment of the present invention; Figure 2 This is a schematic diagram of the collaborative process of each part in one embodiment of the present invention; Figure 3 This is a schematic diagram of the scene perception part in one embodiment of the present invention; Figure 4 This is a schematic diagram of the task allocation part in one embodiment of the present invention; Figure 5 This is a schematic diagram of the trajectory generation part in one embodiment of the present invention; Figure 6 This is a schematic diagram of the structure of a multi-UAV collaborative task planning system in one embodiment of the present invention; Figure 7 This is a structural block diagram of a preferred embodiment of a computer device provided by the present invention. Detailed Implementation
[0018] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. The purpose of providing these embodiments is to make the disclosure of the present invention more thorough and comprehensive. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of the present invention.
[0019] In the description of this application, the terms "first," "second," "third," etc., are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of indicated technical features. Therefore, a feature defined with "first," "second," "third," etc., may explicitly or implicitly include one or more of that feature. In the description of this application, unless otherwise stated, "a plurality of" means two or more.
[0020] In the description of this application, it should be noted that, unless otherwise expressly specified and limited, the terms "installation," "connection," and "linking" should be interpreted broadly. For example, they can refer to fixed connections, detachable connections, or integral connections; they can refer to mechanical connections or electrical connections; they can refer to direct connections or indirect connections through an intermediate medium; and they can refer to the internal communication between two components. The terms "vertical," "horizontal," "left," "right," "upper," "lower," and similar expressions used herein are for illustrative purposes only and do not indicate or imply that the device or component referred to must have a specific orientation, or be constructed and operated in a specific orientation, and therefore should not be construed as limiting the invention. The term "and / or" as used herein includes any and all combinations of one or more of the associated listed items. Those skilled in the art can understand the specific meaning of the above terms in this application based on the specific circumstances.
[0021] In the description of this application, it should be noted that, unless otherwise defined, all technical and scientific terms used in this invention have the same meaning as commonly understood by one of ordinary skill in the art. The terminology used in this specification is for the purpose of describing specific embodiments only and is not intended to limit the invention. Those skilled in the art can understand the specific meaning of the above terms in this application based on the specific circumstances.
[0022] Navigation task planning for multi-UAV systems in unknown open environments often presents the following challenges: 1. Heterogeneous perception signals and ambiguous user input make it difficult to unify user intent with scene information; 2. The complexity of multi-UAV task planning increases exponentially with the number of UAVs. Traditional methods, which establish optimization models based on task requirements and environmental constraints, are difficult to meet the needs of complex tasks in dynamic environments; 3. Language model-based task planning outputs lack physical modeling, resulting in problems such as unreachability, uncontrollability, and even violations of physical laws.
[0023] Based on this, one embodiment of the present invention provides a multi-UAV cooperative task planning method. For details, please refer to [link to relevant documentation]. Figure 1 , Figure 1 The diagram shown illustrates a multi-UAV cooperative task planning method according to one embodiment of the present invention, including the following steps: S1. Input the user's task description text into the UAV knowledge base, which consists of scene information, flight action library and historical flight logs, for retrieval to obtain external knowledge information.
[0024] This embodiment takes multiple unmanned aerial vehicles (UAVs) / intelligent agents as an example, dividing its task planning process into three main parts: scene perception, task allocation, and trajectory generation. For details, please refer to [link / reference]. Figure 2 As shown.
[0025] This step is performed in the scene perception part. Specifically: the user inputs a task description using natural language, resulting in user task description text. Then, the user task description text is used to retrieve knowledge information from the drone knowledge base that meets the first preset condition for similarity as external knowledge information, as expressed by the following formula: in, The first preset condition for the user task description text is... Retrieve the most similar vectors from an external database based on vector similarity (cosine similarity can be used). Content: As a supplementary context input to the multimodal information fusion model, i.e., external knowledge information; For drone knowledge base, That is The i-th piece of knowledge information.
[0026] Correspondingly, the UAV knowledge base consists of three parts: scene information, flight action database, and historical flight logs. In some embodiments of this invention, scene information... For triples: Including semantic tags ( ), visual landmarks ( ) and geospatial coordinates ( ).
[0027] In the formula, For index, representing the first Each scene element; Indicates the first A collection of semantic descriptions of scene elements The set of all possible semantic tags, It is the first The vector representation of each scene element in the visual feature space, with dimension . , Description of the Spatial location information of each scene element, with dimensions of .
[0028] Flight Action Library For a quadruple: .
[0029] in, Let be the closed-loop kinematic control function required for common flight maneuvers, representing the th The control laws corresponding to common flight maneuvers, such as "hovering", "straight flight", and "turning"; The first is derived from the dynamics and kinematic equations of the unmanned aerial vehicle. Physical feasibility constraints, such as "maximum speed", "maximum acceleration", "maximum tilt angle", "energy consumption limit", etc. Based on the abstract of laws and regulations, the first Various compliance constraints, such as "no-fly zone constraints", "maximum flight altitude", and "take-off and landing area permits"; It is the first Each API function parameter.
[0030] Historical Flight Log It is a quadruple: In the formula, It is an index, indicating the first... This historical flight log, Indicates the first Submission or flight record; It is the user task description (text), such as "fly to the riverbank"; It is flight log data (CSV format, including trajectory, attitude, waypoints, and time). It is the result of task allocation. It is the result of the task execution evaluation.
[0031] S2. Obtain perception data from the UAV's onboard sensors, fuse the user task description text, external knowledge information, and perception data to generate a task description embedding vector.
[0032] Understandably, the scene awareness component combines user input, external knowledge base, and scene information to achieve task understanding and scene information unification, generating embedding vectors for subsequent task allocation. See details for further information. Figure 3 As shown in the figure, this embodiment uses a text encoder to fuse external knowledge information obtained through retrieval with multimodal scene information obtained through a multimodal encoder to obtain the required embedding vector.
[0033] Specifically, user task description text in free format can be used. via text encoder ( ), converted into an embedding vector acceptable for the fusion operation ( ): The system utilizes a variety of heterogeneous sensors onboard the drone, including GPS, IMU, LiDAR, RGB camera, and depth camera, to acquire perception data.
[0034] To semantically align these multimodal perceptual data, this embodiment preferably uses a cross-modal contrastive learning mechanism of the Large Multimodal Model (CLIP) for preprocessing. Specifically, the perceptual data is input into the large multimodal model, and the weights of the modal projection function in the large multimodal model are dynamically adjusted using the embedding vectors of the user task description text to align the modal features corresponding to the perceptual data. During the alignment process, a designed loss function is introduced to optimize the parameters of the large multimodal model, resulting in the embedded representation of the perceptual data. It is expressed by the following formula: in, It is the first The original input of each modality, such as ; It is an independent feature extractor for each modality; It is a modal projection function that projects each modal feature onto a unified semantic space; CLIP fusion operation for modal features; It is the fusion unit of CLIP; m is the total number of modes.
[0035] Furthermore, the loss function Represented as: in, For comparison of losses, such as: , For negative sample pairs, Batch size; These are the weighting coefficients; Calculate the cosine similarity; Adjust the contrast for temperature parameters; The text vector input by the user. For visual modality vectors, This is the state vector of the UAV.
[0036] For example, user input: "Fly over narrow passages, avoiding collisions with walls", is encoded as a text vector. Images captured by an RGB camera Includes visual landmarks at the entrance of the passageway, and LiDAR data. The depth camera captured spatial distance information within the channel, and obtained information about the pipe depth and entrance. These three elements together constitute the visual modality vector. At the same time, combined with the geographic coordinate information provided by GPS and the drone's own attitude perception information The encoding yields the UAV state vector. .exist During training, comparative learning improves the triplet. The similarity is higher than other negative example combinations, which strengthens the modal consistency between the "narrow passage" task instructions and the corresponding perception scene and action execution, thereby obtaining the embedded representation of multimodal perception data. This supports subsequent understanding and reasoning about the intent of complex tasks.
[0037] Furthermore, after obtaining the embedded representation of the perceptual data... Then, it is compared with the embedding vector of the user task description text. External knowledge information The fusion is performed using a multimodal information fusion model (multimodal semantic codec) based on embedding vectors. In this embodiment, the multimodal information fusion model is essentially a UNITER-like multimodal Transformer codec structure. Its Encoder part encodes the three types of modal data inputs respectively, and the Decoder part performs multimodal interaction through a shared attention layer, finally generating task description embedding vectors, as shown below: in, The task description embedding vector mainly includes the main target ( ) and detailed task description ( ), which will be used as input for the subsequent task allocation phase.
[0038] S3. Under the constraints of the conditional filtering network constructed from the flight action library, the task description is embedded into the vector input of the preset large language model for processing, and the initial sub-task sequence of the current task description is generated.
[0039] After generating the fused task description embedding vector representation, this representation is input into the task allocation part, which will first execute tasks such as... Figure 4 The subtask generation operation shown is preferably implemented using a large language model (LLM) and an attention architecture in this embodiment.
[0040] Specifically, the task description is embedded into a multi-head attention layer within the large language model to perform context awareness, extract its contextual semantics, and obtain the global task intent. This operation aims to ensure consistency in the understanding of the task objective among the agents. Understandably, this embodiment constructs a Transformer-based attention architecture within each agent, which... For queries, the agent role tends to be embedded. Embedded with historical experience Execute multi-head attention The interaction is represented as follows: In the formula, Context-aware embeddings represent the agent's independent individual interpretation of the task objective.
[0041] all A graph representation of the global task intent is formed through sparse graph attention aggregation. This graph is shared among agents and serves as a consensus representation for the task, as shown below: in, This represents the global query vector, which can typically be embedded from the task description. Generation, used to measure different intelligent agents The correlation. It is the first A smart agent The relevant key vector is used to compute a domain-wide query. The degree of matching. It is a scaling factor (the square root of the attention dimension) to prevent the dot product from being too large and affecting the gradient of Sparsemax. It is a sparse normalization function. The resulting attention weight vector represents the importance of each agent's interpretation in forming the global task intent.
[0042] Next, hierarchical reasoning is performed, and a conditional filtering network is designed to constrain the feasible boundaries of the agent's flight. In this embodiment, the flight action control function of the UAV in the flight action library (i.e., ), physical constraints (i.e. ) and rule constraints (i.e. compliance constraints) ) construct a conditional filtering network for indicators Specifically, it is expressed as: in, It is a multilayer perceptual filtering network used to determine the capabilities of a given action on the current platform. Is it feasible in a sensory environment?
[0043] With reference to historical flight logs, a conditional filtering network is used to drive the thought chain of a large language model to reason and decompose the global task intent, generating an initial sub-task sequence. Specifically, it is expressed as follows: Understandably, in the initial subtask generation phase, each agent relies on a predefined task language template, ultimately generating a sequence of initial subtasks. The template is encoded as a structured JSON object and sent uniformly to the evaluation and screening phases below. This template may include the following information: "Role", such as the current platform number or division of labor type; "task", such as semantic tags such as inspection, identification, tracking, navigation, etc.; "target location coordinates"; "path restrictions", such as obstacle avoidance boundaries, flight angle restrictions, etc., which can be limited by conditional filtering networks.
[0044] Example scenario: User input commands "U-03, please conduct an aerial inspection of the photovoltaic plant area, and simultaneously identify whether the photovoltaic panels are damaged and whether there are pedestrians at the exit."; Global task description embedding vector ( The template contains three sub-tasks: "Inspecting the photovoltaic factory area", "Identifying damaged photovoltaic panels", and "Navigating to the east exit for moving target detection". The template then generates the initial sub-task sequence for the current agent i. The following is a sample JSON for "Navigation to the East Exit for Moving Target Detection": { "Agent_ID":"U-03", "Role":"Anomaly_Detection", "Task":"Motion_Surveillance", "Target_Location": { "Latitude": 35.6881, "Longitude": 139.6902, "Altitude": 20 }, "Path_Constraints": { "Avoid_Areas": [ {"semantic":"pedestrian_zone"}, {"semantic":"vehicle_lanes"} ], "Restricted_Heading_Range": [30, 150] }, "Expected_Output":"Trajectory log of moving objects at Exit_East" } S4 drives the large language model to filter the initial subtask sequence and obtain the optimal subtask for each UAV.
[0045] This step involves evaluating and filtering the initial subtask sequence in the task allocation section. The goal is to determine the optimal subtask to be executed by each UAV agent. Based on this, this embodiment uses semantic offset, trajectory distribution similarity, and task risk level as filtering indicators to drive a large language model to verify each subtask in the initial subtask sequence. Subtasks whose verification results meet the second preset condition are determined as the optimal subtasks.
[0046] Specifically, during the verification process, the semantic offset between each subtask in the initial subtask sequence and the global task intent is verified to obtain the first verification result corresponding to each subtask. , means as follows: Based on the above formula, the scoring results of the semantic offset of each subtask are obtained. For example: "UAV U-03 performs motion monitoring task at the East Exit (Exit_East) and outputs trajectory log of moving target". Offset Case: The task type is changed, from Motion_Surveillance to Area_Patrol. This indicates a semantic offset, as shown in the LLM output. =0.2.
[0047] Furthermore, a similarity analysis is performed on the current trajectory distribution and historical trajectory distribution corresponding to each sub-task to obtain the second verification result for each sub-task. , means as follows: In the formula, This is the trajectory reachability verification function. Specifically, in this embodiment, the current trajectory distribution corresponding to the subtask is retrieved from the flight action library and the embedding vectors of the perception data, and compared with the historical trajectory distribution to verify its physical reachability.
[0048] For example, the similarity can be calculated using the KL divergence between the current trajectory distribution and historical distributions. A similarity threshold of 0.6 can be set based on the distribution of historical benign sample data. Subtasks with similarities below this threshold indicate that the trajectory is physically unreachable and may pose risks such as collisions or inability to avoid obstacles.
[0049] Furthermore, a conditional filtering network is used to verify the risk level of each subtask, resulting in a third verification result for each subtask. , means as follows: For example, it can be evaluated from the perspectives of redundancy, conflict, and risk. If the same UAV is assigned to "filming warehouse A" twice, this indicates a redundancy risk, and the LLM output will reflect this. =0.7. For example, if the paths of UAV-01 and UAV-02 intersect in a no-fly zone, this indicates a risk of conflict between the two agents, and the LLM output... =0.1. Or, if a UAV is flying at low altitude through a congested area, this indicates a flight hazard, and the LLM output will be [value missing]. =0.2.
[0050] Through the above verification process, the first verification result, the second verification result, and the third verification result are integrated to obtain the target verification result of each subtask. The subtask whose target verification result satisfies the second preset condition is determined as the optimal subtask. Specifically, it is expressed as follows: The second presupposition condition is: (Note) The permutation obtained by sorting the scores from highest to lowest: Select the optimal subtask with the highest score. Used for trajectory generation of subsequent agents.
[0051] S5~S6. The security constraints established based on the conflict search algorithm are injected into the denoising network of the diffusion model to obtain the improved diffusion model. The optimal sub-task is used to drive the improved diffusion model to output the optimal path for each UAV, and control each UAV to execute the corresponding task along the optimal path.
[0052] This step is the trajectory generation part. After assigning the optimal subtask to each UAV, it will be executed as follows: Figure 5 The example illustrates the conflict-free optimal path generation operation. This embodiment improves upon the diffusion model. It should be understood that the diffusion model avoids manual feature engineering, can fit complex spatial distributions end-to-end, and has stronger generalization capabilities during user task or scene transitions. The diffusion model involves a denoising process during execution. To specifically address the deficiency of physical conflicts in the paths generated by the diffusion model (such as path covering obstacles or physical collisions), a Conflict Search (CBS) algorithm is introduced to improve this denoising process.
[0053] Specifically, firstly, a conflict set is constructed based on the conflict search algorithm and encoded as a mask vector. The mask vector is then injected as a security constraint into the denoising network of the diffusion model to obtain an improved diffusion model.
[0054] Then, the optimal subtask is encoded into a task vector, and the improved diffusion model is driven to iteratively sample from the pre-trained trajectory distribution using the task vector as the initial condition. At the same time, the denoising network is called to dynamically adjust the trajectory distribution until the optimal path is output through convergence.
[0055] Task vector It is expressed as follows: In the formula, For the task encoder, the task vector of agent i As the conditional distribution of the initial state for trajectory generation, it effectively enables semantic control and target alignment of the diffusion model.
[0056] Specifically, during the iteration process, in task coding Under guidance, using the trained trajectory distribution ,generate Candidate path samples Each trajectory consists of a series of flight state sequences. Composition, among which, In flight mode, indicating The flight status (position, velocity, attitude, etc.) at any given moment can be represented by the following formula: in, This represents the path decoder decoding process of the diffusion model. Initialized samples representing standard Gaussian noise are used for diverse trajectory generation; This represents the distribution of conditional trajectories.
[0057] In some embodiments of the present invention, a large amount of historical trajectory data from historical flight logs can be used to pre-train the diffusion model. The goal of the pre-training process is to adjust the parameter θ so that the distribution estimated by the diffusion model... To approximate the true distribution as closely as possible. After pre-training, It encapsulates knowledge about "flight trajectories," and the generation process involves extracting information from this distribution. Sampling is performed during this process.
[0058] Example: A process for injecting security constraints is given. First, a conflict set is constructed using a conflict search algorithm. (Such as spatial overlap, temporal conflict, physical collision, obstacles, etc.) are represented as follows: In the formula, This indicates that a conflict occurs between agents i and j at time t.
[0059] Secondly, the collision set is encoded into a mask vector. This is the reverse denoising process of the input diffusion model. By dynamically injecting safety constraints in each iteration of the denoising process, the diffusion model can adjust the path distribution and gradually converge to the optimal trajectory, i.e., the optimal path, that is, without collisions, conflicts, and is executable.
[0060] Based on task vector and security constraint mask vector The process of adjusting and optimizing the path can be represented as follows: in, yes The denoised flight state at any given moment represents The noise is dynamically corrected during the gradual denoising process, so that it tends to a conflict-free state.
[0061] If we assume Represents the optimized minimum conflict set: The above formula means that, in At this time, a conflict-free optimal path is obtained. Simultaneously, based on the verification results of the coordinating sub-task objectives, the corresponding optimal path is output. The optimal path is sent to each UAV platform to control each UAV to execute the corresponding task along the optimal path. Simultaneously, the path scheme is recorded in the trajectory database to provide data support and a basis for iterative optimization in subsequent tasks.
[0062] During the execution of the mission, each UAV agent observes the environment and generates messages in the form of [TimeS, Obj, Alarm, Loc_X, Loc_Y, Loc_Z], including UAV status messages, scene messages, mission execution messages, emergency messages, etc. Then, the messages are sent to the scene perception part through a dedicated communication channel to achieve multimodal scene perception.
[0063] The following example further illustrates this: User commands "Conduct aerial inspections of the photovoltaic plant area, and simultaneously identify whether the photovoltaic panels are damaged and whether there are pedestrians at the exit."
[0064] The subtask allocation results are as follows: "U-01 and U-02 inspect the photovoltaic factory area," "U-03 and U-04 identify damaged photovoltaic panels," and "U-05 navigates to the east exit for moving target detection." Based on the scene perception section, obstacles such as iron towers, cars, and pedestrians are detected, and their positions are updated in real-time using a multimodal information fusion model. For trajectory generation, there are five drones (U-01 to U-05) in the scene. Based on the LLM's task allocation results, corresponding conflict-free trajectories are generated. Then, each drone departs from the takeoff platform. U-01 and U-02 traverse the factory map in a zigzag pattern at a height of 20m, avoiding iron towers and other drones performing tasks. U-03 and U-04 detect the breakage of 32 photovoltaic panels in two rows at a speed of 1m / s, carefully avoiding pedestrians and cars. U-05 navigates to the east exit, uses the camera's human motion detection module, and analyzes whether pedestrians are present.
[0065] In summary, this embodiment of the invention collects multi-source sensor data, aligns and maps it using CLIP's contrastive learning mechanism, and fuses the mapped result with user input and external knowledge information retrieved from the UAV knowledge base to obtain a global task description embedding. Next, it uses this embedding, along with a large language model and an attention mechanism, to perform inference analysis on the embedding, assigning optimal sub-tasks to each UAV under multiple constraints. Based on the optimal sub-tasks, it uses an improved diffusion model—with a conflict search algorithm optimizing the diffusion model's denoising process—for UAV path planning, resulting in conflict-free paths for multi-agent execution, suitable for path exploration in dynamic scenarios.
[0066] One embodiment of the present invention provides a multi-UAV collaborative task planning system. For details, please refer to [link / reference]. Figure 6 , Figure 6 The diagram shown illustrates the structure of a multi-UAV cooperative mission planning system according to one embodiment of the present invention, including: The retrieval module M1 is used to input the user's task description text into the UAV knowledge base, which consists of scene information, flight action library and historical flight logs, and retrieve external knowledge information. The fusion module M2 is used to acquire perception data from the UAV's onboard sensors, fuse the user task description text, the external knowledge information, and the perception data to generate a task description embedding vector. The task allocation module M3 is used to embed the task description into a preset large language model for processing under the constraints of the conditional filtering network constructed by the flight action library, and generate an initial sub-task sequence of the current task description. The task filtering module M4 is used to drive the large language model to filter the initial subtask sequence and obtain the optimal subtask for each UAV. The path planning module M5 is used to inject the safety constraints established according to the conflict search algorithm into the denoising network of the diffusion model to obtain the improved diffusion model. The optimal subtask is used to drive the improved diffusion model to output the optimal path for each UAV. The task execution module M6 is used to control each UAV to execute the corresponding task along the optimal path.
[0067] like Figure 7 As shown, this embodiment of the invention also provides a computer device. Figure 7 This is a structural block diagram of a preferred embodiment of a computer device provided by the present invention. The computer device includes a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor. When the processor executes the computer program, it implements the method described above.
[0068] Preferably, the computer program can be divided into one or more modules / units (such as computer program 1, computer program 2, ...), and the one or more modules / units are stored in the memory and executed by the processor to complete the present invention. The one or more modules / units can be a series of computer program instruction segments capable of performing specific functions, and the instruction segments are used to describe the execution process of the computer program in the computer device.
[0069] The processor can be a central processing unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor can be a microprocessor, or the processor can be any conventional processor. The processor is the control center of the terminal device, connecting various parts of the terminal device through various interfaces and lines.
[0070] The memory mainly includes a program storage area and a data storage area. The program storage area can store the operating system, applications required for at least one function, etc., while the data storage area can store related data, etc. Furthermore, the memory can be a high-speed random access memory, or a non-volatile memory, such as a plug-in hard drive, a SmartMedia Card (SMC), a Secure Digital (SD) card, and a Flash Card, or other volatile solid-state storage devices.
[0071] It should be noted that the aforementioned terminal devices may include, but are not limited to, processors and memory, as will be understood by those skilled in the art. Figure 7 The structural block diagram is merely an example of a terminal device and does not constitute a limitation on the terminal device. It may include more or fewer components than shown, or combine certain components, or use different components. Those skilled in the art will understand that all or part of the processes in the above embodiments can be implemented by a computer program instructing related hardware. The program can be stored in a computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. The storage medium may be a magnetic disk, optical disk, read-only memory (ROM), or random access memory (RAM), etc.
[0072] Accordingly, embodiments of the present invention provide a computer-readable storage medium, the computer-readable storage medium including a stored computer program, wherein, when the computer program is executed, it controls the device where the computer-readable storage medium is located to perform the steps in the method of the above embodiments, for example... Figure 1 Steps S1 to S6 as described above.
[0073] The technical features and effects of the multi-UAV collaborative mission planning system proposed in this embodiment of the invention are the same as those of the multi-UAV collaborative mission planning method proposed in this embodiment of the invention, and will not be repeated here.
[0074] The embodiments described above are merely illustrative of several implementations of the present invention, and while the descriptions are specific and detailed, they should not be construed as limiting the scope of the present invention. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of the present invention, and these modifications and improvements all fall within the scope of protection of the present invention. Therefore, the scope of protection of this patent should be determined by the appended claims.
Claims
1. A multi-UAV cooperative task planning method, characterized in that, include: The user's task description text is input into the UAV knowledge base, which consists of scene information, flight action library and historical flight logs, for retrieval to obtain external knowledge information; Perception data is acquired from the onboard sensors of the UAV, and the user task description text, the external knowledge information and the perception data are fused to generate a task description embedding vector; Under the constraints of the conditional filtering network constructed by the flight action library, the task description is embedded into a vector input to a preset large language model for processing, generating an initial subtask sequence of the current task description; The large language model is driven to filter the initial subtask sequence to obtain the optimal subtask for each UAV; The security constraints established based on the conflict search algorithm are injected into the denoising network of the diffusion model to obtain the improved diffusion model. The optimal subtask is used to drive the improved diffusion model to output the optimal path for each UAV. Control each UAV to execute its corresponding task along the optimal path.
2. The multi-UAV cooperative mission planning method as described in claim 1, characterized in that, The user task description text is input into a UAV knowledge base consisting of scene information, flight action database, and historical flight logs for retrieval to obtain external knowledge information, including: The user task description text is used to retrieve knowledge information from the UAV knowledge base that meets the first preset condition for similarity, which is then used as the external knowledge information.
3. The multi-UAV cooperative mission planning method as described in claim 1, characterized in that, Before fusing the user task description text, the external knowledge information, and the perceived data, the process includes: The perceived data is input into a multimodal large model, and the weights of the modal projection function in the multimodal large model are dynamically adjusted using the embedding vector of the user task description text in order to align the modal features corresponding to the perceived data. During the alignment process, a designed loss function is introduced to optimize the parameters of the multimodal large model, thereby obtaining the embedded representation of the perceived data.
4. The multi-UAV cooperative mission planning method as described in claim 1, characterized in that, The process of generating the initial subtask sequence includes: The conditional filtering network is constructed using the flight action control functions, physical constraints, and rule constraints of the UAV in the flight action library as indicators; The task description is embedded into a vector input to the multi-head attention layer of the large language model for context awareness, thereby obtaining the global task intent. With reference to the historical flight logs, the thought chain of the large language model, driven by the conditional filtering network as a constraint, is used to reason and decompose the global task intent, generating the initial subtask sequence.
5. The multi-UAV cooperative mission planning method as described in claim 1, characterized in that, The driving large language model filters the initial subtask sequence to obtain the optimal subtask for each UAV, including: Using semantic offset, trajectory distribution similarity, and task risk level as screening indicators, the large language model is driven to verify each subtask in the initial subtask sequence, and the subtask whose verification results meet the second preset condition is determined as the optimal subtask.
6. The multi-UAV cooperative mission planning method as described in claim 5, characterized in that, The driving language model filters the initial subtask sequence to obtain the optimal subtask for each UAV, and also includes: The offset of the contextual semantics between each subtask in the initial subtask sequence and the global task intent is verified to obtain the first verification result corresponding to each subtask; A similarity analysis is performed on the current trajectory distribution and historical trajectory distribution corresponding to each subtask to obtain the second verification result corresponding to each subtask; The risk level of each subtask is verified using the conditional filtering network to obtain the third verification result for each subtask. By integrating the first verification result, the second verification result, and the third verification result, the target verification result of each sub-task is obtained; The subtask whose target verification result satisfies the second preset condition is determined as the optimal subtask.
7. The multi-UAV cooperative mission planning method as described in claim 1, characterized in that, The step of injecting security constraints established according to the conflict search algorithm into the denoising network of the diffusion model to obtain an improved diffusion model, and using the optimal subtask to drive the improved diffusion model to output the optimal path corresponding to each UAV, includes: A conflict set is constructed based on the conflict search algorithm and encoded as a mask vector. The mask vector is then injected as a security constraint into the denoising network of the diffusion model to obtain the improved diffusion model. The optimal subtask is encoded as a task vector, and the improved diffusion model is driven to iteratively sample from the pre-trained trajectory distribution using the task vector as the initial condition. At the same time, a denoising network is invoked to dynamically adjust the trajectory distribution until the optimal path is output.
8. A multi-UAV collaborative mission planning system, characterized in that, include: The retrieval module is used to input the user's task description text into the UAV knowledge base, which consists of scene information, flight action library and historical flight logs, and retrieve external knowledge information. The fusion module is used to acquire perception data from the UAV's onboard sensors, fuse the user task description text, the external knowledge information, and the perception data to generate a task description embedding vector. The task allocation module is used to embed the task description into a preset large language model for processing under the constraints of a conditional filtering network constructed from the flight action library, thereby generating an initial subtask sequence of the current task description. The task filtering module is used to drive the large language model to filter the initial subtask sequence and obtain the optimal subtask for each UAV. The path planning module is used to inject the safety constraints established according to the conflict search algorithm into the denoising network of the diffusion model to obtain the improved diffusion model, and use the optimal subtask to drive the improved diffusion model to output the optimal path for each UAV. The task execution module is used to control each UAV to execute the corresponding task along the optimal path.
9. A computer device, characterized in that, It includes a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, wherein the processor executes the computer program to implement the multi-UAV cooperative mission planning method as described in any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program, wherein when the device containing the computer-readable storage medium executes the computer program, it implements the multi-UAV cooperative mission planning method as described in any one of claims 1 to 7.