Multi-unmanned aerial vehicle cooperative task allocation method and device for complex task scene
Through a causal-driven end-to-end algorithm architecture, efficient, robust, and globally optimal decision-making for UAV collaborative task allocation is achieved, solving the problems of insufficient state representation and conflict resolution in complex task scenarios, and improving the task execution efficiency of large-scale heterogeneous clusters.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- AERONAUTICS RES INST OF CHINA
- Filing Date
- 2026-03-24
- Publication Date
- 2026-06-23
Smart Images

Figure CN122261237A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of unmanned aerial vehicle (UAV) scheduling and intelligent optimization technology, and more specifically, to a method and apparatus for multi-UAV collaborative task allocation in complex task scenarios. Background Technology
[0002] Existing multi-UAV collaborative task allocation technologies mostly employ traditional single graph representation and correlation analysis, which fail to capture high-order spatiotemporal collaborative patterns between entities and accurately distinguish the essential causal dependencies between tasks. This results in insufficient global state representation and an explosive increase in decision-making complexity. Furthermore, existing solutions overly rely on centralized architectures and a one-way "offline training-online execution" model. This not only restricts the large-scale deployment of heterogeneous clusters due to communication and computing power bottlenecks but also leads to weak generalization ability and severely delayed dynamic responses when facing extreme dynamic scenarios outside the distribution network, such as sudden emergency tasks or UAV malfunctions. In addition, traditional conflict resolution mechanisms are prone to getting trapped in local optima and triggering secondary conflicts. Moreover, the entire system lacks closed-loop iteration and feedback verification methods based on high-fidelity simulation. Consequently, existing algorithms generally suffer from low allocation efficiency, poor matching accuracy, and weak robustness in engineering applications under strongly coupled, multi-constrained, and extreme dynamic environments.
[0003] Therefore, the present invention provides a method and apparatus for multi-UAV collaborative task allocation in complex task scenarios, thereby improving the above-mentioned technical problems. Summary of the Invention
[0004] This disclosure aims to address the shortcomings of existing technologies by providing a method and apparatus for multi-UAV collaborative task allocation in complex task scenarios. The present invention adopts a causal-driven full-link algorithm architecture, which breaks through the inherent limitations of traditional correlation-based data-driven algorithms and achieves efficient, robust, and globally optimal task allocation in extreme dynamic, strongly constrained, and large-scale heterogeneous scenarios.
[0005] To achieve the above objectives, the present disclosure proposes the following technical solutions:
[0006] In a first aspect, embodiments of this disclosure propose a multi-UAV collaborative task allocation method for complex task scenarios, comprising the following steps:
[0007] S1. Acquire multi-source heterogeneous data, which includes full-dimensional state data, full-element attribute data of the task, and full-domain dynamic environmental constraint data collected in real time by UAV onboard sensors; after standardizing the multi-source heterogeneous data, construct a spatiotemporal heterogeneous hypergraph containing UAV nodes, task nodes, and environmental constraint nodes, and use a spatiotemporal heterogeneous hypergraph convolutional network for feature encoding to output a low-dimensional dense representation vector representing the global physical state.
[0008] S2. Based on the low-dimensional dense representation vector and the historical task dataset containing physical execution time and resource consumption, calculate the causal information of physical entities to extract the macro-causal structure and construct a causal directed acyclic graph; perform constraint pruning by quantifying the causal intervention effect, and calculate the robustness evaluation index of the allocation scheme based on the counterfactual inference world model to screen the allocation benchmark framework and redundancy backup strategy.
[0009] S3. The heterogeneous UAV cluster is divided into multiple cooperative subgroups using the K-means++ clustering algorithm, and a two-layer federated learning architecture of subgroup local layer and global aggregation layer is constructed. The local matching degree representation matrix is output through the heterogeneous graph attention network in the subgroup local layer, and a global cross-domain cooperative matching representation matrix is generated in the global aggregation layer.
[0010] S4. Based on the cross-scenario causal invariance law, construct a meta-task set covering complex physical scenarios. Use a model-independent meta-learning framework to perform offline meta-training on a multi-agent reinforcement learning network to obtain an initial meta-policy model. In the online dynamic allocation stage, combine the real-time state vector with the global cross-domain collaborative matching representation matrix, and generate multi-UAV allocation decisions through policy network reasoning and a distributed consensus mechanism.
[0011] S5. Based on the allocation decision, the causal directed acyclic graph, and the global dynamic environmental constraint data, detect physical resource conflicts, temporal conflicts, spatial collision conflicts, and communication coordination conflicts of the UAV cluster; use a distributed Nash bargaining game mechanism and a causal constraint backtracking mechanism to resolve the conflicts, and perform pre-simulation verification and closed-loop iterative optimization on the resolved solution in a high-fidelity digital twin, and finally output and execute UAV task allocation and flight control commands.
[0012] As a preferred embodiment of the present invention, the construction of the spatiotemporal heterogeneous hypergraph and the feature encoding using the spatiotemporal heterogeneous hypergraph convolutional network specifically include:
[0013] The hyperedges in the spatiotemporal heterogeneous hypergraph connect any number of heterogeneous nodes. The weight of any hyperedge is determined by the cosine similarity of the feature vectors between the associated entities and the strength of the physical entity association. The dynamic evolution of the hyperedge weight over time is calculated by combining the time decay coefficient.
[0014] The spatiotemporal heterogeneous hypergraph convolutional network aggregates the spatiotemporal correlation features of nodes and hyperedges based on the normalized spatiotemporal heterogeneous hypergraph Laplacian matrix. The Laplacian matrix maps the actual communication topology of the UAV cluster in the current physical space, the geographical location dependence of task execution, and the spatial constraints of environmental obstacles.
[0015] One-dimensional temporal convolution kernels are used to perform convolution operations on the time series of node features to smooth sensor noise during continuous flight of UAVs and capture the time-varying evolution of UAV flight trajectories.
[0016] As a preferred technical solution of the present invention, the extraction of macroscopic causal structure, construction of causal directed acyclic graph, and constraint pruning specifically include:
[0017] The task characteristics and the actual physical execution state are spliced together to form a micro state space. The causal information of the task system at different scales is calculated, and the macro state mapping function is determined by maximizing the causal information.
[0018] The causal directed acyclic graph is constructed based on the essential causal dependencies between tasks. The weights of the causal edges are calculated based on the partial derivatives of the completion quality of subsequent tasks with respect to the completion quality of preceding tasks in historical batch data, as well as the normalized difference in actual physical execution time.
[0019] By combining front-door and back-door adjustments, the causal intervention effect of task allocation decisions on global task effectiveness is quantified, and constraints corresponding to spurious correlations with causal intervention effects less than a preset intervention effect threshold are removed to compress the physical allocation solution space dimension.
[0020] As a preferred embodiment of the present invention, generating a global cross-domain collaborative matching representation matrix in the two-layer federated learning architecture of the subgroup local layer and the global aggregation layer specifically includes:
[0021] Independent subgroup computing nodes encrypt their local model parameters using a homomorphic encryption algorithm at the subgroup local layer, and only upload the encrypted model parameters to the global aggregation layer;
[0022] The global aggregation layer uses a weighted federated average algorithm to decrypt and aggregate the received model parameters to generate global collaborative model parameters.
[0023] A global collaborative heterogeneous graph is constructed based on the parameters of the global collaborative model, and cross-subgroup collaborative features are encoded through a global heterogeneous graph attention network to generate the global cross-domain collaborative matching representation matrix.
[0024] An 8-bit integer quantization scheme and knowledge distillation technique are used to lightweight the representation model, and the lightweight model is deployed on the airborne edge computing device of a single UAV.
[0025] As a preferred embodiment of the present invention, the offline meta-training of the multi-agent reinforcement learning network using a model-independent meta-learning framework specifically includes:
[0026] An action space is constructed in the reinforcement learning network. The action matrix elements in the action space are subject to the task uniqueness constraint that each task is assigned to only one UAV, and the UAV load constraint that the task assigned to a single UAV does not exceed its physical maximum load.
[0027] A reward function is constructed in the reinforcement learning network. The reward function includes a global task performance reward, an execution success rate reward that satisfies the physical spatiotemporal window constraint, an energy consumption penalty for actual remaining battery life reduction, and a causal constraint violation penalty based on the causal directed acyclic graph determination.
[0028] Based on the sampled physical trajectory data, the policy loss and value loss are calculated, gradient updates and meta-external evaluations are performed to obtain the initial meta-policy model.
[0029] As a preferred embodiment of the present invention, the online dynamic allocation phase generates multi-UAV allocation decisions through policy network reasoning and a distributed consensus mechanism, specifically including:
[0030] In a steady-state scenario without sudden operational conditions, the allocation decision for each UAV is obtained by weighting the action probability output by the policy network with the value of the global cross-domain collaborative matching representation matrix.
[0031] In dynamic scenarios of equipment physical failure or sudden changes in the environmental wind field, local sensor trajectory data is collected to calculate the fine-tuning loss. The gradient update step size is adaptively adjusted according to the intensity of the change to perform fast fine-tuning and generate a new allocation decision that is adapted to the current physical change scenario.
[0032] A single UAV interacts with neighboring UAVs within physical communication distance via a low-latency communication link to exchange local decision information, calculates a consensus decision value for allocation decisions, and finally confirms the allocation decision when the consensus decision value is greater than a preset consensus threshold.
[0033] As a preferred embodiment of the present invention, the method of using a distributed Nash bargaining game mechanism to resolve conflicts and performing pre-simulation verification in a high-fidelity digital twin specifically includes:
[0034] For the detected resource conflicts, spatial collision conflicts, and communication coordination conflicts, the marginal benefit, which includes task dynamic priority, matching degree, environmental threat risk coefficient, and real physical energy consumption cost, is used as the bargaining benefit function of the game participants. The UAVs involved in the conflict solve the bargaining solution by maximizing the benefit product through point-to-point communication.
[0035] The high-fidelity digital twin is constructed by integrating the drone flight entity dynamics model, sensor mathematical model, and virtual-real data synchronization interface; a digital image is generated by synchronizing real physical scene data into the digital twin; a comprehensive risk assessment value including the probability of mission physical failure, the probability of delay, and the probability of insufficient battery life is calculated using preset risk weight coefficients; the allocation scheme is adjusted when the set risk threshold is exceeded.
[0036] Secondly, embodiments of this disclosure propose a multi-UAV collaborative task allocation device for complex task scenarios, the device comprising:
[0037] The global multi-source data acquisition and representation module is used to collect full-dimensional state data, task and environmental data from UAV onboard sensors, perform standardized processing, construct a spatiotemporal heterogeneous hypergraph and perform feature encoding, and output a low-dimensional dense representation vector representing the global physical state.
[0038] The Causal Emergence Decoupling and Counterfactual Inference module is used to extract the macro-causal structure based on the causal emergence algorithm to construct a causal directed acyclic graph, perform constraint pruning through intervention effect quantification, and build a counterfactual inference world model to screen and allocate the baseline framework and redundancy backup strategy.
[0039] The hierarchical federated heterogeneous matching degree representation module is used to construct a two-layer federated learning architecture. It completes the matching degree encoding between the UAV and the task and generates a global cross-domain collaborative matching representation matrix through a heterogeneous graph attention network.
[0040] The generalized causal meta-reinforcement learning decision module is used to construct a meta-task set and perform offline meta-training of the reinforcement learning network. In the online stage, it combines real-time state features to generate allocation decisions through fine-tuning and a distributed consensus mechanism.
[0041] The distributed conflict resolution and digital twin optimization module is used to detect physical and logical conflicts of UAVs. It resolves conflicts through a distributed Nash bargaining game mechanism and a causal constraint backtracking mechanism, and performs pre-simulation verification and closed-loop iterative optimization in a high-fidelity digital twin to output UAV task allocation and flight control commands.
[0042] Thirdly, this disclosure proposes an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the program, it implements the above-described multi-UAV collaborative task allocation method for complex task scenarios.
[0043] Fourthly, embodiments of this disclosure propose a computer-readable storage medium storing a computer program that, when executed by a processor, implements the aforementioned multi-UAV collaborative task allocation method for complex task scenarios.
[0044] In summary, the present invention has the following beneficial effects:
[0045] Firstly, this invention proposes a spatiotemporal heterogeneous hypergraph representation technology, which overcomes the limitation of traditional graph representations that can only describe simple pairwise relationships between entities. Through full data acquisition and hypergraph modeling, it accurately captures high-order collaboration, spatiotemporal dependence, and dynamic evolution patterns among three types of entities: drones, tasks, and environmental constraints. Combined with feature encoding from a spatiotemporal hypergraph convolutional network, it achieves a low-dimensional, dense representation of the entire state, effectively avoiding the loss of key state information and providing standardized input with high-dimensional and strong representational capabilities for subsequent decision-making.
[0046] Secondly, this invention introduces a causal emergence algorithm and a counterfactual inference mechanism to accurately distinguish between essential causal dependencies and spurious correlations between tasks. Through quantification of causal intervention effects and constraint pruning, it compresses the solution space of high-dimensional task allocation problems, significantly reducing decision complexity. Simultaneously, by using counterfactual inference to pre-judge risks and select optimal solutions, it improves the success rate of task execution, effectively addressing the core pain points of explosive decision complexity and insufficient global optimality of solutions in traditional algorithms.
[0047] Thirdly, this invention constructs a two-layer federated learning architecture of "subgroup local layer - global aggregation layer," combining K-means++ clustering and homomorphic encrypted parameter transmission to achieve distributed task allocation for a thousand-unit heterogeneous UAV swarm, reducing communication data volume and controlling decision latency to the millisecond level. Through lightweight optimization of model quantization and knowledge distillation, the number of representation model parameters is reduced, allowing direct deployment on airborne edge computing devices to achieve fully distributed autonomous decision-making, effectively avoiding computing power bottlenecks and single-point failure risks.
[0048] Fourth, this invention proposes an out-of-distribution generalization causal meta-reinforcement learning decision framework. Based on the cross-scenario causal invariance law, a large-scale meta-task set is constructed and trained offline, enabling the initial meta-policy model to quickly adapt to unfamiliar extreme scenarios with 1-5 step gradient updates. Faced with out-of-distribution conditions such as new emergency tasks and equipment failures, only a small amount of real-time data for 10 decision steps is needed to complete model fine-tuning, solving the defects of weak generalization and slow response of traditional algorithms in dynamic scenarios.
[0049] Fifth, this invention constructs a distributed Nash bargaining game mechanism without a central node to address resource, spatial collision, and communication coordination conflicts. It automatically achieves global optimal consensus through bargaining solutions that maximize marginal returns, improving the success rate of conflict resolution. Furthermore, it can still achieve local resolution even in scenarios with partial communication interruptions, significantly enhancing robustness. For temporal conflicts, a causal constraint backtracking mechanism is adopted to ensure that allocation satisfies essential dependencies, effectively avoiding secondary conflicts and achieving an organic balance between robustness and global optimality.
[0050] Sixth, this invention proposes a digital twin closed-loop iterative optimization system. By constructing a 1:1 high-fidelity digital twin, it achieves autonomous generation and simulation verification of extreme scenarios, improving offline training efficiency and significantly reducing the cost and risk of real flight testing. In the online execution phase, through real-time virtual-real mapping and feedback iteration, the execution results of real scenarios are fed back to the algorithm model, forming a full lifecycle optimization closed loop of "virtual training - online decision-making - virtual-real verification - feedback iteration," continuously improving the algorithm's generalization ability.
[0051] Seventh, this invention, through the deep integration of end-to-end technologies, possesses cross-scenario migration, large-scale expansion, and dynamic environment adaptation capabilities, and can be widely applied to various complex operational scenarios. Under the same scenario conditions, compared with existing technologies, the overall task efficiency is improved by more than 30%, providing core technical support for the efficient collaborative scheduling of large-scale heterogeneous UAV swarms. Attached Figure Description
[0052] Figure 1 The flowchart illustrates a multi-UAV collaborative task allocation method for complex task scenarios provided in this embodiment of the invention. Detailed Implementation
[0053] The present application will now be described in detail with reference to specific embodiments. These embodiments will help those skilled in the art to further understand the present application, but do not limit the present application in any way. It should be noted that those skilled in the art can make several modifications and improvements without departing from the concept of the present application. These all fall within the protection scope of the present application.
[0054] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.
[0055] Unless otherwise defined, all technical and scientific terms used in this specification have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in this specification is for the purpose of describing particular embodiments only and is not intended to limit the scope of this application. The term "and / or" as used in this specification includes any and all combinations of one or more of the associated listed items.
[0056] Furthermore, the technical features involved in the various embodiments of this application described below can be combined with each other as long as they do not conflict with each other.
[0057] In one specific embodiment, the multi-UAV collaborative task allocation method for complex task scenarios provided by this invention can be executed by an electronic device. In practical applications, this electronic device can be an airborne edge computing device installed on a single UAV, a cloud server of a ground control station, or a distributed collaborative computing system composed of an airborne edge computing device and a ground control station / satellite communication node. This electronic device typically includes at least one processor and a memory. The processor calls and executes computer programs or instructions stored in the memory, combining physical parameters such as full-dimensional state data, full-element attribute data of the task, and full-domain dynamic environmental data collected by the UAV's airborne sensors, thereby realizing the multi-UAV collaborative task allocation method provided by this invention, and ultimately outputting and executing UAV task allocation and flight control commands. The technical solution of this invention will be described in detail below with reference to specific embodiments.
[0058] This disclosure aims to address the shortcomings of existing multi-UAV collaborative task allocation technologies in complex task scenarios, including insufficient global state representation, inefficient task coupling and decoupling, poor adaptability to large-scale clusters, weak generalization in dynamic scenarios, low robustness in conflict resolution, and lack of iterative optimization mechanisms. Therefore, this disclosure proposes a multi-UAV collaborative task allocation method and apparatus for complex task scenarios to achieve efficient, robust, and globally optimal task allocation in extremely dynamic, strongly constrained, and large-scale heterogeneous scenarios. This method employs cutting-edge technologies such as spatiotemporal heterogeneous hypergraph representation, causal emergent reasoning, hierarchical federated learning, causal meta-reinforcement learning, and digital twin closed-loop optimization to construct a causal-driven end-to-end algorithm architecture, overcoming the inherent limitations of traditional correlation-based data-driven algorithms. This achieves the goal of efficient collaborative operation of large-scale heterogeneous UAV clusters in strongly coupled tasks and extreme dynamic environments.
[0059] Example 1: Please refer to Figure 1 , Figure 1 A flowchart of the multi-UAV cooperative task allocation method for complex task scenarios, as described in an embodiment of this disclosure, is shown. The overall process mainly includes the following five steps:
[0060] S1. Global Multi-Source Data Acquisition and Spatiotemporal Heterogeneous Hypergraph Representation: This step provides standardized global state input for the entire algorithm. The core is to complete the full acquisition, fusion, and high-dimensional feature encoding of multi-source heterogeneous data in complex scenarios, solving the deficiency of traditional graph representation in describing high-order collaborative relationships of multiple entities.
[0061] Let: a heterogeneous drone swarm set ,in Total number of drones; individual drone State feature set ,in This represents the total number of drone state features; the feature set includes real-time remaining battery life. Remaining load capacity Flight performance attenuation coefficient (range of values) (The closer the value is to 1, the smaller the performance degradation) Real-time flight speed 3D attitude angle ( In order, the parameters are roll angle, pitch angle, and yaw angle, and the equipment health status is as follows. (range of values) (The closer the value is to 1, the better the device status) and the effective communication distance. Key features, etc.
[0062] Set of complex tasks to be executed ,in Total number of tasks; individual task Demand Feature Set ,in The total number of task requirement features; the solicitation includes dynamic task priorities. (range of values) (A higher value indicates a higher level of task urgency) Spatiotemporal execution window (Earliest start time, latest end time of task, in order), task accuracy requirements Scope of Operations Environmental adaptation requirements (range of values) (A higher value indicates a higher tolerance to complex environments) and the type of load required. Key features include (discrete features, coded as numerical values according to job type).
[0063] Global dynamic environment constraint set ,in This represents the total number of environmental constraint features; the feature set includes the spatial coordinates of obstacles. Real-time wind speed and direction ( For wind speed, (Angle between wind direction and due north) and electromagnetic interference intensity No-fly zone boundary coordinates Threat Area Level (range of values) Key features include (a higher value indicates a higher level of threat).
[0064] Based on the above definition, the synchronous collection of three types of core data is completed. The first is the full-dimensional status data of the heterogeneous UAV swarm, with the collection frequency consistent with the sampling frequency of the UAV's onboard sensors. First, it ensures the capture of real-time status changes of the drone; second, it collects all element attribute data for complex tasks. Basic task data is collected all at once during the task release phase, while data from unexpected tasks triggers a real-time collection mechanism, with collection latency controlled within [a certain range]. The third aspect is the dynamic constraint data of the entire environment. This data is collected through multi-source fusion from satellite remote sensing, ground monitoring stations, and UAV-borne environmental perception modules. Real-time updates are triggered when changes in environmental characteristics exceed a threshold, ensuring the timeliness of environmental data.
[0065] S1.1 Standardization Preprocessing: Preprocess all features to eliminate dimensional differences. For example, the standardization calculation formula for UAV state features is:
[0066]
[0067] in, For drones No. The standardized results of class state features, This is the global mean of this feature in the drone swarm. This represents the global standard deviation of this feature within the drone swarm. To prevent the minimum value where the denominator is zero, the value is taken as... .
[0068] The standardized calculation formula for task requirement characteristics is as follows:
[0069]
[0070] in, For the task No. The standardization results of class demand characteristics This represents the global mean of the feature across the task set. This represents the global standard deviation of the feature across the task set. The values are the same as those mentioned above.
[0071] The standardized calculation formula for environmental constraint characteristics is as follows:
[0072]
[0073] in, For environmental constraints Class feature number Standardized results of each sampling point This is the global mean of the feature across all sampling points. This represents the global standard deviation of the feature across all sampling points. The values are the same as those mentioned above.
[0074] The above three formulas are used to complete the standardization transformation of all original features, resulting in the standardized UAV state feature set. Task requirement feature set Environmental constraint feature set This provides standardized feature inputs for the construction of spatiotemporally heterogeneous hypergraphs.
[0075] S1.2 Constructing a Spatiotemporal Heterogeneous Hypergraph: Constructing the Hypergraph Modeling higher-order relationships among multiple entities. Node set. Includes a subset of drone nodes Task node subset Environmental constraint node subset Each node's feature vector is the standardized feature set of the corresponding entity, i.e., the drone node. The feature vector is Task Node The feature vector is Environmental constraint nodes The feature vector is ; A set of hyperedges is different from the edges in a traditional graph. Hyperedges can connect any number of heterogeneous nodes and are used to describe higher-order collaborations, temporal dependencies, constraints and other relationships between multiple entities. The weight of a hyperedge is determined by the feature similarity and the strength of the association between the associated entities.
[0076] Super Edge The weight calculation formula for connecting any number of heterogeneous nodes is:
[0077]
[0078] in, For super-edge The number of connected nodes For nodes and The cosine similarity of feature vectors is calculated using the following formula: , For nodes and Entity association strength, value range The association rules are determined based on the entity type. For example, the association strength between a drone and a task is determined by the matching degree between the drone's capabilities and the task requirements, and the association strength between a drone and environmental constraints is determined by the spatial distance between the drone and the environmental constraints. The closer the spatial distance, the stronger the association. It is a time-dimensional set containing discrete time sampling points. The time sampling interval is consistent with the data acquisition frequency, for ; This is the time decay coefficient, with a value range of... This is used to describe the decay characteristics of hypergraph associations over time, ensuring that the hypergraph can capture dynamic evolution patterns.
[0079] Arbitrary hyperedge in time The formula for calculating the weight evolution over time is:
[0080]
[0081] in, For super-edge In time The weight, For super-edge In time The weight, For time The new weights of the hyperedges are calculated based on the real-time features of the entities. Values Taking into account both the continuity and dynamism of hypergraph relationships, this paper achieves high-order relationship modeling for three types of entities—UAVs, missions, and environmental constraints—through the above hypergraph definition and formula. At the same time, it incorporates time dimension features to capture the dynamic evolution of the hypergraph over time, thus solving the problem that traditional graph representation can only describe simple pairwise relationships between entities.
[0082] S1.3, Spatiotemporal Heterogeneous Hypergraph Convolutional Network Encoding: A spatiotemporal heterogeneous hypergraph convolutional network is used to encode the features of the hypergraph, aggregating the spatiotemporal correlation features of nodes and hyperedges. Specifically, the... The formula for calculating the node feature update of a layer is:
[0083]
[0084] in, For the first The node feature matrix of the layer, each row vector of which represents the comprehensive physical state of a single UAV at a specific time step (including the remaining endurance, payload capacity and three-dimensional attitude angles, etc., calculated in real time by airborne sensors). It is the ReLU activation function. The normalized spatiotemporal heterogeneous hypergraph Laplacian matrix strictly maps the actual communication topology of the UAV swarm in the current physical space, the geographical location dependence of mission execution, and the spatial constraints of environmental obstacles. and These are the network's learnable weight matrix and bias vector, respectively.
[0085] To incorporate dynamic features over time, a temporal convolution kernel is introduced on top of the hypergraph convolution. The calculation formula is as follows:
[0086]
[0087] in, For time The node feature matrix after temporal convolution; one-dimensional temporal convolution operation. It is used to smooth out sensor noise during continuous flight of UAVs and aggregate the correlation information of node features in the time dimension, so as to accurately capture the evolution law of the inertial characteristics of UAV flight trajectory and time-varying factors such as environmental wind field. For length is The node feature time series, The window length of the temporal convolution kernel, with a value of [value to be filled in]. , The number of temporal convolution kernels, taking values of [value]. , The size of the temporal convolution kernel, taking values of [value]. By aggregating the correlation information of node features in the time dimension through temporal convolution operations, spatiotemporal feature fusion encoding is achieved.
[0088] The training objective of the spatiotemporal heterogeneous hypergraph convolutional network is to minimize the reconstruction loss of node features, ensuring that the network can accurately capture the spatiotemporal correlation features of the hypergraph. The formula for calculating the reconstruction loss function is as follows:
[0089]
[0090] in, For drone nodes In time The original standardized feature vector, For the first spatiotemporal heterogeneous hypergraph convolutional network Layer output nodes In time eigenvectors, The total number of layers in the network, with a value of [value to be filled in]. , This represents the total number of nodes in the hypergraph. The total length of time sampling. The network parameters are iteratively optimized using the AdamW optimizer, which is the square of the L2 norm, and the learner rate is... Values Weight decay coefficient Values Iterate training until the loss function converges, with a convergence threshold of 1. .
[0091] Through the encoding and training of the aforementioned spatiotemporal heterogeneous hypergraph convolutional network, a low-dimensional dense representation vector of the global state is finally output. The dimension of this representation vector is With a dimension far lower than the original features, it achieves feature dimensionality reduction and fusion, while fully preserving the high-order correlation features and spatiotemporal dynamic evolution features of the three types of entities: UAVs, tasks, and environmental constraints. It provides standardized, high-dimensional, and highly representative global state input for subsequent steps such as causal emergence decoupling of complex tasks and UAV-task matching degree representation, solving the technical defects of insufficient representation ability and loss of key correlation information in traditional feature encoding methods.
[0092] S2. Decoupling and Counterfactual Inference Optimization of Causal Emergence in Complex Tasks: Based on the above spatiotemporal heterogeneous hypergraph representation, this step extracts the macroscopic causal structure by calculating the causal information of physical entities, thereby defining a physical execution constraint framework for subsequent UAV task allocation.
[0093] First, based on the low-dimensional dense representation vector of the global state output in step one... Extract task-related feature subsets to construct a task feature matrix. ,in This represents the total number of tasks. To represent the vector dimension, the entire dataset from historical task executions is also incorporated. ,in The number of batches for historical tasks, and the dataset for each batch. This includes the execution status, relationships, and final performance results of all tasks in this batch. The task execution status includes the quality of task completion. (The closer the value is to 1, the higher the quality of completion), Task execution time Resource consumption These key indicators provide ample data support for causal emergent learning.
[0094] S2.1 Causal Emergence Mining and Graph Construction: Based on the above data foundation, the causal emergence algorithm is used to mine the macroscopic causal structure of the task system. The core of causal emergence is to identify the macroscopic descriptive level that maximizes the causal information flow by calculating the amount of causal information in the task system at different scales. First, the microscopic state space of the task system is defined as follows: ,in The micro-state dimension is composed of the feature vectors of individual tasks and their execution states. The macro-state space is... ,in For the macroscopic state dimension, it is obtained through clustering and abstraction of microscopic states. The macroscopic state mapping function is: This achieves a dimensionality reduction mapping from microscopic to macroscopic states. The key quantitative indicator for causal emergence is the amount of causal information. The calculation formula is as follows:
[0095]
[0096] in, For random variables and Mutual information measures the strength of the association between variables. For a moment The set of macroscopic states, For a moment The set of macroscopic states, For a moment The There are several macroscopic state variables. The first term of the formula represents the overall causal information content of the set of macroscopic states, and the second term represents the sum of the causal information content of each individual macroscopic state variable. The larger the value, the more significant the causal structure at the macroscopic state level, thus maximizing the amount of causal information. The optimal macroscopic state mapping function can be determined. ,Right now:
[0097]
[0098] Based on the optimal mapping function This approach mines the causal structure of macro-level task systems from micro-level single-task data, accurately distinguishing between essential causal dependencies and spurious correlations between tasks. Essential causal dependencies refer to the situation where the quality of completion of a preceding task directly determines the feasibility of executing subsequent tasks, satisfying... ,in The function is monotonically increasing, indicating that the prerequisite task is... The improvement in the quality of completion will directly drive the development of subsequent tasks. The completion quality improvement; false correlation refers to tasks occurring simultaneously only because of the same triggering source, satisfying but ,in The trigger source variable indicates that the completion quality of the two tasks is unrelated after controlling the trigger source variable.
[0099] Based on the identified causal structure, a causal directed acyclic graph of the task system is constructed. ,in It is a set of task nodes, each node corresponds to a task to be executed, and the node characteristics are the core attributes of the task such as dynamic priority and spatiotemporal execution window; For a set of causal edges, if the task With the task If there is an essential causal dependency, then there exists a directed edge. The direction starts from the preliminary tasks. Pointing to subsequent tasks ; This is the set of causal edge weights. The weight values quantify the strength of the causal influence between tasks. The formula for calculating the causal edge weights is:
[0100]
[0101] in, For the task To the mission Causal edge weights, range of values The larger the value, the stronger the causal influence. For the first Tasks in batch historical data Completion quality is crucial for the task The partial derivative of the completed quality measures the sensitivity to local causal influences. For the first Tasks in batches Execution time For the first Tasks in batches Execution time This is the maximum execution time of all historical tasks, used to normalize the time difference and avoid weight distortion due to excessive differences in execution time.
[0102] Using a causal directed acyclic graph This involves clarifying the causal transmission paths, influence strengths, and global critical execution paths between tasks. The global critical execution path is defined as the path with the largest sum of causal edge weights from the initial task to the final task, calculated as follows: The tasks on this path play a decisive role in the overall task efficiency and should be given priority in resource allocation.
[0103] S2.2 Quantification and Pruning of Causal Intervention Effects: To further reduce decision-making complexity, a combination of front-door and back-door adjustments is used to quantify the causal intervention effect of task allocation decisions on overall task effectiveness. First, intervention variables are defined. Indicates that drones Assigned to task Global task performance The weighted sum of the quality of all tasks completed, i.e. ,in For the task Dynamic priority, front-door adjustment is suitable for situations where mediator variables exist. causal path Intervention effect The calculation formula is:
[0104]
[0105] in, The average intervention effect of front door adjustment, These are mediating variables (such as task execution order and resource allocation). Taking the mediator variable The probability of the value. For intervention And the mediator variable is Global task performance expectation at that time; backdoor adjustment is applicable to causal paths without unobserved confounding variables, intervention effect The calculation formula is:
[0106]
[0107] in, The average intervention effect of backdoor adjustment, For the backdoor adjustment set, satisfying Block all from arrive The backdoor path. Combining the results of frontdoor and backdoor adjustments, the final causal intervention effect is obtained. ,in The weighting coefficient has a range of values. The structure of the causal path determines whether a significant mediating variable exists. The value should be close to 1, otherwise close to 0.
[0108] Based on the quantitative results of the causal intervention effect, the constraints of task allocation are pruned and optimized, and an intervention effect threshold is set. If a certain spurious correlation corresponds to an intervention effect Then, the constraints corresponding to the association are removed, leaving only the constraints corresponding to the essential causal dependency. The pruned task allocation problem solution space dimension The calculation formula is:
[0109]
[0110] in, The solution space dimension before pruning. The sum of the absolute values of the intervention effects corresponding to the removed constraints. The sum of the absolute values of the intervention effects corresponding to all constraints is the sum of the values of the intervention effects. Through this pruning mechanism, the solution space of the high-dimensional task allocation problem can be compressed by more than 60%, which greatly reduces the computational complexity of subsequent decisions, while ensuring that the global optimality of the solution is not affected.
[0111] S2.3 Counterfactual Inference and Robustness Assessment: After completing task causal decoupling and constraint pruning, based on the constructed causal directed acyclic graph... A counterfactual world model is built to simulate the execution results of different allocation schemes under extreme scenarios, predicting potential risks in advance. The core of counterfactual inference is to answer the question, "How would the task execution result change if a certain allocation decision were changed?" First, a set of counterfactual scenarios is defined. This includes decision-making adjustment scenarios (such as using drones). From the task Adjust to task ), equipment failure scenarios (such as performing tasks) Unexpected drone malfunctions), mission delay scenarios (such as missions) Execution time delay Three typical extreme scenarios. For any counterfactual scenario Define counterfactual allocation decision as The corresponding counterfactual task completion quality The calculation formula is:
[0112]
[0113] in, For the scene The counterfactual task completion quality vector is as follows: This represents the task completion quality vector in a real-world scenario. For the task Completion of quality affects allocation decisions The partial derivatives, For the scene Distribution decision The change amount (0 indicates no change, 1 indicates new allocation, -1 indicates cancellation of allocation). For the task Completion quality versus causal edge weights The partial derivatives, For the scene Lower Causal Edge Weights The change in quantity (e.g., in the case of equipment failure, the weight of the causal edge of the task corresponding to the failed drone decreases).
[0114] Based on counterfactual reasoning results, a robustness evaluation index for the allocation scheme is constructed. The robustness evaluation index is calculated using the following formula: (This is used to select the optimal allocation benchmark framework.)
[0115]
[0116] in, As a robustness evaluation index, the value range is... A larger value indicates a more robust allocation scheme. For the scene Counterfactual overall task effectiveness For global task performance in real-world scenarios. It is the sum of the absolute errors in the quality of all tasks completed in the counterfactual scenario and the factual scenario. The total number of tasks, used to normalize the error. Select a robustness evaluation metric. The largest allocation scheme serves as the baseline framework, while simultaneously generating a redundancy backup strategy. The calculation formula for constructing the redundancy backup strategy is as follows:
[0117]
[0118] in, Assign a matrix for redundant backups. To optimally allocate the reference matrix, This is a redundancy coefficient, with a range of values. The strength of the redundancy control strategy For the scene The lower allocation matrix is The formula generates a redundancy backup strategy that enables rapid switching in extreme scenarios, ensuring the continuity and stability of task execution and achieving pre-optimization of the allocation scheme.
[0119] S3. Accurate Characterization of Heterogeneous Matching Degree in Hierarchical Federation: This step is used to model the compatibility between the physical capabilities of drones and mission requirements for the distributed collaboration needs of heterogeneous drone swarms.
[0120] First, based on the low-dimensional dense representation vector of the global state output in step one... Extract a subset of drone-related features to construct a drone capability feature matrix. ( The total number of drones, To represent the vector dimension, extract task-related feature subsets to construct the task requirement feature matrix. ( (Total number of tasks), and introduce the causal directed acyclic graph of the task system constructed in step two. By incorporating causal constraint information into the matching degree representation process, the representation results are ensured to be consistent with the essential dependencies of the task.
[0121] S3.1 Cooperative Subgroup Partitioning and Local Layer Matching: Based on three dimensions—UAV type, communication range, and mission area—the K-means++ clustering algorithm is used to divide the large-scale heterogeneous UAV swarm into multiple cooperative subgroups. The clustering objective function is to maximize the homogeneity within subgroups and the differences between subgroups. The clustering loss function is calculated as follows:
[0122]
[0123] in, For clustering loss function, The total number of cooperative subgroups, with a value range of [value missing]. (Adjusted dynamically according to cluster size) For the first A cooperative subgroup, For drones The standardized capability feature vector, For the first The feature center vectors of each subgroup The square of the L2 norm. This is the weighting coefficient for differences between subgroups, with a value of [value missing]. This is used to balance the aggregation degree within subgroups and the separation degree between subgroups. After clustering, a two-layer federated learning architecture of "subgroup local layer - global aggregation layer" is built. The subgroup local layer contains... Each subgroup of computing nodes is responsible for feature computation and model training for the UAVs within its own subgroup. It only uploads encrypted model parameters (using homomorphic encryption, with the encryption key independently managed by each subgroup) to the global aggregation layer, without uploading raw state data. This significantly reduces cross-subgroup communication bandwidth requirements and reduces communication data volume compared to traditional centralized architectures. In addition, this will simultaneously improve both anti-interference capabilities and data privacy protection levels.
[0124] At the subgroup local layer, a local heterogeneous graph is constructed based on local drone nodes, task nodes, and a causal constraint framework. ,in , For the first A subset of drone nodes in the subgroup. For the subset of task nodes that are adapted to the geographical range and capabilities of this subgroup, the node features are the UAV capability feature vector and the task requirement feature vector, respectively. It is a set of edges, including drone-task edges (representing adaptation relationship), drone-drone edges (representing cooperation relationship), and task-task edges (representing causal dependency relationship). The set of edge weights is the initial value of the UAV-mission edge weights, which is calculated from the feature similarity. ,in The cosine similarity function is used. For the task The standardized requirement feature vector. Based on this local heterogeneous graph, the local matching degree representation encoding is completed through a heterogeneous graph attention network (GAT) with a dynamic attention mechanism. The inter-layer propagation rule formula is as follows:
[0125]
[0126] in, For nodes No. The feature vector of the layer, The LeakyReLU activation function (slope taken) ), For nodes The set of neighboring nodes in a local heterogeneous graph For nodes For nodes Attention weights For the first The learnable weight matrix of the layer, For the first Learnable bias vectors for the layer. Attention weights. The result is obtained by normalization using the softmax function, and the calculation formula is as follows:
[0127]
[0128] in, This is the attention coefficient vector. For feature concatenation operations, this attention mechanism can dynamically adjust the contribution weights of neighboring nodes, focusing on task nodes with high capability matching with the subgroup and significant impact on the global critical path, ultimately outputting a local UAV-task matching degree representation matrix for the subgroup. ( For the first Number of drones in the subswarm For the first (Number of tasks adapted to subgroups), matrix elements Characterizing the first drones in the subgroup With the task Local matching degree, value range The larger the value, the stronger the adaptability.
[0129] S3.2 Global Aggregation Layer Matching: After the local model training of each subgroup is completed, each subgroup will match the encrypted model parameters (including attention network weights). Bias Attention coefficient The parameters are uploaded to the global aggregation layer, which uses a weighted federated average algorithm to aggregate the parameters of each subgroup model, generating global collaborative model parameters. The aggregation formula is as follows:
[0130]
[0131] in, These are the parameters for the global collaborative model. For the first Number of drones in the subswarm The total number of drones, This is a homomorphic decryption operation (decryption is only performed at the global aggregation layer to ensure the security of parameter transmission). For the first Encrypted local model parameters for subgroups. Constructing a global collaborative heterogeneous graph based on global collaborative model parameters. ,in For the complete set of nodes, ( (A set of edges across subgroups, representing cooperative relationships between different subgroups). This is a global set of edge weights (integrating local edge weights and cross-subgroup collaborative weights). Cross-subgroup collaborative features are encoded using a global heterogeneous graph attention network to generate a global cross-domain collaborative matching representation matrix. Matrix elements Characterizing drones With the task The global matching degree is calculated using the following formula:
[0132]
[0133] in, This is the local matching degree weight coefficient, with a value range of... , It is the mean aggregation function. For drones The formula combines local adaptation features with global collaboration requirements to solve the problem of collaborative task allocation across subgroups.
[0134] S3.3 Model Quantization and Knowledge Distillation: To enable model deployment on airborne edge computing devices, model quantization and knowledge distillation techniques are used to lightweight the representation model. Model quantization employs an 8-bit integer (INT8) quantization scheme, converting floating-point model parameters into integer parameters. The quantization formula is as follows:
[0135]
[0136] in, These are the quantized integer parameters. These are raw floating-point parameters. The mean of the parameters, For the standard deviation of the parameter, To quantize bit width, This is a rounding function; after quantization, the approximate floating-point value is restored during model inference using a dequantization formula to ensure that the model accuracy loss is controlled within a certain range. Within this range. Knowledge distillation adopts a "teacher-student" architecture, using a trained deep global model as the teacher model to construct a shallow, lightweight student model (reducing the number of network layers from 5 to 3, and the hidden layer feature dimension from 256 to 64). The distillation loss function formula is:
[0137]
[0138] in, The cross-entropy loss is calculated for the hard labels (matching degree classification results) of the student model and the teacher model. The mean squared error loss is calculated for the soft labels (original matching scores) of the student and teacher models. The weighting coefficient has a value of [value]. Through quantization and distillation optimization, the number of model parameters is reduced. The above improves reasoning speed. It can be deployed directly on the onboard edge computing device of a single drone, enabling each drone to autonomously calculate its matching degree with the task to be assigned, without relying on a central node, thus avoiding the single point of failure risk of a centralized architecture.
[0139] S3.4 Closed-Loop Feedback Calibration: Construct a closed-loop feedback calibration mechanism to feed back the actual execution effect of subsequent task allocation decisions (such as task completion quality and resource utilization) to the matching degree representation model, dynamically updating the model parameters and matching degree weights. The calibration calculation formula is as follows:
[0140]
[0141]
[0142] in, , These are the model parameters before and after the update, respectively. The learning rate is set to a value of [value to be filled in]. , For gradient operators, For the feedback loss function, This is the matching degree matrix predicted by the model. This is a matching degree matrix that has been adjusted based on actual execution results; , The drones before and after the update are shown below. With the task The matching degree value, To update the step size, the value is [value to be filled in]. , For the task The actual quality of completion To predict the quality of task completion based on the predicted matching degree, this closed-loop mechanism continuously improves the accuracy and dynamic adaptability of the matching degree representation.
[0143] S4. Distribution Out-of-Distribution Generalization Causal Meta-Reinforcement Learning Dynamic Allocation Decision: This step is based on the cross-scene causal invariance rules extracted in step two, constructs a meta-task set covering complex physical scenarios, and generates the UAV's action command sequence and allocation decision through reinforcement learning algorithms.
[0144] First, based on the cross-scenario causal invariance rules extracted in step two (i.e., the causal directed acyclic graph of the task system)... (The essential causal dependencies and the strength of causal influence in the context of [the study]), constructing a large-scale meta-task set covering all types of complex scenarios. in The number of meta-tasks, with a value range of... Each meta-task Including heterogeneous drone swarm scale (range of values) ), task coupling (range of values) (obtained by normalizing the sum of causal edge weights) and sudden working condition types. (Including four categories: newly added emergency missions, drone malfunctions, communication interruptions, and sudden environmental changes), and the intensity of the environment to be countered. (range of values) Key scenario parameters (corresponding to threat area levels) are incorporated, along with the spatiotemporal hypergraph representation generated in step one. Step 3 generates the global cross-domain collaborative matching representation matrix. As the basis for state input, it ensures that the meta-task set can cover all possible in-distribution and out-of-distribution scenarios.
[0145] The algorithm is divided into two stages: offline meta-training and online dynamic allocation. The core of the offline meta-training stage is to construct a fusion framework of "causal constraints - meta-learning - multi-agent reinforcement learning," enabling the policy network to learn causal invariance rules across scenarios. First, the core element of reinforcement learning is defined: state space. This is the concatenation vector of the spatiotemporal hypergraph representation and the global matching degree representation, i.e. , dimension ( , The total number of drones, (Total number of tasks); motion space Discretized representation of the drone-task assignment matrix, for each action , Indicates that drones Assigned to task , This indicates that the task is not assigned, and the action space must satisfy the task uniqueness constraint. (Only one drone is assigned to each task) and drone load constraints ( For drones Maximum task load).
[0146] S4.1 Definition of Reinforcement Learning Reward Function: Reward Function To integrate the weighted sum of overall task performance, execution success rate, and resource consumption, and to incorporate a causal constraint penalty term to ensure that decisions satisfy essential causal dependencies, the reward function calculation formula is as follows:
[0147]
[0148] in, From state Execute action Transition to state Instant rewards Let be the weighting coefficient, satisfying The values are respectively ; The global task performance reward is calculated as follows: ( For the task Dynamic priority, For state Assign task (estimated completion quality); The success rate bonus is set to a value of [value]. If all tasks satisfy the time-space window constraint, otherwise... ; The penalty for resource consumption is calculated as follows: ( For drones Initial battery life, For state (remaining battery life); The penalty term for violating causal constraints is calculated as follows: ( As an indicator function, when the drone Assigned to task But no drones Assign to the preceding task The time value is Otherwise , (This refers to the causal edge weight).
[0149] S4.2 Offline Meta-Training: The Multi-Agent Proximal Policy Optimization (MAPPO) network is meta-trained using the Model-Independent Meta-Learning (MAML) framework. The MAPPO network includes a policy network. With value network The policy network outputs the action probability distribution, and the value network estimates the state value. The core of meta-training is to minimize the generalization loss across meta-tasks, that is, to allow the policy network to quickly adapt to new meta-tasks with a small number of gradient updates. The meta-training loss function is calculated as follows:
[0150]
[0151] in, The total training loss is the original value. These are the initial parameters for the policy network and the value network, respectively. For the first The policy network parameters after one gradient update on the individual task For the updated value network parameters, The learning rate within a single element is denoted as , and its value is . ; For the first The policy loss on each meta-task is calculated as follows: ( For the first The state distribution of each meta-task The dominant function is calculated as follows: , (for action value function) The value loss is calculated as follows: ( (For the target value, time-difference target calculation is used). This is the value loss weighting coefficient, with a value of [value missing]. .
[0152] During the meta-training process, each meta-task The process is executed in three steps: "sampling - intra-meta update - extra-meta evaluation": First, in the meta-task... Sampling in the scene Trajectory data ( , (where the trajectory length is used); then, based on the trajectory data, the strategy loss and value loss are calculated, and the initial parameters are adjusted accordingly. Perform a 1-step gradient update to obtain Finally, in the meta-task Resample trajectory data based on updated parameters Calculate the external loss, i.e. and This is then incorporated into the total meta-training loss. The initial parameters are optimized using the AdamW optimizer. Iterative optimization is performed, and the optimizer's out-of-meta learning rate is... Values Weight decay coefficient Values Iterative training continues until the total loss of the meta-training converges (convergence threshold is 1). Finally, an initial meta-policy model with cross-scenario generalization capability is obtained. This model can quickly adapt to unfamiliar extreme scenarios through 1-5 step gradient updates.
[0153] S4.3 Online Dynamic Allocation Decision and Consensus: The online dynamic allocation phase is divided into two categories: steady-state scenarios and dynamic mutation scenarios. In the steady-state scenario (no sudden operating conditions, stable environment), based on the initial meta-policy model... Combined with real-time global state representation Global matching degree representation ( (At the current moment), the task allocation decision for each drone is distributed and calculated using the following formula:
[0154]
[0155] in, For a moment drones With the task The optimal allocation decision, For a moment The real-time state vector, For a moment The global matching degree value is used to obtain the optimal decision by weighting the action probability output by the policy network with the matching degree value, ensuring that the allocation scheme meets all constraints and maximizes the global task efficiency.
[0156] In dynamic and unpredictable scenarios (such as the emergence of new emergency tasks, drone malfunctions, communication interruptions, and sudden environmental changes), a rapid fine-tuning mechanism is triggered, which first collects a small amount of real-time data about the current scenario. ( (This is for the local trajectory length, requiring only a small amount of data from 10 decision steps), and the fine-tuning loss is calculated based on the local data. The fine-tuning loss function is consistent with the intra-element loss function, that is... Then, the parameters of the initial meta-policy model are quickly fine-tuned. The fine-tuning update calculation formula is as follows:
[0157]
[0158]
[0159] in These are the fine-tuned policy network and value network parameters. To fine-tune the learning rate, a value of [value to be filled in] is used. Fine-tuning can be completed in just 1-5 gradient updates, with the number of update steps adaptively adjusted by the intensity of scene mutations (the greater the mutation intensity, the more update steps are required, with a maximum of 5 steps); after fine-tuning, based on the new parameters... Generate the optimal allocation scheme that is suitable for the current scenario. The decision generation formula is the same as in the steady-state scenario, only the policy network parameters are replaced with... The entire fine-tuning and decision-making process takes only milliseconds. This eliminates the need for a full rerun of optimizations, addressing the core pain point of lagging dynamic response in traditional algorithms.
[0160] To ensure consistency and coordination in distributed decision-making, a distributed gradient descent synchronization mechanism is adopted. Each UAV performs policy inference based on its local onboard computing equipment, and simultaneously interacts with surrounding UAVs through low-latency communication links (such as 5G-UWB) to exchange local decision-making information, thus building a distributed decision-making consensus. The consensus function is calculated as follows:
[0161]
[0162] in, For drones Assigned to task The consensus decision value, For drones The number of communicating neighbors, drones for neighbors For the task The allocation decision value, when the consensus decision value is greater than the threshold. At that time, the final decision Otherwise This consensus mechanism avoids local conflicts in distributed decision-making and ensures the consistency of the global allocation scheme.
[0163] S4.4 Online Closed-Loop Feedback: Construct an online closed-loop feedback mechanism to reflect the actual results of task execution (such as task completion quality). Resource consumption The feedback (on the causal constraint satisfaction status) is fed back to the meta-policy model to dynamically update the initial meta-policy parameters. The feedback update calculation formula is as follows:
[0164]
[0165]
[0166] in, For the updated initial meta-policy parameters, The learning rate is set to a value that provides feedback. , The feedback loss function is calculated as follows: ( The actual reward value is calculated based on the actual execution results. (To predict reward values for the model), through continuous online feedback iteration, the generalization ability and dynamic adaptation accuracy of the initial meta-policy model are continuously improved.
[0167] S5. Distributed Conflict Resolution and Digital Twin Closed-Loop Iterative Optimization: This step completes the conflict resolution, implementation verification, and continuous iterative optimization of the allocation scheme, forming a complete algorithm execution closed loop.
[0168] First, based on the task allocation decision matrix output in step four... ( The total number of drones, This represents the total number of tasks. Indicates drone Assigned to task Combined with the causal directed acyclic graph of the task system constructed in step two The dynamic environmental data output in step one completes multi-dimensional conflict detection, covering four core conflict categories: resource conflict, temporal conflict, spatial collision conflict, and communication and coordination conflict, providing accurate conflict type and location for conflict resolution.
[0169] S5.1 Multi-dimensional Conflict Full Detection: Conflict detection adopts a dual mechanism of "rule matching + threshold judgment". The detection logic and quantitative indicators for various types of conflicts are as follows: Resource conflict refers to the same drone being assigned multiple tasks that exceed its execution capacity at the same time. The detection formula is:
[0170]
[0171] in, For drones Resource conflict detection results, This is an indicator function; it takes the value 1 if the condition within the parentheses is met (if a conflict exists), and 0 otherwise. For drones Maximum task load. For drones Execute the task The estimated energy consumption For drones The real-time remaining battery life. Timing conflicts refer to task allocation orders violating causal dependencies; the detection formula is:
[0172]
[0173] in, For the task The timing conflict detection results Indicates task For the task Pre-causal dependency tasks Indicates drone Assigned to task , This indicates that no drones were assigned to the preceding tasks. When this condition is met, it is determined to be a timing conflict. Spatial collision conflict refers to the spatial overlap of the mission execution paths of multiple UAVs and the non-empty intersection of their time windows. The detection formula is:
[0174]
[0175] in, For drones and Space collision detection results For drones Execute the task Planned flight routes The minimum spatial distance between flight paths. Safety distance threshold (range of values) (Dynamically adjusted according to the size of the drone) , Tasks , The spatiotemporal execution window, The intersection of time windows. Communication coordination conflict refers to unmanned aerial vehicles (UAVs) in a collaborative task exceeding their communication range; the detection and calculation formula is:
[0176]
[0177] in, For collaborative tasks and The results of communication and coordination conflict detection, This indicates that the two tasks are collaborative (requiring real-time communication between the drones). , The two drones are respectively performing two missions. , Real-time location, , These represent the effective communication ranges of the two drones, This is the minimum communication range.
[0178] S5.2 Distributed Nash Bargaining Conflict Resolution: For three types of non-temporally ordered conflicts—resource conflicts, spatial collision conflicts, and communication coordination conflicts—a distributed Nash bargaining game mechanism without a central node is established to resolve conflicts. This mechanism does not require central node coordination; the drones involved in the conflict exchange bargaining information through point-to-point communication, automatically reaching a globally optimal consensus result. The game participants are all drones involved in the conflict, and the bargaining payoff function for each participant is defined as the marginal revenue from performing the task, calculated as follows:
[0179]
[0180] in, For drones Execute the task marginal revenue, For the task Dynamic priority, For drones With the task The global matching degree value, , These are the weighting coefficients, with values of [values to be filled in]. , ; For drones Execute the task Risk coefficient (range of values) (This is determined by the level of environmental threat and the difficulty of the mission).
[0181] The core of Nash bargaining game is to find a bargaining solution that satisfies Nash's axioms, that is, to maximize the product of the participants' payoffs. The formula for calculating the bargaining solution is:
[0182]
[0183] in, This represents the optimal allocation outcome in the game. For a combination of drones and missions involving conflict, For drones The retained payoff (i.e., the payoff when not performing any conflicting tasks, which is set to 0) is used. Through this formula, game participants automatically select the drone-task combination with the highest marginal payoff, assign the task to the corresponding drone, and other participants withdraw from the competition for that task, re-participating in the allocation of remaining tasks until all conflicts are completely resolved. The entire game process is completed through point-to-point communication, and the communication data consists only of marginal payoff values and bargaining intentions, resulting in a very small data volume. Even if some drones experience communication interruptions, the remaining participants can still resolve local conflicts, demonstrating strong robustness.
[0184] S5.3, Temporal Conflict Backtracking and Resolution: For temporal conflicts, a causal constraint backtracking mechanism is used to resolve them, based on the causal directed acyclic graph of the task system. From downstream tasks that violate causality Begin by tracing back along the causal path to identify the set of all affected upstream dependent tasks. The allocation and timing of upstream tasks are readjusted, and the adjustment formula is as follows:
[0185]
[0186] in, The adjusted allocation matrix, To adapt to upstream tasks Optimal UAV (based on global matching degree) filter), For drones Execute the task The estimated completion time, For downstream tasks The earliest start time is determined to ensure that the completion time of upstream tasks is no later than the earliest start time of downstream tasks, satisfying causal dependency constraints. If secondary resource conflicts arise during the adjustment process, a distributed Nash bargaining game mechanism is triggered simultaneously to resolve the secondary conflicts in real time, ensuring that the adjusted allocation scheme satisfies both causal and resource constraints.
[0187] S5.4 Digital Twin Pre-simulation and Iterative Optimization: After resolving conflicts, build a 1:1 high-fidelity digital twin of the complex task scenario. To achieve closed-loop iterative optimization of the algorithm, where It is a collection of physical entity mappings, including high-fidelity models of three types of entities: drones, missions, and environments. The drone model fully replicates the core parameters of heterogeneous drones, such as dynamic characteristics, payload capacity, and energy consumption model. The mission model includes mission execution logic and performance evaluation indicators. The environment model synchronizes dynamic data such as obstacle distribution, weather conditions, and electromagnetic interference in real-time with the real scene. It is a mathematical modeling set, containing the mathematical models and simulation logic of each module of the algorithm, to ensure that the algorithm execution in the twin is completely consistent with the real scene; As a virtual-to-real data synchronization interface, it enables real-time data interaction between the virtual twin and real drone clusters, mission systems, and environmental monitoring equipment through 5G+edge computing, with data synchronization latency controlled within [specific parameters]. within; It provides a scene generation and interaction interface, supporting the autonomous generation and simulation interaction of extreme scenarios.
[0188] The closed-loop iterative optimization of the digital twin is divided into an offline stage and an online stage. The offline stage generates massive amounts of extreme scenario data through the digital twin, including scenarios difficult to reproduce in real-world environments such as swarms of thousands of drones, tightly coupled task systems, sudden failures, and extreme environments. This data is used for offline training, hyperparameter optimization, and performance verification of the algorithm model. The loss function formula for offline training is:
[0189]
[0190] in, For offline training, normalized error loss, To determine the number of extreme scenarios generated. For the scene Theoretically optimal global task performance under the given conditions. For the algorithm in the scene To improve the actual global mission performance, the algorithm's causal emergence model, meta-policy model, and matching degree representation model are continuously optimized by minimizing the loss function, thereby significantly reducing the cost and safety risks of real flight tests.
[0191] During the online execution phase, real-time virtual-real mapping synchronizes the drone status, mission progress, and environmental data of the real-world scenario to a digital twin, generating a digital image completely identical to the real-world scenario. Within the digital twin, a preliminary simulation verification of the allocation scheme after conflict resolution is performed. The core of this simulation verification is to predict potential risks during the execution of the scheme (such as sudden energy consumption spikes or mission failure due to environmental changes). The risk assessment formula is as follows:
[0192]
[0193] in, The comprehensive risk assessment value of the allocation plan (range of values) ), , , The risk weighting coefficient has the following values: , , ; This represents the probability of mission failure (caused by factors such as sudden environmental changes or equipment malfunctions). This represents the probability of mission delay (caused by factors such as insufficient resources and flight path congestion). This represents the probability of insufficient range (caused by errors in energy consumption prediction). When the overall risk assessment value... ( The risk threshold is set to a value of [value to be filled in]. When this happens, the allocation scheme is optimized and adjusted in the twin, and the adjusted scheme is synchronized to the real scenario for execution; at the same time, the task execution results in the real scenario (such as task completion quality) are also shared. Resource consumption The success rate of conflict resolution is fed back to the twin to complete the incremental iterative optimization of the algorithm model. The optimization formula is:
[0194]
[0195] in, , These are the algorithm model parameters before and after the update. The iterative learning rate has a value of [value]. , For gradient operators, For the feedback loss function, For real-world task performance. To improve the performance of the twin simulation task, this closed-loop iteration achieves full lifecycle optimization of "virtual training - online decision-making - virtual-real verification - feedback iteration", continuously improving the algorithm's generalization ability and execution performance.
[0196] Example 2: This embodiment of the invention also provides a multi-UAV collaborative task allocation device for complex task scenarios. Each module in this device can be used to execute the steps in the above method embodiments. Specifically, the device includes:
[0197] The global multi-source data acquisition and representation module is used to collect multi-source heterogeneous data in complex scenarios and perform standardization processing, construct a spatiotemporal heterogeneous hypergraph and use a spatiotemporal heterogeneous hypergraph convolutional network for feature encoding, and output a low-dimensional dense representation vector of the global state.
[0198] Causal Emergent Decoupling and Counterfactual Inference Module: Used to mine the macro-causal structure of the task system based on the causal emergent algorithm, perform constraint pruning through intervention effect quantification, and build a counterfactual inference world model to screen the optimal allocation benchmark framework and redundancy backup strategy.
[0199] The hierarchical federated heterogeneous matching degree representation module is used to construct a two-layer federated learning architecture of "subgroup local layer - global aggregation layer". It completes the matching degree encoding between UAV and task through heterogeneous graph attention network, and achieves lightweight deployment and closed-loop feedback calibration through model quantization and knowledge distillation.
[0200] Distributed out-generalization causal meta-reinforcement learning decision module: used to construct a meta-task set that integrates causal constraints, to perform offline meta-training of the reinforcement learning network using a model-independent meta-learning framework, and to generate the optimal allocation decision in the online stage through fine-tuning with a small amount of data and a distributed consensus mechanism;
[0201] Distributed Conflict Resolution and Digital Twin Optimization Module: Used for multi-dimensional conflict detection, conflict resolution through distributed Nash bargaining game mechanism and causal constraint backtracking mechanism, and pre-simulation verification of the solution and closed-loop incremental iterative optimization of the algorithm by combining high-fidelity digital twin.
[0202] Example 3: This embodiment of the invention also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. The processor may be a multi-core central processing unit (CPU), an application-specific integrated circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present invention. The memory is used to store computer program instructions for executing the scheme of the present invention, including but not limited to random access memory (RAM), read-only memory (ROM), flash memory, etc. When the processor executes the computer program in the memory, it implements the various steps of the multi-UAV cooperative task allocation method for complex task scenarios described in Example 1 above.
[0203] Example 4: This embodiment of the invention also provides a computer-readable storage medium storing a computer program / instructions. When executed by a processor, the computer program / instructions implement the various steps of the multi-UAV collaborative task allocation method for complex task scenarios described in Example 1 above. The computer-readable storage medium can be any medium capable of storing program code, such as a USB flash drive, portable hard drive, read-only memory (ROM), random access memory (RAM), magnetic disk, or optical disk.
[0204] The above description is merely a preferred embodiment of the present invention. The scope of protection of the present invention is not limited to the above embodiments. All technical solutions falling within the scope of the present invention's concept are within the scope of protection of the present invention. It should be noted that for those skilled in the art, any improvements and modifications made without departing from the principles of the present invention should also be considered within the scope of protection of the present invention.
Claims
1. A multi-UAV collaborative task allocation method for complex task scenarios, characterized in that, The method includes the following steps: S1. Acquire multi-source heterogeneous data, which includes full-dimensional state data, full-element attribute data of the task, and full-domain dynamic environmental constraint data collected in real time by UAV onboard sensors; after standardizing the multi-source heterogeneous data, construct a spatiotemporal heterogeneous hypergraph containing UAV nodes, task nodes, and environmental constraint nodes, and use a spatiotemporal heterogeneous hypergraph convolutional network for feature encoding to output a low-dimensional dense representation vector representing the global physical state. S2. Based on the low-dimensional dense representation vector and the historical task dataset containing physical execution time and resource consumption, calculate the causal information of physical entities to extract the macro-causal structure and construct a causal directed acyclic graph; perform constraint pruning by quantifying the causal intervention effect, and calculate the robustness evaluation index of the allocation scheme based on the counterfactual inference world model to screen the allocation benchmark framework and redundancy backup strategy. S3. The heterogeneous UAV cluster is divided into multiple cooperative subgroups using the K-means++ clustering algorithm, and a two-layer federated learning architecture of subgroup local layer and global aggregation layer is constructed. The local matching degree representation matrix is output through the heterogeneous graph attention network in the subgroup local layer, and a global cross-domain cooperative matching representation matrix is generated in the global aggregation layer. S4. Based on the cross-scenario causal invariance law, construct a meta-task set covering complex physical scenarios. Use a model-independent meta-learning framework to perform offline meta-training on a multi-agent reinforcement learning network to obtain an initial meta-policy model. In the online dynamic allocation stage, combine the real-time state vector with the global cross-domain collaborative matching representation matrix, and generate multi-UAV allocation decisions through policy network reasoning and a distributed consensus mechanism. S5. Based on the allocation decision, the causal directed acyclic graph, and the global dynamic environmental constraint data, detect physical resource conflicts, temporal conflicts, spatial collision conflicts, and communication coordination conflicts of the UAV cluster; use a distributed Nash bargaining game mechanism and a causal constraint backtracking mechanism to resolve the conflicts, and perform pre-simulation verification and closed-loop iterative optimization on the resolved solution in a high-fidelity digital twin, and finally output and execute UAV task allocation and flight control commands.
2. The multi-UAV cooperative task allocation method for complex task scenarios according to claim 1, characterized in that, The construction of the spatiotemporal heterogeneous hypergraph and the use of a spatiotemporal heterogeneous hypergraph convolutional network for feature encoding specifically include: The hyperedges in the spatiotemporal heterogeneous hypergraph connect any number of heterogeneous nodes. The weight of any hyperedge is determined by the cosine similarity of the feature vectors between the associated entities and the strength of the physical entity association. The dynamic evolution of the hyperedge weight over time is calculated by combining the time decay coefficient. The spatiotemporal heterogeneous hypergraph convolutional network aggregates the spatiotemporal correlation features of nodes and hyperedges based on the normalized spatiotemporal heterogeneous hypergraph Laplacian matrix. The Laplacian matrix maps the actual communication topology of the UAV cluster in the current physical space, the geographical location dependence of task execution, and the spatial constraints of environmental obstacles. One-dimensional temporal convolution kernels are used to perform convolution operations on the time series of node features to smooth sensor noise during continuous flight of UAVs and capture the time-varying evolution of UAV flight trajectories.
3. The multi-UAV collaborative task allocation method for complex task scenarios according to claim 1, characterized in that, The extraction of the macroscopic causal structure, construction of the causal directed acyclic graph, and constraint pruning specifically include: The task characteristics and the actual physical execution state are spliced together to form a micro state space. The causal information of the task system at different scales is calculated, and the macro state mapping function is determined by maximizing the causal information. The causal directed acyclic graph is constructed based on the essential causal dependencies between tasks. The weights of the causal edges are calculated based on the partial derivatives of the completion quality of subsequent tasks with respect to the completion quality of preceding tasks in historical batch data, as well as the normalized difference in actual physical execution time. By combining front-door and back-door adjustments, the causal intervention effect of task allocation decisions on global task effectiveness is quantified, and constraints corresponding to spurious correlations with causal intervention effects less than a preset intervention effect threshold are removed to compress the physical allocation solution space dimension.
4. The multi-UAV collaborative task allocation method for complex task scenarios according to claim 1, characterized in that, In the two-layer federated learning architecture of the subgroup local layer and the global aggregation layer, a global cross-domain collaborative matching representation matrix is generated, specifically including: Independent subgroup computing nodes encrypt their local model parameters using a homomorphic encryption algorithm at the subgroup local layer, and only upload the encrypted model parameters to the global aggregation layer; The global aggregation layer uses a weighted federated average algorithm to decrypt and aggregate the received model parameters to generate global collaborative model parameters. A global collaborative heterogeneous graph is constructed based on the parameters of the global collaborative model, and cross-subgroup collaborative features are encoded through a global heterogeneous graph attention network to generate the global cross-domain collaborative matching representation matrix. An 8-bit integer quantization scheme and knowledge distillation technique are used to lightweight the representation model, and the lightweight model is deployed on the airborne edge computing device of a single UAV.
5. The multi-UAV cooperative task allocation method for complex task scenarios according to claim 1, characterized in that, The offline meta-training of the multi-agent reinforcement learning network using the model-independent meta-learning framework specifically includes: An action space is constructed in the reinforcement learning network. The action matrix elements in the action space are subject to the task uniqueness constraint that each task is assigned to only one UAV, and the UAV load constraint that the task assigned to a single UAV does not exceed its physical maximum load. A reward function is constructed in the reinforcement learning network. The reward function includes a global task performance reward, an execution success rate reward that satisfies the physical spatiotemporal window constraint, an energy consumption penalty for actual remaining battery life reduction, and a causal constraint violation penalty based on the causal directed acyclic graph determination. Based on the sampled physical trajectory data, the policy loss and value loss are calculated, gradient updates and meta-external evaluations are performed to obtain the initial meta-policy model.
6. The multi-UAV cooperative task allocation method for complex task scenarios according to claim 1, characterized in that, The online dynamic allocation phase generates multi-drone allocation decisions through policy network reasoning and a distributed consensus mechanism, specifically including: In a steady-state scenario without sudden operational conditions, the allocation decision for each UAV is obtained by weighting the action probability output by the policy network with the value of the global cross-domain collaborative matching representation matrix. In dynamic scenarios of equipment physical failure or sudden changes in the environmental wind field, local sensor trajectory data is collected to calculate the fine-tuning loss. The gradient update step size is adaptively adjusted according to the intensity of the change to perform fast fine-tuning and generate a new allocation decision that is adapted to the current physical change scenario. A single UAV interacts with neighboring UAVs within physical communication distance via a low-latency communication link to exchange local decision information, calculates a consensus decision value for allocation decisions, and finally confirms the allocation decision when the consensus decision value is greater than a preset consensus threshold.
7. The multi-UAV cooperative task allocation method for complex task scenarios according to claim 1, characterized in that, The process employs a distributed Nash bargaining game mechanism to resolve conflicts, and performs pre-simulation verification in a high-fidelity digital twin. Specifically, this includes: For the detected resource conflicts, spatial collision conflicts, and communication coordination conflicts, the marginal benefit, which includes task dynamic priority, matching degree, environmental threat risk coefficient, and real physical energy consumption cost, is used as the bargaining benefit function of the game participants. The UAVs involved in the conflict solve the bargaining solution by maximizing the benefit product through point-to-point communication. The high-fidelity digital twin is constructed by integrating the drone flight entity dynamics model, sensor mathematical model, and virtual-real data synchronization interface; a digital image is generated by synchronizing real physical scene data into the digital twin; a comprehensive risk assessment value including the probability of mission physical failure, the probability of delay, and the probability of insufficient battery life is calculated using preset risk weight coefficients; the allocation scheme is adjusted when the set risk threshold is exceeded.
8. A multi-UAV collaborative task allocation device for complex task scenarios, characterized in that, The apparatus is used to implement the multi-UAV cooperative task allocation method for complex task scenarios as described in any one of claims 1 to 7, and the apparatus includes: The global multi-source data acquisition and representation module is used to collect full-dimensional state data, task and environmental data from UAV onboard sensors, perform standardized processing, construct a spatiotemporal heterogeneous hypergraph and perform feature encoding, and output a low-dimensional dense representation vector representing the global physical state. The Causal Emergence Decoupling and Counterfactual Inference module is used to extract the macro-causal structure based on the causal emergence algorithm to construct a causal directed acyclic graph, perform constraint pruning through intervention effect quantification, and build a counterfactual inference world model to screen and allocate the baseline framework and redundancy backup strategy. The hierarchical federated heterogeneous matching degree representation module is used to construct a two-layer federated learning architecture. It completes the matching degree encoding between the UAV and the task and generates a global cross-domain collaborative matching representation matrix through a heterogeneous graph attention network. The generalized causal meta-reinforcement learning decision module is used to construct a meta-task set and perform offline meta-training of the reinforcement learning network. In the online stage, it combines real-time state features to generate allocation decisions through fine-tuning and a distributed consensus mechanism. The distributed conflict resolution and digital twin optimization module is used to detect physical and logical conflicts of UAVs. It resolves conflicts through a distributed Nash bargaining game mechanism and a causal constraint backtracking mechanism, and performs pre-simulation verification and closed-loop iterative optimization in a high-fidelity digital twin to output UAV task allocation and flight control commands.
9. An electronic device, characterized in that, The system includes a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the program, it implements the multi-UAV collaborative task allocation method for complex task scenarios as described in any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that, It stores a computer program that, when executed by a processor, implements the multi-UAV cooperative task allocation method for complex task scenarios as described in any one of claims 1 to 7.