A Cross-Domain Intelligent Target Allocation Method for Heterogeneous Aircraft Swarms Based on COMA Architecture
By constructing a multi-dimensional feature vector and target multi-agent network model based on the COMA architecture counterfactual policy gradient algorithm, the problem of credit allocation ambiguity in the cooperative transportation of multi-domain heterogeneous aircraft is solved, and accurate cross-domain target allocation and efficient decision-making are achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- BEIHANG UNIV
- Filing Date
- 2026-04-20
- Publication Date
- 2026-06-30
AI Technical Summary
Traditional target allocation methods struggle to cope with the complexity of multi-domain heterogeneous aircraft collaborative transportation, fail to meet the requirements of multi-domain joint allocation, and suffer from ambiguity in credit allocation.
A counterfactual policy gradient algorithm based on the COMA architecture is adopted to realize cross-domain intelligent target allocation for multi-domain heterogeneous aircraft swarms by constructing multi-dimensional feature vectors and target multi-agent network models. Combined with local and global reward mechanisms, the problem of credit allocation ambiguity between heterogeneous platforms is solved.
It achieves precise collaboration among heterogeneous aircraft in multiple domains, with allocation schemes that are closer to real-world needs, improving allocation efficiency and decision-making quality. It also solves the problem of ambiguous credit allocation between heterogeneous platforms and possesses efficient and stable cross-domain collaboration capabilities.
Smart Images

Figure CN122308461A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of multi-domain cooperative transportation technology, and in particular to a cross-domain intelligent target allocation method for heterogeneous aircraft swarms based on the COMA architecture. Background Technology
[0002] With the rapid development of information and intelligent technologies, the modern transportation environment is undergoing a profound transformation from a single platform to a system, and from single-domain operation to multi-domain collaboration. The collaborative cooperation of heterogeneous aircraft across multiple domains has become a key method for improving delivery success rates and overall transportation benefits. However, in the process of multi-domain collaborative transportation, how to rationally allocate different types and capabilities of transportation units to multiple targets is a classic challenge in control automation.
[0003] Traditional goal assignment methods include linear programming, auction algorithms, ant colony optimization, and genetic algorithms. These methods perform well in static, deterministic tasks, but struggle to handle the complexity of heterogeneous platforms. They are typically based on single-stage and isomorphic assumptions, failing to meet the needs of joint assignment across multiple domains. In practical applications, learning-based intelligent solution algorithms suffer from limitations in the representational capabilities of single-agent algorithms, which may not accurately reflect real-world conditions. Furthermore, multi-agent algorithms commonly exhibit ambiguity in credit allocation within heterogeneous environments. These issues collectively constrain the decision-making quality of existing goal assignment algorithms in multi-domain, heterogeneous environments. Summary of the Invention
[0004] The purpose of this invention is to provide a cross-domain intelligent target allocation method for heterogeneous aircraft swarms based on COMA architecture, which breaks through the assumption of platform homogeneity in traditional methods, realizes precise collaboration of multi-domain heterogeneous aircraft, and solves the problem of ambiguity in credit allocation caused by capability differences between heterogeneous platforms, thereby achieving efficient decision-making for cross-domain collaboration.
[0005] In a first aspect, this invention provides a cross-domain intelligent target allocation method for heterogeneous aircraft swarms based on a COMA architecture, comprising: acquiring attribute information of each aircraft in a multi-domain heterogeneous aircraft swarm and attribute information of each task target in a task target set; the attribute information of the aircraft includes: position, flight speed, flight heading angle, launch cost, transport payload, and delivery success rate to each task target; the attribute information of the task target includes: position, movement speed, and target support value; based on the attribute information of the aircraft and the attribute information of the task targets, constructing a transport matching probability calculation model for each aircraft to each task target, a joint transport matching probability calculation model for multiple aircraft jointly serving the same task target, and a transport revenue calculation model; traversing the relationship between each aircraft and each task target... The system consists of a combination of mission objectives, and uses the multi-dimensional feature vector of each combination as the local observation of the agent corresponding to the domain to which the aircraft belongs. The action of the agent is output using a target multi-agent network model based on the counterfactual policy gradient algorithm (COMA). The multi-dimensional feature vector includes: the launch cost of the aircraft, the target guarantee value of the mission objective, information reflecting the coupling relationship between resource use and mission progress, the transportation matching probability of the combination, the joint transportation matching probability and transportation benefit before and after the aircraft is assigned to the mission objective. The action of the agent represents the allocation result between the aircraft and the mission objective in the combination. The allocation result includes one of the following: assigned, not assigned. The cross-domain intelligent target allocation result of the multi-domain heterogeneous aircraft group is determined based on all actions of all agents.
[0006] In an optional implementation, based on the attribute information of the aircraft and the attribute information of the mission target, a transportation matching probability calculation model for each aircraft to each mission target is constructed, including: constructing an angle dominance function for the target aircraft to serve the specified mission target based on the position of the target aircraft, the flight direction angle, and the position of the specified mission target; wherein, the target aircraft represents any aircraft in the multi-domain heterogeneous aircraft group; the specified mission target represents any mission target in the set of mission targets; constructing a velocity dominance function for the target aircraft to serve the specified mission target based on the flight speed of the target aircraft and the movement speed of the specified mission target; constructing a distance dominance function for the target aircraft to serve the specified mission target based on the position of the target aircraft, the position of the specified mission target, and the maximum effective service distance of the aircraft; constructing an overall dominance function for the target aircraft to serve the specified mission target based on the angle dominance function, the velocity dominance function, and the distance dominance function; and constructing a transportation matching probability calculation model for the target aircraft to serve the specified mission target based on the overall dominance function and the transportation payload of the target aircraft.
[0007] In an optional implementation, the joint transport matching probability calculation model for multiple aircraft jointly serving the same mission objective is expressed as follows: ;in, This indicates that multiple aircraft work together to serve the mission objective. The probability of joint transportation matching. This represents the total number of aircraft in a multi-domain heterogeneous aircraft swarm. Indicates aircraft For the mission objective The probability of transportation matching. Indicates aircraft With mission objectives The allocation results Indicates aircraft Assigned to task objectives , Indicates aircraft Not assigned to a task objective .
[0008] In an optional implementation, a transportation revenue calculation model for multiple aircraft jointly serving the same mission objective is constructed based on the attribute information of the aircraft and the attribute information of the mission objective. This includes: constructing a joint delivery success probability calculation model for multiple aircraft jointly serving the specified mission objective based on the delivery success rate of each aircraft to the specified mission objective, the allocation result of each aircraft to the specified mission objective, and the total number of aircraft in the multi-domain heterogeneous aircraft group; updating the allocation result of each aircraft assigned to the specified mission objective based on random probability and the joint delivery success probability calculation model for multiple aircraft jointly serving the specified mission objective, to obtain the updated allocation result; and constructing a transportation revenue calculation model for multiple aircraft jointly serving the specified mission objective based on the total number of aircraft in the multi-domain heterogeneous aircraft group, the transportation matching probability calculation model of each aircraft to the specified mission objective, the updated allocation result, and the target guarantee value of the specified mission objective.
[0009] In an optional implementation, when training a target multi-agent network model based on the counterfactual policy gradient algorithm (COMA) architecture, the reward function is a weighted sum of local and global rewards; the local reward is the change in the value of the first objective function resulting from a single allocation decision by the agent; the first objective function is a function that aims to maximize transportation revenue and minimize costs without considering traffic uncertainty; the global reward is the average value of the second objective function corresponding to the final allocation matrix across each decision step; the second objective function is a function that aims to maximize transportation revenue and minimize costs while considering traffic uncertainty.
[0010] In an optional implementation, the local reward is represented as: ;in, and These represent the allocation matrices before and after the agent makes a decision, respectively, and the elements in the allocation matrices are... Indicates aircraft With mission objectives The allocation results Denotes the first objective function. , This represents the total number of task objectives in the task objective set. This represents the total number of aircraft in a multi-domain heterogeneous aircraft swarm. Indicates aircraft For the mission objective The probability of transportation matching. Indicates aircraft With mission objectives The allocation results Indicates aircraft Assigned to task objectives , Indicates aircraft Not assigned to a task objective , Indicate the task objective The target guarantee value, The demand coefficient represents the need to reduce the cost of aircraft. Indicates aircraft The launch cost.
[0011] In an optional implementation, the global reward is represented as: ;in, This represents the final allocation matrix, and the elements in the final allocation matrix. Indicates aircraft With mission objectives The final allocation result, , This indicates that multiple aircraft work together to serve the mission objective. The probability of successful joint delivery Represents random probability. Describes the second objective function. .
[0012] Secondly, this invention provides a cross-domain intelligent target allocation device for heterogeneous aircraft swarms based on a COMA architecture, comprising: an acquisition module for acquiring attribute information of each aircraft in a multi-domain heterogeneous aircraft swarm and attribute information of each task target in a task target set; the aircraft attribute information includes: position, flight speed, flight heading angle, launch cost, transport payload, and delivery success rate to each task target; the task target attribute information includes: position, movement speed, and target support value; a construction module for constructing, based on the aircraft attribute information and the task target attribute information, a transport matching probability calculation model for each aircraft to each task target, a joint transport matching probability calculation model for multiple aircraft jointly serving the same task target, and a transport revenue calculation model; and a traversal and allocation module for... The algorithm iterates through each combination of aircraft and mission objective, and sequentially uses the multi-dimensional feature vector of each combination as the local observation of the agent corresponding to the domain to which the aircraft belongs. It then outputs the agent's actions using a target multi-agent network model based on the counterfactual policy gradient algorithm (COMA). The multi-dimensional feature vector includes: the launch cost of the aircraft, the target assurance value of the mission objective, information reflecting the coupling relationship between resource usage and mission progress, the transport matching probability of the combination, the joint transport matching probability before and after the aircraft is assigned to the mission objective, and the transport benefits. The agent's actions represent the allocation result between the aircraft and the mission objective in the combination. The allocation result includes one of the following: assigned, not assigned. A determination module is used to determine the cross-domain intelligent target allocation result of the multi-domain heterogeneous aircraft group based on all actions of all agents.
[0013] Thirdly, the present invention provides an electronic device, including a memory and a processor, wherein the memory stores a computer program that can run on the processor, and the processor executes the computer program to implement the cross-domain intelligent target allocation method for heterogeneous aircraft swarms based on the COMA architecture as described in any of the foregoing embodiments.
[0014] Fourthly, the present invention provides a computer-readable storage medium storing computer instructions, which, when executed by a processor, implement the cross-domain intelligent target allocation method for heterogeneous aircraft swarms based on the COMA architecture described in any of the foregoing embodiments.
[0015] This invention utilizes a multi-dimensional feature vector—containing information such as aircraft launch cost, target assurance value of mission objectives, information reflecting the coupling relationship between resource usage and mission progress, transportation matching probability of aircraft-mission objective combinations, and joint transportation matching probability and transportation benefits before and after aircraft allocation to mission objectives—as local observations of the agent. Combined with a target multi-agent network model based on the COMA architecture, this invention achieves an efficient solution to the cross-domain target allocation problem for multi-domain heterogeneous aircraft swarms. By constructing a multi-dimensional feature vector for each aircraft and each mission objective combination, this method overcomes the assumption of platform homogeneity in traditional methods, achieving precise coordination among multi-domain heterogeneous aircraft and making the allocation scheme closer to real-world needs. Furthermore, this invention applies a counterfactual multi-agent policy gradient framework to the multi-domain heterogeneous aircraft target allocation algorithm. In the environment of collaborative transportation of multi-domain heterogeneous aircraft, the counterfactual learning mechanism can effectively solve the problem of ambiguity in credit allocation caused by capability differences between heterogeneous platforms, achieving efficient decision-making for cross-domain collaboration. Attached Figure Description
[0016] To more clearly illustrate the specific embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the specific embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.
[0017] Figure 1 A flowchart illustrating a cross-domain intelligent target allocation method for heterogeneous aircraft swarms based on a COMA architecture, provided as an embodiment of the present invention; Figure 2 This invention provides a schematic diagram of a cross-domain intelligent target allocation process for a multi-domain heterogeneous aircraft swarm. Figure 3 A schematic diagram of a target allocation algorithm based on the COMA architecture provided in an embodiment of the present invention; Figure 4 A functional block diagram of a cross-domain intelligent target allocation device for heterogeneous aircraft swarms based on COMA architecture is provided for an embodiment of the present invention. Figure 5 This is a schematic diagram of an electronic device provided in an embodiment of the present invention. Detailed Implementation
[0018] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. The components of the embodiments of the present invention described and shown in the accompanying drawings can generally be arranged and designed in various different configurations.
[0019] Therefore, the following detailed description of the embodiments of the invention provided in the accompanying drawings is not intended to limit the scope of the claimed invention, but merely to illustrate selected embodiments of the invention. All other embodiments obtained by those skilled in the art based on the embodiments of the invention without inventive effort are within the scope of protection of the invention.
[0020] The following detailed description of some embodiments of the present invention is provided in conjunction with the accompanying drawings. Unless otherwise specified, the following embodiments and features can be combined with each other.
[0021] Example 1 This invention focuses on multi-domain collaborative transportation allocation scenarios. Considering the heterogeneous characteristics of targets and multiple agents, it proposes a distributed counterfactual swarm learning model based on the Counterfactual Multi-Agent Policy Gradients (COMA) algorithm to construct an efficient multi-agent task allocation system. A specific application scenario involves multiple unmanned aerial vehicles (UAVs) collaboratively serving multiple moving ship targets at sea (referred to as mission targets). To simplify the problem, it is assumed that the number and location of mission targets remain constant during the allocation scheme formulation process, treating it as a static allocation problem.
[0022] In the aforementioned application scenarios, multiple agents control a multi-domain aircraft swarm system, meaning one agent controls an aircraft swarm within a single domain. For example, agent 1 controls aircraft swarm 1 on a maritime platform, and agent 2 controls aircraft swarm 2 on an airspace platform. This embodiment of the invention employs a "centralized training, distributed execution" architecture: during the training phase, agents can interact to learn optimal strategies; during the execution phase, multiple agents control multiple aircraft swarms, collaborating to complete transportation tasks for n targets to maximize overall transportation benefits. Compared to single-agent control schemes, this architecture can more accurately perceive the environment and, combined with the characteristics of each domain platform, achieve better target allocation decisions.
[0023] Figure 1 A flowchart illustrating a cross-domain intelligent target allocation method for heterogeneous aircraft swarms based on a COMA architecture, as provided in this embodiment of the invention, is shown below. Figure 1 As shown, the method specifically includes the following steps: Step S102: Obtain the attribute information of each aircraft in the multi-domain heterogeneous aircraft group and the attribute information of each mission target in the mission target set.
[0024] The aircraft's attribute information includes: position, flight speed, flight heading angle, launch cost, transport payload, and delivery success rate to each mission target; the mission target's attribute information includes: position, movement speed, and target support value.
[0025] This step aims to collect and structure-model the basic capability parameters and environmental information of all entities participating in collaborative allocation, providing a data foundation for subsequent decision-making. The core advantage of multi-agent systems lies in their ability to accurately extract the characteristics of aircraft within different domains, thereby improving allocation efficiency for specific objectives. In multi-aircraft task allocation, each agent controls an aircraft with the following attributes: ,in, Indicates aircraft Location (i.e., location coordinates). Indicates aircraft Flight speed, Indicates aircraft Flight direction angle, Indicates aircraft Launch cost (reflecting the cost of resource consumption). Indicates aircraft The transport load (characterizing the single mission completion capability, with a value range of ) That is, the normalized transport load). , Indicates aircraft For the mission objective The delivery success rate (characterizing the probability that the aircraft will successfully arrive at (deliver the cargo) when transporting cargo to a specific target, taking into account the interference during the transportation process).
[0026] The task objective also possesses specific attributes, which are represented in the following ways in this embodiment of the invention: ;in, Indicate the task objective Location Indicate the task objective movement speed, Indicate the task objective The target guarantee value represents the inherent value that the mission objective itself needs to be guaranteed. This value is obtained after the drone successfully delivers the goods.
[0027] Step S104: Based on the attribute information of the aircraft and the attribute information of the mission target, construct a transportation matching probability calculation model for each aircraft to each mission target, a joint transportation matching probability calculation model for multiple aircraft jointly serving the same mission target, and a transportation revenue calculation model.
[0028] Specifically, in this embodiment of the invention, based on the attribute information of the aircraft and the attribute information of the mission target, a multi-dimensional advantage function is constructed for each aircraft serving each mission target, and a transportation matching probability calculation model is generated in conjunction with the transport payload of the aircraft. Considering that multiple aircraft can cooperate to go to the same target to improve the transportation success rate, the joint transportation effect is modeled using the principle of probabilistic complementarity. This yields a joint transportation matching probability calculation model for multiple aircraft jointly serving the same mission target. Based on the same principle, a joint delivery success probability calculation model for multiple aircraft jointly serving the same mission target can be constructed. Furthermore, by combining the target guarantee value of each mission target, a transportation revenue calculation model for multiple aircraft jointly serving the same mission target can be constructed.
[0029] Step S106: Iterate through the combination of each aircraft and each mission target, and sequentially use the multi-dimensional feature vector of each combination as the local observation of the agent corresponding to the domain to which the aircraft belongs. Then, use the target multi-agent network model based on the counterfactual policy gradient algorithm COMA architecture to output the agent's actions.
[0030] The multidimensional feature vector includes: the launch cost of the aircraft, the target support value of the mission objective, information reflecting the coupling relationship between resource use and mission progress, the combined transportation matching probability, the joint transportation matching probability before and after the aircraft is allocated to the mission objective, and the transportation revenue; the information reflecting the coupling relationship between resource use and mission progress includes: the proportion of the cost of the aircraft that has been allocated to the total cost, the proportion of the target support value that has been traversed to the total target support value, and the proportion of the cost of the aircraft allocated to the target in the current allocation state to the total cost; the agent's action represents the allocation result of the aircraft and the mission objective in the combination; the allocation result includes one of the following: allocated, not allocated.
[0031] Step S108: Determine the cross-domain intelligent target allocation result of the multi-domain heterogeneous aircraft swarm based on all actions of all agents.
[0032] Specifically, step S106 maps the environmental state to the input features of the network model and completes the distributed action generation through a neural network model based on counterfactual multi-agent policy gradient (i.e., the aforementioned target multi-agent network model).
[0033] Figure 2This is a schematic diagram of a cross-domain intelligent target allocation process for a multi-domain heterogeneous aircraft swarm provided by an embodiment of the present invention. The method provided by this embodiment of the present invention is applicable to two or more multi-domain platforms. For ease of description, Figure 2 The following example illustrates the application to two-domain platforms (aircraft swarms on maritime platforms and airspace platforms). Figure 2 The set of aircraft in the data is represented as: ,in, This indicates the total number of aircraft in the fleet of aircraft on the sea-based platform. Indicates the pointer of the aircraft in the fleet of aircraft on the sea platform. This indicates the total number of aircraft in the airspace platform's fleet. This represents a pointer to an aircraft in a group of aircraft on an airspace platform. The set of mission objectives is represented as: In mission allocation, each aircraft can only serve one target, and each target must be assigned at least one aircraft.
[0034] To obtain the cross-domain intelligent target allocation results for a multi-domain heterogeneous aircraft swarm, this embodiment of the invention needs to traverse the combinations of aircraft and mission targets. For each combination, a corresponding multi-dimensional feature vector is constructed as the local observation input for the agent corresponding to the domain to which the aircraft belongs. To better reflect real-world applications, within a single agent, the data obtained from each observation represents the agent's internal information. In each time step, based on the aircraft... With mission objectives In the combination of these forms, the observations obtained by the agent each time are represented as: ,in, For aircraft launch cost, The percentage of total cost allocated to the aircraft. The proportion of the covered value of targets that have already been traversed to the total covered value of targets. Assigned to the task target in the current allocation state The proportion of the total cost of the aircraft. This represents the transport matching probability for this combination. and The aircraft Assigned to task objectives The probability of matching between pre- and post-transport combined transport and To make the aircraft Assigned to task objectives Before and after targeting Transportation revenue, Indicate the task objective The target is to ensure value. The state space is formed by aggregating the local observations of the spacecraft-mission target combination in each intelligent entity.
[0035] This invention employs a centralized training and distributed execution framework, combined with the COMA algorithm to solve the credit allocation problem in multi-agent cooperation. Each agent is equipped with an independent Actor network (policy network), receiving its local observations and outputting an action probability distribution. To simplify the action space, reduce computational complexity, improve computational efficiency, and adapt to different scales, the action space of agent u is defined as follows: Regarding the situation in both domains, their joint action is as follows: In the objective multi-agent network model, each agent sequentially advances the allocation decision for each combination until all combinations have completed their allocation selection.
[0036] After processing by the target multi-agent network model Figure 2 The allocation results obtained from the two-domain platform can be represented by the following binary decision matrix (or allocation matrix for short): ,in, Indicates aircraft With mission objectives The allocation results Indicates aircraft Assigned to task objectives , Indicates aircraft Not assigned to a task objective Two agents are responsible for the upper and lower parts of the allocation matrix, respectively, ultimately generating a complete target allocation scheme. This demonstrates that the model has good scalability; as long as the allocation matrix can be binary-partitioned, the constructed model can be extended to a wider range of real-world allocation task scenarios.
[0037] This invention employs a multi-dimensional feature vector, comprising information on aircraft launch cost, target assurance value of mission objectives, information reflecting the coupling relationship between resource usage and mission progress, transportation matching probability of aircraft-mission objective combinations, and joint transportation matching probability and transportation benefits before and after aircraft allocation to mission objectives, as local observations of the agent. Combined with a target multi-agent network model based on the COMA architecture, this method efficiently solves the cross-domain target allocation problem for multi-domain heterogeneous aircraft swarms. By constructing a multi-dimensional feature vector for each aircraft and each mission objective combination, this method overcomes the assumption of platform homogeneity in traditional methods, achieving precise coordination among multi-domain heterogeneous aircraft and making the allocation scheme closer to real-world needs. Furthermore, this invention applies a counterfactual multi-agent policy gradient framework to the multi-domain heterogeneous aircraft target allocation algorithm. In the environment of collaborative transportation of multi-domain heterogeneous aircraft, the counterfactual learning mechanism can effectively solve the problem of ambiguity in credit allocation caused by capability differences between heterogeneous platforms, achieving efficient decision-making for cross-domain collaboration.
[0038] To evaluate the effectiveness of an aircraft's transportation service to a mission objective, a transportation matching probability is defined to assess the allocation effect of an aircraft serving a specific mission objective, thereby obtaining the overall transportation benefit. Generally speaking, the relative position of the aircraft and the target, the aircraft's own attributes, and its mobility all affect the transportation effect. To quantify the aircraft's suitability for transporting mission objectives, this invention constructs a transportation matching probability model from three dimensions: angle advantage, speed advantage, and distance advantage. By characterizing the relative attributes and dynamic relationship between the aircraft and the mission objective, a scientific evaluation of the overall transportation benefit is achieved.
[0039] In an optional implementation, step S104 above, based on the attribute information of the aircraft and the attribute information of the mission target, constructs a transportation matching probability calculation model for each aircraft to each mission target, specifically including the following steps: Step S1041: Based on the position of the target aircraft, the flight direction angle, and the position of the designated mission target, construct the angle dominance function of the target aircraft serving the designated mission target; where the target aircraft represents any aircraft in the multi-domain heterogeneous aircraft group; and the designated mission target represents any mission target in the set of mission targets.
[0040] Specifically, the angular advantage factor is used to characterize the impact of the lead angle on the effectiveness of transportation services. The lead angle is the angular deviation between the direction of the aircraft's velocity and the direction of the line connecting it to the mission target; that is, the lead angle is determined based on the aircraft's position. Flight direction angle and the location of the mission objective Specifically, the leading angle is transformed into dominance through a linear mapping, quantifying the impact of the leading angle on the accuracy of transportation services. The embodiment of this invention sets the angle dominance function as follows: ,in, Indicates the leading angle. Indicates with aircraft With mission objectives Distance between A coefficient that is directly proportional.
[0041] Step S1042: Based on the flight speed of the target aircraft and the movement speed of the specified mission target, construct the speed advantage function of the target aircraft serving the specified mission target.
[0042] Compared to other advantages, this trend is more in line with the physical law of diminishing marginal returns in velocity gain. Therefore, the saturation-type velocity dominance function is designed as follows: ,in, This represents the speed advantage coefficient, which shows that when At that time, the speed advantage is close to 1.
[0043] Step S1043: Based on the position of the target aircraft, the position of the designated mission target, and the maximum effective service distance of the aircraft, construct the distance advantage function of the target aircraft serving the designated mission target.
[0044] In this embodiment of the invention, the distance advantage function is designed as a piecewise function with the maximum effective service distance as the threshold. Within the maximum effective service distance, the advantage gradually decreases as the distance between the target aircraft and the designated mission target increases, in order to describe the dynamic impact of distance on the delivery success probability. This can be modeled as follows: , Indicates aircraft With mission objectives The distance between them Indicates aircraft Maximum effective service distance This represents the parameter used to adjust the attenuation shape. As a preset constant, based on the expression of the distance dominance function, it can be seen that the smaller the distance between the target aircraft and the designated mission target, the larger the distance dominance function value; conversely, the larger the distance between the target aircraft and the designated mission target, the smaller the distance dominance function value. When the distance exceeds the maximum effective service distance of the aircraft, the distance dominance function value is 0.
[0045] Step S1044: Based on the angle dominance function, velocity dominance function, and distance dominance function, construct the overall dominance function of the target aircraft in serving the specified mission objective.
[0046] Combining the aforementioned three dominance functions, we can obtain the aircraft Serving the mission objective Overall dominance function: ,in, These are the weighting coefficients.
[0047] Step S1045: Based on the overall dominance function and the transport payload of the target aircraft, construct a transport matching probability calculation model for the target aircraft to the specified mission objective.
[0048] Based on the mutual motion between the aircraft and the mission target, and considering the aircraft's transport capacity, the overall dominance function combined with the aircraft's transport load can be used to obtain the aircraft's... For the mission objective Transportation matching probability: .
[0049] In collaborative operations, multiple aircraft may be assigned to a single mission objective to improve transportation service effectiveness. In this embodiment of the invention, the joint transportation matching probability calculation model for multiple aircraft jointly serving the same mission objective is expressed as follows: ;in, This indicates that multiple aircraft work together to serve the mission objective. The probability of joint transportation matching. This represents the total number of aircraft in a multi-domain heterogeneous aircraft swarm. Indicates aircraft For the mission objective The probability of transportation matching. Indicates aircraft With mission objectives The allocation results Indicates aircraft Assigned to task objectives , Indicates aircraft Not assigned to a task objective .
[0050] In an optional implementation, step S104 above, based on the attribute information of the aircraft and the attribute information of the mission objective, constructs a transportation revenue calculation model for multiple aircraft jointly serving the same mission objective, specifically including the following steps: Step S104a: Based on the delivery success rate of each aircraft to the designated mission target, the allocation result of each aircraft to the designated mission target, and the total number of aircraft in the multi-domain heterogeneous aircraft group, construct a joint delivery success probability calculation model for multiple aircraft jointly serving the designated mission target.
[0051] The perception model designed above is based on a static environment. However, in actual execution, the environment will dynamically change depending on both parties. Analogous to the probabilistic modeling method for joint transportation matching, multiple aircraft serve the same mission objective. The probability of successful joint delivery at that time is: .
[0052] Step S104b: Based on the random probability and the joint delivery success probability calculation model of multiple aircraft jointly serving the specified mission target, update the allocation result of each assigned to the specified mission target to obtain the updated allocation result.
[0053] After obtaining the subset of aircraft assigned to the specified mission objective, the joint delivery success probability of that subset of aircraft jointly serving the specified mission objective can be calculated. The update of the assignment results refers to the following rules. For example, if the aircraft Already assigned to the task objective ,and The probability of successful joint delivery serving the mission objective is Therefore, this step requires updating the assignment results for each task target assigned to a specified task objective. Specifically, to determine The value of is first determined by generating random probabilities. ,like ,but, ;like ,but Similarly, to determine The value of is first determined by generating random probabilities. ,like ,but, ;like ,but . The method for determining the value is the same.
[0054] Step S104c: Based on the total number of aircraft in the multi-domain heterogeneous aircraft group, the transportation matching probability calculation model of each aircraft to the specified mission target, the updated allocation results, and the target guarantee value of the specified mission target, construct a transportation revenue calculation model for multiple aircraft jointly serving the specified mission target.
[0055] In this embodiment of the invention, multiple aircraft jointly serve the mission objective. The transportation revenue calculation model is expressed as .
[0056] The specific design block diagram for the target allocation algorithm based on the COMA architecture is as follows: Figure 3As shown, starting from the environmental state (including the capabilities of multi-domain heterogeneous aircraft, target information, and resource constraints), the dominance is first calculated, and the action probabilities of each agent are output through the policy networks of maritime platform 1 and airspace platform 2. Next, the interaction trajectories are stored in the experience replay pool. Data retrieved from the replay pool passes through the core component of the counterfactual baseline calculation module to complete credit allocation and update the policy network. Finally, the evaluation network of the command center integrates global information to update policy parameters and outputs the target network for stable training. The entire process adopts a centralized training distributed execution architecture, utilizing global information to achieve accurate credit allocation during training.
[0057] In one alternative implementation, when training a target multi-agent network model based on the Counterfactual Policy Gradient Algorithm (COMA) architecture, the reward function is a weighted sum of local and global rewards. The reward function is expressed as: , The adjustment coefficient representing the reward weight. Indicates a partial reward. This indicates the global reward.
[0058] The local reward is the change in the value of the first objective function resulting from a single allocation decision by the agent; the first objective function is a function that aims to maximize transportation revenue and minimize costs without considering the uncertainty of passage.
[0059] Local rewards are represented as: ;in, and These represent the allocation matrices before and after the agent makes a decision, respectively, and the elements in the allocation matrices are... Indicates aircraft With mission objectives The allocation results Denotes the first objective function. , This represents the total number of task objectives in the task objective set. This represents the total number of aircraft in a multi-domain heterogeneous aircraft swarm. Indicates aircraft For the mission objective The probability of transportation matching. Indicates aircraft With mission objectives The allocation results Indicates aircraft Assigned to task objectives , Indicates aircraft Not assigned to a task objective , Indicate the task objective The target guarantee value, The demand coefficient represents the need to reduce the cost of aircraft. Indicates aircraft The launch cost.
[0060] The global reward is the average of the second objective function value corresponding to the final allocation matrix across each decision step; the second objective function is a function that aims to maximize transportation revenue and minimize costs, taking into account the uncertainty of passage.
[0061] The global reward is represented as: ;in, This represents the final allocation matrix, and the elements in the final allocation matrix. Indicates aircraft With mission objectives The final allocation result, , This indicates that multiple aircraft work together to serve the mission objective. The probability of successful joint delivery Represents random probability. Describes the second objective function. .
[0062] Based on the general model of multi-aircraft mission allocation and the fundamental objective of maximizing transportation revenue, the objective function for multi-aircraft cooperation can be obtained, which is the expected sum of the target guarantee values. However, in practical applications, multi-aircraft mission allocation needs to balance maximizing transportation revenue and minimizing costs, while considering robustness to dynamic environments. Therefore, this embodiment of the invention considers travel uncertainty, takes maximizing the expected target guarantee value as its core, and suppresses excessive consumption through cost terms to construct the aforementioned second objective function.
[0063] In COMA, each agent has an individual actor network that outputs the probability of action assignment based on local observations, and shares a central critic network for evaluation. The counterfactual advantage function addresses the reliability allocation problem in multi-agent collaboration by quantifying the independent contribution of a single agent's action to the global reward. This function updates the actor networks of each agent during training using global information, enabling information exchange. In task assignment scenarios, if a single agent chooses to assign an aircraft to the task objective, the counterfactual advantage function compares this decision with the difference in global transportation revenue when no assignment is made, thus eliminating the influence of other aircraft assignment decisions and independently evaluating the aircraft's assignment value. For agent u, its counterfactual advantage function is: ,in, The value of the joint action output by the critic network reflects the global transportation revenue under the current strategy; The policy distribution of agent u is obtained based on local observations; For the combination of actions of other intelligent agents, A joint action when only agent u performs an action, while keeping the actions of other agents unchanged.
[0064] The Actor network updates its parameters by maximizing the expected advantage, and its gradient formula is: ,in, For the Actor network parameters of agent u, The action probability distribution is based on the softmax function. It is calculated from the counterfactual advantage function. During training, stochastic gradient descent is used to optimize the gradient.
[0065] The Critic network updates its parameters by minimizing the time difference (TD) error, with the loss function being: The target Q value y is defined as: , and These are the parameters for the current Critic network and the target Critic network, respectively. For joint awards, This is the discount factor.
[0066] Compared with existing technologies, the embodiments of this invention achieve significant improvements in four dimensions: allocation efficiency, convergence speed, scalability, and environmental robustness. First, unified modeling and state coding break through the assumption of platform homogeneity in traditional methods, enabling precise coordination of heterogeneous aircraft in multiple domains such as air and sea, and the allocation scheme is closer to real needs. Second, the introduction of the counterfactual credit allocation mechanism effectively eliminates the non-stationarity and credit ambiguity problems in multi-agent learning, making policy gradient updates more accurate and the training process more stable. Third, the dual-channel reward structure realizes an explicit trade-off between transportation benefits, resource consumption, and allocation risks, guiding the policy to automatically approach the Pareto optimal solution. Finally, the highly scalable training strategy supports the smooth expansion of the number of targets and maintains stable convergence under complex scenarios such as target position perturbations and platform performance fluctuations.
[0067] Overall, the embodiments of the present invention demonstrate higher decision quality, faster training convergence, stronger scalability and better anti-disturbance capability compared with traditional allocation, heuristic algorithms and existing intelligent algorithms in static and quasi-static multi-objective allocation tasks. They provide efficient and intelligent solutions for practical subsequent applications such as cross-platform group coordination, UAV formation decision-making and emergency resource scheduling.
[0068] Example 2 This invention also provides a cross-domain intelligent target allocation device for heterogeneous aircraft swarms based on the COMA architecture. This device is mainly used to execute the cross-domain intelligent target allocation method for heterogeneous aircraft swarms based on the COMA architecture provided in Embodiment 1 above. The device provided in this invention will be described in detail below.
[0069] Figure 4 A functional block diagram of a cross-domain intelligent target allocation device for heterogeneous aircraft swarms based on COMA architecture, provided for embodiments of the present invention, is shown below. Figure 4 As shown, the device mainly includes: an acquisition module 10, a construction module 20, a traversal and allocation module 30, and a determination module 40, wherein: The acquisition module 10 is used to acquire the attribute information of each aircraft in the multi-domain heterogeneous aircraft group and the attribute information of each mission target in the mission target set. The attribute information of the aircraft includes: position, flight speed, flight direction angle, launch cost, transport payload and delivery success rate to each mission target. The attribute information of the mission target includes: position, movement speed and target support value.
[0070] Module 20 is used to construct a transportation matching probability calculation model for each aircraft to each mission objective, a joint transportation matching probability calculation model for multiple aircraft jointly serving the same mission objective, and a transportation revenue calculation model based on the attribute information of the aircraft and the attribute information of the mission objective.
[0071] The traversal and allocation module 30 is used to traverse each combination of aircraft and each mission objective, and sequentially use the multi-dimensional feature vector of each combination as the local observation of the agent corresponding to the domain to which the aircraft belongs. The action of the agent is output using the target multi-agent network model based on the counterfactual policy gradient algorithm COMA architecture. The multi-dimensional feature vector includes: the launch cost of the aircraft, the target guarantee value of the mission objective, information reflecting the coupling relationship between resource use and mission progress, the transportation matching probability of the combination, the joint transportation matching probability and transportation benefit before and after the aircraft is allocated to the mission objective. The action of the agent represents the allocation result of the aircraft and mission objective in the combination. The allocation result includes one of the following: allocated, not allocated.
[0072] The determination module 40 is used to determine the cross-domain intelligent target allocation result of the multi-domain heterogeneous aircraft swarm based on all actions of all agents.
[0073] This invention employs a multi-dimensional feature vector, comprising information on aircraft launch cost, target assurance value of mission objectives, information reflecting the coupling relationship between resource usage and mission progress, transportation matching probability of aircraft-mission objective combinations, and joint transportation matching probability and transportation benefits before and after aircraft allocation to mission objectives, as local observations of the agent. Combined with a target multi-agent network model based on the COMA architecture, this achieves an efficient solution to the cross-domain target allocation problem for multi-domain heterogeneous aircraft swarms. By constructing a multi-dimensional feature vector for each aircraft and each mission objective combination, this device overcomes the assumption of platform homogeneity in traditional methods, achieving precise coordination among multi-domain heterogeneous aircraft and making the allocation scheme closer to real-world needs. Furthermore, this invention applies a counterfactual multi-agent policy gradient framework to the multi-domain heterogeneous aircraft target allocation algorithm. In the environment of collaborative transportation of multi-domain heterogeneous aircraft, the counterfactual learning mechanism can effectively solve the problem of ambiguity in credit allocation caused by capability differences between heterogeneous platforms, achieving efficient decision-making for cross-domain collaboration.
[0074] Optionally, building module 20 includes: The first construction unit is used to construct an angular dominance function of the target aircraft serving the specified mission objective based on the position of the target aircraft, the flight direction angle, and the position of the specified mission objective; wherein, the target aircraft represents any aircraft in the multi-domain heterogeneous aircraft group; and the specified mission objective represents any mission objective in the set of mission objectives.
[0075] The second building unit is used to construct a speed advantage function of the target aircraft serving the specified mission objective based on the flight speed of the target aircraft and the movement speed of the specified mission objective.
[0076] The third building unit is used to construct the distance advantage function of the target aircraft serving the specified mission objective based on the position of the target aircraft, the position of the specified mission objective, and the median of the optimal effective range of the aircraft.
[0077] The fourth building unit is used to construct the overall dominance function of the target aircraft serving the specified mission objective based on the angle dominance function, velocity dominance function, and distance dominance function.
[0078] The fifth building unit is used to construct a transportation matching probability calculation model for the target aircraft to the specified mission objective based on the overall dominance function and the transport payload of the target aircraft.
[0079] Optionally, the joint transport matching probability calculation model for multiple aircraft jointly serving the same mission objective is expressed as: ;in, This indicates that multiple aircraft work together to serve the mission objective. The probability of joint transportation matching. This represents the total number of aircraft in a multi-domain heterogeneous aircraft swarm. Indicates aircraft For the mission objective The probability of transportation matching. Indicates aircraft With mission objectives The allocation results Indicates aircraft Assigned to task objectives , Indicates aircraft Not assigned to a task objective .
[0080] Optionally, building module 20 also includes: The sixth building unit is used to construct a joint delivery success probability calculation model for multiple aircraft jointly serving a specified mission target, based on the delivery success rate of each aircraft to the specified mission target, the allocation result of each aircraft to the specified mission target, and the total number of aircraft in the multi-domain heterogeneous aircraft group.
[0081] The update unit is used to update the allocation result of each assigned to the specified mission objective based on the joint delivery success probability calculation model of multiple aircraft jointly serving the specified mission objective, and obtain the updated allocation result.
[0082] The seventh building unit is used to construct a transportation revenue calculation model for multiple aircraft jointly serving a specified mission objective, based on the total number of aircraft in a multi-domain heterogeneous aircraft group, the transportation matching probability calculation model of each aircraft to a specified mission objective, the updated allocation results, and the target guarantee value of the specified mission objective.
[0083] Optionally, when training a target multi-agent network model based on the counterfactual policy gradient algorithm COMA architecture, the reward function is a weighted sum of local and global rewards.
[0084] Local reward is the change in the value of the first objective function resulting from a single allocation decision by the agent; the first objective function is a function that aims to maximize transportation revenue and minimize costs without considering the uncertainty of passage.
[0085] The global reward is the average of the second objective function value corresponding to the final allocation matrix across each decision step; the second objective function is a function that aims to maximize transportation revenue and minimize costs, taking into account the uncertainty of passage.
[0086] Optionally, the local reward is represented as: ;in, and These represent the allocation matrices before and after the agent makes a decision, respectively, and the elements in the allocation matrices are... Indicates aircraft With mission objectives The allocation results Denotes the first objective function. , This represents the total number of task objectives in the task objective set. This represents the total number of aircraft in a multi-domain heterogeneous aircraft swarm. Indicates aircraft For the mission objective The probability of transportation matching. Indicates aircraft With mission objectives The allocation results Indicates aircraft Assigned to task objectives , Indicates aircraft Not assigned to a task objective , Indicate the task objective The target guarantee value, The demand coefficient represents the need to reduce the cost of aircraft. Indicates aircraft The launch cost.
[0087] Optionally, the global reward is represented as: ;in, This represents the final allocation matrix, and the elements in the final allocation matrix. Indicates aircraft With mission objectives The final allocation result, , This indicates that multiple aircraft work together to serve the mission objective. The probability of successful joint delivery Represents random probability. Describes the second objective function. .
[0088] Example 3 See Figure 5 This invention provides an electronic device, which includes a processor 60, a memory 61, a bus 62, and a communication interface 63. The processor 60, the communication interface 63, and the memory 61 are connected via the bus 62. The processor 60 is used to execute executable modules, such as computer programs, stored in the memory 61.
[0089] The memory 61 may include high-speed random access memory (RAM) or non-volatile memory, such as at least one disk storage device. Communication between this system network element and at least one other network element is achieved through at least one communication interface 63 (which can be wired or wireless), such as the Internet, wide area network, local area network, metropolitan area network, etc.
[0090] Bus 62 can be an ISA bus, PCI bus, or EISA bus, etc. The bus can be divided into address bus, data bus, control bus, etc. For ease of representation, Figure 5 The symbol is represented by a single double-headed arrow, but this does not mean that there is only one bus or one type of bus.
[0091] The memory 61 is used to store programs. After receiving an execution instruction, the processor 60 executes the program. The method executed by the apparatus defined by the process disclosed in any of the foregoing embodiments of the present invention can be applied to the processor 60 or implemented by the processor 60.
[0092] Processor 60 may be an integrated circuit chip with signal processing capabilities. In implementation, each step of the above method can be completed by the integrated logic circuitry in the hardware of processor 60 or by instructions in software form. Processor 60 can be a general-purpose processor, including a Central Processing Unit (CPU), a Network Processor (NP), etc.; it can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. It can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of this invention. The general-purpose processor can be a microprocessor or any conventional processor. The steps of the methods disclosed in the embodiments of this invention can be directly embodied in the execution of a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software modules can reside in random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, or other mature storage media in the art. The storage medium is located in memory 61. Processor 60 reads the information in memory 61 and, in conjunction with its hardware, completes the steps of the above method.
[0093] The computer program product of the cross-domain intelligent target allocation method for heterogeneous aircraft swarms based on COMA architecture provided in this embodiment of the invention includes a computer-readable storage medium storing non-volatile program code executable by a processor. The instructions included in the program code can be used to execute the methods described in the preceding method embodiments. For specific implementation, please refer to the method embodiments, which will not be repeated here.
[0094] In addition, the functional units in the various embodiments of the present invention can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit.
[0095] If the aforementioned functions are implemented as software functional units and sold or used as independent products, they can be stored in a processor-executable, non-volatile, computer-readable storage medium. Based on this understanding, the technical solution of this invention, essentially, or the part that contributes to the prior art, or a portion of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0096] It should be noted that similar labels and letters in the following figures indicate similar items. Therefore, once an item is defined in one figure, it does not need to be further defined and explained in subsequent figures.
[0097] In the description of this invention, it should be noted that the terms "center," "upper," "lower," "left," "right," "vertical," "horizontal," "inner," and "outer," etc., indicate the orientation or positional relationship based on the orientation or positional relationship shown in the accompanying drawings, or the orientation or positional relationship commonly used when the product of this invention is in use. They are only for the convenience of describing this invention and simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation, and therefore should not be construed as a limitation of this invention. In addition, the terms "first," "second," "third," etc., are only used to distinguish descriptions and should not be construed as indicating or implying relative importance.
[0098] Furthermore, terms such as "horizontal," "vertical," and "sag" do not imply that components must be absolutely horizontal or suspended, but rather that they can be slightly tilted. For example, "horizontal" simply means that its direction is more horizontal relative to "vertical," and does not mean that the structure must be completely horizontal, but can be slightly tilted.
[0099] In the description of this invention, it should also be noted that, unless otherwise explicitly specified and limited, the terms "set," "install," "connect," and "link" should be interpreted broadly. For example, they can refer to a fixed connection, a detachable connection, or an integral connection; they can refer to a mechanical connection or an electrical connection; they can refer to a direct connection or an indirect connection through an intermediate medium; and they can refer to the internal connection of two components. Those skilled in the art can understand the specific meaning of the above terms in this invention based on the specific circumstances.
[0100] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some or all of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present invention.
Claims
1. A heterogeneous aircraft group cross-domain intelligent target assignment method based on a COMA architecture, characterized in that, include: Obtain the attribute information of each aircraft in the multi-domain heterogeneous aircraft swarm and the attribute information of each mission objective in the mission objective set; The aircraft's attribute information includes: position, flight speed, flight heading angle, launch cost, transport payload, and delivery success rate to each mission target; the mission target's attribute information includes: position, movement speed, and target support value. Based on the attribute information of the aircraft and the attribute information of the mission target, a transportation matching probability calculation model for each aircraft to each mission target, a joint transportation matching probability calculation model for multiple aircraft jointly serving the same mission target, and a transportation revenue calculation model are constructed. The system iterates through each combination of aircraft and mission objective, and sequentially uses the multi-dimensional feature vector of each combination as the local observation of the agent corresponding to the domain to which the aircraft belongs. It then outputs the agent's action using a target multi-agent network model based on the Counterfactual Policy Gradient Algorithm (COMA) architecture. The multi-dimensional feature vector includes: the launch cost of the aircraft, the target assurance value of the mission objective, information reflecting the coupling relationship between resource usage and mission progress, the transport matching probability of the combination, the joint transport matching probability before and after the aircraft is allocated to the mission objective, and the transport revenue. The agent's action represents the allocation result between the aircraft and the mission objective in the combination; the allocation result includes one of the following: allocated, not allocated. The cross-domain intelligent target allocation result of the multi-domain heterogeneous aircraft swarm is determined based on all actions of all agents.
2. The heterogeneous aircraft fleet cross-domain intelligent target assignment method based on the COMA architecture according to claim 1, characterized in that, Based on the attribute information of the aircraft and the attribute information of the mission target, a transportation matching probability calculation model for each aircraft to each mission target is constructed, including: Based on the position of the target aircraft, its flight direction angle, and the position of the designated mission target, an angle dominance function is constructed for the target aircraft to serve the designated mission target; wherein, the target aircraft refers to any aircraft in the multi-domain heterogeneous aircraft group; and the designated mission target refers to any mission target in the set of mission targets; Based on the flight speed of the target aircraft and the movement speed of the designated mission target, a speed advantage function for the target aircraft in serving the designated mission target is constructed; Based on the location of the target aircraft, the location of the designated mission target, and the maximum effective service distance of the aircraft, a distance advantage function for the target aircraft to serve the designated mission target is constructed; Based on the angle dominance function, the velocity dominance function, and the distance dominance function, an overall dominance function is constructed for the target aircraft to serve the specified mission objective; Based on the overall dominance function and the transport payload of the target aircraft, a transport matching probability calculation model for the target aircraft to the specified mission objective is constructed.
3. The cross-domain intelligent target allocation method for heterogeneous aircraft swarms based on COMA architecture according to claim 2, characterized in that, The joint transport matching probability calculation model for multiple aircraft jointly serving the same mission objective is expressed as follows: ;in, This indicates that multiple aircraft work together to serve mission objectives. The probability of joint transportation matching, This represents the total number of aircraft in the multi-domain heterogeneous aircraft group. Indicates aircraft For the mission objective The probability of transportation matching. Indicates aircraft With mission objectives The allocation results Indicates aircraft Assigned to task objectives , Indicates aircraft Not assigned to a task objective .
4. The heterogeneous aircraft fleet cross-domain intelligent target assignment method based on the COMA architecture according to claim 2, characterized in that, Based on the attribute information of the aircraft and the attribute information of the mission objective, a transportation revenue calculation model for multiple aircraft jointly serving the same mission objective is constructed, including: Based on the delivery success rate of each aircraft to the designated mission target, the allocation result of each aircraft to the designated mission target, and the total number of aircraft in the multi-domain heterogeneous aircraft group, a joint delivery success probability calculation model for multiple aircraft jointly serving the designated mission target is constructed. Based on the random probability and the joint delivery success probability calculation model of the multiple aircraft jointly serving the specified mission target, the allocation result of each assigned to the specified mission target is updated to obtain the updated allocation result; Based on the total number of aircraft in the multi-domain heterogeneous aircraft group, the transportation matching probability calculation model of each aircraft to the specified mission target, the updated allocation result, and the target guarantee value of the specified mission target, a transportation revenue calculation model for multiple aircraft jointly serving the specified mission target is constructed.
5. The heterogeneous aircraft fleet cross-domain intelligent target assignment method based on the COMA architecture according to claim 1, characterized in that, When training the target multi-agent network model based on the counterfactual policy gradient algorithm COMA architecture, the reward function is a weighted sum of local and global rewards; The local reward is the change in the value of the first objective function resulting from a single allocation decision by the agent; The first objective function is a function that aims to maximize transportation revenue and minimize costs without considering traffic uncertainty. The global reward is the average of the second objective function value corresponding to the final allocation matrix across each decision step. The second objective function is one that considers the uncertainty of passage and aims to maximize transportation revenue and minimize costs.
6. The cross-domain intelligent target allocation method for heterogeneous aircraft swarms based on COMA architecture according to claim 5, characterized in that, The local reward is represented as: ;in, and These represent the allocation matrices before and after the agent makes a decision, respectively, and the elements in the allocation matrices are... Indicates aircraft With mission objectives The allocation results Denotes the first objective function. , This represents the total number of task objectives in the set of task objectives. This represents the total number of aircraft in the multi-domain heterogeneous aircraft group. Indicates aircraft For the mission objective The probability of transportation matching. Indicates aircraft With mission objectives The allocation results Indicates aircraft Assigned to task objectives , Indicates aircraft Not assigned to a task objective , Indicate the task objective The target guarantee value, The demand coefficient represents the need to reduce the cost of aircraft. Indicates aircraft The launch cost.
7. The cross-domain intelligent target allocation method for heterogeneous aircraft swarms based on COMA architecture according to claim 6, characterized in that, The global reward is represented as follows: ;in, This represents the final allocation matrix, and the elements in the final allocation matrix. Indicates aircraft With mission objectives The final allocation result, , This indicates that multiple aircraft work together to serve mission objectives. The probability of successful joint delivery Represents random probability. Describes the second objective function. .
8. A cross-domain intelligent target allocation device for heterogeneous aircraft swarms based on COMA architecture, characterized in that, include: The acquisition module is used to acquire the attribute information of each aircraft in the multi-domain heterogeneous aircraft group and the attribute information of each mission target in the mission target set; The aircraft's attribute information includes: position, flight speed, flight heading angle, launch cost, transport payload, and delivery success rate to each mission target; the mission target's attribute information includes: position, movement speed, and target support value. The construction module is used to construct, based on the attribute information of the aircraft and the attribute information of the mission target, a transportation matching probability calculation model for each aircraft to each mission target, a joint transportation matching probability calculation model for multiple aircraft jointly serving the same mission target, and a transportation revenue calculation model. The traversal and allocation module is used to traverse each combination of aircraft and each mission objective, and sequentially use the multi-dimensional feature vector of each combination as the local observation of the agent corresponding to the domain to which the aircraft belongs. It then outputs the action of the agent using a target multi-agent network model based on the Counterfactual Policy Gradient Algorithm (COMA) architecture. The multi-dimensional feature vector includes: the launch cost of the aircraft, the target assurance value of the mission objective, information reflecting the coupling relationship between resource usage and mission progress, the transport matching probability of the combination, the joint transport matching probability before and after the aircraft is allocated to the mission objective, and the transport revenue. The action of the agent represents the allocation result of the aircraft and mission objective in the combination; the allocation result includes one of the following: allocated, not allocated. The determination module is used to determine the cross-domain intelligent target allocation result of a multi-domain heterogeneous aircraft swarm based on all actions of all agents.
9. An electronic device comprising a memory and a processor, wherein the memory stores a computer program executable on the processor, characterized in that, When the processor executes the computer program, it implements the cross-domain intelligent target allocation method for heterogeneous aircraft swarms based on the COMA architecture as described in any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer instructions, which, when executed by a processor, implement the cross-domain intelligent target allocation method for heterogeneous aircraft swarms based on the COMA architecture as described in any one of claims 1 to 7.