Unmanned cluster multi-target encirclement and capture resource perception type task allocation method, device, equipment and medium

By modeling the relationship between artificial potential field and target, and combining heuristic algorithms and reinforcement learning, dynamic optimization allocation of resources for unmanned swarm hunting was achieved. This solved the problems of resource waste and insufficient dynamic adaptability in traditional methods, and improved task efficiency and success rate.

CN121979618BActive Publication Date: 2026-06-19NAT UNIV OF DEFENSE TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
NAT UNIV OF DEFENSE TECH
Filing Date
2026-04-09
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing unmanned swarm capture technologies suffer from waste and inefficiency in resource allocation, especially lacking dynamic environmental adaptability in multi-target scenarios, making it difficult to optimize global task performance.

Method used

The artificial potential field method is used to model the relationship between unmanned swarms and targets. Combined with heuristic algorithms and reinforcement learning, resources are initialized and dynamically adjusted. Distributed cooperative encirclement behavior logic is designed. Through target tracking, cooperative encirclement and state switching, dynamic optimization allocation of resources is achieved.

Benefits of technology

It improves the resource utilization efficiency and mission success rate of unmanned swarm hunting, solves the problems of blind resource allocation and dynamic adaptability in traditional methods, and realizes the scalability of the system and resources.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN121979618B_ABST
    Figure CN121979618B_ABST
Patent Text Reader

Abstract

This application relates to the field of unmanned swarm cooperative strategy technology, and discloses a resource-aware task allocation method, device, equipment, and medium for multi-target encirclement in unmanned swarms. The method includes: modeling the relationship between agents and targets in an unmanned swarm encirclement scenario; summarizing a resource experience base and completing the initial resource allocation in the encirclement initiation phase through a heuristic algorithm-based agent-target resource allocation initialization method; and capturing scene information and dynamically adjusting and optimizing resources through reward and punishment learning using a reinforcement learning-based dynamic encirclement resource adaptive allocation optimization method. This application solves key problems in encirclement systems such as poor scalability, blind resource allocation, and insufficient dynamic adaptability.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of unmanned swarm collaborative strategy technology, specifically a method, apparatus, equipment, and medium for resource-aware task allocation in multi-target hunting of unmanned swarms. Background Technology

[0002] Current research on unmanned swarm hunting scenarios has the following problems:

[0003] Existing research on encirclement and capture missions often prioritizes mission completion over addressing the resource allocation between the unmanned swarm and the captured target. This typically leads to significant resource waste or inefficiency.

[0004] Existing resource allocation methods are significantly inadequate in adapting to dynamic environments. Especially in multi-target pursuit scenarios, they lack the ability to adjust resource allocation in real time according to changes in the dynamic environment, making it difficult to dynamically optimize the overall effectiveness of the capture mission.

[0005] In summary, there is an urgent need for a new technical solution for resource-aware task allocation in multi-target hunting of unmanned swarms. Summary of the Invention

[0006] The purpose of this application is to provide a resource-aware task allocation method, device, equipment and medium for multi-target hunting in unmanned swarms, so as to solve the problem of resource optimization allocation in multi-target hunting under complex environments in the prior art.

[0007] To achieve the above objectives, this application provides a resource-aware task allocation method for multi-target encirclement in unmanned swarms, the method comprising:

[0008] S10: Modeling the relationship between intelligent agents and targets in unmanned swarm hunting scenarios. This modeling uses the artificial potential field method to model the unmanned swarm and targets according to their functions and dependencies, forming the task execution process;

[0009] S20: By using a heuristic algorithm-based method for initializing the resource allocation between the agent and the target, the resource experience base is summarized and the initial resource allocation for the encirclement and capture initiation phase is completed.

[0010] S30: By leveraging a reinforcement learning-based dynamic encirclement resource adaptive allocation optimization method, scene information is captured and resources are dynamically adjusted and optimized through reward and punishment learning.

[0011] Preferably, the task execution process includes: designing the encirclement behavior logic of the unmanned swarm using a distributed collaborative method, and carrying out dynamic collaborative encirclement of the unmanned swarm through target tracking, collaborative encirclement and state switching.

[0012] As a preferred option, S20 specifically involves: summarizing resource allocation based on historical experience gained from training, according to task requirements and environmental information; this training includes: analysis of encirclement task requirements and construction of constraint system, design of resource consumption cost of encirclement task and heuristic algorithm initialization of resource allocation process.

[0013] As a preferred option, S30 specifically addresses the dynamic nature and non-global information constraints of the encirclement scenario by constructing a reinforcement learning-driven online resource allocation optimization framework, with real-time adaptation, dynamic optimization, and convergence assurance as its core. Through the interaction and iteration between the agent and the environment, the resource allocation scheme is continuously optimized.

[0014] Preferably, the target tracking specifically involves: when the UAV obtains the estimated position of the target through its own detection or cluster communication sharing, it generates a tracking velocity vector pointing towards the target, and the tracking velocity direction always points towards the target;

[0015] The coordinated encirclement specifically involves: designing a short-range separation repulsion force to avoid collisions or excessive aggregation of drones within the swarm; the final control speed of the drone is the weighted sum of the tracking vector and the separation repulsion force, achieving a dynamic balance before and after the encirclement is completed;

[0016] The state switching is specifically as follows: the behavior mode is dynamically adjusted based on the number of pursuers of the same target in the communication topology. If the current drone ranking is greater than the number of pursuers, the current target is abandoned and the search continues.

[0017] As a preferred embodiment, the encirclement task requirement analysis and constraint system construction combines geometric distribution and dynamic characteristics to give a quantitative standard for judging the success of the encirclement, and gives two types of constraints based on the actual feasibility of the task, including drone allocation constraints and success rate constraints.

[0018] The resource consumption cost design for the encirclement mission aims to minimize the overall resource consumption of the mission, while avoiding the irrational approach of infinitely increasing the number of drones in pursuit of time efficiency, thus achieving efficient use of resources.

[0019] The heuristic algorithm initializes the resource allocation process, outputting an initial resource allocation scheme that satisfies constraints and has the optimal overall cost through quantitative iterative search and multi-dimensional evaluation.

[0020] As a preferred embodiment, S30 includes:

[0021] Construct a reinforcement learning environment model that fits the dynamic characteristics of the capture, clarify the quantitative definition of state and action space, and provide a foundation for online interaction;

[0022] With the optimization orientation of capture efficiency, resource economy and dynamic adaptability, a weighted collaborative reward function is designed to balance immediate benefits and long-term goals.

[0023] A dual-network structure consisting of a policy network and a value network is constructed. The policy network is a fully connected network that outputs the probability distribution of resource allocation adjustments after inputting the encirclement state. The value network is also a fully connected network that outputs the state value after inputting the encirclement state, which is used to evaluate the long-term benefits of the current situation.

[0024] To achieve the above objectives, this application also provides a resource-aware task allocation device for multi-target hunting of unmanned swarms, which applies the resource-aware task allocation method for multi-target hunting of unmanned swarms as described above. The device includes:

[0025] The encirclement scenario modeling module is used to model the relationship between intelligent agents and targets in unmanned swarm encirclement scenarios. This modeling uses the artificial potential field method to model the unmanned swarm and targets according to their functions and dependencies, forming the task execution process.

[0026] The initial resource allocation module is used to summarize the resource experience base and complete the initial resource allocation in the encirclement and capture initiation phase by using a heuristic algorithm-based intelligent agent and target resource ratio initialization method.

[0027] The resource dynamic adjustment and optimization module is used to capture scene information and achieve dynamic adjustment and optimization of resources through reward and punishment learning by leveraging a reinforcement learning-based dynamic encirclement resource adaptive allocation optimization method.

[0028] To achieve the above objectives, this application also provides a resource-aware task allocation computer device for unmanned swarm multi-target encirclement and capture, including at least one processor, at least one memory, and a data bus;

[0029] The processor and the memory communicate with each other via the data bus;

[0030] The memory stores program instructions that are executed by the processor, which invokes the program instructions to execute the resource-aware task allocation method for multi-target capture in unmanned swarms as described above.

[0031] To achieve the above objectives, this application also provides a medium on which a computer program is stored, which, when executed by a processor, implements the resource-aware task allocation method for unmanned swarm multi-target capture as described above.

[0032] Beneficial Effects: This application presents a resource-aware task allocation method, apparatus, equipment, and medium for unmanned swarm multi-target encirclement, constructing a complete technical system for modeling, evaluation, and optimization. It solves key problems in traditional encirclement systems such as poor scalability, blind resource allocation, and insufficient dynamic adaptability. Specifically, this includes: distributed multi-agent interactive modeling to improve system scalability; simulation-driven correlation evaluation to ensure the scientific nature of resource allocation; and reinforcement learning-driven online optimization to enhance dynamic adaptability. Based on the above, it not only improves the technical framework of unmanned swarm encirclement but also provides new ideas and methods for resource optimization and allocation in complex dynamic scenarios, laying a solid foundation for the engineering application of unmanned swarm encirclement technology. Attached Figure Description

[0033] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0034] Figure 1 A flowchart illustrating the resource-aware task allocation method for multi-target encirclement and capture in unmanned swarms provided in this application embodiment;

[0035] Figure 2 The diagram shows the structure of the unmanned swarm multi-target encirclement resource-aware task allocation device provided in this application embodiment; in the diagram: 10, encirclement scene modeling module; 20, initial resource allocation module; 30, resource dynamic adjustment and optimization module.

[0036] The implementation, functional features, and advantages of this invention will be further explained in conjunction with the embodiments and with reference to the accompanying drawings. Detailed Implementation

[0037] The technical solutions in the embodiments of this application will be clearly and completely described below. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of this application.

[0038] In this document, the term "comprising" is intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0039] This embodiment focuses on the collaborative optimization of unmanned swarm capture. Addressing core issues in capture scenarios, such as inaccurate agent-target relationship modeling due to differences in multi-target characteristics, imbalance between real-time and optimizational resource allocation in dynamic environments, and insufficient adaptation of task assignment to target requirements, this embodiment proposes a multi-dimensional relationship modeling method covering multiple targets. It integrates target motion characteristics, agent heterogeneity, and environmental constraints to systematically describe the coupling logic of collaborative capture, providing theoretical support for resource allocation. Furthermore, it proposes a unified resource allocation and task assignment optimization method integrating heuristic algorithms and reinforcement learning. Heuristic algorithms quickly solve the initial resource configuration scheme for multi-target capture, while reinforcement learning dynamically adapts to unexpected situations such as target escape and agent failure during the capture process, achieving a balance between resource utilization efficiency and capture success rate. This optimization method solves specific challenges such as minimizing resource allocation for precise single-target capture and achieving balanced resource allocation for collaborative capture of swarm targets. This improves resource optimization in the field of unmanned swarm capture and promotes the engineering application of heuristic algorithms and reinforcement learning in multi-target collaborative tasks.

[0040] This embodiment first uses a modeling method for the relationship between agents and targets in unmanned swarm hunting scenarios to decouple their complex dependencies and adapt to the core characteristics of the scenario, such as target dynamism and non-global information. Next, it uses a heuristic algorithm-based method for initializing the resource allocation between agents and targets, summarizes a resource experience base, and completes the initial resource allocation in the hunting initiation phase, laying the foundation for subsequent collaborative tasks. Then, it uses a reinforcement learning-based dynamic hunting resource adaptive allocation optimization method to capture scenario information and achieve dynamic adjustment and optimization of resources through reward and punishment learning.

[0041] This embodiment will specifically address key challenges such as relationship decoupling, experience reuse, and resource allocation, ultimately supporting the resource allocation optimization system for unmanned swarm collaborative encirclement and capture to achieve efficient and accurate collaborative encirclement and capture resource allocation.

[0042] This paper describes a modeling method for the dynamics of unmanned swarms and targets based on the artificial potential field method. This embodiment uses the artificial potential field method to model the unmanned swarm and targets according to their functions and dependencies, forming a clear task execution process. The research will explore how to define the encirclement scenario within the established artificial potential field, ensuring that the proposed resource allocation optimization framework is practically meaningful and scalable. Particular attention is paid to the cooperative encirclement of the unmanned swarm and the quantity relationship with the targets, ensuring that task efficiency improvements can be achieved, such as savings in time, hardware, and fuel costs.

[0043] This section explains the modeling method for artificial potential fields in scenarios. The core idea of ​​the artificial potential field method is to simulate the motion of particles in a potential field in physical space. By constructing a superimposed potential field of virtual gravitational and repulsive fields, it guides intelligent agents to achieve target approach and obstacle avoidance in complex environments. In unmanned swarm encirclement scenarios, this method maps the unmanned swarm, the target, and environmental obstacles as force-bearing units in the potential field, decoupling the complex unmanned swarm-multi-target relationship into quantifiable potential field interactions, thus achieving dynamic modeling of collaborative encirclement.

[0044] The specific modeling method is as follows: a gravitational field is constructed with each target as the gravitational source to generate a gravitational force that drives all agents to approach the target area; at the same time, a multi-layer repulsive field is constructed, including the safety repulsive force of obstacles on agents, the collision avoidance repulsive force between agents, and the safety distance repulsive force around the target (to avoid excessive aggregation); the motion state of the agents is determined by the resultant force of the total potential field, and the dynamic adjustment of the motion direction and acceleration is realized by calculating the gradient of the potential field function.

[0045] Total potential function It is the superposition of gravitational potential field and repulsive potential field, specifically defined as follows:

[0046]

[0047] in, Let the current position vector of the agent be _____. The gravitational potential field generated for the target. It is a repulsive potential field.

[0048] Gravitational potential field function The specific definitions are as follows:

[0049]

[0050] in, The gravitational intensity coefficient, To determine the number of targets to be captured, For the first The position vectors of the targets. For intelligent agents and the first The Euclidean distance between the targets. The negative gradient of the gravitational field is the resultant gravitational force acting on the agent:

[0051]

[0052] The repulsive potential field function contains multiple types of repulsive sources, specifically defined as follows:

[0053]

[0054] in, The number of obstacles, The total number of agents. The current agent is assigned a number. Both the obstacle repulsion subfield and the agent-to-agent repulsion subfield are expressed as piecewise functions:

[0055]

[0056] in, The repulsive force intensity coefficient, For intelligent agents and the first The distance to each repulsive source (obstacle or other intelligent agent), Let be the effective radius of the repulsive force. The corresponding resultant repulsive force is:

[0057]

[0058] The resultant force of the agent's motion is:

[0059]

[0060] The agent updates its motion state according to the direction of the resultant force, achieving a dynamic balance between approaching the target, avoiding obstacles, and swarm collaboration, thereby completing the decoupled modeling of the multi-agent-multi-target relationship.

[0061] This document explains the method for modeling the dynamics of the target in a manhunt. To maintain physical realism while ensuring simulation efficiency, a simplified form of a second-order integrator model is used for the dynamics modeling of the target, namely a kinematic model based on velocity control, with physical constraints superimposed.

[0062] The state evolution process of the target being captured is described using a second-order integrator, and the core state variables and motion update rules are clarified:

[0063] First, the state space is defined, and the core states of the target are abstracted into vectors. ,in Represents the real-time position coordinates in a two-dimensional plane. express The velocity component in direction fully covers the key dimensions of motion.

[0064] The position update formula is as follows:

[0065]

[0066] in, To simulate the step size, the continuity and smoothness of the motion trajectory in the time discretization scenario are ensured.

[0067] To address the active evasion characteristics of the target during encirclement, an escape behavior dynamic model dominated by a repulsive field is constructed based on the artificial potential field method to simulate the target's dynamic response logic to the drone swarm:

[0068] Repulsive field generation: When drones When it enters the target's perception range, the target treats it as a source of repulsion, generating a repulsive force vector that moves away from the drone.

[0069]

[0070] in, For minute quantities, avoid division by zero errors.

[0071] Escape velocity synthesis: The target’s desired escape velocity is the vector sum of the repulsive forces of the UAVs across all detection ranges.

[0072]

[0073] If the target does not detect any drones, it will either perform a random walk or maintain its original movement pattern.

[0074] This paper describes the dynamics modeling for the control of drone swarms in encirclement and capture. A distributed collaborative approach is employed to design the encirclement and capture behavior logic of the drone swarm. Dynamic collaborative encirclement and capture is achieved through three core mechanisms: target tracking, collaborative encirclement, and state switching.

[0075] The unmanned swarm shares its own location, obtains the location of its neighbors, and receives the assigned target information with friendly units within the communication range. It tracks and surrounds the target within the detection range, and determines the number of drones to surround the target based on the range of the detection angle.

[0076] Target tracking: When a drone obtains an estimated target location through its own detection or swarm communication sharing. At that time, a tracking velocity vector pointing towards the target is generated, and the direction of the tracking velocity always points towards the target. The corresponding mathematical expression is:

[0077]

[0078] Cooperative Encirclement: To avoid collisions or excessive aggregation of drones within the swarm, a short-range separation repulsion force is designed. The final control speed of the drone is the weighted sum of the tracking vector and the separation repulsion force, achieving a dynamic balance before and after the encirclement is completed. The corresponding mathematical expression is:

[0079]

[0080] State transition: Based on the number of pursuers of the same target in the communication topology. Dynamically adjust behavior patterns ( (The number of drones required to capture a single target is determined based on the detection angle). If the current drone ID is greater than... If the current target is not found, the search continues, thus avoiding oversaturation of resources allocated to a single target.

[0081] This document describes a heuristic algorithm-based method for initializing resource allocation in encirclement tasks. This method, based on historical experience gained through training, can determine appropriate resource allocations according to task requirements and environmental information. The heuristic algorithm iteratively updates the initial allocation scheme by incorporating various constraints (cost, success rate, etc.) to ensure the accuracy and rationality of the allocation between the unmanned swarm and the target, avoiding unnecessary resource consumption.

[0082] This document explains the requirements analysis and constraint system construction for the encirclement and capture mission. The requirements for the encirclement and capture mission are mainly based on two major demands: target containment and resource efficiency, providing guidance for subsequent resource allocation.

[0083] Combining geometric distribution and dynamic characteristics, a quantitative standard for determining the success of the encirclement is given:

[0084] Distance coverage conditions: at least The drone enters the target's perimeter to form a physical blockade. (As determined by condition 2).

[0085] Angular distribution conditions: the azimuth angles of the drones participating in the encirclement relative to the target. After sorting. Maximum interval between adjacent angles. This ensures that the target is within a closed polygon with no escape sectors.

[0086] Based on the actual feasibility of the task, the following two types of constraints are given:

[0087] Drone allocation constraints: For drones With the goal The matching relationship The total number of drones, This is the total number of targets. This constraint prevents a single drone from being repeatedly assigned to multiple targets, eliminating the risk of resource conflicts.

[0088] Success rate constraint: ,in, The attenuation coefficient is... This represents the minimum success rate threshold. The minimum success rate constraint is determined by fitting success rates from historical simulation data, ensuring the reliability of the initial formulation scheme.

[0089] This document explains the design considerations for resource consumption costs in encirclement and capture missions. To balance mission efficiency and resource investment, a multi-dimensional comprehensive resource consumption cost design is implemented, quantifying the economics of resource allocation. The design aims to minimize the overall resource consumption of the encirclement and capture mission while avoiding irrational solutions that indiscriminately increase the number of drones in pursuit of time efficiency, thus achieving highly efficient resource utilization.

[0090] Construct the following weighted summation cost function:

[0091]

[0092] in: Represents energy consumption cost, , For drones The encirclement path length is the sum of the Euclidean distance between the target location and the drone's initial position and the trajectory length during the encirclement process. The energy consumption coefficient per unit path and the energy cost design quantify the energy consumption during the capture process, which is suitable for long-range driving scenarios. Representing time cost, it counts the total number of simulation steps required to complete the encirclement and capture of all targets, which can directly reflect the execution efficiency of the encirclement and capture task and is suitable for speed-sensitive scenarios such as emergency response. , and This is a weighting coefficient that can be dynamically adjusted according to the needs of different scenarios and tasks. To determine the hardware cost, the number of drones involved in the operation is counted, and the fixed cost per drone is given. This avoids redundant drone numbers and ensures the actual economic efficiency of the allocation.

[0093] This document describes the process of initializing resource allocation using a heuristic algorithm. This embodiment outputs an initial resource allocation scheme that satisfies constraints and achieves optimal overall cost through quantitative iterative search and multi-dimensional evaluation. Specifically:

[0094] First, complete the structured input of the task requirements, and read the target quantity and the agent's capability set. Define the core parameters and the structured input parameter set. The system encodes non-numerical parameters using 0-1 encoding and standardizes agent capability values ​​to ensure the consistency and computability of input data. Subsequently, a constraint-cost association mapping is performed to transform the parameter set... Mapped to a quantized set of constraints , ( Assign constraints to the agent, To constrain the ability to capture, The constraint-cost relationship matrix is ​​formed by integrating the success rate constraint and the comprehensive cost function. This process binds task requirements with the solution objective; then, resource allocation and solution space initialization are performed, generating an initial solution space based on the quantitative relationship between the agent and the objective. A single proportioning scheme is represented by a binary matrix. The process involves uniform random sampling during generation, discarding invalid solutions that violate C1 to ensure the validity of the initial solution. Heuristic iterative search optimization is then performed, iteratively optimizing the initial solution space, retaining solutions that satisfy C1 after each iteration. Finally, the comprehensive cost of the solutions is calculated and sorted in ascending order, generating an evaluation result set that is fed back to the iteration unit. The iteration terminates when the conditions are met (the number of iterations reaches 100 or the cost change is less than 10 times for 10 consecutive iterations). When the cost is minimized, the feasible solution is taken as the optimal initial allocation scheme. Meanwhile, the feasible solution set is stored in the historical solution pool to provide a reference for the generation of solutions for subsequent similar tasks, thereby improving the solution efficiency.

[0095] This document describes an online resource allocation optimization method based on reinforcement learning for a trapping scenario. Addressing the dynamic nature of trapping scenarios (target escape, environmental changes) and non-global information constraints, this embodiment constructs a reinforcement learning-driven online resource allocation optimization framework centered on "real-time adaptation - dynamic optimization - convergence guarantee." Through iterative interaction between the agent and the environment, the resource allocation scheme is continuously optimized.

[0096] This document explains the modeling of a reinforcement learning environment and the definition of its state-action space for a trapping scenario. It constructs a reinforcement learning environment model that closely reflects the dynamic characteristics of trapping, clearly defining the quantitative aspects of state and action space, thus providing a foundation for online interaction.

[0097] Environmental model components: Environment Includes a set of intelligent agents Target set Dynamic constraint set (Speed, Boundary, Obstacle Avoidance) and State Transition Rules ,in, This is the current state. For the next state, For action.

[0098] state space Define a high-dimensional state vector. It comprehensively represents the real-time situation and resource allocation status of the encirclement and capture operation.

[0099]

[0100] Wherein: the target state includes , and The state of the agent includes , and Resource allocation status includes and Environmental constraints include ; / For the target's position / velocity matrix, The target clustering degree; / This is the position / velocity matrix of the agent. This represents the agent's remaining capability vector. for Time-based resource allocation matrix; The estimated success rate of the current capture operation; ) represents the minimum distance between the agent and the obstacle.

[0101] Action Space A: Action Defined as the incremental adjustment amount of resource allocation, i.e. ,satisfy:

[0102]

[0103] in, To maximize the range of adjustments per operation (avoiding sudden changes in proportions), ensuring the feasibility and smoothness of the action.

[0104] This document describes the design of a reward function for unmanned swarm cooperative hunting. This embodiment optimizes the hunting process by considering "hunting efficiency, resource economy, and dynamic adaptability," and designs a weighted cooperative reward function. Balancing immediate gains with long-term goals:

[0105] Terminal rewards Triggered upon successful capture mission; negatively correlated with capture time, encouraging rapid completion.

[0106]

[0107] in, Basic reward coefficient, This is the initial moment to begin optimization. This represents the maximum allowable time.

[0108] Instant rewards Feedback at each time step, comprehensively considering success rate and resource cost:

[0109]

[0110] in, The overall cost of the initial proportions, and As weight, This is an estimate of the current success rate.

[0111] Penalty items To ensure the feasibility of the solution, penalties are imposed for constraint violations and adaptation failures.

[0112]

[0113] in: For indicator functions; Penalties for violating allocation constraints; Penalty for failure of encirclement; Penalty for collision; express There is always the possibility of intelligent agents colliding or overstepping boundaries.

[0114] Total reward discount accumulation: ,in, As a discount factor, When the task ends, the strategy should focus on long-term benefits.

[0115] This document describes the online iterative optimization and convergence of resource allocation strategies. This embodiment employs the Proximal Policy Optimization (PPO) algorithm to achieve online iterative optimization of the resource allocation for encirclement, balancing exploration and utilization while ensuring policy convergence in dynamic scenarios. First, a dual-network structure of "policy network + value network" is constructed: the policy network... For a fully connected network, input the capture status. Post-output resource allocation adjustment amount probability distribution; value network Also a fully connected network, input The output state value is used to assess the long-term benefits of the current situation.

[0116] The policy update is based on the clipped surrogate objective function of PPO to avoid instability caused by excessively large update magnitudes. The core formula is:

[0117]

[0118] in: This is the old strategy; The clip threshold; The advantage function is used to quantify actions. The relative returns.

[0119] The online iteration process is as follows: every time step The agent, based on the current state With strategy Sampling action Adjusting resource allocation They then interacted with the environment and received rewards. Next state experience group Store in the experience replay pool ;when When the number of experienced samples reaches the batch threshold, the dominance function is calculated based on the experience of the sampled batches. Maximize by gradient ascent Update the policy network while minimizing the mean squared error. Update the value network.

[0120] To balance exploration and convergence, an adaptive exploration mechanism is introduced: exploration rate. The exploration range is gradually reduced as iterations progress; the strategy convergence criterion is the value network loss within consecutive iteration cycles. And the update magnitude of the strategy parameters At this point, the strategy enters the fine-tuning stage, adjusting parameters only slightly according to changes in the environment.

[0121] This embodiment addresses the dynamic, multi-constraint, and incomplete information characteristics of unmanned swarm capture scenarios, and focuses on technological innovations in three core areas: modeling, evaluation, and optimization. It achieves at least the following technological innovations:

[0122] A scalable encirclement scenario modeling method based on distributed decision-making and multi-agent interaction. Specifically, traditional encirclement modeling often employs centralized control logic, which is difficult to adapt to large-scale cluster expansion, and the fixed agent interaction rules lack flexibility in responding to dynamically escaping targets. This embodiment, based on the artificial potential field method and distributed finite state machines, constructs a scalable encirclement scenario dynamic model. Through decentralized decision-making and dynamic interaction mechanisms, it overcomes the scale limitations and adaptability bottlenecks of traditional centralized models. In target escape modeling, an innovative dynamic escape logic based on a multi-source repulsive field is designed. The target can perceive the threat of surrounding drones in real time through its perception range and generate an adaptive escape vector, rather than the traditional fixed trajectory escape mode, which better reflects the active avoidance characteristics of targets in real encirclement scenarios. In drone collaborative modeling, a short-range separation repulsive force and a multi-hop self-organizing network communication mechanism are introduced between drones, which avoids cluster aggregation and collisions and achieves distributed sharing of target information, allowing the cluster size to be flexibly expanded according to task requirements (from a few to dozens of drones). Meanwhile, by using a state switching rule of "random search - tracking and encirclement - task abandonment," resource allocation is dynamically optimized to avoid oversaturation of resources for a single target, significantly improving the cluster's adaptability to multi-target encirclement scenarios. This distributed modeling approach not only ensures the collaborative efficiency of large-scale clusters but also enhances the model's robustness to environmental changes (such as obstacles and boundary constraints), providing high-fidelity scenario support for subsequent resource allocation optimization.

[0123] (2) A method for evaluating the correlation between resource allocation and task success rate in unmanned swarm hunting based on local perception. Specifically, traditional hunting resource allocation often relies on experience-based allocation or single-objective optimization, lacking a quantitative evaluation of the correlation between "resource input and task success rate," resulting in low resource utilization or high risk of hunting failure. In this embodiment, a resource allocation initialization framework integrating "demand-constraint-cost" is constructed based on a heuristic algorithm. Through a simulation-driven correlation evaluation mechanism, the accurate matching of resource allocation and hunting success rate is achieved, solving the problem of blindness in traditional methods. The innovation lies in several key aspects. First, it quantifies and defines a multi-dimensional constraint system for the encirclement task (allocation constraints, capability constraints, and success rate constraints), and constructs a comprehensive cost function that integrates energy consumption, time, and hardware costs, avoiding the irrational approach of "infinitely increasing the number of machines for speed." Second, it solves the constrained multi-objective optimization problem using heuristic algorithms, generating an initial resource allocation scheme that satisfies "low energy consumption, high efficiency, and high success rate," providing a high-quality starting point for subsequent online optimization. More importantly, it establishes a quantitative correlation model between resource allocation and encirclement success rate. By fitting the success rate function with extensive simulation data, it achieves accurate prediction of success rates under different allocation schemes, rather than relying on traditional qualitative judgments. This simulation-based correlation evaluation mechanism provides a quantitative basis for resource allocation decisions, ensuring the maximization of encirclement success rate under limited resource constraints, significantly improving the scientific and rational nature of resource allocation.

[0124] (3) Dynamic Encirclement Resource Allocation Optimization Method Based on Incomplete Information. Specifically, traditional encirclement resource allocation is mostly a static allocation mode, which is difficult to cope with dynamic characteristics such as target escape and environmental changes. Moreover, it assumes that global information is known, which does not conform to the constraints of "incomplete information" in real scenarios, resulting in poor adaptability of the optimization scheme. In this embodiment, based on reinforcement learning (PPO algorithm), an online dynamic optimization framework under incomplete information is constructed. Through real-time interaction and iteration between the agent and the environment, the dynamic adjustment of resource allocation is realized, which breaks through the limitations of traditional static allocation and global information assumptions. The innovations are reflected in three aspects: First, a state-action space tailored to the dynamic characteristics of the encirclement is designed. The state vector comprehensively represents the target state, agent state, resource allocation state, and environmental constraints. Actions are defined as incremental adjustments to resource allocation, ensuring the accuracy and feasibility of optimization. Second, a multi-objective collaborative reward function is constructed, integrating encirclement efficiency, resource economy, and constraint satisfaction. This encourages rapid encirclement while penalizing resource waste and constraint violations, guiding the strategy towards "high efficiency and economy." Third, online iterative optimization of the strategy is achieved based on the PPO algorithm. A clipping mechanism and an adaptive exploration strategy are introduced. Under conditions of incomplete information (the global states of all targets and agents cannot be obtained in real time), the resource allocation strategy is continuously optimized through experience replay and network updates, ensuring a high encirclement success rate even in dynamic scenarios such as target escape and sudden environmental changes. This reinforcement learning-based dynamic optimization method can adapt to scene changes in real time, solving the resource allocation problem in dynamic encirclement under incomplete information, and significantly improving the adaptability and robustness of the encirclement system.

[0125] This embodiment, through the three major innovations mentioned above, constructs a complete technical system of "modeling-evaluation-optimization," solving key problems in traditional swarm capture systems such as poor scalability, blind resource allocation, and insufficient dynamic adaptability. Specific innovations include: distributed multi-agent interactive modeling to improve system scalability; simulation-driven correlation evaluation to ensure the scientific nature of resource allocation; and reinforcement learning-driven online optimization to enhance dynamic adaptability. These innovations not only improve the technical framework of unmanned swarm capture but also provide new ideas and methods for resource optimization and allocation in complex dynamic scenarios, laying a solid foundation for the engineering application of unmanned swarm capture technology.

[0126] Reference Figure 2 , Figure 2 This is a structural block diagram of the unmanned swarm multi-target capture resource-aware task allocation device provided in an embodiment of this application.

[0127] like Figure 2 As shown, this embodiment also discloses a resource-aware task allocation device for multi-target encirclement of unmanned swarms, applying the resource-aware task allocation method for multi-target encirclement of unmanned swarms as described above. The device includes:

[0128] The encirclement scenario modeling module 10 is used to model the relationship between intelligent agents and targets in unmanned swarm encirclement scenarios. This modeling uses the artificial potential field method to model the unmanned swarm and targets according to their functions and dependencies, forming a task execution process.

[0129] The initial resource allocation module 20 is used to summarize the resource experience base and complete the initial resource allocation in the encirclement and capture initiation phase by using a heuristic algorithm-based intelligent agent and target resource matching initialization method.

[0130] The resource dynamic adjustment and optimization module 30 is used to capture scene information and achieve dynamic adjustment and optimization of resources through reward and punishment learning by leveraging a reinforcement learning-based dynamic encirclement resource adaptive allocation optimization method.

[0131] This embodiment also discloses a resource-aware task allocation computer device for unmanned swarm multi-target encirclement and capture, including at least one processor, at least one memory and a data bus;

[0132] The processor and the memory communicate with each other via the data bus;

[0133] The memory stores program instructions that are executed by the processor, which invokes the program instructions to execute the resource-aware task allocation method for multi-target capture in unmanned swarms as described above.

[0134] This embodiment also discloses a medium on which a computer program is stored. When the computer program is executed by a processor, it implements the resource-aware task allocation method for unmanned swarm multi-target capture as described above.

[0135] It should be noted that the unmanned swarm multi-target capture resource-aware task allocation device, equipment, and medium of this embodiment correspond to the aforementioned unmanned swarm multi-target capture resource-aware task allocation method. Therefore, any content not specifically described in the unmanned swarm multi-target capture resource-aware task allocation device, equipment, and medium of this embodiment, including but not limited to functional definitions, working principles, and technical effects, can be referred to the description in the aforementioned unmanned swarm multi-target capture resource-aware task allocation method, and will not be repeated here.

[0136] In the embodiments provided in this application, it should be understood that the embodiments described herein can be implemented in hardware, software, firmware, middleware, code, or any suitable combination thereof. For hardware implementation, the processor may be implemented in one or more of the following: application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), processors, controllers, microcontrollers, microprocessors, other electronic units designed to implement the functions described herein, or combinations thereof. For software implementation, some or all of the processes of the embodiments may be performed by a computer program instructing the associated hardware. During implementation, the program may be stored in a computer-readable medium or transmitted as one or more instructions or code on a computer-readable medium. Computer-readable media include computer media and communication media, wherein communication media include any medium that facilitates the transmission of a computer program from one place to another. The medium may be any available medium that is accessible to a computer. Computer-readable media may include, but is not limited to, RAM, ROM, EEPROM, CD-ROM or other optical disk storage, disk media or other magnetic storage devices, or any other medium capable of carrying or storing desired program code having the form of instructions or data structures and accessible to a computer.

[0137] Finally, it should be noted that the above description is only a preferred embodiment of this application and is not intended to limit this application. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing embodiments or make equivalent substitutions for some of the technical features. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the protection scope of this application.

Claims

1. A method for multi-target hunting resource-aware task allocation of unmanned swarm, characterized in that, The method includes: S10: Modeling the relationship between intelligent agents and targets in unmanned swarm encirclement scenarios. This modeling uses the artificial potential field method to model the unmanned swarm and targets according to their functions and dependencies, forming a task execution process. The task execution process includes: designing the encirclement behavior logic of the unmanned swarm using a distributed collaborative method, and performing dynamic collaborative encirclement of the unmanned swarm through target tracking, collaborative encirclement, and state switching. Specifically, target tracking involves: when the UAV obtains the estimated position of the target through its own detection or swarm communication sharing, generating a tracking velocity vector pointing towards the target, with the tracking velocity direction always pointing towards the target. The coordinated encirclement specifically involves: designing a short-range separation repulsion force to avoid collisions or excessive aggregation of drones within the swarm; the final control speed of the drone is the weighted sum of the tracking vector and the separation repulsion force, achieving a dynamic balance before and after the encirclement is completed; The state switching is specifically as follows: the behavior mode is dynamically adjusted based on the number of pursuers of the same target in the communication topology. If the current drone ranking is greater than the number of pursuers, the current target is abandoned and the search continues. S20: Through a resource allocation initialization method based on heuristic algorithms for agents and targets, summarize the resource experience base and complete the initial resource allocation in the encirclement and capture initiation phase; S20 specifically involves: summarizing resource allocation based on task requirements and environmental information, and using historical experience learned through training; this training includes: encirclement and capture task requirement analysis and constraint system construction, encirclement and capture task resource consumption cost design, and heuristic algorithm initialization of resource allocation process; S30: Utilizing a reinforcement learning-based dynamic encirclement resource adaptive allocation optimization method, scene information is captured, and dynamic adjustment and optimization of resources are achieved through reward and penalty learning; S30 includes: Construct a reinforcement learning environment model that fits the dynamic characteristics of the capture, clarify the quantitative definition of state and action space, and provide a foundation for online interaction; With the optimization orientation of capture efficiency, resource economy and dynamic adaptability, a weighted collaborative reward function is designed to balance immediate benefits and long-term goals. A dual-network structure consisting of a policy network and a value network is constructed. The policy network is a fully connected network that outputs the probability distribution of resource allocation adjustment after inputting the encirclement state. The value network is also a fully connected network that outputs the state value after inputting the encirclement state, which is used to evaluate the long-term benefits of the current situation.

2. The resource-aware task allocation method for unmanned swarm multi-target encirclement and capture according to claim 1, characterized in that, Specifically, S30 addresses the dynamic nature and non-global information constraints of the encirclement scenario by constructing a reinforcement learning-driven online resource allocation optimization framework, with real-time adaptation, dynamic optimization, and convergence guarantee as its core principles. Through the interaction and iteration between the agent and the environment, the resource allocation scheme is continuously optimized.

3. The resource-aware task allocation method for unmanned swarm multi-target encirclement and capture according to claim 1, characterized in that, The encirclement mission requirements analysis and constraint system construction combines geometric distribution and dynamic characteristics to give a quantitative standard for judging the success of the encirclement, and gives two types of constraints based on the actual feasibility of the mission, including drone allocation constraints and success rate constraints. The resource consumption cost design for the encirclement mission aims to minimize the overall resource consumption of the mission, while avoiding the irrational approach of infinitely increasing the number of drones in pursuit of time efficiency, thus achieving efficient use of resources. The heuristic algorithm initializes the resource allocation process, outputting an initial resource allocation scheme that satisfies constraints and has the optimal overall cost through quantitative iterative search and multi-dimensional evaluation.

4. A resource-aware task allocation device for multi-target encirclement and capture of unmanned swarms, employing the resource-aware task allocation method for multi-target encirclement and capture of unmanned swarms as described in any one of claims 1 to 3, characterized in that, The device includes: The encirclement scenario modeling module is used to model the relationship between intelligent agents and targets in unmanned swarm encirclement scenarios. This modeling uses the artificial potential field method to model the unmanned swarm and targets according to their functions and dependencies, forming the task execution process. The initial resource allocation module is used to summarize the resource experience base and complete the initial resource allocation in the encirclement and capture initiation phase by using a heuristic algorithm-based intelligent agent and target resource ratio initialization method. The resource dynamic adjustment and optimization module is used to capture scene information and achieve dynamic adjustment and optimization of resources through reward and punishment learning by leveraging a reinforcement learning-based dynamic encirclement resource adaptive allocation optimization method.

5. A resource-aware task allocation computer device for unmanned swarm multi-target capture, characterized in that, Includes at least one processor, at least one memory, and a data bus; The processor and the memory communicate with each other via the data bus; The memory stores program instructions that are executed by the processor, which invokes the program instructions to execute the resource-aware task allocation method for unmanned swarm multi-target encirclement as described in any one of claims 1 to 3.

6. A medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the resource-aware task allocation method for unmanned swarm multi-target encirclement as described in any one of claims 1 to 3.