Unmanned aerial vehicle dynamic target cooperative hunting method based on ground end layered reinforcement learning

By generating globally optimal capture points through ground-based hierarchical reinforcement learning and multi-agent optimization algorithms, and combining advanced point prediction guidance mechanisms and rolling temporal updates, the robustness and reliability issues of dynamic targets in UAV collaborative capture are solved, achieving efficient and safe dynamic target capture.

CN122308451APending Publication Date: 2026-06-30XIAN UNIV OF POSTS & TELECOMM

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
XIAN UNIV OF POSTS & TELECOMM
Filing Date
2026-05-08
Publication Date
2026-06-30

Smart Images

  • Figure CN122308451A_ABST
    Figure CN122308451A_ABST
Patent Text Reader

Abstract

This invention discloses a multi-UAV dynamic target cooperative encirclement method based on ground-based hierarchical reinforcement learning, belonging to the field of UAV swarm cooperative control and intelligent decision-making technology. The method includes: acquiring state data of dynamic targets and UAVs; generating an initial encirclement point based on the state data; optimizing the initial encirclement point on the ground using a multi-agent near-end policy optimization algorithm to obtain a globally optimal encirclement point; planning local guidance paths for the UAVs based on the globally optimal encirclement point and issuing them for execution; and cyclically adjusting the globally optimal encirclement point and local guidance paths based on real-time monitored state data changes using a rolling time-domain dynamic update mechanism. This invention decouples global cooperative decision-making from local flight control, reduces the onboard computing load, effectively responds to target maneuvering behavior, and improves the efficiency and safety of multi-UAV cooperative encirclement.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of UAV swarm collaborative control and intelligent decision-making technology, specifically to a UAV dynamic target collaborative encirclement method based on ground-based hierarchical reinforcement learning. Background Technology

[0002] With the deep integration of drone technology, wireless communication technology, and artificial intelligence technology, drone systems are increasingly being used in target surveillance, area containment, emergency response, and intelligent security. The core requirement of drone-based collaborative encirclement is to effectively surround a dynamic target through the spatial configuration planning and dynamic adjustment of multiple drones. However, in practical applications, the sudden maneuvering behavior of dynamic targets (such as sudden changes of direction, acceleration, and deceleration), as well as the obstacle avoidance requirements and communication latency of the drones themselves, can easily lead to the failure of the encirclement configuration and target escape, severely affecting the robustness and reliability of tracking and encirclement.

[0003] To effectively address the aforementioned problems, existing technologies typically employ methods based on geometric rules, control laws or artificial potential fields, and multi-agent cooperative strategies based on reinforcement learning. However, these methods suffer from low efficiency and reliability. Simply put, geometric rule-based methods can only adapt to uniform target motion, lagging behind in configuration adjustments when facing dynamic maneuvers; control law-based methods are prone to getting trapped in local optima, leading to gaps in the encirclement or causing drone collisions; and reinforcement learning-based methods often deploy strategies directly on the drone, which, limited by onboard computing and storage resources, not only results in unstable operation but also fails to decouple collaborative decision-making from flight control, leading to heavy communication burdens and significant engineering implementation difficulties. Furthermore, these traditional methods often lack dynamic update mechanisms in the rolling time domain, making it difficult to simultaneously meet the real-time requirements of global collaboration and local obstacle avoidance.

[0004] Currently, there is a lack of a good technology that can effectively solve the above problems.

[0005] The information disclosed in this background section is intended only to enhance the understanding of the overall background of the invention and should not be construed as an admission or in any way implying that the information constitutes prior art known to those skilled in the art. Summary of the Invention

[0006] The purpose of this invention is to overcome the above-mentioned defects and provide a method for collaborative encirclement and capture of dynamic targets by UAVs based on ground-based hierarchical reinforcement learning.

[0007] To address the problems in the prior art, the technical solution of the present invention is as follows:

[0008] A method for collaborative encirclement of dynamic targets by unmanned aerial vehicles (UAVs) based on ground-based hierarchical reinforcement learning includes: acquiring state data of dynamic targets and UAVs within a target area using UAVs; generating initial encirclement points based on the acquired state data of the dynamic targets and the UAVs; optimizing the initial encirclement points using a preset multi-agent near-end policy optimization algorithm on the ground to obtain a globally optimal encirclement point; planning local guidance paths for the UAVs based on the globally optimal encirclement point and issuing the local guidance paths to the corresponding UAVs for execution; and cyclically adjusting the globally optimal encirclement point and the local guidance paths using a preset dynamic update mechanism based on real-time monitoring of changes in the state data of the dynamic targets and the UAVs.

[0009] Optionally, the step of acquiring the status data of dynamic targets and the status data of drones within the target area through drones includes: the drone collecting first status information of the dynamic targets and second status information of the drone itself in real time through its onboard sensing devices; uploading the collected first status information and second status information to the ground terminal through a wireless communication link; and the ground terminal preprocessing the first status information and second status information to obtain the status data of the dynamic targets and the status data of all drones.

[0010] Optionally, the first state information includes at least the target position of the dynamic target, the speed of the dynamic target, the direction of the dynamic target, and the acceleration of the dynamic target; the second state information includes at least the current position of the UAV, the flight speed of the UAV, the remaining endurance of the UAV, and the flight attitude of the UAV.

[0011] Optionally, generating the corresponding initial encirclement point based on the acquired state data of the dynamic target and the state data of the UAV includes: using the current spatial position of the dynamic target as the geometric center, and combining the number of UAVs participating in the encirclement and the encirclement accuracy requirements, generating an initial encirclement configuration around the dynamic target according to a preset regular polygon geometric encirclement rule, and acquiring the spatial position of each node under the encirclement configuration; calculating the straight-line distance between each node, and determining whether the straight-line distance meets the preset minimum anti-collision safety limit; for nodes that do not meet the safety limit, fine-tuning their orientation and extending the distance based on a preset offset rule until all nodes meet the anti-collision constraints, thereby outputting an initial encirclement point with safety redundancy.

[0012] Optionally, the ground terminal utilizes a preset multi-agent near-end policy optimization algorithm to optimize the initial capture points based on the state data of the dynamic target and the state data of the UAV, in order to obtain the globally optimal capture points. This includes: using the relative position vector and relative velocity vector between the UAV and the dynamic target as inputs to the multi-agent near-end policy optimization algorithm; evaluating the current inputs using the multi-agent near-end policy optimization algorithm and outputting parameter adjustment amounts for the spatial positions of each initial capture point, thereby obtaining the optimized initial capture points; scoring the optimized initial capture points based on a preset reward function; iteratively updating the network parameters of the multi-agent near-end policy optimization algorithm based on the scoring feedback results; and finally outputting the capture points corresponding to the adjustment schemes with the highest comprehensive scores as the globally optimal capture points.

[0013] Optionally, the parameter adjustment amount includes at least the polar diameter adjustment amount and the polar angle adjustment amount, and both the polar diameter adjustment amount and the polar angle adjustment amount are limited within a preset safety threshold for the sudden change of the containment point.

[0014] Optionally, the ground terminal plans a local guidance path for the UAV based on the globally optimal capture point, including: determining the advance guidance range based on the UAV's maximum flight speed and rolling planning cycle; generating a basic advance point set within the advance guidance range with the UAV's current position as the starting point and the corresponding globally optimal capture point as the target direction, according to a preset step size; performing collision detection on the basic advance point set in conjunction with environmental obstacle information and the local guidance paths of other UAVs, and fine-tuning the positions of advance points with potential risks, thereby generating a collision-free advance point set; and performing path search and optimization based on the collision-free advance point set as the base nodes to generate a local guidance path, the endpoint of which is a sub-target point within the advance guidance range.

[0015] Optionally, the step of performing path search and optimization based on the collision-free leading point set as the base nodes to generate the local guidance path includes: constructing a local path optimization objective function with the optimization objectives of shortest path length, least estimated flight time, and smoothest trajectory; using the local path optimization objective function to optimize the collision-free leading point set to obtain a preliminary local guidance path; performing constraint verification on the preliminary local guidance path to determine whether the distance between all nodes on the preliminary local guidance path and obstacles is greater than a preset minimum obstacle avoidance distance, and whether the distance between the preliminary local guidance paths of other UAVs is greater than a preset minimum safe distance between UAVs; if the above dual constraints are satisfied, the preliminary local guidance path is confirmed as the final local guidance path; if not, the collision-free leading point set is readjusted and path planning is performed again.

[0016] Optionally, the local guidance path is sent to the corresponding UAV for execution, including: the ground terminal sends the planned local guidance path one-to-one to the UAV through a wireless communication module; after receiving the local guidance path, the flight control unit inside the UAV calculates the position deviation between the current position of the UAV and the path node based on the node information of the local guidance path; based on the position deviation, a PID controller is used to adjust the flight attitude and flight speed of the UAV so that the UAV flies along the local guidance path.

[0017] Optionally, the step of adjusting the globally optimal capture point and the local guidance path cyclically based on the changes in the dynamic target state data and the UAV state data monitored in real time using a preset dynamic update mechanism includes: the ground terminal calculating the changes in the velocity and acceleration of the dynamic target within a preset monitoring time interval, and the flight deviation between the current position of the UAV and the planned position of the local guidance path, using data transmitted back in real time; determining whether any of the following triggering conditions are met: the change in state exceeds a preset state change threshold, the flight deviation exceeds a preset flight deviation threshold, or a preset rolling planning cycle is reached; when any of the above triggering conditions are met, a rolling time-domain update is triggered; after the update is triggered, the ground terminal re-collects the state data and sequentially repeats the operations of dynamically optimizing the global capture configuration and generating a new local guidance path, and each UAV adjusts its flight trajectory in real time according to the updated local guidance path until the capture task is completed.

[0018] The beneficial effects of this application are as follows:

[0019] This invention constructs a hierarchical collaborative architecture with centralized planning on the ground and distributed execution on the UAVs. This architecture deploys complex global encirclement and capture configuration decision-making tasks on the ground, allowing each UAV to perform only low-level local path tracking control. This decoupling of decision-making and control effectively reduces dependence on UAV onboard computing power and communication bandwidth, avoiding the risk of computing overload caused by UAVs simultaneously undertaking high-dimensional computation and low-level control. This improves the stability and engineering feasibility of the multi-UAV collaborative system in practical operation.

[0020] Meanwhile, this invention dynamically optimizes the initial encirclement point by introducing a pre-trained multi-agent reinforcement learning strategy on the ground. Using the relative state between the UAV and the target as input, it comprehensively evaluates multiple objectives, including the integrity of the encirclement configuration, collision avoidance, and approach efficiency. Compared to traditional encirclement methods based on fixed geometric rules, this mechanism can flexibly output adjustment amounts for the encirclement point according to the real-time motion state of the dynamic target. This results in a globally optimal encirclement point with stronger environmental adaptability, effectively improving the reliability of the encirclement configuration when facing highly maneuverable targets.

[0021] Furthermore, this invention employs a forward-prediction guidance mechanism. By considering the distribution of environmental obstacles and inter-drone safety distance constraints along the expected direction for the drones to reach the globally optimal capture point, it plans collision-free short-term local guidance paths for each drone, replacing the traditional direct-flight target point control mode. This local path guidance mechanism enables drones to achieve segmented, smooth flight as they approach the target, effectively mitigating the collision risks caused by sudden environmental changes or trajectory intersections, and improving safety during swarm collaborative flight.

[0022] Finally, this invention constructs a dynamic update mechanism based on a rolling time domain to continuously monitor the state mutations of dynamic targets and the deviations from the UAV's flight trajectory during the encirclement and capture process. When the aforementioned monitoring parameters reach preset trigger thresholds or reach preset cycles, the system can promptly trigger the cyclical update planning of the global encirclement configuration and local guidance path. This closed-loop monitoring and conditional triggering mechanism enables the entire collaborative system to respond in real time to the target's sudden evasive behavior and quickly complete strategy reset, reducing the probability of dynamic targets escaping and ensuring the continuous and effective encirclement of targets. Attached Figure Description

[0023] Figure 1 This is a flowchart illustrating the UAV dynamic target cooperative encirclement method based on ground-based hierarchical reinforcement learning provided in this embodiment of the invention. Figure 1 ; Figure 2 This is a flowchart illustrating the UAV dynamic target cooperative encirclement method based on ground-based hierarchical reinforcement learning provided in this embodiment of the invention. Figure 2 ; Figure 3 This is a diagram showing the overall system structure of the UAV dynamic target cooperative encirclement method based on ground-based hierarchical reinforcement learning provided in an embodiment of the present invention. Figure 4 This is a schematic diagram of the working principle of the global encirclement configuration decision module of the UAV dynamic target cooperative encirclement method based on ground-based hierarchical reinforcement learning provided in an embodiment of the present invention. Figure 5The figure shows the test results of the UAV dynamic target cooperative encirclement method based on ground-based hierarchical reinforcement learning provided in the embodiments of the present invention. Figure 6 This is a flowchart illustrating the UAV dynamic target cooperative encirclement method based on ground-based hierarchical reinforcement learning provided in this embodiment of the invention. Figure 3 . Detailed Implementation

[0024] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only for explaining the invention and are not intended to limit the invention; that is, the described embodiments are merely some embodiments of the invention, and not all embodiments. The components of the embodiments of the invention described and shown in the accompanying drawings can generally be arranged and designed in various different configurations.

[0025] Therefore, the following detailed description of the embodiments of the invention provided in the accompanying drawings is not intended to limit the scope of the claimed invention, but merely to illustrate selected embodiments of the invention. All other embodiments obtained by those skilled in the art based on the embodiments of the invention without inventive effort are within the scope of protection of the invention.

[0026] It should be noted that relational terms such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0027] As mentioned earlier, with the deep integration of drone technology, wireless communication technology, and artificial intelligence technology, multi-drone systems are increasingly widely used in civilian and industrial fields such as target surveillance, area containment, emergency response, and intelligent security due to their advantages of flexible deployment and collaborative operation. Multi-drone collaborative encirclement, as one of the core tasks of swarm operations, requires the spatial configuration planning and dynamic adjustment of multiple drones to effectively surround a dynamic target, restrict its movement space, prevent its escape, and ultimately efficiently complete the encirclement mission. However, in practical applications, the sudden changes in direction, acceleration, and deceleration of dynamic targets, as well as factors such as obstacle avoidance requirements and communication latency between drones, can easily lead to the failure of the encirclement configuration and target escape, seriously affecting the robustness and reliability of tracking and encirclement. In recent years, multi-drone encirclement technology has developed rapidly, giving rise to a variety of classic methods.

[0028] However, existing technologies still have limitations: Firstly, they do not clearly define the deployment location of decision-making logic. If reinforcement learning strategies are directly deployed on the UAV using the traditional approach, the computing power and storage resources of the UAV will be limited, resulting in slow strategy convergence and unstable operation. Secondly, existing technologies do not decouple collaborative decision-making from flight control. The UAV must simultaneously undertake decision-making and control tasks, which not only increases the communication burden but also poses safety risks associated with autonomous decision-making, making engineering implementation difficult. In addition, other existing capture technologies also have their own shortcomings: methods based on geometric rules are computationally simple but cannot adapt to the dynamic maneuvering of the target; methods based on control laws or artificial potential fields are highly dependent on parameters and are prone to getting trapped in local optima, leading to gaps in the capture or UAV collisions.

[0029] To address this issue, this invention proposes a multi-UAV dynamic target cooperative encirclement method based on ground-based hierarchical reinforcement learning, which solves the above problems in the following way.

[0030] Example 1:

[0031] Please refer to the instruction manual. Figure 1 ,like Figure 1 As shown in the figure, this embodiment provides a method for cooperative encirclement of dynamic targets by UAVs based on ground-based hierarchical reinforcement learning. The execution subject can be a controller, and the method mainly includes the following steps:

[0032] S100: Acquire status data of dynamic targets within the target area and status data of the drone through the drone;

[0033] S200. Based on the acquired state data of the dynamic target and the state data of the UAV, generate the corresponding initial capture point;

[0034] S300: The ground end uses a preset multi-agent near-end strategy optimization algorithm to optimize the initial capture point based on the state data of the dynamic target and the state data of the UAV, so as to obtain the globally optimal capture point.

[0035] S400: Based on the globally optimal capture point, the ground terminal plans a local guidance path for the UAV and sends the local guidance path to the corresponding UAV for execution; and

[0036] S500: Based on the changes in the dynamic target status data and the UAV status data monitored in real time, the globally optimal capture point and the local guidance path are cyclically adjusted using a preset dynamic update mechanism.

[0037] Based on the above steps, this invention achieves decoupling of global encirclement configuration decision-making and local flight path control by constructing a hierarchical collaborative guidance and control architecture centrally deployed on the ground, reducing the computing load on the UAV and improving system operational safety; it designs a forward prediction guidance mechanism to replace the traditional direct flight encirclement point control mode, generating short-term local guidance paths in real time on the ground to solve the problems of dynamic obstacle avoidance lag and trajectory abrupt changes; it builds a two-layer optimization structure to achieve global encirclement configuration optimization through multi-agent reinforcement learning and local dynamic obstacle avoidance optimization through path planning, while constructing a rolling time-domain dynamic update mechanism to respond to changes in target motion state in real time; ultimately achieving efficient, reliable, and safe encirclement of dynamic targets, improving the success rate of encirclement and the efficiency of collaboration.

[0038] Example 2:

[0039] Based on the above embodiments, in order to provide a clearer and more complete explanation of the technical solutions therein, the present invention also provides an embodiment two, the execution subject of which can also be a controller, such as... Figures 2 to 6 As shown, in this second embodiment, step S100 may further include:

[0040] S110. The UAV collects the first state information of the dynamic target and the second state information of the UAV itself in real time through its onboard sensing devices.

[0041] The first state information includes at least the target position, the speed of the dynamic target, and the direction of the dynamic target's movement. The acceleration of the dynamic target cannot be directly and instantaneously collected by conventional sensing devices. Therefore, by numerically differentiating the speed of movement at continuous moments and combining it with a Kalman filter state observer for smoothing and estimation, a stable acceleration estimate is obtained.

[0042] The second status information includes at least the drone's current location, the drone's flight speed, the drone's remaining battery life, and the drone's flight attitude;

[0043] In this second embodiment, step S100 further includes:

[0044] S120. The collected first status information and second status information are uploaded to the ground terminal through a wireless communication link;

[0045] S130: The ground terminal preprocesses the first state information and the second state information to obtain the state data of the dynamic target and the state data of all UAVs.

[0046] In a preferred embodiment of this second embodiment, steps S110 to S130 may further include the following steps:

[0047] In a multi-drone swarm, once any drone detects a dynamic target using its onboard sensing devices (e.g., visual sensors, radar), it collects the target's initial state information in real time, including the target's position. (Coordinate values ​​in a three-dimensional coordinate system), the velocity of a dynamic target , the direction angle of motion of a dynamic target and the acceleration of dynamic targets Simultaneously, each drone collects its own second-state information, including its current position. ( , (Number of drones), drone flight speed The remaining battery life of the drone and the flight attitude angle of the drone .

[0048] Subsequently, the multi-UAV swarm uploads all the collected information to the ground via wireless communication links. The ground terminal preprocesses the information: first, it verifies the validity of the data, removes abnormal jump values, and completes missing data using linear interpolation; then, it normalizes the valid data; at the same time, it records the total latency of data transmission and algorithm calculation, predicts the target position within the latency interval based on the dynamic target's motion state, and obtains the latency-compensated predicted target state, ultimately forming dynamic target state data and UAV state data that can be used for decision-making.

[0049] Among them, the position, velocity, and acceleration thresholds of dynamic targets are preset based on the physical limits of the sensing equipment and the maximum motion capability of the target: the position threshold is determined by the maximum detection range of the radar / visual sensor; the velocity and acceleration thresholds are determined by the upper limit of the typical physical motion of the target type, which are engineering-determinable prior boundaries.

[0050] To prevent sudden target maneuvers from causing parameter overruns and triggering network input anomalies, the system truncates data exceeding a preset threshold for protection before performing normalization. The normalization formula is as follows:

[0051]

[0052] In the formula, Here, X represents the normalized parameter values, and X represents the original parameter values. and These are the preset maximum and minimum thresholds for the parameter, ensuring that all input parameters are in the [0,1] range and improving the algorithm's convergence efficiency.

[0053] For example, for instance:

[0054] A two-layer collaborative encirclement and capture planning system is constructed, consisting of a centralized decision-making layer on the ground and a distributed execution layer on the UAV. The ground end deploys a status information acquisition and preprocessing module, a global encirclement configuration decision-making module, a local advanced path guidance module, and a rolling time-domain dynamic update module. These modules work together to optimize the encirclement configuration and plan local paths. The UAV is equipped with an airborne perception and acquisition unit and an airborne flight control unit. The perception unit includes visual sensors and radar for acquiring the status of the target and itself, while the control unit has a built-in PID controller for path tracking and flight attitude adjustment.

[0055] Simultaneously calibrate the system's core parameters: number of drones. Maximum flight speed average flight speed Minimum safe distance between drones Minimum obstacle avoidance distance Preset capture radius Encirclement radius tolerance Rolling planning cycle Advanced guidance range Number of advanced points ;

[0056] Subsequently, the four UAVs used their onboard sensing and acquisition units to detect dynamic targets on the ground in real time, collecting the three-dimensional position coordinates, speed, and direction angle of the targets, and estimating acceleration (first state information) by velocity differentiation and Kalman filtering. Simultaneously, they collected the three-dimensional position coordinates, flight speed, and flight attitude angle of their own UAVs (second state information). Each UAV generated a high-precision timestamp with GPS timing at the time of collection, and uploaded the state information and timestamp together to the ground terminal via a wireless communication link. The ground terminal performed time alignment and linear interpolation calibration on the asynchronously uploaded data based on the timestamps of each UAV, eliminating the problem of state asynchrony caused by wireless transmission delay and multiple UAVs being asynchronous, and ensuring the consistency of global decision input.

[0057] The ground-based status information acquisition and preprocessing module verifies the validity of the received data.

[0058] Outliers were removed using the 3σ Laida criterion, and jump data that deviated from the mean by more than 3 times the standard deviation were identified as outliers and removed.

[0059] The missing data lost during transmission is filled using linear interpolation, and the missing values ​​are obtained by fitting the effective data from adjacent time points.

[0060] After removing outliers and filling in missing values, normalize according to the formula. All parameters are mapped to the [0,1] interval to complete data preprocessing, thereby obtaining the state data of the dynamic target and the state data of all UAVs, providing accurate input for subsequent encirclement planning.

[0061] In this second embodiment, step S200 may further include:

[0062] S210. Using the current spatial position of the dynamic target as the geometric center, and combining the number of drones participating in the encirclement and the encirclement accuracy requirements, an initial encirclement configuration is generated around the dynamic target according to the preset regular polygon geometric encirclement rules, and the spatial position of each node under the encirclement configuration is obtained.

[0063] S220. Calculate the straight-line distance between each node and determine whether the straight-line distance meets the preset minimum anti-collision safety limit.

[0064] S230. For nodes that do not meet the security limits, their orientation is finely adjusted and the distance is extended based on a preset offset rule.

[0065] The process involves making fine-tuning adjustments to the polar angles of illegal capture points along the tangent direction on the circumference of the capture area, with a fixed adjustment step size of 2° and a total angle of no more than 10° for each adjustment. The distance between drones is gradually increased by small angle offsets. After each adjustment, the node spacing is recalculated until all nodes meet the anti-collision constraints, thereby outputting an initial capture point with safety redundancy.

[0066] In a preferred embodiment of this second embodiment, steps S210 to S230 may further include the following steps:

[0067] The ground end is based on the current position of the preprocessed dynamic target. As the encirclement center, considering the number of drones (n) and the required encirclement accuracy, a preset regular polygon geometric enclosure rule is used to generate the initial encirclement configuration. A polar coordinate system is established with the target as the center, and the initial encirclement point... (polar diameter) To preset a fixed capture radius, (consistent with the target altitude), where the initial polar angle of the kth UAV is... The calculation formula is:

[0068]

[0069] Subsequently, the initial capture point in polar coordinates is converted to Cartesian coordinates using the following conversion formula:

[0070]

[0071] The initial set of capture points generated Minimum safe distance constraints between drones must be met. ( , (To establish a preset safe distance) to ensure that the initial configuration has no risk of collision.

[0072] For example, for instance:

[0073] On the ground, using the pre-processed target's current location as the encirclement center, an initial encirclement configuration is generated according to the geometric encirclement rules of a regular quadrilateral. A polar coordinate system is established with the encirclement center as the pole, and the configuration is then calculated using the formula... The initial polar angles of the four UAVs are calculated to be 0°, 90°, 180°, and 270° respectively, and the polar radius is set to... .

[0074] Using the polar coordinate to Cartesian coordinate transformation formula Convert to planar coordinates, perform a safety distance check on the generated initial set of capture points, and check those that do not meet the requirements. The polar angle of the capture points is slightly adjusted to eventually generate a collision-free initial set of capture points.

[0075] In this second embodiment, step S300 may further include:

[0076] S310. The relative position vector and relative velocity vector between the UAV and the dynamic target are used as inputs to the multi-agent proximal policy optimization algorithm.

[0077] S320. The multi-agent proximal policy optimization algorithm is used to evaluate the current input quantity and output the parameter adjustment amount for the spatial position of each initial capture point, thereby obtaining the optimized initial capture point.

[0078] The parameter adjustment amounts include at least polar diameter adjustment amounts and polar angle adjustment amounts, and both polar diameter adjustment amounts and polar angle adjustment amounts are limited within a preset safety threshold for sudden changes in the containment point.

[0079] In this second embodiment, step S300 further includes:

[0080] S330. The optimized initial capture point is scored based on a preset reward function;

[0081] S340. Based on the scoring feedback results, iteratively update the network parameters of the multi-agent proximal policy optimization algorithm, and finally output the trapping points corresponding to the adjustment scheme with the highest comprehensive score as the global optimal trapping points.

[0082] In a preferred embodiment of this second embodiment, steps S310 to S340 may further include the following steps:

[0083] A pre-trained multi-agent near-field policy optimization (MAPPO) algorithm is deployed on the ground. Taking the relative state between the UAV and the dynamic target as input, it outputs an adjustment to the encirclement point, achieving dynamic optimization of the initial encirclement point.

[0084] First, the state input S of the algorithm consists of the relative position vector and relative velocity vector of each UAV relative to the dynamic target, specifically expressed as:

[0085]

[0086] Among them, relative position relative speed .

[0087] Subsequently, the algorithm's action output A represents the parameter adjustments for each initial capture point, including the polar radius adjustment. With polar angle adjustment amount ,Right now:

[0088]

[0089] Adjusted capture points (In polar coordinates) it is:

[0090]

[0091] in, , , , The maximum adjustment amount is preset to avoid sudden changes in the capture point.

[0092] Subsequently, the reward function R adopts a multi-objective weighted summation form, comprehensively considering the integrity of the encirclement configuration, the drone obstacle avoidance, and the target encirclement effect, as shown in the following formula:

[0093]

[0094] Encirclement configuration integrity reward When the relative distance between all drones and the target is at When the interval is, ,otherwise ,in , ; baseline capture radius The maximum permissible deviation is determined based on the UAV's maximum flight speed, the target maneuver response safety time, and the minimum safety distance. Based on the target's maximum acceleration, maximum velocity, and total system time delay, the system dynamically adjusts interval parameters in real time according to the target's motion state to achieve environmental adaptation and suitability for dynamic targets with different maneuverability.

[0095] Obstacle Avoidance Rewards If the distance between the drones all meet the requirements ,but ,otherwise ;

[0096] Distance Rewards Calculate the average distance between all drones and their corresponding capture points. The reward value is ,in To preset the maximum average distance, the maximum flight speed of the drone, the maximum escape speed of the target, and the total system delay τ are derived and determined to encourage the drone to quickly reach the capture point.

[0097] For reward weighting, satisfy It can be adjusted according to the actual task requirements:

[0098] 1) Initial weight determination: The Analytic Hierarchy Process (AHP) is used to determine the weights based on the task requirements. Among them, w1 (encirclement configuration integrity weight) corresponds to the core objective of the encirclement mission and is initially set to 0.5; w2 (obstacle avoidance reward weight) corresponds to the UAV flight safety constraints and is initially set to 0.3; w3 (distance reward weight) corresponds to the encirclement efficiency objective and is initially set to 0.2.

[0099] 2) Dynamic adaptive adjustment: The system dynamically adjusts the weights according to the real-time task status: when the target is highly maneuvering and escaping, w1 and w3 are increased to enhance the encirclement configuration and response speed; when the distance between UAVs is close to the safety threshold, w2 is increased to prioritize flight safety.

[0100] 3) Weight constraints: During the adjustment process, w1≥0.4 and w2≥0.2 are always satisfied to ensure that the priority of the encirclement mission and the bottom line of flight safety are not breached.

[0101] Finally, the multi-agent proximal policy optimization algorithm uses a policy network. Output action distribution, value network The core update formula for estimating state value is as follows:

[0102]

[0103]

[0104]

[0105] In the formula, These are the policy network parameters from the previous iteration. The formula for calculating the advantage function estimate obtained from the generalized advantage estimation (GAE) is as follows:

[0106]

[0107] in For time difference error, As a reward discount factor, For GAE attenuation parameters, Estimating the state value output of the value network; This is the pruning factor (default is 0.2), used to limit the policy update magnitude and prevent training divergence or sudden policy changes due to excessively large single iteration steps. The target value, i.e., the target estimate of the state value, is calculated using the GAE target value: This is used to update the value network parameters and minimize the value estimation error. For policy entropy, The coefficients for value network and entropy regularization are adaptively tuned through simulation experiments, with initial value ranges of [missing values]. , The aim is to balance the weights of value estimation error and policy entropy in the loss function, ensuring stable convergence and efficient optimization of the algorithm by minimizing the total loss. Update network parameters to dynamically optimize the capture points and output the optimal set of capture points. .

[0108] For example, for instance:

[0109] The ground-based system invokes the global encirclement configuration decision module to construct a centralized reinforcement learning network architecture based on the MAPPO concept for multi-UAV encirclement tasks, adapting to the core innovative requirement of high-dimensional feature processing on the ground:

[0110] Regarding state input, a global state vector containing information about all UAVs and targets is constructed using the relative positions and velocities of the UAVs and dynamic targets, the relative distance between the UAVs, and the target acceleration estimates. (N is the number of drones participating in the capture); In order to eliminate differences in different dimensions and scales, a combination of relative coordinate transformation and scale normalization is adopted to map each state variable to the interval [-1,1], and then input it into the neural network after unified encoding.

[0111] In terms of network structure design, the Actor policy network adopts a hybrid structure of "Multilayer Perceptron (MLP) + Gated Recurrent Unit (GRU)". The MLP part contains two fully connected hidden layers, each with 256 neurons, used to extract spatial static features; the GRU layer has 128 neurons to model the temporal dynamic characteristics of target motion, adapting to highly maneuverable target encirclement scenarios; each hidden layer uses the ReLU activation function, and the output layer uses the tanh activation function, constraining the adjustment amount within a preset range to avoid abrupt changes at the encirclement point. The Critic network adopts a centralized value function structure, using the global state and joint actions as inputs to achieve overall value evaluation of multi-UAV cooperative encirclement configurations, which is consistent with the core architecture of MAPPO's centralized training and distributed execution, rather than simply applying a general algorithm.

[0112] Based on the customized network described above, the ground terminal uses the relative position and relative velocity between the UAV and the dynamic target as state inputs. The pre-trained multi-agent reinforcement learning algorithm outputs the polar radius and polar angle adjustment amounts for each capture point. ,according to Adjust the parameters of the capture point and limit... To avoid sudden changes in the capture point.

[0113] Combining multi-objective reward functions The configuration is evaluated using strategy loss. With value loss The network parameters are iteratively updated, and the final output is a set of globally optimal capture points that adapt to the target's motion state, forming a global capture configuration without gaps or collisions.

[0114] In this second embodiment, step S400 may further include:

[0115] S410, The ground terminal determines the advanced guidance range based on the maximum flight speed and rolling planning cycle of the UAV;

[0116] S420. Starting from the current position of the UAV and taking the corresponding global optimal capture point as the target direction, generate a basic advanced point set within the advanced guidance range according to a preset step size.

[0117] S430. Combining environmental obstacle information with the local guidance paths of other UAVs, collision detection is performed on the basic leading point set, and the positions of leading points with risks are fine-tuned to generate a collision-free leading point set.

[0118] Step S430 further includes:

[0119] S431. Construct a local path optimization objective function with the optimization objectives of shortest path length, least estimated flight time, and smoothest trajectory.

[0120] S432. Optimize the collision-free leading point set using the local path optimization objective function to obtain a preliminary local guidance path;

[0121] S433. Perform constraint verification on the preliminary local guidance path, and determine whether the distance between all nodes on the preliminary local guidance path and obstacles is greater than the preset minimum obstacle avoidance distance, and whether the distance between the preliminary local guidance paths of other drones is greater than the preset minimum safe distance between drones.

[0122] S434. If the above dual constraints are satisfied, the preliminary local guidance path is confirmed as the final local guidance path; if not, the collision-free leading point set is readjusted and path planning is performed again.

[0123] In this second embodiment, step S400 further includes:

[0124] S440. Using the set of collision-free leading points as the base nodes, perform path search and optimization to generate a local guiding path. The endpoint of the local guiding path is a sub-target point within the leading guidance range.

[0125] S450: The ground terminal transmits the planned local guidance path to the UAV one-to-one via the wireless communication module;

[0126] S460. After receiving the local guidance path, the flight control unit inside the UAV calculates the positional deviation between the current position of the UAV and the path nodes based on the node information of the local guidance path.

[0127] S470. Based on the position deviation, a PID controller is used to adjust the flight attitude and flight speed of the UAV, so that the UAV flies along the local guidance path.

[0128] In a preferred embodiment of this second embodiment, steps S410 to S470 may further include the following steps:

[0129] Based on the output of the globally optimal capture point, the ground-based system constructs a local advanced path guidance module, designs an advanced point prediction guidance mechanism, and employs a rolling time-domain programming method to generate short-term local guidance paths for each UAV in real time, replacing the traditional control mode of UAVs flying directly to the capture point. Details are as follows:

[0130] The ground end is based on the maximum flight speed of the drone. With rolling planning cycle Determine the scope of advanced guidance In other words, the maximum length of the local guidance path is the maximum distance that the UAV can fly within a planning cycle, ensuring that the planned path conforms to the physical flight constraints of the UAV and is feasible.

[0131] Starting from the drone's current position and with the globally optimal capture point as the final target direction, the ground-based system generates a basic advance point set based on the advance guidance range and the optimal movement direction from the drone to the capture point, using a dynamically calculated step size: Step Size Based on the maximum flight speed of the drone Rolling planning cycle With advanced guidance range Dynamic calculation, the formula is:

[0132]

[0133] Where n is the number of leading points, the step size represents the maximum safe flight distance of the UAV along the target direction in a single planning cycle, ensuring that the leading point set matches the actual motion capability of the UAV, avoiding path abrupt changes due to excessively large step size, and increasing computational redundancy due to excessively small step size; at the same time, combined with environmental obstacle information (uploaded by UAV perception or global perception from the ground) and the local guidance paths of other UAVs, collision detection is performed on the basic leading point set. If the leading point is in an obstacle area or the path distance to other UAVs is less than the safe distance, the position of the leading point is fine-tuned to generate a collision-free leading point set, providing basic nodes for the generation of local guidance paths.

[0134] On the ground side, using a set of collision-free leading points as nodes, and with the optimization objectives of minimizing path length, smoothest trajectory, and least flight time, a local path optimization objective function is constructed:

[0135]

[0136] In the formula, This is the length of the local guidance path. To estimate flight time ( , (The average flight speed of the drone). The total turning angle of the path is used to characterize the smoothness of the trajectory. It is calculated as the sum of the absolute values ​​of the changes in heading angle between adjacent nodes of the path, i.e.:

[0137]

[0138] in Let the heading angle of the i-th node on the path be the coordinate difference between adjacent nodes. , The calculation yielded: The smaller the total turning angle, the higher the trajectory smoothness;

[0139] To optimize weights and satisfy Its value is objectively determined based on the Analytic Hierarchy Process (AHP) combined with the requirements of UAV capture missions: the path length, flight time and trajectory smoothness are used as the criteria layers, the judgment matrix is ​​constructed through expert scoring, and the weight coefficients are obtained after consistency verification. The system can dynamically and adaptively adjust the weights according to the real-time mission scenario to ensure the balance between path performance and flight safety.

[0140] The ground-based system uses an improved A* path planning algorithm to search and optimize paths to a set of collision-free leading points: starting from the UAV's current position and ending at the sub-target points, using a multi-objective optimization function. As an evaluation criterion, an optimal path satisfying the constraints is searched within the scope of the advanced guidance; short-term local guidance paths are generated. ,in This is the current location of the drone. The sub-target point in the direction of the capture point (within the advance guidance range) is the endpoint of the local guidance path, which is the sub-target point rather than the global capture point, thus achieving the control effect of segmented approach and advance guidance.

[0141] Subsequently, the generated local guidance path must satisfy a dual constraint: that is, the distance between all nodes on the path and obstacles must be greater than the minimum obstacle avoidance distance. ; and the distance between the local guidance paths of other drones is greater than the minimum safe distance between drones. If the constraints are not met, the lead point set is readjusted and the path is planned to ensure the safety of the local guidance path.

[0142] The ground terminal sends local guidance paths one-to-one to each UAV through a wireless communication module. The UAV terminal only undertakes distributed execution tasks and does not need to perform complex decision-making and planning calculations, thereby decoupling global collaborative decision-making from local flight control and reducing the computing power and communication load of the UAV terminal.

[0143] After receiving the local guidance path, the flight control unit adjusts the flight attitude and speed using a proportional-integral-derivative (PID) controller based on the path node information to achieve precise tracking of the local guidance path. The PID control law formula is as follows:

[0144]

[0145] In the formula, To control the output (servo motor angle, throttle position). The deviation between the drone's current position and the path node is calculated using the drone's current position. Using this as a baseline, calculate the nearest path node on the path. The Euclidean distance, combined with the heading angle deviation Construct a comprehensive deviation that includes both position and attitude, namely:

[0146]

[0147] Where λ is the attitude deviation weighting coefficient, used to balance position tracking and attitude stability; These are the proportional, integral, and derivative coefficients, obtained as follows: first, initial parameters are obtained using the Ziegler-Nichols offline tuning method; then, combined with the real-time deviation during the UAV's flight, the parameters are dynamically adjusted using a fuzzy adaptive PID algorithm: based on the position deviation... and the rate of change of deviation As input, it is corrected in real time through fuzzy rule reasoning. It adapts to the highly dynamic and strongly coupled controlled characteristics of UAVs, avoiding control instability under different flight conditions with fixed parameters; through PID control, it ensures that the UAV flies smoothly and accurately along the local guidance path, gradually approaching the global capture point.

[0148] For example, for instance:

[0149] The ground end activates the local advanced path guidance module, and presses... The advance guidance range is determined to be 5m. Starting from the current position of each UAV and with the globally optimal capture point as the target direction, a basic advance point set containing 10 nodes is generated at a step size of 0.5m. Collision detection is performed on the basic advance point set in conjunction with ground environment obstacle information. Ground obstacles are modeled as expanded cylindrical safety envelopes (with the obstacle center as the axis and the radius being the sum of the obstacle radius and the UAV's safe distance). Mathematical inequalities are then used to... Complete the collision detection, among which Let i be the coordinates of the i-th leading point. Let J be the coordinates of the center of the j-th obstacle. Euclidean distance. The safe envelope radius is used to generate a collision-free set of leading points. For leading points that are in the obstacle area or violate the safe distance, they are translated slightly outward along the normal vector of the obstacle surface (the translation step is 0.1m to ensure that the points meet the safe distance constraint after translation).

[0150] An objective function is constructed with path length, flight time, and trajectory smoothness as optimization objectives. A short-term local guidance path is generated through a path planning algorithm, with the path endpoint being a sub-target point 5m away from the capture point. The generated path is then validated a second time: the change in heading angle between adjacent nodes is calculated, and the maximum turning angle is constrained to not exceed the maximum maneuvering turning angle threshold of the UAV (taken as 30°) to avoid generating sharp-angle turn trajectories. At the same time, it is ensured that all nodes meet obstacle avoidance and safe distance constraints, thus forming an executable local guidance path.

[0151] The ground terminal sends the local guidance path of each UAV to the corresponding airborne flight control unit through the wireless communication module. The UAV only performs path tracking tasks and does not participate in any decision-making or planning.

[0152] The airborne flight control unit uses a PID control law based on local guidance path node information. The system calculates and controls the output, adjusting the servo angle and throttle in real time to correct the drone's positional deviation. This ensures that the drone flies smoothly along a local guidance path toward the sub-target point, gradually approaching the overall capture point.

[0153] In this second embodiment, step S500 may further include:

[0154] S510. The ground terminal calculates the velocity and acceleration state changes of the dynamic target within a preset monitoring time interval, as well as the flight deviation between the current position of the UAV and the planned position of the local guidance path, based on the data transmitted back in real time.

[0155] S520. Determine whether any of the following triggering conditions are met: the state change exceeds a preset state change threshold, the flight deviation exceeds a preset flight deviation threshold, or a preset rolling planning cycle is reached; when it is determined that any of the above triggering conditions are met, trigger the rolling time domain update.

[0156] S530. After the update is triggered, the ground terminal re-collects the status data and sequentially repeats the operations of dynamically optimizing the global encirclement configuration and generating new local guidance paths. Each UAV adjusts its flight trajectory in real time according to the updated local guidance path until the encirclement mission is completed.

[0157] In a preferred embodiment of this second embodiment, steps S510 to S530 may further include the following steps:

[0158] A rolling time-domain dynamic update mechanism is established on the ground end. By using the real-time data transmitted by the UAV regarding its own status, target status, and environmental obstacle information, the mechanism continuously monitors changes in the target's motion state and the UAV's flight status. This enables dynamic and cyclical updates of the encirclement configuration and local guidance path, ensuring the continuous and effective encirclement of dynamic targets. Specifically:

[0159] Ground-side definition of target state change Deviation from the flight status of the drone The calculation formula is as follows:

[0160]

[0161]

[0162] In the formula, For monitoring time interval Changes in velocity and acceleration of the internal target The actual location of the drone. The location for planning local guidance paths.

[0163] Ground-based preset state change thresholds and flight deviation thresholds are obtained through offline tuning based on the UAV's maximum maneuverability, target motion characteristics, and allowable error of the capture mission. Specifically, the speed change threshold is set to 10%–20% of the target's maximum cruising speed, the acceleration change threshold is set to 15%–25% of the target's maximum maneuvering acceleration, and the flight deviation threshold is the sum of the UAV's positioning error and path tracking steady-state error. The rolling planning cycle is determined based on a combination of airborne communication latency and ground-based computation latency. A rolling update is triggered when any of the following conditions are met:

[0164] Condition 1: Change in target state It determines that the target's motion state has changed significantly (such as changing direction, acceleration, or deceleration).

[0165] Condition 2: Flight status deviation of the UAV The drone's flight path was determined to have deviated from the planned path.

[0166] Condition 3: The preset rolling planning cycle is reached. This enables periodic, routine updates.

[0167] When an update is triggered, the ground terminal immediately repeats steps S100 to S400 to re-collect and preprocess the status information, dynamically optimize the global encirclement configuration, generate a new local guidance path and send it to the UAV. The UAV adjusts its flight trajectory in real time according to the new local guidance path to achieve dynamic adaptation between the encirclement configuration and the local path.

[0168] Through a dynamic update mechanism in the rolling time domain, the ground end continuously optimizes both the global encirclement configuration and the local guidance path. The UAV continues to fly along the updated local guidance path, gradually approaching the global encirclement point and forming a continuous and effective encirclement of the dynamic target until the encirclement mission is completed.

[0169] For example, for instance:

[0170] The ground-based rolling time-domain dynamic update module receives real-time flight status and target status information transmitted back by the UAV at a monitoring interval of 0.5 seconds, according to the formula... , Calculate the change in target state and the flight deviation of the UAV.

[0171] When any of the update trigger conditions are met: , When the 1-second rolling planning cycle is reached, the rolling update is immediately triggered, and steps S100 to S400 are repeated to re-collect preprocessing information, optimize the global capture configuration, generate new local guidance paths, and distribute them.

[0172] After receiving the new path, the drones adjust their flight trajectories in real time. The ground station continuously performs two-layer optimization and path updates until the four drones effectively surround the dynamic target. After completing the encirclement mission, the ground station issues a termination command, and the drones execute subsequent actions according to preset rules.

[0173] It should be understood that, in the embodiments of the present invention, "B corresponding to A" means that B is associated with A, and B can be determined based on A. However, it should also be understood that determining B based on A does not mean that B is determined solely based on A; B can also be determined based on A and / or other information.

[0174] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both. To clearly illustrate the interchangeability of hardware and software, the components and steps of the various examples have been generally described in terms of functionality in the foregoing description. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementations should not be considered beyond the scope of this invention.

[0175] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.

[0176] In the embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the couplings or direct couplings or communication connections shown or discussed may be indirect couplings or communication connections through some interfaces, apparatuses, or units, or they may be electrical, mechanical, or other forms of connection.

[0177] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of the embodiments of the present invention, depending on actual needs.

[0178] Furthermore, the functional units in the various embodiments of the present invention can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.

[0179] From the above description of the embodiments, those skilled in the art will clearly understand that the present invention can be implemented in hardware, firmware, or a combination thereof. When implemented in software, the above-described functions can be stored in a computer-readable medium or transmitted as one or more instructions or code on a computer-readable medium. Computer-readable media include computer storage media and communication media, wherein communication media include any medium that facilitates the transmission of a computer program from one place to another. Storage media can be any available medium accessible to a computer. For example, but not limited to, computer-readable media can include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage media or other magnetic storage devices, or any other medium capable of carrying or storing desired program code having the form of instructions or data structures and accessible to a computer. Furthermore, any connection can suitably be a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, then coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of the medium. As used in this invention, disk and disc include compressed optical discs (CDs), laser discs, optical discs, digital versatile discs (DVDs), floppy disks, and Blu-ray discs, wherein disks typically magnetically copy data, while discs optically copy data using lasers. The combinations described above should also be included within the scope of protection for computer-readable media.

[0180] In summary, the above description is merely a preferred embodiment of the technical solution of the present invention and is not intended to limit the scope of protection of the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.

Claims

1. A method for cooperative encirclement of dynamic targets by unmanned aerial vehicles (UAVs) based on ground-based hierarchical reinforcement learning, characterized in that, The method includes: Acquire status data of dynamic targets within the target area and the status data of the drone itself using drones; Based on the acquired state data of the dynamic target and the state data of the UAV, a corresponding initial capture point is generated; The ground end uses a preset multi-agent near-end strategy optimization algorithm to optimize the initial capture point based on the state data of the dynamic target and the state data of the UAV, so as to obtain the globally optimal capture point; Based on the globally optimal capture point, the ground-based system plans a local guidance path for the UAV and distributes the local guidance path to the corresponding UAV for execution; and Based on the changes in the dynamic target status data and the UAV status data monitored in real time, the globally optimal capture point and the local guidance path are cyclically adjusted using a preset dynamic update mechanism.

2. The UAV dynamic target cooperative encirclement method based on ground-based hierarchical reinforcement learning according to claim 1, characterized in that, The acquisition of status data of dynamic targets within the target area and the status data of the drone through the drone includes: The drone collects the first state information of the dynamic target and the second state information of the drone itself in real time through its onboard sensing devices; The collected first and second status information are uploaded to the ground terminal via a wireless communication link; The ground terminal preprocesses the first and second state information to obtain the state data of the dynamic target and the state data of all UAVs.

3. The UAV dynamic target cooperative encirclement method based on ground-based hierarchical reinforcement learning according to claim 2, characterized in that, The first state information includes at least the target position of the dynamic target, the speed of the dynamic target, the direction of the dynamic target, and the acceleration of the dynamic target; The second status information includes at least the drone's current location, flight speed, remaining battery life, and flight attitude.

4. The UAV dynamic target cooperative encirclement method based on ground-based hierarchical reinforcement learning according to claim 1, characterized in that, The step of generating a corresponding initial capture point based on the acquired state data of the dynamic target and the state data of the UAV includes: Using the current spatial position of the dynamic target as the geometric center, and combining the number of drones participating in the encirclement and the encirclement accuracy requirements, an initial encirclement configuration is generated around the dynamic target according to the preset regular polygon geometric encirclement rules, and the spatial position of each node under the encirclement configuration is obtained. Calculate the straight-line distance between each node and determine whether the straight-line distance meets the preset minimum collision avoidance safety limit; For nodes that do not meet the safety limits, their orientation is finely adjusted and the distance is extended based on preset offset rules until all nodes meet the anti-collision constraints, thereby outputting an initial capture point with safety redundancy.

5. The UAV dynamic target cooperative encirclement method based on ground-based hierarchical reinforcement learning according to claim 1, characterized in that, The ground-based system utilizes a pre-defined multi-agent near-end strategy optimization algorithm to optimize the initial capture point based on the state data of the dynamic target and the state data of the UAV, in order to obtain the globally optimal capture point, including: The relative position vector and relative velocity vector between the UAV and the dynamic target are used as inputs to the multi-agent proximal policy optimization algorithm. The multi-agent proximal policy optimization algorithm is used to evaluate the current input quantity and output the parameter adjustment amount for the spatial position of each initial capture point, thereby obtaining the optimized initial capture point. The optimized initial capture point is scored based on a preset reward function; The network parameters of the multi-agent proximal policy optimization algorithm are iteratively updated based on the scoring feedback results. Finally, the trapping points corresponding to the adjustment scheme with the highest comprehensive score are output as the global optimal trapping points.

6. The UAV dynamic target cooperative encirclement method based on ground-based hierarchical reinforcement learning according to claim 5, characterized in that, The parameter adjustment amounts include at least polar diameter adjustment amounts and polar angle adjustment amounts, and both polar diameter adjustment amounts and polar angle adjustment amounts are limited within a preset safety threshold for sudden changes in the capture point.

7. The UAV dynamic target cooperative encirclement method based on ground-based hierarchical reinforcement learning according to claim 1, characterized in that, Based on the globally optimal capture point, the ground terminal plans a local guidance path for the UAV, including: The ground control determines the advance guidance range based on the maximum flight speed and rolling planning cycle of the UAV; Starting from the current position of the UAV and taking the corresponding global optimal capture point as the target direction, a basic advanced point set is generated within the advanced guidance range at a preset step size; By combining environmental obstacle information with the local guidance paths of other UAVs, collision detection is performed on the basic leading point set, and the positions of leading points with risks are fine-tuned to generate a collision-free leading point set. Using the set of collision-free leading points as the base nodes, path search and optimization are performed to generate local guiding paths. The endpoint of the local guiding path is a sub-target point within the leading guidance range.

8. The UAV dynamic target cooperative encirclement method based on ground-based hierarchical reinforcement learning according to claim 7, characterized in that, The step of performing path search and optimization based on the set of collision-free leading points to generate the local guidance path includes: A local path optimization objective function is constructed with the objectives of minimizing path length, minimizing estimated flight time, and achieving the smoothest trajectory. The local path optimization objective function is used to optimize the set of collision-free leading points to obtain a preliminary local guidance path; The initial local guidance path is constrained and verified to determine whether the distance between all nodes on the initial local guidance path and obstacles is greater than the preset minimum obstacle avoidance distance, and whether the distance between the initial local guidance paths of other drones is greater than the preset minimum safe distance between drones. If the above dual constraints are met, the preliminary local guidance path is confirmed as the final local guidance path; if not, the collision-free leading point set is readjusted and path planning is performed again.

9. The UAV dynamic target cooperative encirclement method based on ground-based hierarchical reinforcement learning according to claim 1, characterized in that, Sending the local guidance path to the corresponding drone for execution includes: The ground terminal transmits the planned local guidance path to the UAV one-to-one via a wireless communication module. After receiving the local guidance path, the flight control unit inside the UAV calculates the positional deviation between the UAV's current position and the path nodes based on the node information of the local guidance path. Based on the positional deviation, a PID controller is used to adjust the flight attitude and speed of the UAV, so that the UAV flies along the local guidance path.

10. The UAV dynamic target cooperative encirclement method based on ground-based hierarchical reinforcement learning according to claim 1, characterized in that, The method of adjusting the globally optimal capture point and the local guidance path cyclically based on the changes in the dynamic target status data and the UAV status data under real-time monitoring using a preset dynamic update mechanism includes: The ground terminal calculates the velocity and acceleration state changes of the dynamic target within a preset monitoring time interval based on the data transmitted back in real time, as well as the flight deviation between the current position of the UAV and the planned position of the local guidance path. Determine whether any of the following triggering conditions are met: the state change exceeds a preset state change threshold, the flight deviation exceeds a preset flight deviation threshold, or a preset rolling planning cycle is reached. When any of the above triggering conditions are met, a scrolling time domain update is triggered. After the update is triggered, the ground terminal re-collects the status data and sequentially repeats the operations of dynamically optimizing the global encirclement configuration and generating new local guidance paths. Each UAV adjusts its flight trajectory in real time according to the updated local guidance path until the encirclement mission is completed.