A method and apparatus for multi-target point tracking and obstacle avoidance of unmanned aerial vehicles (UAVs)
By employing a multi-target point tracking obstacle avoidance method, utilizing deep reinforcement learning models and LiDAR data, the problem of local optimization in UAV obstacle avoidance was solved, achieving the effect of accurately reaching the destination in complex environments.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SUN YAT SEN UNIV
- Filing Date
- 2023-12-26
- Publication Date
- 2026-06-30
AI Technical Summary
Existing low-computational-cost path planning algorithms based on DRL tend to overemphasize local optimization in UAV obstacle avoidance, resulting in failure to reach the target destination.
A multi-target point tracking obstacle avoidance method is adopted. By combining LiDAR data and UAV status information with a deep reinforcement learning model, action commands are predicted to guide the UAV to avoid obstacles and ensure that it reaches the end of the path.
With limited computing resources, the system enables UAVs to make high-frequency decisions in complex environments, accurately avoid obstacles, and reach the target destination, thereby improving obstacle avoidance success rate and efficiency.
Smart Images

Figure CN117742366B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of unmanned aerial vehicle (UAV) technology, and more specifically, to a method and apparatus for multi-target point tracking and obstacle avoidance of UAVs. Background Technology
[0002] With the continuous development of drone technology, drones are increasingly widely used in various fields such as security, agriculture, logistics, and environmental protection. Drones are generally equipped with onboard sensors such as binocular cameras and lidar to perceive their surroundings and make action decisions based on this information, avoiding obstacles to safely reach their destination. However, the low-altitude environment is complex, containing various obstacles such as kites and balloons, and drones have very limited computing resources. Therefore, obstacle avoidance technology has become a significant factor restricting the development of drones. Traditional obstacle avoidance technology includes two parts: mapping and planning. Mapping includes simultaneous localization and mapping (SMR) and structure-of-motion (SOG) reconstruction. Localization is a prerequisite for action decision-making and has high computational costs, further limiting the computational resources required for action decision-making. Therefore, for drones, it is necessary to explore efficient obstacle avoidance methods.
[0003] Deep Reinforcement Learning (DRL) maps states to actions through a policy network, enabling drones to make high-frequency decisions in complex environments without requiring mapping steps or complex computations, making it suitable for drones with limited computing resources. Existing technologies provide low-computational-cost path planning algorithms based on DRL to help drones avoid obstacles in point-to-point scenarios. However, these methods focus too much on local optima, which can easily lead to situations where the goal is missed in the pursuit of local optima. Summary of the Invention
[0004] In view of this, this application provides a multi-target point tracking and obstacle avoidance method and apparatus for UAVs, which solves the shortcomings of existing low-computational-cost path planning algorithms based on DRL, which focus too much on local optimization and thus fail to reach the target destination.
[0005] To achieve the above objectives, the following solution is proposed:
[0006] A multi-target point tracking and obstacle avoidance method for unmanned aerial vehicles (UAVs) includes:
[0007] Based on the UAV's global path and current location, determine one or more path tracking points. The global path is the path planning for the UAV to fly from the path start point to the path end point.
[0008] Based on the path endpoint and each path tracking point, status input data is obtained, wherein the status input data includes lidar data, the current action status of the UAV, the relative position of the UAV to the path endpoint, and the relative position of the UAV to each path tracking point;
[0009] Obtain the trained deep reinforcement learning model;
[0010] The state input data is input into the trained deep reinforcement learning model to obtain the action command output by the trained deep reinforcement learning model. The action command is used to guide the UAV to avoid obstacles.
[0011] Optionally, determining one or more path tracking points based on the UAV's global path and current location includes:
[0012] Based on the current location of the drone, select the closest point on the path that is closest to the drone from the global path;
[0013] Starting from the nearest point on the path, and taking the flight direction of the global path as the point selection direction, multiple path tracking points are selected from the global path.
[0014] Optionally, starting from the nearest point on the path, one or more path tracking points are selected from the global path, including:
[0015] The sampling interval is determined based on the preset guidance distance;
[0016] Starting from the nearest point on the path, and taking the flight direction of the global path as the point selection direction, path points are selected as path tracking points at intervals in the global path, until the interval between the latest selected path tracking point and the first selected path tracking point is equal to the guidance distance.
[0017] Optionally, before determining the sampling interval based on a preset guidance distance, the following steps are also included:
[0018] Determine the target distance between the nearest point on the path and the end point on the path;
[0019] When the target distance is greater than the guide distance, proceed to the step of determining the point interval based on the preset guide distance;
[0020] When the target distance is not greater than the guidance distance, the path endpoint is determined as the path tracking point.
[0021] Optionally, obtaining the trained deep reinforcement learning model includes:
[0022] Obtain the initial deep reinforcement learning model and the training paths corresponding to different random maps. Each random map contains obstacles of different sizes, and each training path contains the path planning of the training drone flying from the training start point to the training end point.
[0023] For each training path, based on the current position of the training drone in the corresponding random map, a training nearest point is selected from the training path; starting from the training nearest point, multiple consecutive training target points are selected from the training path; the radar data of the training drone, the action state of the training drone, the relative position of the training drone to the training endpoint, and the relative position of the training drone to each training target point are input into an initial deep reinforcement learning model to obtain the predicted action output by the initial deep reinforcement learning model; after the training drone's action state is updated to the predicted action and interacts with the corresponding random map, the latest position of the training drone is determined; a first distance between the latest position of the training drone and the corresponding training endpoint, and a second distance between the latest position of the training drone and each training target point are calculated; based on the first distance and the second distance, the reward value of the initial deep reinforcement learning model is calculated; based on the reward value, the parameters of the initial deep reinforcement learning model are updated; when the training drone reaches the last training target point, the process returns to the step of selecting the training nearest point from the training path based on the current position of the training drone in the corresponding random map, until the training drone reaches the training endpoint;
[0024] The initial deep reinforcement learning model obtained through each training path is used as the trained deep reinforcement learning model.
[0025] Optionally, calculating the reward value of the initial deep reinforcement learning model based on the first distance and the second distance includes:
[0026] Obtain a preset reward function, and substitute the first distance and the second distance into the reward function to calculate the reward value of the initial deep reinforcement learning model;
[0027] The reward function is as follows:
[0028] r total =r goal +r track +r crash +r free +r step
[0029]
[0030]
[0031] r track =-d Pclosest
[0032] r crash =-exp(-(d ro -dmin ) / r)
[0033]
[0034] r total The reward value; r goal For distance reward; r track For tracking rewards; r crash For collision rewards; r free As a reward for free space; r step Step-based rewards; d g d represents the distance from the current drone to the end of the path. gmin The preset distance threshold; r arrival Preset reward; d pi For P track,i Distance to the drone; z pi The allocation coefficient is used to adjust Δd. pi For r goal The degree of contribution; Δd pi For P track,i Changes in distance from the drone; P track,i Let N be the i-th training target point; track d represents the number of training target points. al As the allocation factor, and d al ∈[0,1]; For d al power of i-1; For d al N track power; z pi As a weighting factor; For drones to P closest Distance of P; closest To select the closest point to the UAV from the global path; r and It is a hyperparameter; d ro Indicates the distance between the drone and the nearest obstacle; d i d represents the i-th data point in the radar data. min The minimum value in the corresponding radar data.
[0035] Optionally, obtaining the training path corresponding to different random maps includes:
[0036] Using a random map generator, the number of obstacles of each size is randomly determined based on a preset range of obstacles of each size;
[0037] Using a random map generator, random maps are generated based on the number of obstacles of various sizes;
[0038] In the random map, a random distance is selected, and the training start point and training end point are set based on the random distance;
[0039] Based on the Random Extended Tree (RRT) algorithm, a training path is constructed that is adapted to the random map and includes a training start point and a training end point.
[0040] Optionally, the random map generator is represented as:
[0041]
[0042] To show the output of the random environment generator; For the random environment generator function; d target n is the random distance; min,1 n is the minimum number of obstacles of the first size; max,1 n represents the maximum number of obstacles of the first size. min,2 The minimum number of obstacles of the second size; n max,2 This represents the maximum number of obstacles of the second size.
[0043] Optionally, the current operational state of the UAV includes the current linear velocity and the current yaw rate of the UAV;
[0044] The action commands include predicted linear velocity and predicted yaw rate.
[0045] A multi-target point tracking and obstacle avoidance device for unmanned aerial vehicles (UAVs) includes:
[0046] The determination module is used to determine one or more path tracking points based on the UAV's global path and the UAV's current position. The global path is the path planning for the UAV to fly from the path start point to the path end point.
[0047] The acquisition module is used to acquire status input data based on the path endpoint and each path tracking point. The status input data includes lidar data, the current action status of the UAV, the relative position of the UAV to the path endpoint, and the relative position of the UAV to each path tracking point.
[0048] The output module is used to acquire the trained deep reinforcement learning model; input the state input data into the trained deep reinforcement learning model to obtain the action command output by the trained deep reinforcement learning model, and the action command is used to update the action state of the UAV and guide the UAV to avoid obstacles.
[0049] A multi-target point tracking and obstacle avoidance device for unmanned aerial vehicles (UAVs), including a memory and a processor;
[0050] The memory is used to store programs;
[0051] The processor is used to execute the program to implement the various steps of the above-described multi-target point tracking and obstacle avoidance method for UAVs.
[0052] A readable storage medium having a computer program stored thereon, characterized in that, when the computer program is executed by a processor, it implements the various steps of the above-described multi-target point tracking and obstacle avoidance method for unmanned aerial vehicles.
[0053] As can be seen from the above technical solution, the multi-target point tracking and obstacle avoidance method for UAVs provided in this application allows the UAV's action decisions to be determined by a trained deep reinforcement learning model. The deep reinforcement learning model can fully learn and process the input state data and predict future action states through the fitting ability of neural networks. This eliminates the need for complex mapping and planning, reduces the computational resource requirements of the UAV, and enables high-frequency decision-making in complex environments, guiding the UAV to avoid obstacles.
[0054] Furthermore, the state input data of this application consists of LiDAR data, the current action state of the UAV, the relative position of the UAV to the path endpoint, and the relative position of the UAV to each path tracking point. Each path tracking point is determined by the UAV's global path and current position. During deep learning and processing, the trained neural network model fully learns prior information from the state input data, emphasizing the path endpoint. When predicting action commands, it fully considers the distance between the path endpoint and the UAV, avoiding situations where the UAV cannot reach the path endpoint. Therefore, this application can guide the UAV to avoid obstacles while accurately reaching the path endpoint, even with limited computing resources. Experiments have proven that this application is highly effective in long-distance obstacle avoidance tasks. Attached Figure Description
[0055] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only embodiments of this application. For those skilled in the art, other drawings can be obtained based on the provided drawings without creative effort.
[0056] Figure 1 This is a flowchart of a multi-target point tracking and obstacle avoidance method for an unmanned aerial vehicle (UAV) disclosed in an embodiment of this application.
[0057] Figure 2 A schematic diagram of training curve comparison provided in an embodiment of this application;
[0058] Figure 3 This is a structural block diagram of a multi-target point tracking and obstacle avoidance device for a drone disclosed in an embodiment of this application. Detailed Implementation
[0059] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0060] Next, combine Figure 1 The multi-target point tracking and obstacle avoidance method for UAVs described in this application is described in detail, including the following steps:
[0061] Step S1: Determine one or more path tracking points based on the drone's global path and current location.
[0062] Specifically, the global path can be a path plan for the UAV to fly from the path start point to the path end point, and can include multiple path points with sequential relationships.
[0063] Global paths can be constructed based on the RRT (Rapidly Exploring Random Tree) algorithm.
[0064] This application can be executed multiple times before the drone reaches the end of the path to complete the action decision.
[0065] Any path tracking point is a path point on the global path.
[0066] The drone can be guided to the end of the path by tracking points along the path.
[0067] The drone can be a rotary-wing drone.
[0068] Step S2: Based on the path endpoint and each path tracking point, obtain status input data.
[0069] Specifically, the status input data may include lidar data, the current action status of the UAV, the relative position of the UAV to the end point of the path, and the relative position of the UAV to each path tracking point.
[0070] Among them, lidar data χ o =[d o1 ,...,d on ], where d oi This represents the i-th distance information in the lidar data.
[0071] The current operational status of a drone can include its current linear velocity and its current yaw rate.
[0072] The relative distance between the drone's current position and the path endpoint can be calculated using the drone's current position, the path endpoint, and each path tracking point. The relative distance between the drone's current position and each path tracking point can also be calculated.
[0073] When there are multiple path tracking points, the order of these points is determined by the direction of the global path.
[0074] Step S3: Obtain the trained deep reinforcement learning model.
[0075] Specifically, a deep reinforcement learning model can be pre-trained and then installed in the drone to help the drone make action decisions.
[0076] The trained deep reinforcement learning model can include an automatic temperature mechanism, a critic-actor network, and an instruction filter.
[0077] The critic-actor network is built on a target function that maximizes entropy and supports a stochastic policy.
[0078] The automatic temperature mechanism can automatically adjust the temperature factor based on the output distribution of the stochastic policy. The cost function of the temperature factor α is:
[0079]
[0080] in It is a hyperparameter representing the target entropy, which can be simply set as the negative of the action space dimension -dim(Action).
[0081] Command filters can smooth the network output during the testing phase to stabilize the obstacle avoidance performance of rotary-wing UAVs and improve robustness.
[0082] Step S4: Input the state input data into the trained deep reinforcement learning model to obtain the action instructions output by the trained deep reinforcement learning model.
[0083] Specifically, the trained deep reinforcement learning model is trained on multiple random maps, each containing obstacles of different sizes.
[0084] The action commands can be used to update the current action status of the drone and guide the drone to avoid obstacles.
[0085] Action commands can include predicted linear velocity and predicted yaw rate.
[0086] As can be seen from the above technical solution, the multi-target point tracking and obstacle avoidance method for UAVs provided in this application allows the UAV's action decisions to be determined by a trained deep reinforcement learning model. The deep reinforcement learning model can fully learn and process the input state data and predict future action states through the fitting ability of neural networks. This eliminates the need for complex mapping and planning, reduces the computational resource requirements of the UAV, and enables high-frequency decision-making in complex environments, guiding the UAV to avoid obstacles.
[0087] Furthermore, the state input data of this application consists of LiDAR data, the current action state of the UAV, the relative position of the UAV to the path endpoint, and the relative position of the UAV to each path tracking point. Each path tracking point is determined by the UAV's global path and current position. During deep learning and processing, the trained neural network model fully learns prior information from the state input data, emphasizing the path endpoint. When predicting action commands, it fully considers the distance between the path endpoint and the UAV, avoiding situations where the UAV cannot reach the path endpoint. Therefore, this application can guide the UAV to avoid obstacles while accurately reaching the path endpoint, even with limited computing resources. Experiments have proven that this application is highly effective in long-distance obstacle avoidance tasks.
[0088] In some embodiments of this application, the process of determining one or more path tracking points based on the UAV's global path and current position is described in detail, and the steps are as follows:
[0089] S10. Based on the current location of the UAV, select the closest point on the path that is closest to the UAV from the global path.
[0090] Specifically, based on the location of the drone, the nearest path point to the drone can be selected from the global path as the path nearest point.
[0091] After determining the shortest point on the path, we can calculate whether the distance between the shortest point and the end point of the path is not greater than the guiding distance.
[0092] When the distance is greater than the guide distance, step S11 can be executed.
[0093] When the distance is not greater than the guide distance, the end point of the path can be directly used as the path tracking point.
[0094] S11. Starting from the nearest point on the path, and taking the flight direction of the global path as the point selection direction, select multiple path tracking points from the global path.
[0095] Specifically, starting from the nearest point on the path, the direction of the global path is used as the point selection direction, and multiple path tracking points are selected from the global path.
[0096] The interval between any two adjacent path tracking points is the same, and the distance between adjacent path tracking points is determined based on a preset guidance distance.
[0097] When the drone exceeds or reaches the last path tracking point but has not reached the end of the path, the drone can return to execute step S10.
[0098] As can be seen from the above technical solution, this embodiment provides an optional method for selecting path tracking points. By using the above method, based on the global path, multiple path tracking points are selected by selecting the closest point on the path, which helps the veerging UAV to return to the global path, thereby better reaching the end of the path.
[0099] In some embodiments of this application, the process of selecting multiple path tracking points from the global path, starting from the nearest point on the path and taking the flight direction of the global path as the point selection direction, is described in detail as follows:
[0100] S110. Determine the sampling interval based on the preset guide distance.
[0101] Specifically, the number of path tracking points can be determined;
[0102] The quotient between the guide distance and the set quantity is used as the point selection interval.
[0103] S111. Starting from the nearest point on the path, and taking the flight direction of the global path as the point selection direction, select path points as path tracking points at intervals in the global path until the interval between the latest selected path tracking point and the first selected path tracking point is equal to the guidance distance.
[0104] Specifically, starting from the nearest point on the path, the direction of flight from the nearest point to the end point on the global path can be used as the point-collecting direction. Multiple path tracking points can be collected at equal intervals, with the point-collecting interval as the interval, until the number of collected path tracking points reaches the set number or the interval between the newly selected path tracking point and the first selected path tracking point is equal to the guidance distance.
[0105] As can be seen from the above technical solution, this embodiment provides an optional method for selecting path tracking points based on the nearest path point. By using the above method, path tracking points for guiding the flight of the UAV can be selected, and the selected path tracking points are evenly distributed, which reduces the randomness of the calculation and alleviates the computational pressure.
[0106] In some embodiments of this application, considering that when the path endpoint enters the drone's visual range, the selected path tracking point will be very close to the path endpoint, in this case, the drone may have already passed some closer path tracking points after interaction, but has not yet passed some farther path tracking points. Since the distance between the drone and these path tracking points increases after the drone passes them, the non-sparse distance reward function causes negative feedback from the closer path tracking points, hindering the drone from reaching the path endpoint. Based on this, this application considers directly determining the path endpoint as the path tracking point when it is close to the drone. Therefore, before executing step S110, an additional judgment process is attempted to improve the accuracy of action decision-making. The judgment process will be described in detail below:
[0107] S112. Determine the target distance between the nearest point on the path and the end point on the path.
[0108] Specifically, the distance between the nearest point on the path and the end point on the path can be calculated, and this distance can be used as the target distance.
[0109] S113. Determine whether the target distance is greater than the guidance distance; if yes, proceed to step S110; if no, proceed to step S114.
[0110] Specifically, the target distance and the guide distance can be compared. If the target distance is greater than the guide distance, step S110 is executed; if the target distance is not greater than the guide distance, step S114 is executed.
[0111] S114. The endpoint of the path is determined as the path tracking point.
[0112] Specifically, the path endpoint can be directly set as the path and endpoint.
[0113] At this point, the forward speed can be reduced to prevent the drone from crossing the end of the path. For example, the maximum forward speed can be adjusted to 1 m / s.
[0114] As can be seen from the above technical solution, this embodiment adds a judgment process. Through the above process, it is possible to avoid the situation where the drone cannot reach the end of the path during the process of guiding the drone to avoid obstacles, thereby further improving the accuracy of this application.
[0115] In some embodiments of this application, the process of obtaining the trained deep reinforcement learning model in step S3 is described in detail, and the steps are as follows:
[0116] S30. Obtain the initial deep reinforcement learning model and the training paths corresponding to different random maps. Each random map contains obstacles of different sizes, and each training path contains the path planning of the training drone flying from the training starting point to the training ending point.
[0117] Specifically, different training paths can correspond to different random maps, which can be randomly generated by a random map generator.
[0118] Obstacles of various sizes can be placed in the random map.
[0119] S31. For each training path, based on the current position of the training drone in the corresponding random map, select the nearest training point from the training path; starting from the nearest training point, select multiple consecutive training target points from the training path; input the radar data of the training drone, the action state of the training drone, the relative position of the training drone to the training endpoint, and the relative position of the training drone to each training target point into the initial deep reinforcement learning model to obtain the predicted action output by the initial deep reinforcement learning model; determine the latest position of the training drone after the action state of the training drone is updated to the predicted action and interacts with the corresponding random map; calculate the first distance between the latest position of the training drone and the corresponding training endpoint, and the second distance between the latest position of the training drone and each training target point; calculate the reward value of the initial deep reinforcement learning model based on the first distance and the second distance; update the parameters of the initial deep reinforcement learning model based on the reward value; when the training drone reaches the last training target point, return to the step of selecting the nearest training point from the training path based on the current position of the training drone in the corresponding random map, until the training drone reaches the training endpoint.
[0120] Specifically, the initial deep reinforcement learning model can be trained using each training path.
[0121] The distance between the latest position of the training drone and the corresponding training endpoint is taken as the first distance, and the distance between the latest position of the training drone and each training target point is taken as the second distance.
[0122] S32. The initial deep reinforcement learning model obtained through each training path is used as the trained deep reinforcement learning model.
[0123] Specifically, the initial deep reinforcement learning model trained using various training paths can be used as the trained deep reinforcement learning model.
[0124] As can be seen from the above technical solution, this embodiment provides an optional method for training a deep reinforcement learning model. Through the above method, the deep reinforcement learning model can be trained iteratively using different training target points, thereby further improving the prediction accuracy of the deep reinforcement learning model.
[0125] In some embodiments of this application, the process of obtaining training paths corresponding to different random maps in step S30 is described in detail, and the steps are as follows:
[0126] S300: Using a random map generator, the number of obstacles of each size is randomly determined based on a preset range of obstacles of each size.
[0127] Specifically, the random map generator can be configured with a range of obstacle quantities of various sizes. The random map generator can then randomly select a value from each obstacle quantity range as the number of obstacles of the corresponding size.
[0128] The number of obstacles of different sizes can be the same.
[0129] S301. Using a random map generator, generate a random map based on the number of obstacles of various sizes.
[0130] Specifically, by using a random map generator, obstacles can be randomly added to a blank map according to the number of obstacles corresponding to each size, thus forming a random map.
[0131] S302. In the random map, select a random distance, and set the training start point and training end point based on the random distance.
[0132] Specifically, a random map generator can be used to generate random distances, and the position of the training drone on the random map can be used as the training starting point. The training endpoint can be determined based on the random distance and the training starting point.
[0133] S303. Based on the Random Extended Tree Algorithm (RRT), construct a training path that is adapted to the random map and includes a training start point and a training end point.
[0134] Specifically, the Randomized Tree Transformation (RRT) algorithm can be used to construct a training path based on the corresponding random map, the training start point, and the training end point.
[0135] By repeating the above process multiple times, multiple training paths can be obtained.
[0136] As can be seen from the above technical solution, this embodiment provides an optional method for obtaining training paths. The above method can use a random map containing obstacles of different sizes to construct training paths, which can simulate various obstacle avoidance situations of drones in actual use, and further improve the training effect of this application.
[0137] In some embodiments of this application, the random map generator can be represented as:
[0138]
[0139] To show the output of the random environment generator; For the random environment generator function; d target n is the random distance; min,1 n is the minimum number of obstacles of the first size; max,1 n represents the maximum number of obstacles of the first size. min,2 The minimum number of obstacles of the second size; n max,2 This represents the maximum number of obstacles of the second size.
[0140] In some embodiments of this application, the process of calculating the reward value of the initial deep reinforcement learning model based on the first distance and the second distance in S31 is described in detail, and the steps are as follows:
[0141] S310. Obtain a preset reward function, and substitute the first distance and the second distance into the reward function to calculate the reward value of the initial deep reinforcement learning model.
[0142] Specifically, a non-sparse reward function can be pre-set to guide the drone to complete path tracking and obstacle avoidance training.
[0143] The reward function can be as follows:
[0144] r total =r goal +r track +r crash +r free +r step
[0145]
[0146]
[0147] r track =-d Pclosest
[0148] r crash =-exp(-(d ro-d min ) / r)
[0149]
[0150] r total The reward value; r goal For distance reward; r track For tracking rewards; r crash For collision rewards; r free As a reward for free space; r step Step-based rewards; d g d represents the distance from the current drone to the end of the path. gmin The preset distance threshold; r arrival Preset reward; d pi For P track,i Distance to the drone; z pi The allocation coefficient is used to adjust Δd. pi For r goal The degree of contribution can be any positive floating-point number; Δd pi For P track,i Changes in distance from the drone; P track,i Let N be the i-th training target point; track d represents the number of training target points. al As the allocation factor, and d al ∈[0,1]; For d al power of i-1; For d al N track power; z pi As a weighting factor; For drones to P closest Distance of P; closest To select the closest point to the UAV from the global path; r and It is a hyperparameter; d ro Indicates the distance between the drone and the nearest obstacle; d i This refers to the i-th data point in the radar data, specifically the i-th range information point in the radar data; d min The minimum value in the corresponding radar data.
[0151] To avoid calculation errors caused by division by zero, 1 / 0 = 1 / N can be preset. track .
[0152] As can be seen from the above technical solution, this embodiment provides an optional method for calculating reward values. This method can guide the drone towards the path endpoint through positive and negative feedback, avoiding obstacles. The reward function unifies the reward scale during training, allowing different parameter configurations to be compared with each other. Simultaneously, this solution transforms the parameters of the training target point dimension into one-dimensional parameters, reducing the difficulty of parameter adjustment and improving the training convergence rate.
[0153] Next, specific experiments will be conducted to verify the beneficial effects of this application.
[0154] The nearest point and the path tracking point correspond to the reward function r. track and r goal As can be seen from the reward function above, r track In r to t al The contribution is solely determined by its position in r total The weights in r are determined by the fact that the parameter has only one dimension. track The introduction of this will not make parameter tuning too difficult. Therefore, r goal The floating terms in the data are determined by the number of path tracking points N. track and allocation factor d al Decision. Therefore, the following section compares different N values. track and d al Obstacle avoidance performance under configuration.
[0155] The trained deep reinforcement learning model uses a fully connected neural network with two hidden layers (256 neurons) to represent the policy. The activation function is uniformly non-linear ReLU, and the policy network output is mapped to [0,1] via the Tanh function or by pruning. The policy network π... φ and Value Network Q θ No parameters are shared. This application uniformly uses the ADAM gradient descent algorithm (learning rate 0.0003) to train all networks and the temperature factor α. Policy π φ Before training, the sample was randomized using Gaussian noise.
[0156] During the experiment, the number of path tracking points and the value of the allocation factor were adjusted, and multiple parameter configurations were set. Each parameter configuration was tested using three random seeds. Each random seed was trained to convergence after 1000 interactions. Data from different parameter configurations were sorted according to their average values, with performance increasing from top to bottom. The baseline used in the experiment was a traditional point-to-point obstacle avoidance scheme, N track =1,r track ≡0 and the state space does not contain P in the body coordinate system closest Because N track =1,d alTaking any value will not affect the experimental results. Considering that drones may not have performance differentiation in overly simple obstacle environments, the environmental parameter n used in this invention... min,1 n max,1 n min,2 n max,2 The obstacles are relatively large, causing the obstacle density to approach 20%. In this situation, the drone may not be able to reach the destination due to the excessive density of obstacles.
[0157] The results show that N track Obstacle avoidance performance is generally strong under the =4 configuration. Markov decision processes assume that the state space contains all the elements needed for decision-making. In point-to-point obstacle avoidance schemes, the state space only contains the distance and orientation information of a single target. This representation does not consider any path information and degenerates into a partially observable Markov decision process in path-tracking obstacle avoidance tasks. Path information is prior information that guides the UAV in the correct direction without requiring blind exploration. Generally, the UAV only needs to explore slightly near the waypoint to find a feasible path. The extended state space contains the nearest pathpoint information P. closest and multiple path tracking points P track,i Nearest path information P closest It can limit the shortest distance between the drone and the path. Multiple path tracking points P track,i A small portion of the path can be approximated. A larger number of path tracking points N. track To some extent, this can provide a more complete description of the environment in which the drone operates. However, the number of path tracking points N... track More is not necessarily better. Given the number of path tracking points N track State vector s t Increase 2*N track Dimension. Generally, the dimension of the state space |S| is much greater than 2*N. track And N kcart Smaller. When N track When it is small, the path tracking point P track,i The error will not excessively interfere with the drone's decision-making. The increase in dimensionality introduced by the method proposed in this application will not have a significant negative impact on the drone's decision-making. track A slight increase of 4 helps drones make more accurate decisions.
[0158] Next, the training curves were compared. Ten random seeds were used to conduct repeated experiments. After 1,000 tests were performed in each group of repeated experiments, the obstacle avoidance success rate was calculated.
[0159] See Figure 2 Proposed indicates that the parameter is configured as N. track =4d al=0.6 corresponds to the training process. Baseline indicates a point-to-point obstacle avoidance algorithm, i.e., parameter configuration of N. track =1, r track The training process corresponding to ≡0.
[0160] As training progresses, both the Proposed and Baseline prototypes converge stably and overlap. Subtracting r from the round reward of the Proposed prototype... track,proposed Subsequently, the proposed (corrected) curves obtained were mostly above the baseline. This proves that tracking the reward r... track The introduction of this method improves the path tracking obstacle avoidance performance and also proves that the multi-target point tracking obstacle avoidance method of the UAV in this application is superior to the point-to-point obstacle avoidance algorithm.
[0161] Experimental results show that the average obstacle avoidance success rate of this application generally remains around 0.737, and the value is relatively stable, with no invalid training (e.g., obstacle avoidance rate less than 0.400). Traditional solutions maintain an average obstacle avoidance success rate around 0.490, but with significant fluctuations, including cases with extremely low obstacle avoidance rates. These cases with extremely low obstacle avoidance rates deviate significantly from other data under the same configuration and can be considered training failures. Clearly, the method of this invention comprehensively surpasses point-to-point obstacle avoidance methods in terms of obstacle avoidance performance and training stability.
[0162] Experiments revealed that drones using a point-to-point obstacle avoidance scheme fly relatively slowly during obstacle avoidance and are prone to deviating from the overall path. This "confusion" in their movement results in the drone requiring a longer flight time. Therefore, this application can guide the drone to the end of the path more quickly.
[0163] Next, we will combine Figure 3 This application provides a detailed description of the multi-target point tracking and obstacle avoidance device for UAVs. The multi-target point tracking and obstacle avoidance device for UAVs described below can be compared with the multi-target point tracking and obstacle avoidance method for UAVs described above.
[0164] See Figure 3 It can be observed that the multi-target tracking and obstacle avoidance device for drones may include:
[0165] The determination module 10 is used to determine one or more path tracking points based on the global path of the UAV and the current position of the UAV. The global path is the path planning of the UAV from the starting point to the ending point of the path.
[0166] The acquisition module 20 is used to acquire status input data based on the path endpoint and each path tracking point, wherein the status input data includes lidar data, the current action status of the UAV, the relative position of the UAV to the path endpoint, and the relative position of the UAV to each path tracking point;
[0167] The output module 30 is used to acquire the trained deep reinforcement learning model; input the state input data into the trained deep reinforcement learning model to obtain the action command output by the trained deep reinforcement learning model, and the action command is used to update the action state of the UAV and guide the UAV to avoid obstacles.
[0168] The module to be determined may include:
[0169] The path nearest point selection unit is used to select the path nearest point that is closest to the drone from the global path based on the drone's current position;
[0170] The path tracking point selection unit is used to select multiple path tracking points from the global path, starting from the nearest point on the path and taking the flight direction of the global path as the point selection direction.
[0171] The path tracking point selection unit may include:
[0172] The sampling interval determination subunit is used to determine the sampling interval based on a preset guide distance;
[0173] The point selection interval utilizes a sub-unit to select path points as path tracking points at intervals along the global path, starting from the nearest point on the path and taking the flight direction of the global path as the point selection direction, until the interval between the latest selected path tracking point and the first selected path tracking point is equal to the guidance distance.
[0174] The path tracking point selection unit may also include:
[0175] The target distance determination subunit is used to determine the target distance between the nearest point on the path and the end point of the path.
[0176] The guidance distance comparison unit is used to perform the step of determining the point interval based on a preset guidance distance when the target distance is greater than the guidance distance; and to determine the path endpoint as the path tracking point when the target distance is not greater than the guidance distance.
[0177] The output module may include:
[0178] The training path acquisition unit is used to acquire the initial deep reinforcement learning model and the training paths corresponding to different random maps. Each random map contains obstacles of different sizes, and each training path contains the path planning of the training drone flying from the training start point to the training end point.
[0179] The model training unit is used to select the nearest training point from the training path for each training path based on the current position of the training drone in the corresponding random map; starting from the nearest training point, select multiple consecutive training target points from the training path; input the radar data of the training drone, the action state of the training drone, the relative position of the training drone to the training endpoint, and the relative position of the training drone to each training target point into an initial deep reinforcement learning model to obtain the predicted action output by the initial deep reinforcement learning model; determine the latest position of the training drone after updating its action state to the predicted action and interacting with the corresponding random map; and calculate the training... The first distance between the latest position of the drone and the corresponding training endpoint, and the second distance between the latest position of the training drone and each training target point; based on the first distance and the second distance, calculate the reward value of the initial deep reinforcement learning model; based on the reward value, update the parameters of the initial deep reinforcement learning model; when the training drone reaches the last training target point, return to the step of selecting the nearest training point from the training path based on the current position of the training drone in the corresponding random map, until the training drone reaches the training endpoint; use the initial deep reinforcement learning model obtained through training on each training path as the trained deep reinforcement learning model.
[0180] Model training units may include:
[0181] The reward value calculation subunit is used to obtain a preset reward function, and substitute the first distance and the second distance into the reward function to calculate the reward value of the initial deep reinforcement learning model;
[0182] The reward function is as follows:
[0183] r total =r goal +r track +r crash +r free +r step
[0184]
[0185]
[0186]
[0187] r crash =-exp(-(d ro -d min ) / r)
[0188]
[0189] r total The reward value; r goal For distance reward; r track For tracking rewards; r crash For collision rewards; r free As a reward for free space; r step Step-based rewards; d g d represents the distance from the current drone to the end of the path. gmin The preset distance threshold; r arrival Preset reward; d pi For P track,i Distance to the drone; z pi The allocation coefficient is used to adjust Δd. pi For r goal The degree of contribution; Δd pi For P track,i Changes in distance from the drone; P track,i Let N be the i-th training target point; track d represents the number of training target points. al As the allocation factor, and d al ∈[0,1]; For d al power of i-1; For d al N track power; z pi As a weighting factor; For drones to P closest Distance of P; closest To select the closest point to the UAV from the global path; r and It is a hyperparameter; d ro Indicates the distance between the drone and the nearest obstacle; d i d represents the i-th data point in the radar data. min The minimum value in the corresponding radar data.
[0190] The training path acquisition unit may include:
[0191] The obstacle quantity determination subunit is used to randomly determine the number of obstacles of each size based on a preset range of obstacle quantities for each size using a random map generator.
[0192] The random map generation subunit is used to generate random maps based on the number of obstacles of various sizes using a random map generator.
[0193] The endpoint determination subunit is used to select a random distance in the random map and set the training start point and training endpoint based on the random distance.
[0194] The path determination subunit is used to construct a training path adapted to the random map and containing a training start point and a training end point based on the Random Extended Tree Algorithm (RRT).
[0195] The training path may also include the following sub-units:
[0196] The random map generator storage subunit is used to store the random map generator;
[0197] The random map generator is represented as follows:
[0198]
[0199] To show the output of the random environment generator; For the random environment generator function; d target n is the random distance; min,1 n is the minimum number of obstacles of the first size; max,1 n represents the maximum number of obstacles of the first size. min,2 The minimum number of obstacles of the second size; n max,2 This represents the maximum number of obstacles of the second size.
[0200] The multi-target point tracking and obstacle avoidance device for UAVs provided in this application embodiment can be applied to UAVs, including: a memory and instructions in the memory for execution;
[0201] The memory stores a program, which the processor can call. The program is used for:
[0202] Based on the UAV's global path and current location, determine one or more path tracking points. The global path is the path planning for the UAV to fly from the path start point to the path end point.
[0203] Based on the path endpoint and each path tracking point, status input data is obtained, wherein the status input data includes lidar data, the current action status of the UAV, the relative position of the UAV to the path endpoint, and the relative position of the UAV to each path tracking point;
[0204] Obtain the trained deep reinforcement learning model;
[0205] The state input data is input into the trained deep reinforcement learning model to obtain the action command output by the trained deep reinforcement learning model. The action command is used to guide the UAV to avoid obstacles.
[0206] Optionally, the refined and extended functions of the program can be referred to the above description.
[0207] This application embodiment also provides a readable storage medium that can store a program suitable for execution by a processor, the program being used for:
[0208] Based on the UAV's global path and current location, determine one or more path tracking points. The global path is the path planning for the UAV to fly from the path start point to the path end point.
[0209] Based on the path endpoint and each path tracking point, status input data is obtained, wherein the status input data includes lidar data, the current action status of the UAV, the relative position of the UAV to the path endpoint, and the relative position of the UAV to each path tracking point;
[0210] Obtain the trained deep reinforcement learning model;
[0211] The state input data is input into the trained deep reinforcement learning model to obtain the action command output by the trained deep reinforcement learning model. The action command is used to guide the UAV to avoid obstacles.
[0212] Optionally, the refined and extended functions of the program can be referred to the above description.
[0213] Finally, it should be noted that in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.
[0214] The various embodiments in this specification are described in a progressive manner, with each embodiment focusing on the differences from other embodiments. The same or similar parts between the various embodiments can be referred to each other.
[0215] The above description of the disclosed embodiments enables those skilled in the art to make or use this application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of this application. The various embodiments of this application can be combined with each other. Therefore, this application is not to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims
1. A multi-target point tracking and obstacle avoidance method for a UAV, characterized in that, include: Based on the UAV's global path and current location, determine one or more path tracking points. The global path is the path planning for the UAV to fly from the path start point to the path end point. Based on the path endpoint and each path tracking point, status input data is obtained, wherein the status input data includes lidar data, the current action status of the UAV, the relative position of the UAV to the path endpoint, and the relative position of the UAV to each path tracking point; Obtain the initial deep reinforcement learning model and the training paths corresponding to different random maps. Each random map contains obstacles of different sizes, and each training path contains the path planning of the training drone flying from the training start point to the training end point. For each training path, based on the current position of the training drone in the corresponding random map, a training nearest point is selected from the training path; starting from the training nearest point, multiple consecutive training target points are selected from the training path; the radar data of the training drone, the action state of the training drone, the relative position of the training drone to the training endpoint, and the relative position of the training drone to each training target point are input into an initial deep reinforcement learning model to obtain the predicted action output by the initial deep reinforcement learning model; after the training drone's action state is updated to the predicted action and interacts with the corresponding random map, the latest position of the training drone is determined; a first distance between the latest position of the training drone and the corresponding training endpoint, and a second distance between the latest position of the training drone and each training target point are calculated; based on the first distance and the second distance, the reward value of the initial deep reinforcement learning model is calculated; based on the reward value, the parameters of the initial deep reinforcement learning model are updated; when the training drone reaches the last training target point, the process returns to the step of selecting the training nearest point from the training path based on the current position of the training drone in the corresponding random map, until the training drone reaches the training endpoint; The initial deep reinforcement learning model obtained through each training path is used as the trained deep reinforcement learning model. The state input data is input into the trained deep reinforcement learning model to obtain the action command output by the trained deep reinforcement learning model. The action command is used to guide the UAV to avoid obstacles.
2. The multi-target point tracking and obstacle avoidance method for UAVs according to claim 1, characterized in that, The step of determining one or more path tracking points based on the UAV's global path and current location includes: Based on the current location of the drone, select the closest point on the path that is closest to the drone from the global path; Starting from the nearest point on the path, and taking the flight direction of the global path as the point selection direction, multiple path tracking points are selected from the global path.
3. The multi-target point tracking and obstacle avoidance method for UAVs according to claim 2, characterized in that, Starting from the nearest point on the path, and using the flight direction of the global path as the point selection direction, multiple path tracking points are selected from the global path, including: The sampling interval is determined based on the preset guidance distance; Starting from the nearest point on the path, and taking the flight direction of the global path as the point selection direction, path points are selected as path tracking points at intervals in the global path, until the interval between the latest selected path tracking point and the first selected path tracking point is equal to the guidance distance.
4. The multi-target point tracking and obstacle avoidance method for UAVs according to claim 3, characterized in that, Before determining the point selection interval based on the preset guidance distance, the process also includes: Determine the target distance between the nearest point on the path and the end point on the path; When the target distance is greater than the guide distance, proceed to the step of determining the point interval based on the preset guide distance; When the target distance is not greater than the guidance distance, the path endpoint is determined as the path tracking point.
5. The method according to claim 1, characterized in that, The step of calculating the reward value of the initial deep reinforcement learning model based on the first distance and the second distance includes: Obtain a preset reward function, and substitute the first distance and the second distance into the reward function to calculate the reward value of the initial deep reinforcement learning model; The reward function is as follows: This is the reward value; As a distance reward; For tracking and rewarding; For collision rewards; Rewards for free space; Rewards based on steps taken; This represents the current distance from the drone to the end of its path. The preset distance threshold; Preset rewards; for Distance to the drone; The allocation coefficient is used for adjustment. right The degree of contribution; for Changes in distance from the drone; Let i be the i-th training target point; The number of training target points; As the allocation factor, and ; for power of i-1; for of Power; As a weighting factor; For drones to The distance; To select the closest point to the UAV from the global path; r and It is a hyperparameter; Indicates the distance between the drone and the nearest obstacle; d i This refers to the i-th data point in the radar data. The minimum value in the corresponding radar data.
6. The multi-target point tracking and obstacle avoidance method for unmanned aerial vehicles according to claim 1, characterized in that, Obtain training paths corresponding to different random maps, including: Using a random map generator, the number of obstacles of each size is randomly determined based on a preset range of obstacles of each size; Using a random map generator, random maps are generated based on the number of obstacles of various sizes; In the random map, a random distance is selected, and the training start point and training end point are set based on the random distance; Based on the Random Extended Tree (RRT) algorithm, a training path is constructed that is adapted to the random map and includes a training start point and a training end point.
7. The multi-target point tracking and obstacle avoidance method for unmanned aerial vehicles according to claim 6, characterized in that, The random map generator is represented as follows: This is the output of the random environment generator; This is a function for generating random environments; The random distance; This represents the minimum number of obstacles of the first size. This represents the maximum number of obstacles of the first size. This represents the minimum number of obstacles of the second size. This represents the maximum number of obstacles of the second size.
8. The multi-target point tracking and obstacle avoidance method for a UAV according to any one of claims 1-7, characterized in that, The current operational status of the drone includes the drone's current linear velocity and the drone's current yaw rate; The action commands include predicted linear velocity and predicted yaw rate.
9. A multi-target point tracking and obstacle avoidance device for unmanned aerial vehicles (UAVs), characterized in that, include: The determination module is used to determine one or more path tracking points based on the UAV's global path and the UAV's current position. The global path is the path planning for the UAV to fly from the path start point to the path end point. The acquisition module is used to acquire status input data based on the path endpoint and each path tracking point. The status input data includes lidar data, the current action status of the UAV, the relative position of the UAV to the path endpoint, and the relative position of the UAV to each path tracking point. The output module is used to acquire the initial deep reinforcement learning model and training paths corresponding to different random maps. Each random map contains obstacles of different sizes, and each training path includes a path plan for the training drone to fly from the training starting point to the training endpoint. For each training path, based on the current position of the training drone in the corresponding random map, the nearest training point is selected from the training path. Starting from the nearest training point, multiple consecutive training target points are selected from the training path. The radar data of the training drone, the action state of the training drone, the relative position of the training drone to the training endpoint, and the relative position of the training drone to each training target point are input into the initial deep reinforcement learning model to obtain the predicted action output by the initial deep reinforcement learning model. The latest position of the training drone is determined after the action state of the training drone is updated to the predicted action and interacts with the corresponding random map. Calculate the first distance between the latest position of the training drone and the corresponding training endpoint, and the second distance between the latest position of the training drone and each training target point; calculate the reward value of the initial deep reinforcement learning model based on the first distance and the second distance; update the parameters of the initial deep reinforcement learning model based on the reward value; when the training drone reaches the last training target point, return to the step of selecting the nearest training point from the training path based on the current position of the training drone in the corresponding random map, until the training drone reaches the training endpoint; use the initial deep reinforcement learning model obtained through training on each training path as the trained deep reinforcement learning model; input the state input data into the trained deep reinforcement learning model to obtain the action command output by the trained deep reinforcement learning model, and the action command is used to guide the drone to avoid obstacles.