Unmanned aerial vehicle control method, apparatus, and electronic device
By acquiring the visual perception information and flight state parameters of the UAV, generating flight action parameters using a reinforcement learning model, and combining a multi-objective reward function with second-order smoothness constraints, the problem of poor flight stability of the UAV in complex environments is solved, achieving high-precision and stable flight control.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SICHUAN AOSSCI TECHNOLOGY CO LTD
- Filing Date
- 2026-04-08
- Publication Date
- 2026-06-23
AI Technical Summary
In existing technologies, drones are difficult to control with high precision in complex environments, resulting in poor flight stability.
By acquiring the visual perception information and flight status parameters of the UAV, the trained reinforcement learning model is used to generate flight action parameters. The reward function is determined by combining the center alignment degree, tracking stability, safe distance constraints, observation scale constraints and flight stability, and second-order smoothing constraints are applied to generate flight control commands.
It improves the flight control precision and stability of UAVs in complex environments, enhances their resistance to interference, ensures that flight action parameters are within the limits of physical dynamics, and improves flight safety and stability.
Smart Images

Figure CN121995947B_ABST
Abstract
Description
Technical Field
[0001] This application belongs to the field of unmanned aerial vehicle (UAV) technology, and in particular relates to a UAV control method, device and electronic equipment. Background Technology
[0002] Currently, with the development of artificial intelligence technology, the drone field has begun to introduce visual recognition technology to detect targets, and then control the drone's flight status based on the target detection results to achieve autonomous flight control. However, in complex environments, the control precision requirements for drones are high, and the flight status control in related technologies cannot guarantee the stability of drone flight, failing to meet the high-precision control needs of practical applications. Summary of the Invention
[0003] In view of the shortcomings of the prior art, this application provides a drone control method, device and electronic device to solve the problem that drones cannot meet high-precision control in complex environmental scenarios, resulting in poor flight stability.
[0004] In a first aspect, this application provides a method for controlling an unmanned aerial vehicle (UAV), comprising:
[0005] Acquire visual perception information and flight status parameters of the drone;
[0006] Based on visual perception information and flight state parameters, the flight action parameters of the UAV at each decision moment are obtained through a trained reinforcement learning model; the reward function of the reinforcement learning model is jointly determined by center alignment degree, tracking stability, safe distance constraint, observation scale constraint and flight stability.
[0007] The flight control reliability of the target object is obtained by evaluating the flight control of the target object based on visual perception information.
[0008] If the flight control reliability meets the preset reliability, the flight action parameters are subject to second-order smoothing constraints. Based on the flight action parameters after second-order smoothing constraints, the flight control commands used to control the UAV are determined.
[0009] In one embodiment of this application, flight control evaluation of a target object is performed based on visual perception information to obtain the flight control reliability of the target object. This includes: determining the target trajectory stability index and target prediction deviation of the target object within a first preset time window based on visual perception information, and determining the mean target detection confidence score and the aspect ratio change rate of the target bounding box within the first preset time window; and performing flight control evaluation of the target object based on the target trajectory stability index, target prediction deviation, mean target detection confidence score, and aspect ratio change rate of the target bounding box to obtain the flight control reliability of the target object.
[0010] In one embodiment of this application, determining the target trajectory stability index and target prediction deviation of a target object within a first preset time window based on visual perception information includes: determining the overlap of target bounding boxes in adjacent image frames within the first preset time window based on visual perception information, and determining the target trajectory stability index based on the overlap; determining the historical trajectory of the target object within the first preset time window based on visual perception information, and determining the actual position of the target object in the current image frame; predicting the target prediction position of the target object in the current image frame based on the historical trajectory, and determining the target prediction deviation based on the target prediction position and the actual position.
[0011] In one embodiment of this application, the visual perception information includes: the relative position of the target and the target detection confidence level; the flight state parameters include: flight speed, flight attitude angle, and flight altitude; based on the visual perception information and the flight state parameters, the flight action parameters of the UAV at each decision moment are obtained through a trained reinforcement learning model, including: constructing a state space vector of reinforcement learning based on the relative position of the target, the target detection confidence level, the flight speed, the flight attitude angle, and the flight altitude; and obtaining the flight action parameters of the UAV at each decision moment through a trained reinforcement learning model based on the state space vector.
[0012] In one embodiment of this application, before obtaining the flight action parameters of the UAV at each decision moment through a trained reinforcement learning model based on visual perception information and flight state parameters, the method further includes: determining the sample object based on the sample perception information acquired by the UAV during flight; determining the center alignment degree, tracking stability, observation scale constraint, and safe distance constraint of the UAV during flight based on the sample object; and determining the flight stability of the UAV during flight. A reward function is then determined based on the center alignment degree, tracking stability, safe distance constraint, observation scale constraint, and flight stability.
[0013] In one embodiment of this application, determining the center alignment, tracking stability, observation scale constraint, and safe distance constraint of a UAV during flight based on a sample object includes: determining the offset between the sample object and the center of the sample image based on the center pixel coordinates of the sample object, the center coordinates of the sample image, and the height and width of the sample image; normalizing the offset; determining the center alignment based on the normalized offset and a preset adjustment parameter; determining the position change of the sample object in adjacent image frames within a second preset time window; determining the tracking stability based on the position change; obtaining the actual distance between the UAV and the sample object; determining the safe distance constraint using a logarithmic barrier function based on the actual distance, a preset minimum distance, and a preset maximum distance; and obtaining the actual area occupied by the sample object in the sample image; determining the observation scale constraint based on the actual area occupied and the preset area occupied.
[0014] In one embodiment of this application, determining the flight stability of a UAV during flight includes: acquiring the current flight control action and the flight control action at the previous moment of the UAV; and determining the flight stability based on the current flight control action and the flight control action at the previous moment.
[0015] In one embodiment of this application, determining flight control commands for controlling a drone based on flight motion parameters after second-order smoothing constraints includes: performing safety constraint processing on the flight motion parameters after second-order smoothing constraints according to preset flight constraint conditions; and determining flight control commands for controlling the drone based on the flight motion parameters after safety constraint processing.
[0016] Secondly, this application also provides a drone control device, comprising:
[0017] The parameter acquisition module is configured to acquire the UAV's visual perception information and flight status parameters;
[0018] The action decision module is configured to obtain the flight action parameters of the UAV at each decision moment based on visual perception information and flight state parameters through a trained reinforcement learning model; the reward function of the reinforcement learning model is jointly determined by center alignment degree, tracking stability, safe distance constraint, observation scale constraint and flight stability.
[0019] The control evaluation module is configured to perform flight control evaluation on the target object based on visual perception information to obtain the flight control reliability of the target object.
[0020] The flight control module is configured to apply second-order smoothing constraints to the flight action parameters when the flight control confidence level meets the preset confidence level, and to determine the flight control commands used to control the UAV based on the flight action parameters after the second-order smoothing constraints.
[0021] Thirdly, this application also provides an electronic device, including: one or more processors and a memory, wherein a computer program is stored in the memory, and when the one or more processors execute the computer program, the electronic device performs the steps of the above-described drone control method.
[0022] The beneficial effects of this technical solution are as follows: First, it acquires the visual perception information and flight state parameters of the UAV. Based on this information and parameters, a trained reinforcement learning model is used to obtain the flight action parameters of the UAV at each decision moment, enabling the model to more comprehensively perceive the complex environment and the UAV's real-time state. Second, the reward function of this reinforcement learning model is jointly determined by center alignment, tracking stability, safe distance constraints, observation scale constraints, and flight stability. This allows the model to simultaneously optimize multiple objectives related to flight quality and safety during training. Through this multi-objective reward mechanism, the model learns how to balance various indicators in complex environments, thereby generating more accurate flight action parameters and improving the flight control accuracy and stability of the UAV in complex environments. Finally, the flight control of the target object is evaluated based on the visual perception information to obtain the flight control credibility. When the flight control credibility meets the preset credibility, a second-order smoothing constraint is applied to the flight action parameters. Based on the second-order smoothing constraint, the flight control commands used to control the UAV are determined. This will further enhance the ability of UAVs to resist interference, ensure flight stability and safety in complex environments, and at the same time ensure that the second derivative of the flight action parameter sequence is within the physical dynamic limits of the UAV, thereby improving the control accuracy of UAVs in complex environmental scenarios.
[0023] It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and do not limit this application. Attached Figure Description
[0024] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application. It is obvious that the drawings described below are merely some embodiments of this application, and those skilled in the art can obtain other drawings based on these drawings without any inventive effort. In the drawings:
[0025] Figure 1 This is a schematic flowchart illustrating an exemplary embodiment of the UAV control method of this application;
[0026] Figure 2 This is a schematic diagram of the structure of a drone control device shown in an exemplary embodiment of this application;
[0027] Figure 3A schematic diagram of the structure of a computer system suitable for implementing the electronic device of the present application is shown. Detailed Implementation
[0028] The embodiments of this application will be described below with reference to the accompanying drawings and preferred embodiments. Those skilled in the art can easily understand other advantages and effects of this application from the content disclosed in this specification. This application can also be implemented or applied through other different specific embodiments, and various details in this specification can also be modified or changed based on different viewpoints and applications without departing from the spirit of this application. It should be understood that the preferred embodiments are only for illustrating this application and are not intended to limit the scope of protection of this application.
[0029] It should be noted that the illustrations provided in the following embodiments are only schematic representations of the basic concept of this application. Therefore, the drawings only show the components related to this application and are not drawn according to the actual number, shape and size of the components in the actual implementation. In the actual implementation, the shape, quantity and proportion of each component can be arbitrarily changed, and the layout of the components may also be more complex.
[0030] In the following description, numerous details are explored to provide a more thorough explanation of embodiments of the present application. However, it will be apparent to those skilled in the art that embodiments of the present application may be practiced without these specific details. In other embodiments, well-known structures and devices are shown in block diagram form rather than in detail to avoid obscuring embodiments of the present application.
[0031] Please see Figure 1 , Figure 1 This is a flowchart illustrating an exemplary embodiment of the drone control method of this application. Figure 1 As shown, in an exemplary embodiment, the drone control method includes steps S110 to S140, and each step is described in detail below.
[0032] S110 acquires the visual perception information and flight status parameters of the UAV;
[0033] It is understandable that visual perception information refers to the structured information obtained by a drone during flight by collecting environmental image data through onboard visual sensors, and then analyzing it through algorithms such as image processing, target detection, feature extraction, and depth estimation. This information is used to characterize the flight environment and the state of target objects, including target object category, position coordinates, size, motion parameters, obstacle distribution, terrain features, relative distance, and attitude correlation information.
[0034] Flight status parameters refer to the flight status parameters of an unmanned aerial vehicle (UAV) determined by its flight control system during flight, including the UAV's position, altitude, flight speed, acceleration, pitch angle, roll angle, yaw angle, and rate of attitude change.
[0035] During drone flight, the onboard computing unit continuously receives visual perception information from the onboard visual sensors, and simultaneously acquires real-time flight status parameters returned by the flight control system. This information is used to characterize the drone's environmental perception capabilities and flight status during flight.
[0036] S120 obtains the flight action parameters of the UAV at each decision moment through a trained reinforcement learning model based on visual perception information and flight status parameters.
[0037] The reward function of the reinforcement learning model is determined by the center alignment degree, tracking stability, safe distance constraint, observation scale constraint, and flight stability.
[0038] It is understandable that reinforcement learning models are pre-trained models that can output the flight action parameters of the UAV at each decision moment based on the input visual perception information and flight state parameters.
[0039] Furthermore, since the reward function of the reinforcement learning model is jointly determined by the center alignment degree, tracking stability, safe distance constraint, observation scale constraint, and flight stability, the trained reinforcement learning model can continuously optimize the center alignment degree, tracking stability, safe distance constraint, observation scale constraint, and flight stability of the UAV during flight when outputting the flight action parameters of the UAV at each decision moment. This ensures that the various indicators are balanced in complex environments, thereby generating more accurate flight action parameters and improving the flight control accuracy and stability of the UAV in complex environments.
[0040] S130 assesses the flight control of the target object based on visual perception information to obtain the flight control reliability of the target object.
[0041] Among them, flight control reliability characterizes the degree of reliability of the target object's continuous existence and stable spatial position over a continuous period of time.
[0042] After obtaining the flight action parameters of the UAV at each decision moment, the flight control reliability of the target object can be evaluated. This ensures that in complex environments, the UAV can dynamically adjust its behavior strategy based on the flight control reliability, enhancing its adaptability to complex scenarios. It should be noted that the specific implementation method for evaluating the flight control reliability of the target object based on visual perception information will be described in detail in subsequent embodiments, and will not be elaborated upon here.
[0043] S140, under the condition that the flight control confidence level meets the preset confidence level, applies second-order smoothing constraints to the flight action parameters, and determines the flight control commands used to control the UAV based on the flight action parameters after second-order smoothing constraints.
[0044] The preset credibility level is a pre-set credibility threshold. The specific value can be set according to the application scenario of the drone. This application does not impose specific limitations on it.
[0045] If the flight control confidence level is greater than or equal to the preset confidence level, it indicates that the flight control confidence level meets the preset confidence level, and the target object continuously exists within a continuous time window with a relatively stable spatial position. The target object is determined to be a reliable target, which can trigger subsequent flight control decisions for the UAV.
[0046] If the flight control confidence level is less than the preset confidence level threshold, it indicates that the flight control confidence level does not meet the preset confidence level. The target object does not exist continuously in the continuous time window or its spatial position is unstable. There may be cases where non-real targets such as background, interference objects, and noise are mistakenly identified as the target object. In this case, the UAV flight control decision will no longer be triggered, and the execution of subsequent steps will be stopped.
[0047] Understandably, second-order smoothing constraints can map the flight action parameters output by the reinforcement learning model into safe and executable actions that conform to the physical and dynamic limits of the UAV. Second-order smoothing constraints use the second derivatives of the flight action parameters (translational acceleration, attitude angular acceleration) as the core constraint object, and the maximum dynamic parameters calibrated at the UAV's factory as hard constraint thresholds. Through smoothing operations, the second derivatives are constrained to not exceed the physical execution limits, avoiding problems such as sudden action changes, amplitude overshoot, and high-frequency oscillations caused by reinforcement learning algorithms pursuing task gains. It forces the flight action parameter sequence to be second-order continuously differentiable, eliminating command jumps and attitude jitter. This preserves the task adaptability of the reinforcement learning model while fundamentally avoiding safety risks such as motor overload, excessive fuselage stress, and flight loss of control, achieving a safe connection between decision-making and execution, and ensuring stable flight of the UAV.
[0048] According to the technical solution provided in this application, firstly, visual perception information and flight state parameters of the UAV are acquired. Based on the visual perception information and flight state parameters, a trained reinforcement learning model is used to obtain the flight action parameters of the UAV at each decision moment, enabling the reinforcement learning model to more comprehensively perceive the complex environment and the real-time state of the UAV itself. Secondly, the reward function of this reinforcement learning model is jointly determined by center alignment degree, tracking stability, safe distance constraint, observation scale constraint, and flight stability. This allows the model to simultaneously optimize multiple objectives related to flight quality and safety during training. Through a multi-objective reward mechanism, the model learns how to balance various indicators in complex environments, thereby generating more accurate flight action parameters and improving the flight control accuracy and stability of the UAV in complex environments. Finally, flight control evaluation of the target object is performed based on the visual perception information to obtain the flight control credibility of the target object. When the flight control credibility meets the preset credibility, second-order smoothing constraints are applied to the flight action parameters. Based on the flight action parameters after second-order smoothing constraints, flight control commands for controlling the UAV are determined. This will further enhance the ability of UAVs to resist interference, ensure flight stability and safety in complex environments, and at the same time ensure that the second derivative of the flight action parameter sequence is within the physical dynamic limits of the UAV, thereby improving the control accuracy of UAVs in complex environmental scenarios.
[0049] In some embodiments, flight control evaluation of a target object is performed based on visual perception information to obtain the flight control reliability of the target object, including:
[0050] Based on visual perception information, the target trajectory stability index and target prediction deviation of the target object within the first preset time window are determined, as well as the mean target detection confidence and the aspect ratio change rate of the target bounding box within the first preset time window are determined. Based on the target trajectory stability index, target prediction deviation, mean target detection confidence, and aspect ratio change rate of the target bounding box, the flight control of the target object is evaluated to obtain the flight control reliability of the target object.
[0051] Target prediction bias characterizes the deviation between the actual position and the predicted position of a target object in the current image frame. Target detection confidence characterizes the probability that a target object is correctly detected; a higher target detection confidence indicates a more reliable target object recognition result. The target trajectory stability index characterizes the overlap of the bounding boxes of target objects within a continuous time window.
[0052] It is understandable that traditional flight control relies solely on single-frame detection scores or the overlap of target bounding boxes in consecutive frames when determining whether a target object can be used for flight control. When there are changes in lighting, background occlusion, or similar interference, the bounding box of the target object is prone to instantaneous jumps or flickering, causing the flight control system to experience severe jitter or false obstacle avoidance due to false control commands.
[0053] Therefore, this embodiment introduces flight control reliability assessment. The flight control of the target object is evaluated based on the target trajectory stability index, target prediction deviation, target detection confidence mean, and the aspect ratio change rate of the target bounding box. This eliminates instantaneous, accidental, and unstable false detection targets and only responds to real and stable targets, thereby avoiding abnormal triggering of the flight control system due to false alarms and improving flight safety and reliability.
[0054] It is understandable that after acquiring visual perception information during the flight of a drone, the information contained in the visual perception information can be extracted. Specifically, target detection can be performed on the visual perception information to identify and locate target objects from the images captured by the drone.
[0055] In some examples, the drone's onboard vision sensor can acquire aerial image sequences in real time and input them into an onboard computing unit. This unit can run a pre-trained object detection neural network, first preprocessing each frame of the image, including denoising, scale normalization, and grayscale correction. Then, inference detection is performed: through forward inference of the model, combined with parallel confidence threshold filtering and non-maximum suppression, the semantic information and geometric features of the target are output. Finally, the target detection results are output, including: the target object category, the center and size of the target object's bounding box, and the target object's detection confidence score. Then, multiple target detection confidence scores of the target object within a first preset time window are obtained, and the mean target detection confidence score is calculated based on these multiple scores.
[0056] Furthermore, the aspect ratio change rate of the target object's bounding box within a first preset time window can be determined. Specifically, the aspect ratio AR = W / H of the target bounding box can be determined based on the length and width of the image, where W is the width and H is the height. Then, the aspect ratio change rate of the target object's bounding box within the first preset time window is determined based on the target bounding box's aspect ratio. It is understandable that the shape changes of a real target object are continuous within a short period of time. Therefore, the lower the aspect ratio change rate of the target bounding box, the more reliable the target object recognition result.
[0057] Then, using the confidence calculation formula, the flight control of the target object is evaluated based on the target trajectory stability index, target prediction deviation, mean target detection confidence, and the aspect ratio change rate of the target bounding box, to obtain the flight control confidence of the target object. The confidence calculation formula is as follows:
[0058]
[0059] Where C represents the flight control reliability. The mean confidence score for target detection. This is an indicator of the stability of the target trajectory. For the target prediction bias, The aspect ratio of the target bounding box is the rate of change. These are the weighting coefficients.
[0060] According to the technical solution provided in the embodiments of this application, the flight control reliability of the target object is comprehensively evaluated from four dimensions: target trajectory stability index, target prediction deviation, target detection confidence mean, and the aspect ratio change rate of the target bounding box. This can ensure the accuracy and robustness of the detection results, improve the accuracy of target detection, enhance the adaptability of the UAV in complex scenarios, and avoid false detections or false alarms that trigger flight control.
[0061] In some embodiments, determining the target trajectory stability index and target prediction deviation of the target object within a first preset time window based on visual perception information includes:
[0062] Based on visual perception information, the overlap of target bounding boxes in adjacent image frames within the first preset time window is determined, and the target trajectory stability index is determined based on the overlap. Based on visual perception information, the historical trajectory of the target object within the first preset time window is determined, as well as the actual position of the target object in the current image frame is determined. Based on the historical trajectory, the target predicted position of the target object in the current image frame is predicted, and the target prediction deviation is determined based on the target predicted position and the actual position.
[0063] Specifically, based on visual perception information, the overlap of the bounding boxes of the target object in adjacent image frames within a first preset time window is determined. For example, the overlap of the bounding boxes of the target object in the first and second frames is calculated, then the overlap between the second and third frames is calculated, and so on, until the overlap between the third and fourth frames, the fourth and fifth frames, and so on, is calculated for all adjacent image frames within the first preset time window. The overlap of the bounding boxes of the target object in all adjacent image frames within the first preset time window is then used to characterize the positional stability of the target object, obtaining a target trajectory stability index that reflects the motion pattern of the target object in consecutive frames.
[0064] In some examples, the stability index can be calculated using a formula based on the overlap of the bounding boxes of the target object in adjacent image frames to determine the stability index of the target trajectory. The formula for calculating the stability index is shown below:
[0065]
[0066] Where N is the total number of image frames. Let be the bounding box of the target object in frame t. The bounding box of the target object in frame t+1. represents the overlap between the bounding boxes of the target object in frame t and frame t+1.
[0067] Then, the historical trajectory of the target object within the first preset time window is determined, as well as the actual position of the target object in the current image frame.
[0068] The target object's predicted position in the current image frame is predicted based on its historical trajectory. This can be achieved using methods such as linear prediction or Kalman filtering. Then, the prediction deviation is determined based on the predicted and actual positions. , For actual location, Predict the location of the target.
[0069] In some embodiments, the visual perception information includes: the target's relative position and the target detection confidence; the flight state parameters include: flight speed, flight attitude angle, and flight altitude.
[0070] Based on visual perception information and flight state parameters, the flight action parameters of the UAV at each decision moment are obtained through a trained reinforcement learning model. These parameters include: constructing a state space vector for reinforcement learning based on the target relative position, target detection confidence, flight speed, flight attitude angle, and flight altitude; and obtaining the flight action parameters of the UAV at each decision moment through the trained reinforcement learning model based on the state space vector.
[0071] Among them, the target relative position represents the positional relationship between the target object and the center of the UAV's view; the flight status parameters include flight speed, flight attitude angle and flight altitude, which can be obtained in real time through the flight control system.
[0072] Furthermore, the acquired flight state parameters and the parameters obtained based on target detection—namely, the target relative position, target detection confidence, flight speed, flight attitude angle, and flight altitude—are concatenated to obtain the reinforcement learning state space vector. This state space vector enables the reinforcement learning model to perceive the overall environmental state and the UAV's own flight state, thereby generating more accurate flight control decisions.
[0073] Specifically, in the actual execution process, after determining the state space vector, it is input into the trained reinforcement learning model. The policy network inside the model will perform feature extraction, state recognition and decision reasoning on the state space vector. Combined with the guiding logic of the reward function, it will output the flight action parameters of the UAV at each decision moment.
[0074] In addition, the relative position of the target is determined based on visual perception information, including: determining the center pixel coordinates of the target object and the center coordinates of the image based on visual perception information; and determining the relative position of the target based on the center pixel coordinates of the target object, the center coordinates of the image, and the height and width of the image.
[0075] It's understandable that the center pixel coordinates of the target object represent its specific location in the image, while the image center coordinates represent the center point of the drone's viewpoint. By calculating the difference between the two, we can obtain the offset of the target object relative to the center of the drone's viewpoint, which describes the positional relationship between the target and the center of the drone's viewpoint.
[0076] Continuing from the previous example, the target detection result can be obtained based on the visual perception information. Then, the center pixel coordinates of the target object in the current image are determined based on the target detection result, and the image center coordinates of the current image are also determined.
[0077] Then, using the relative position calculation formula, the relative position of the target is determined based on the center pixel coordinates of the target object, the center coordinates of the image, and the height and width of the image. The relative position calculation formula is as follows:
[0078]
[0079] For the relative position of the target, The center pixel coordinates of the target object. Here are the coordinates of the image center, and W and H are the width and height of the current image.
[0080] This approach fully utilizes the pixel information and geometric characteristics of images, reducing positioning errors caused by external interference in traditional methods and enhancing the system's robustness. Furthermore, by incorporating the target's relative position, the UAV can achieve precise tracking and positioning of the target object. This improves the UAV's perception accuracy in complex environments, thereby enhancing its adaptability and stability in such environments.
[0081] According to the technical solution provided in the embodiments of this application, a state space vector for reinforcement learning is constructed based on the target's relative position, target detection confidence, flight speed, flight attitude angle, and flight altitude. Based on the state space vector, the flight action parameters of the UAV at each decision moment are obtained through the trained reinforcement learning model. This enables the reinforcement learning model to comprehensively perceive both the environmental state and the UAV's own flight state.
[0082] In some embodiments, before obtaining the flight action parameters of the UAV at each decision moment through a trained reinforcement learning model based on visual perception information and flight state parameters, the method further includes:
[0083] The sample objects are determined based on the sample perception information acquired by the UAV during flight. Based on the sample objects, the center alignment degree, tracking stability, observation scale constraints, and safe distance constraints of the UAV during flight are determined, as well as the flight stability of the UAV during flight. Based on the center alignment degree, tracking stability, safe distance constraints, observation scale constraints, and flight stability, the reward function is determined.
[0084] It is understandable that, in order to make the flight action parameters output by the reinforcement learning model more accurate and to ensure that the UAV can stably track the target object and maintain stable flight, a reward function can be constructed. The policy network of the reinforcement learning model to be trained can be trained and optimized using the reward function to obtain a well-trained reinforcement learning model.
[0085] The specific training process is as follows: based on the sample perception information and the sample flight state parameters of the UAV, the sample state space vector is determined. The sample state space vector is then input into the reinforcement learning model to be trained. During the process of the UAV tracking the sample object, the policy network of the reinforcement learning model to be trained is optimized by the constructed reward function (including center alignment degree, tracking stability, safe distance constraint, observation scale constraint, and flight stability) to obtain the trained reinforcement learning model.
[0086] Among them, center alignment is used to measure whether the sample object is located in the center of the image. The closer the sample object is to the center of the image, the larger the reward value, and the farther the sample object is from the center of the image, the smaller the reward value. Tracking stability measures the stability of the UAV in continuously tracking the sample object, avoiding deviation caused by external interference or target movement. Safe distance constraint ensures that the UAV maintains a reasonable distance from the sample object during flight to prevent collisions or dangerous approaches. If the UAV and the sample object always maintain a safe distance, the reward value is larger. If the distance between the UAV and the sample object exceeds or falls short of the safe distance, the reward value is smaller. Observation scale constraint ensures that the sample object in the image is not too large or too small, ensuring that the sample object is within the golden resolution range of the detector.
[0087] Flight stability characterizes the drone's flight state. To reduce unnecessary shaking or drastic adjustments, the more stable the drone's flight, the higher the reward value. Flight stability can be quantified by setting specific thresholds to measure the drone's shaking amplitude and adjustment frequency. If the drone maintains a small shaking amplitude and a low adjustment frequency (not exceeding the specific threshold) during flight, the reward value is higher; conversely, if the drone makes frequent drastic adjustments or has a large shaking amplitude (exceeding the specific threshold), the reward value is reduced accordingly.
[0088] In some examples, a reward function determination formula can be used, based on factors such as center alignment, tracking stability, safe distance constraints, observation scale constraints, and flight stability, to determine the reward function. The formula for determining the reward function is shown below:
[0089]
[0090] Where R is the reward function, A is the center alignment degree, T is the tracking stability, F is the flight stability, D is the safe distance constraint, and H is the observation scale constraint. These are the weighting coefficients.
[0091] According to the technical solution provided in the embodiments of this application, by incorporating center alignment, tracking stability, safe distance constraints, observation scale constraints, and flight stability into the reward function, and training and optimizing the policy network of the reinforcement learning model to be trained, the autonomous control capability of the UAV in complex environments can be improved. Simultaneously, through continuous optimization of the policy network, the UAV can dynamically adjust its behavior patterns in constantly changing environments, further enhancing its adaptability and robustness.
[0092] In some embodiments, determining the center alignment, tracking stability, observation scale constraints, and safe distance constraints of the UAV during flight based on the sample object includes:
[0093] Based on the center pixel coordinates of the sample object, the center coordinates of the sample image, and the height and width of the sample image, the offset between the sample object and the center of the sample image is determined, and the offset is normalized. Based on the normalized offset and preset adjustment parameters, the degree of center alignment is determined. Within a second preset time window, the positional changes of the sample object in adjacent image frames are determined, and the tracking stability is determined based on the positional changes. The actual distance between the UAV and the sample object is obtained, and the safe distance constraint is determined using the logarithmic barrier function based on the actual distance, the preset minimum distance, and the preset maximum distance. The actual area occupied by the sample object in the sample image is obtained, and the observation scale constraint is determined based on the actual area occupied and the preset area occupied.
[0094] First, the offset can be calculated using the offset formula. Based on the center pixel coordinates of the sample object, the center coordinates of the sample image, and the height and width of the sample image, the offset between the sample object and the center of the sample image can be determined. The offset calculation formula is as follows:
[0095]
[0096] δ is the offset. The center pixel coordinates of the sample object. Here are the center coordinates of the sample image, and W and H are the width and height of the sample image, respectively.
[0097] Then, the offset is normalized to map it to a preset range, such as limiting the offset to between 0 and 1. This effectively reduces the impact of differences in image size or resolution.
[0098] Next, the alignment formula can be used to determine the degree of center alignment based on the normalized offset and preset adjustment parameters. The alignment formula is as follows:
[0099]
[0100] Where A represents the degree of center alignment. This is the offset after normalization. These are preset adjustment parameters.
[0101] Second, within the second preset time window, determine the positional changes of sample objects in adjacent image frames, and determine the tracking stability based on the positional changes;
[0102] Specifically, the length of the second preset time window can be set to n. In this case, the tracking stability of the UAV can be determined based on the positional changes of the sample objects in adjacent image frames. Specifically, the stability evaluation formula can be used to determine the positional changes of the sample objects in adjacent image frames within the preset time window, and the tracking stability can be determined based on the positional changes. The stability evaluation formula is as follows:
[0103]
[0104] t
[0105] Where T represents the tracking stability. This indicates the position of the sample object in the k-th frame. Indicates the sample object at the kth The position of frame 1, when the k-th frame and the k-th frame... When the positional change in a single frame is small, it indicates high tracking stability. When the positional change in the k-th frame and the k-th frame... When the position changes significantly within a single frame, it indicates low tracking stability.
[0106] also, t defines the range of values for the summation operation, n is the length of the time window, and t is the time of the current frame. t represents the variable k will move from Start the calculation, and calculate up to t.
[0107] Third, to ensure flight safety, it is necessary to maintain a safe distance between the UAV and the sample object during flight. The safe distance constraint can effectively prevent the UAV from getting too close to the sample object and causing a collision risk. At the same time, a logarithmic barrier function is introduced so that the reward value of the UAV decays exponentially when it approaches the minimum safe distance, thus constructing a virtual repulsive potential field.
[0108] In one embodiment of this application, the process of determining the safety distance constraint using the logarithmic barrier function based on the actual distance, the preset minimum distance, and the preset maximum distance is as follows:
[0109]
[0110] Where D represents the safety distance constraint. The actual distance between the drone and the sample object. To preset the minimum distance, This is the preset maximum distance.
[0111] Fourth, during flight, the sample object should not be as large as possible in the image, but should be within the golden resolution range of the detector.
[0112] In one embodiment of this application, the actual area of the sample object in the sample image is obtained, and the observation scale constraint is determined based on the actual area and the preset area as follows:
[0113]
[0114] in, Due to observation scale constraints, For the preset area proportion, This represents the actual percentage of the area. These are preset adjustment parameters.
[0115] In some embodiments, determining the flight stability of the drone during flight includes:
[0116] Obtain the current flight control actions and the previous flight control actions of the UAV; determine flight stability based on the current flight control actions and the previous flight control actions.
[0117] Specifically, to avoid violent maneuvers by the drone, this embodiment applies smooth constraints to the flight control actions. By calculating the difference between the current flight control actions and the flight control actions at the previous moment, the flight stability of the drone is evaluated.
[0118] In one embodiment of this application, flight stability can be determined using a stability evaluation formula based on the current flight control action and the flight control action at the previous moment, wherein the stability evaluation formula is as follows:
[0119]
[0120] Where F represents flight stability, For the current flight control action, This refers to the flight control actions performed at the previous moment.
[0121] It is understandable that the above formula can be used to measure the current flight control actions. Compared to the flight control actions at the previous moment The degree of change is mapped to a similarity or weight value between 0 and 1 to represent flight stability.
[0122] The smaller the change in flight control actions, the closer the value of F is to 1, and the greater the reward value. Conversely, the larger the change in flight control actions, the closer the value of F is to 0, and the smaller the reward value.
[0123] According to the technical solution provided in this application, flight stability can be continuously monitored and evaluated by combining the current flight control actions with the previous flight control actions. This allows for dynamic adjustment of the control strategy, thereby further optimizing flight performance. It is applicable to mission scenarios requiring high-precision operations, such as inspection, surveying, or logistics delivery, and can reduce safety risks while ensuring mission efficiency.
[0124] In some embodiments, flight control commands for controlling the UAV are determined based on the flight motion parameters after second-order smoothing constraints, including:
[0125] The flight motion parameters after second-order smoothing are subjected to safety constraint processing based on preset flight constraints; the flight control commands used to control the UAV are determined based on the flight motion parameters after safety constraint processing.
[0126] Flight constraints characterize the restrictive requirements that a drone must meet during flight. These constraints can be set according to the actual environmental scenario to adapt to different mission requirements and environmental characteristics, ensuring that the drone can safely and stably perform flight missions in complex environments.
[0127] Furthermore, to prevent the drone from exhibiting violent maneuvers during operation, the flight control commands used to control the drone must be subject to safety constraints on the second-order smoothed flight parameters based on preset flight constraints before being determined. Then, based on these safety-constrained flight parameters, the flight control commands used to control the drone are determined, ensuring that the drone's flight control commands meet safety standards and mission requirements. This improves stability and safety during flight.
[0128] In one embodiment of this application, a safety constraint formula can be used to perform safety constraint processing on the flight motion parameters after second-order smoothing constraints according to preset flight constraint conditions, wherein the safety constraint formula is as follows:
[0129]
[0130] in, Let be the flight motion parameters after second-order smoothing constraints, 'a' be the preset flight constraints, and 'argmin' represent the function minimum value. Flight motion parameters after safety constraint processing.
[0131] In one embodiment of this application, the flight maneuver parameters include: heading angle adjustment, flight speed adjustment, and flight altitude adjustment.
[0132] Flight control commands include: target heading angle command, target flight speed command, and target flight altitude command.
[0133] At this point, the preset flight constraint condition 'a' can be:
[0134]
[0135]
[0136]
[0137] in This is the heading angle adjustment amount. For flight speed adjustment, For flight altitude adjustment, This represents the maximum change in heading. The maximum velocity change, This represents the maximum change in height.
[0138] Based on the flight motion parameters processed by safety constraints, the flight control commands used to control the UAV are determined, which can be expressed as:
[0139]
[0140]
[0141]
[0142] in, For target heading angle command, For target flight speed command, The target flight altitude command, denoted as heading angle, v as flight speed, and h as flight altitude.
[0143] After determining the target heading angle, target flight speed, and target flight altitude commands, the flight control commands are sent to the UAV flight control system via the flight control communication interface. Upon receiving the flight control commands, the flight control system adjusts the UAV's heading angle, flight speed, and flight altitude through attitude control algorithms and the power system to achieve flight trajectory control, enabling the UAV to stably track and autonomously control the target object.
[0144] Simultaneously, real-time visual perception information and flight status parameters of the UAV are continuously fed back to the onboard computing unit for updating the state space vector. Through continuous iteration, the reinforcement learning strategy enables the UAV to achieve high-precision and stable tracking of target objects and autonomous flight control.
[0145] According to the technical solution provided in the embodiments of this application, by using safety constraint processing with preset flight constraint conditions, the flight safety of the UAV during the execution of the mission can be ensured, and the UAV can be prevented from exhibiting violent maneuvering behavior when performing actions. Under the premise of ensuring mission efficiency, accurate tracking and stable control of the target object can be achieved.
[0146] It should be understood that the sequence number of each step in the above embodiments does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the process of the embodiments of this application.
[0147] All of the above-mentioned optional technical solutions can be combined in any way to form the optional embodiments of this application, and will not be described in detail here.
[0148] The following are embodiments of the apparatus described in this application, which can be used to execute the embodiments of the method described in this application. For details not disclosed in the apparatus embodiments of this application, please refer to the embodiments of the method described in this application.
[0149] Figure 2 This is a schematic diagram illustrating the structure of a drone control device according to an exemplary embodiment of this application. Figure 2 As shown, the exemplary drone control device includes:
[0150] The parameter acquisition module 201 is configured to acquire the visual perception information and flight status parameters of the UAV.
[0151] The action decision module 202 is configured to obtain the flight action parameters of the UAV at each decision moment based on visual perception information and flight state parameters through a trained reinforcement learning model; the reward function of the reinforcement learning model is jointly determined by center alignment degree, tracking stability, safe distance constraint, observation scale constraint and flight stability.
[0152] The control evaluation module 203 is configured to perform flight control evaluation on the target object based on visual perception information to obtain the flight control reliability of the target object.
[0153] The flight control module 204 is configured to apply second-order smoothing constraints to the flight action parameters when the flight control confidence level meets the preset confidence level, and to determine the flight control commands for controlling the UAV based on the flight action parameters after the second-order smoothing constraints.
[0154] In some embodiments, the control evaluation module 203 is further configured to determine the target trajectory stability index and target prediction deviation of the target object within a first preset time window based on visual perception information, and to determine the mean target detection confidence and the aspect ratio change rate of the target bounding box within the first preset time window; and to perform flight control evaluation on the target object based on the target trajectory stability index, target prediction deviation, mean target detection confidence and the aspect ratio change rate of the target bounding box to obtain the flight control reliability of the target object.
[0155] In some embodiments, the control evaluation module 203 is further configured to: determine the overlap of target bounding boxes in adjacent image frames within a first preset time window based on visual perception information; determine a target trajectory stability index based on the overlap; determine the historical trajectory of the target object within the first preset time window based on visual perception information; determine the actual position of the target object in the current image frame; predict the target prediction position of the target object in the current image frame based on the historical trajectory; and determine the target prediction deviation based on the target prediction position and the actual position.
[0156] In some embodiments, the visual perception information includes: target relative position and target detection confidence; the flight state parameters include: flight speed, flight attitude angle and flight altitude; the parameter acquisition module 201 is further configured to construct a reinforcement learning state space vector based on the target relative position, target detection confidence, flight speed, flight attitude angle and flight altitude; and obtain the flight action parameters of the UAV at each decision moment based on the state space vector and the trained reinforcement learning model.
[0157] In some embodiments, the action decision module 202 is further configured to determine sample objects based on sample perception information acquired by the UAV during flight, determine the center alignment degree, tracking stability, observation scale constraint, and safe distance constraint of the UAV during flight based on the sample objects, and determine the flight stability of the UAV during flight; and determine a reward function based on the center alignment degree, tracking stability, safe distance constraint, observation scale constraint, and flight stability.
[0158] In some embodiments, the action decision module 202 is further configured to: determine the offset between the sample object and the center of the sample image based on the center pixel coordinates of the sample object and the center coordinates of the sample image, as well as the height and width of the sample image; normalize the offset; determine the center alignment degree based on the normalized offset and preset adjustment parameters; determine the position change of the sample object in adjacent image frames within a second preset time window; determine the tracking stability based on the position change; obtain the actual distance between the UAV and the sample object; determine the safe distance constraint based on the actual distance, the preset minimum distance, and the preset maximum distance using the logarithmic barrier function; and obtain the actual area of the sample object in the sample image; determine the observation scale constraint based on the actual area and the preset area.
[0159] In some embodiments, the action decision module 202 is further configured to acquire the current flight control action and the previous flight control action of the UAV; and determine flight stability based on the current flight control action and the previous flight control action.
[0160] In some embodiments, the flight control module 204 is configured to perform safety constraint processing on the flight motion parameters after second-order smoothing constraints according to preset flight constraint conditions; and determine flight control commands for controlling the UAV based on the flight motion parameters after safety constraint processing.
[0161] Embodiments of this application also provide an electronic device, including: one or more processors; and a storage device for storing one or more programs, which, when executed by one or more processors, cause the electronic device to implement the methods provided in the above embodiments.
[0162] Figure 3 A schematic diagram of a computer system suitable for implementing the embodiments of this application is shown. It should be noted that... Figure 3 The computer system 300 of the electronic device shown is merely an example and should not impose any limitation on the functionality and scope of use of the embodiments of this application.
[0163] like Figure 3As shown, the computer system 300 includes a Central Processing Unit (CPU) 301, which can perform various appropriate actions and processes based on programs stored in Read-Only Memory (ROM) 302 or programs loaded from storage portion 308 into Random Access Memory (RAM) 303, such as executing the methods described in the above embodiments. The RAM 303 also stores various programs and data required for system operation. The CPU 301, ROM 302, and RAM 303 are interconnected via a bus 304. An Input / Output (I / O) interface 305 is also connected to the bus 304.
[0164] The following components are connected to I / O interface 305: an input section 306 including a keyboard, mouse, etc.; an output section 307 including a cathode ray tube (CRT), liquid crystal display (LCD), etc., and speakers, etc.; a storage section 308 including a hard disk, etc.; and a communication section 309 including a network interface card such as a LAN (Local Area Network) card, modem, etc. The communication section 309 performs communication processing via a network such as the Internet. A drive 310 is also connected to I / O interface 305 as needed. Removable media 311, such as a disk, optical disk, magneto-optical disk, semiconductor memory, etc., are installed on drive 310 as needed so that computer programs read from them can be installed into storage section 308 as needed.
[0165] Specifically, according to embodiments of this application, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of this application include a computer program product comprising a computer program carried on a computer-readable medium, the computer program including a computer program for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via communication section 309, and / or installed from removable medium 311. When the computer program is executed by central processing unit (CPU) 301, it performs various functions defined in the system of this application.
[0166] It should be noted that the computer-readable medium shown in the embodiments of this application can be a computer-readable signal medium, a computer-readable storage medium, or any combination thereof. A computer-readable storage medium can be, for example, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of a computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), flash memory, optical fiber, portable compact disc read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof. In this application, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying a computer-readable computer program. Such propagated data signals can take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. Computer-readable signal media can also be any computer-readable medium other than computer-readable storage media, which can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device. The computer program contained on the computer-readable medium can be transmitted using any suitable medium, including but not limited to wireless, wired, etc., or any suitable combination thereof.
[0167] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation that may be implemented in systems, methods, and computer program products according to various embodiments of this application. Each block in a flowchart or block diagram may represent a module, segment, or portion of code, which contains one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in a block diagram or flowchart, and combinations of blocks in a block diagram or flowchart, may be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.
[0168] The units described in the embodiments of this application can be implemented in software or hardware, and the described units can also be located in a processor. The names of these units do not necessarily limit the specific unit itself.
[0169] Another aspect of this application provides a computer-readable storage medium storing a computer program thereon, which, when executed by a computer's processor, causes the computer to perform the method as described above. This computer-readable storage medium may be included in the electronic device described in the above embodiments, or it may exist independently and not assembled into the electronic device.
[0170] Another aspect of this application provides a computer program product or computer program including computer instructions stored in a computer-readable storage medium. A processor of a computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computer device to perform the methods described in the various embodiments above.
[0171] The above embodiments are merely illustrative of the principles and effects of this application and are not intended to limit this application. Any person skilled in the art can modify or alter the above embodiments without departing from the spirit and scope of this application. Therefore, all equivalent modifications or alterations made by those skilled in the art without departing from the spirit and technical concept disclosed in this application should still be covered by the steps of this application.
Claims
1. A method for controlling an unmanned aerial vehicle (UAV), characterized in that, include: Acquire visual perception information and flight status parameters of the drone; Based on the visual perception information and flight state parameters, the flight action parameters of the UAV at each decision moment are obtained through a trained reinforcement learning model; the reward function of the reinforcement learning model is jointly determined by the center alignment degree, tracking stability, safe distance constraint, observation scale constraint, and flight stability. The flight control reliability of the target object is obtained by evaluating the flight control of the target object based on the visual perception information. When the flight control reliability meets the preset reliability, the flight action parameters are subjected to second-order smoothing constraints, and the flight control commands used to control the UAV are determined based on the flight action parameters after second-order smoothing constraints. Before obtaining the UAV's flight action parameters at each decision moment using a trained reinforcement learning model based on the visual perception information and flight state parameters, the method further includes: The sample objects are determined based on the sample perception information acquired by the UAV during flight. Based on the sample objects, the center alignment degree, tracking stability, observation scale constraints, and safe distance constraints of the UAV during flight are determined, as well as the flight stability of the UAV during flight. The reward function is determined based on the center alignment degree, tracking stability, safe distance constraint, observation scale constraint, and flight stability. The center alignment degree is determined by determining the offset between the sample object and the center of the sample image based on the center pixel coordinates of the sample object and the center coordinates of the sample image, as well as the height and width of the sample image. The offset is then normalized, and the center alignment degree is determined based on the normalized offset and preset adjustment parameters. The observation scale constraint is determined by obtaining the actual area of the sample object in the sample image and determining the observation scale constraint based on the actual area and the preset area.
2. The method according to claim 1, characterized in that, The step of evaluating the flight control of the target object based on the visual perception information to obtain the flight control reliability of the target object includes: Based on the visual perception information, determine the target trajectory stability index and target prediction deviation of the target object within a first preset time window, and determine the mean target detection confidence and the aspect ratio change rate of the target bounding box of the target object within the first preset time window. The flight control reliability of the target object is obtained by evaluating the target object's flight control based on the target trajectory stability index, target prediction deviation, target detection confidence mean, and the aspect ratio change rate of the target bounding box.
3. The method according to claim 2, characterized in that, The step of determining the target trajectory stability index and target prediction deviation of the target object within a first preset time window based on the visual perception information includes: Based on the visual perception information, the overlap of target bounding boxes in adjacent image frames within a first preset time window is determined, and the target trajectory stability index is determined based on the overlap. Based on the visual perception information, determine the historical trajectory of the target object within a first preset time window, and determine the actual position of the target object in the current image frame; The target object's predicted position in the current image frame is predicted based on the historical trajectory, and the target prediction deviation is determined based on the predicted position and the actual position.
4. The method according to claim 1, characterized in that, The visual perception information includes: the target's relative position and the target detection confidence level; the flight state parameters include: flight speed, flight attitude angle, and flight altitude. The step of obtaining the UAV's flight action parameters at each decision moment based on the visual perception information and flight state parameters through a trained reinforcement learning model includes: Based on the target's relative position, target detection confidence, flight speed, flight attitude angle, and flight altitude, a state space vector for reinforcement learning is constructed. Based on the state space vector, the flight action parameters of the UAV at each decision moment are obtained through a trained reinforcement learning model.
5. The method according to claim 1, characterized in that, The method further includes: Within a second preset time window, determine the positional changes of sample objects in adjacent image frames, and determine the tracking stability based on the positional changes; The actual distance between the drone and the sample object is obtained, and the safe distance constraint is determined using the logarithmic barrier function based on the actual distance, the preset minimum distance, and the preset maximum distance.
6. The method according to claim 1, characterized in that, Determining the flight stability of the UAV during flight includes: Obtain the current flight control actions and the previous flight control actions of the UAV; The flight stability is determined based on the current flight control actions and the flight control actions at the previous moment.
7. The method according to claim 1, characterized in that, The step of determining the flight control commands for controlling the UAV based on the flight motion parameters after second-order smoothing constraints includes: The flight motion parameters after second-order smoothing constraints are subjected to safety constraints based on preset flight constraints. Based on the flight action parameters processed by safety constraints, flight control commands for controlling the UAV are determined.
8. A drone control device, characterized in that, include: The parameter acquisition module is configured to acquire the UAV's visual perception information and flight status parameters; The action decision module is configured to obtain the flight action parameters of the UAV at each decision moment based on the visual perception information and flight state parameters through a trained reinforcement learning model; the reward function of the reinforcement learning model is jointly determined by the center alignment degree, tracking stability, safe distance constraint, observation scale constraint, and flight stability. The control evaluation module is configured to perform a flight control evaluation of the target object based on the visual perception information, and obtain the flight control reliability of the target object. The flight control module is configured to apply second-order smoothing constraints to the flight action parameters when the flight control confidence level meets a preset confidence level, and to determine flight control commands for controlling the UAV based on the flight action parameters after the second-order smoothing constraints. Before obtaining the UAV's flight action parameters at each decision moment using a trained reinforcement learning model based on the visual perception information and flight state parameters, the method further includes: The sample objects are determined based on the sample perception information acquired by the UAV during flight. Based on the sample objects, the center alignment degree, tracking stability, observation scale constraints, and safe distance constraints of the UAV during flight are determined, as well as the flight stability of the UAV during flight. The reward function is determined based on the center alignment degree, tracking stability, safe distance constraint, observation scale constraint, and flight stability. The center alignment degree is determined by determining the offset between the sample object and the center of the sample image based on the center pixel coordinates of the sample object and the center coordinates of the sample image, as well as the height and width of the sample image. The offset is then normalized, and the center alignment degree is determined based on the normalized offset and preset adjustment parameters. The observation scale constraint is determined by obtaining the actual area of the sample object in the sample image and determining the observation scale constraint based on the actual area and the preset area.
9. An electronic device, characterized in that, include: One or more processors and a memory, wherein the memory stores a computer program that, when executed by the one or more processors, causes the electronic device to perform the steps of the unmanned aerial vehicle control method as described in any one of claims 1 to 7.