Unmanned aerial vehicle (UAV) tracking methods, devices, equipment, storage media, and software products
By predicting the duration of occlusion and generating tracking control commands through a generative artificial intelligence model, the problem of tracking discontinuity of UAVs when occluded by obstacles is solved, and efficient target re-acquisition of UAVs in complex environments is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- ANHUI KAIYANG TECHNOLOGY CO LTD
- Filing Date
- 2026-03-18
- Publication Date
- 2026-06-30
Smart Images

Figure CN122308447A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of unmanned aerial vehicle (UAV) technology, and in particular to a UAV tracking method, apparatus, device, storage medium, and program product. Background Technology
[0002] In recent years, vision-based autonomous target tracking has become one of the important application directions in the field of unmanned aerial vehicles (UAVs). UAVs rely on their onboard visual sensors to continuously acquire environmental images and use image recognition algorithms to lock onto and follow target objects. However, in real-world complex and dynamic environments, target objects are often obscured by obstacles such as trees and buildings, affecting the continuity and robustness of visual tracking.
[0003] To address the issue of target obstruction, after detecting an obstruction event, the drone is typically controlled to hover or circle within a small area near the point where the target disappears, passively waiting for the target to reappear.
[0004] However, the strategy of having drones passively hover and wait is too rigid. If the waiting time is insufficient, the drone may leave prematurely and miss the target. If the waiting time is too long, it will cause tracking delays or even cause the target to move away. It is also difficult to guarantee the timeliness and effectiveness of tracking. Summary of the Invention
[0005] This application provides a method, apparatus, device, storage medium, and program product for tracking unmanned aerial vehicles (UAVs), which can solve the problems of poor continuity and robustness in UAV target tracking. The technical solution is as follows: On the one hand, a method for tracking unmanned aerial vehicles (UAVs) is provided, the method comprising: Acquire real-time environmental images, perform target recognition on the real-time environmental images, and when a target object is identified, control the UAV to track the target object based on the image information of the area where the target object is located; When the target object is detected to be occluded by an obstacle, historical environmental images are acquired. The historical environmental images are a sequence of images in which the target object was detected within a specified historical time period. The real-time environmental images and the historical environmental images are input into a generative artificial intelligence model to obtain the predicted occlusion duration of the target object. Based on the predicted occlusion duration, a tracking control command is determined, and the flight attitude of the UAV is controlled based on the tracking control command.
[0006] In one possible implementation, inputting the real-time environmental image and the historical environmental image into a generative artificial intelligence model to obtain the predicted occlusion duration of the target object includes: Obtain model prompts, which are used to instruct the generative artificial intelligence model to predict the duration of occlusion of the target object based on the real-time environmental image and the historical environmental image; The real-time environmental image, the historical environmental image, and the model prompts are input into the generative artificial intelligence model to obtain the predicted occlusion duration of the target object.
[0007] In another possible implementation, determining the tracking control command based on the predicted occlusion duration includes: In response to the predicted occlusion duration being less than or equal to a first specified duration, a first tracking control command is generated, which instructs the UAV to maintain its current flight altitude. In response to the predicted occlusion duration being greater than a first specified duration, a second tracking control command is generated, which instructs the UAV to increase its flight altitude.
[0008] In another possible implementation, determining the tracking control command based on the predicted occlusion duration includes: In response to the predicted occlusion duration being less than or equal to a first specified duration, a first tracking control command is generated. The first tracking control command is used to instruct the UAV to maintain a hovering attitude until the target object is identified in the real-time environmental image. In response to the predicted occlusion duration being greater than a first specified duration, a second tracking control command is generated, which instructs the UAV to fly in the direction corresponding to the predicted recurrence location of the target object.
[0009] In another possible implementation, generating the second tracking control command includes: The motion trajectory information of the target object is determined based on the historical environmental image; The predicted recurrence location of the target object is determined based on the motion trajectory information.
[0010] In another possible implementation, determining the predicted reappearance location of the target object based on the motion trajectory information includes: The motion state information of the target object is determined based on the motion trajectory information; Based on the motion state information and the predicted occlusion duration, the predicted displacement of the target object is determined; The predicted recurrence position of the target object is determined based on the predicted displacement.
[0011] On the other hand, a drone tracking device is provided, the device comprising: The acquisition module is configured to acquire real-time environmental images, perform target recognition on the real-time environmental images, and when a target object is identified, control the UAV to track the target object based on the image information of the area where the target object is located. The prediction module is configured to acquire historical environmental images when the target object is detected to be occluded by an obstacle. The historical environmental images are a sequence of images in which the target object was detected within a specified historical time period. The real-time environmental images and the historical environmental images are input into a generative artificial intelligence model to obtain the predicted occlusion duration of the target object. The control module is configured to determine tracking control commands based on the predicted occlusion duration, and to control the flight attitude of the UAV based on the tracking control commands.
[0012] In one possible implementation, the prediction module is configured to: Obtain model prompts, which are used to instruct the generative artificial intelligence model to predict the duration of occlusion of the target object based on the real-time environmental image and the historical environmental image; The real-time environmental image, the historical environmental image, and the model prompts are input into the generative artificial intelligence model to obtain the predicted occlusion duration of the target object.
[0013] In another possible implementation, the control module is used for: In response to the predicted occlusion duration being less than or equal to a first specified duration, a first tracking control command is generated, which instructs the UAV to maintain its current flight altitude. In response to the predicted occlusion duration being greater than a first specified duration, a second tracking control command is generated, which instructs the UAV to increase its flight altitude.
[0014] In another possible implementation, the control module is used for: In response to the predicted occlusion duration being less than or equal to a first specified duration, a first tracking control command is generated. The first tracking control command is used to instruct the UAV to maintain a hovering attitude until the target object is identified in the real-time environmental image. In response to the predicted occlusion duration being greater than a first specified duration, a second tracking control command is generated, which instructs the UAV to fly in the direction corresponding to the predicted recurrence location of the target object.
[0015] In another possible implementation, the control module is used for: The motion trajectory information of the target object is determined based on the historical environmental image; The predicted recurrence location of the target object is determined based on the motion trajectory information.
[0016] In another possible implementation, the control module is used for: The motion state information of the target object is determined based on the motion trajectory information; Based on the motion state information and the predicted occlusion duration, the predicted displacement of the target object is determined; The predicted recurrence position of the target object is determined based on the predicted displacement.
[0017] On the other hand, an electronic device is provided, including a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor executes the program to implement the method described in any of the above.
[0018] On the other hand, a non-transitory computer-readable storage medium is provided, the non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method described in any of the preceding claims.
[0019] On the other hand, a computer program product is provided, including computer program instructions that, when run on a computer, cause the computer to perform the method described in any of the preceding claims.
[0020] The beneficial effects of the technical solution provided in this application are as follows: By utilizing generative artificial intelligence models to deeply understand and reason about historical and real-time environmental images, the motion patterns of target objects, the physical properties of occlusions, and scene structure information are analyzed to predict the reappearance time of target objects, thereby transforming the prediction of future occlusion states into a specific basis for control decisions. This significantly improves the speed and success rate of re-capturing target objects in complex dynamic environments and enhances the adaptability, continuity, and robustness of UAVs in tracking targets in occluded scenarios. Attached Figure Description
[0021] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0022] Figure 1 This is a schematic diagram of an implementation environment provided in an embodiment of this application; Figure 2 This is a flowchart of the drone tracking method provided in the embodiments of this application; Figure 3This is a schematic diagram of the structure of the drone tracking device provided in the embodiments of this application; Figure 4 This is a schematic diagram of the structure of the electronic device provided in the embodiments of this application. Detailed Implementation
[0023] To make the objectives, technical solutions, and advantages of this application clearer, the embodiments of this application will be described in further detail below with reference to the accompanying drawings.
[0024] This disclosure provides a method for tracking unmanned aerial vehicles (UAVs). This method can be implemented using a computer device, which may be a terminal device or a server installed inside the UAV. The terminal device may be an onboard computing unit or a flight control computer (FCC). The server may be an application server, a cloud service server, or a server used to perform certain computing tasks, etc. This disclosure uses the example of a terminal device as the computer device for detailed explanation; other cases are similar and will not be described in detail.
[0025] like Figure 1 As shown, the terminal device may include a processor 110, a memory 120, and a communication component 130, etc.
[0026] The processor 110 can be a central processing unit (CPU), which can be used to execute various operation instructions.
[0027] The memory 120 can be various volatile or non-volatile memory, such as solid-state disk (SSD), dynamic random access memory (DRAM), etc. The memory can be used to store pre-stored data, intermediate data, and result data related to the processing.
[0028] The communication component 130 can be a wired network connector, a wireless fidelity (WiFi) module, a Bluetooth module, a cellular communication module, etc. The communication component can be used to transmit data with other devices.
[0029] In some embodiments, such as Figure 2 As shown, the drone tracking method includes: S201. Acquire real-time environmental images, perform target recognition on the real-time environmental images, and when a target object is identified, control the UAV to track the target object based on the image information of the area where the target object is located.
[0030] In practice, the real-time environmental images can be acquired by a visual acquisition unit mounted on the UAV. The visual acquisition unit can be an imaging device with environmental image acquisition capabilities, such as a monocular camera, a binocular camera, or an RGB-D camera. The visual acquisition unit is communicatively connected to the UAV's flight control system and onboard processing unit. It can continuously acquire real-time environmental images within the UAV's preset tracking field of view according to a preset acquisition frame rate, and transmit the acquired real-time environmental images to the onboard processing unit, or to the supporting ground-based processing equipment via a wireless communication link, providing basic image data for subsequent target recognition and tracking control.
[0031] Target recognition of the acquired real-time environmental images can be achieved through a pre-trained target detection model. The target detection model can be a deep learning model suitable for real-time detection in embedded devices, such as the YOLO series or Faster R-CNN. The model is pre-trained and optimized using a large number of image samples containing preset tracking targets under different poses, lighting conditions, background environments, and occlusion levels. This enables the target detection model to have high-precision recognition and localization capabilities for target objects. During the target recognition process, the real-time environmental image is input into the trained target detection model. The target detection model outputs the detection box position of the target object in the real-time environmental image, the category confidence score, and other recognition results. When the output category confidence score is greater than a preset confidence threshold (e.g., 0.90), it is determined that the target object has been identified. At the same time, the region where the target object is located in the real-time environmental image is determined by the output detection box.
[0032] When a target object is identified, the UAV is controlled to track the target object based on the image information of the area where the target object is located. First, based on the image information of the area where the target object is located, position-related information such as the center coordinates and detection box size of the target object in the image coordinate system is extracted. Through the imaging model obtained by camera calibration and the coordinate transformation relationship, the image coordinates of the target object are converted into relative position information in the world coordinate system or the UAV body coordinate system. Then, combined with the flight status information currently collected by the UAV, including real-time flight speed, flight altitude, body attitude angle, position coordinates, etc., corresponding tracking flight control commands are generated. The tracking flight control commands include at least one of speed commands, position commands, and attitude adjustment commands. The tracking flight control commands are sent to the flight control system of the UAV, which controls the power unit of the UAV to execute corresponding flight actions, so that the UAV and the target object maintain a preset relative distance, relative altitude, and tracking angle, thereby achieving continuous, stable, and autonomous tracking of the target object.
[0033] S202. When it is detected that the target object is occluded by an obstacle, acquire historical environmental images. The historical environmental images are a sequence of images in which the target object was identified within a specified historical time period. Input the real-time environmental images and the historical environmental images into a generative artificial intelligence model to obtain the predicted occlusion duration of the target object.
[0034] In specific implementation, the first step is to identify the occlusion of the target object. This is based on one or more combinations of the target identification result, target motion continuity information, and semantic information of the environmental image. Specifically, this includes: based on the target detection result, after target identification is completed on the real-time environmental image, if the confidence level of the target object category output by the target detection model is consistently lower than a preset confidence threshold, and if the integrity of the target object detection box is determined (the integrity of the target object detection box is equal to the area of the detection box of the visible region of the target object identified in the real-time environmental image, divided by the standard detection box area of the target object in the unoccluded state, and then multiplied by 100%), where the target object is unoccluded. The area of the standard detection box under occlusion (taken as the average area of the detection boxes identified when the target object is unoccluded within a specified historical period before the occlusion event) is lower than a preset integrity threshold (e.g., 60%) or the effective pixel percentage of the target object in the image (the effective pixel percentage of the target object in the image is equal to the number of effective pixels in the visible area of the target object in the real-time environment image, divided by the total number of pixels in the unoccluded state of the target object, and then multiplied by 100%, where the total number of pixels in the unoccluded state of the target object is taken as the total number of pixels in the target area identified when the target object is unoccluded within a specified historical period before the occlusion event). If the average number of elements is lower than a preset percentage threshold (e.g., 30%), and the duration of the above state exceeds a preset duration (e.g., 1 second), and the case where the target object completely moves out of the field of view of the visual acquisition unit is excluded, (firstly, based on historical environmental images within a specified historical time before the occlusion event occurs, extract the historical motion trajectory, real-time motion speed, and motion direction information of the target object, combine them with the pre-calibrated effective field of view parameters within the visual acquisition unit, predict the theoretical spatial position of the target object at the current moment through a kinematic model, and obtain the corresponding theoretical imaging area through coordinate transformation; if the predicted theoretical imaging area is completely...) If the target object completely exceeds the effective imaging boundary of the real-time environment image, or if the historical motion trajectory of the target object shows a continuous trend of moving out of the image boundary, and the inter-frame overlap of the target detection box in consecutive preset frames is lower than a preset threshold and eventually disappears completely from the image boundary, then it is determined that the target object has completely moved out of the field of view of the visual acquisition unit; if the predicted theoretical imaging area of the target object is within the effective imaging range of the real-time environment image, and the historical motion trend of the target object does not show the characteristic of continuously moving out of the field of view, then the situation where the target object completely moves out of the field of view of the visual acquisition unit is excluded, and the target object is determined to be occluded by an obstacle.
[0035] It can also be based on motion continuity judgment. The motion trajectory of the target can be obtained by fitting the historical position information of the target object in the historical environmental image. The theoretical position of the target object in the current real-time environmental image can be predicted by the kinematic model. If no target object matching the appearance features is detected within the preset neighborhood of the theoretical position, and the predicted position of the target is still within the effective field of view of the visual acquisition unit, then it is determined that the target object is occluded by an obstacle.
[0036] It can also be based on semantic segmentation judgment. By using a pre-trained scene semantic segmentation model to perform pixel-level semantic segmentation on real-time environmental images, it can identify the obstacle region and the predicted region of the target object in the image. When the overlap ratio between the two (the overlap ratio between the obstacle region and the predicted region of the target object is equal to the area of the area where the obstacle region and the predicted region of the target object overlap, divided by the total area of the predicted region of the target object, and then multiplied by 100%, where the predicted region of the target object is the area range in the current real-time environmental image that the target object should be located in based on the historical motion trajectory prediction, and the obstacle region is the pixel area corresponding to the object in the image that can occlude the target, identified by the semantic segmentation model) exceeds a preset overlap threshold (e.g., 60%), it is determined that the target object is occluded by an obstacle.
[0037] The historical environmental images are a sequence of images pre-stored in the UAV's onboard cache unit, representing a specified historical time period (e.g., 1 second) during which the target object was successfully identified. The image sequence includes continuous frame images or sampled keyframe images containing the target object collected within the specified historical time period before the occlusion event occurs. Each frame image is associated with a corresponding collection timestamp, the identification and positioning information of the target object within that frame, and the UAV's flight status data at the time of collection. The onboard cache unit adopts a real-time storage mechanism with cyclic overlay to continuously update and store the collected environmental images containing valid target objects, ensuring that historical environmental images within the corresponding range can be quickly retrieved when an occlusion event is triggered. These historical environmental images can completely restore the target object's motion state, trajectory change patterns, appearance features, and key spatiotemporal information such as the environmental structure and obstacle distribution of the scene in which the target is located before the occlusion occurs, providing basic data support for subsequent occlusion duration prediction.
[0038] The generative AI model described is a pre-trained and scenario-based fine-tuned multimodal generative vision model adapted to the embedded computing power environment of UAVs, such as GPT-4V (GPT-4 Turbo with Vision), LLaVA-NeXT, and Video-LLaMA. This multimodal generative vision model possesses the ability to understand multi-frame image sequence inputs, perform cross-modal semantic reasoning, and generate outputs. Its pre-training process is based on massive publicly available multimodal image and text datasets, enabling it to possess basic visual content understanding, spatiotemporal feature extraction, and logical reasoning capabilities.
[0039] The model prompts are obtained, which are used to explicitly instruct the model to predict the duration of time from the moment the target object is occluded until it reappears within the effective field of view of the UAV's visual acquisition unit, based on the input real-time and historical environmental images. Then, the real-time environmental image acquired at the moment of occlusion, the historical environmental image (sequence), and the model prompts are input together into the optimized generative artificial intelligence model. The model usually first performs visual encoding and feature extraction on the input multi-frame images to obtain the spatiotemporal features of the target's motion trajectory in the historical environmental image, the target's appearance features, the semantic features of the occluder in the real-time environmental image, and the scene structure features. After fusing the multi-dimensional features, generative inference is used to output the quantified predicted occlusion duration of the target object.
[0040] The model prompt can be set as follows: "Based on the input image data, please accurately predict the total duration from the moment the target is currently occluded to when it reappears within the effective field of view of the drone, and output the numerical result in seconds. The inference process must extract and analyze the core attributes of the obstacles related to occlusion, including but not limited to the static or dynamic type of the obstacle, its spatial size, and the occlusion coverage area. For static fixed obstacles such as trees, buildings, and fixed guardrails, a longer occlusion duration prediction weight should be matched; for dynamic movable obstacles such as vehicles, pedestrians, and mobile work equipment, a shorter occlusion duration prediction weight should be matched, and the final duration prediction should be completed by combining the target's historical movement patterns."
[0041] The model prompt can also be set to: "Please perform a prediction task based on the input historical environmental image sequence and the current real-time environmental image. The historical environmental image sequence is a series of frames that have been successfully identified within a specified historical time period from the current moment. It records the complete visual features of the target object, its trajectory, speed, and direction of motion in the image, as well as the environmental structure, obstacle distribution, and attributes of the scene in which the target object is located. The current real-time environmental image is the image acquired at the moment when the target object is determined to be occluded by an obstacle. It shows the specific scene conditions at the time of occlusion, the visual features of the occluder, and the area where the target object is occluded." Domain. Please comprehensively analyze the motion patterns of the target object extracted from historical environmental image sequences, the occlusion attributes and scene structure information extracted from real-time environmental images, and combine the kinematic constraints of common moving targets with the physical spatial layout of the scene to perform inference and calculation. Finally, please output a quantified time prediction value, which is the estimated time required for the target object to reappear completely within the effective field of view of the UAV visual acquisition unit from the current occlusion moment, i.e., the predicted occlusion duration of the target object. Please output the prediction result directly in the format 'Predicted occlusion duration: [numerical value] seconds', where [numerical value] is the specific predicted number of seconds, which can be retained to one decimal place. S203. Determine tracking control commands based on the predicted occlusion duration, and control the flight attitude of the UAV based on the tracking control commands.
[0042] In specific implementation, the first specified duration is a critical time threshold for distinguishing between short-duration and long-duration occlusion. It can be preset and adaptively adjusted based on the UAV's onboard computing power, the field of view of the visual acquisition unit, the target object's normal movement speed, and the environmental complexity of the tracking scene. For example, it can be preset to a fixed value such as 2s, 3s, or 5s, or the threshold can be adjusted in real time based on the target object's historical movement speed and the semantic type of the obstacle. When the predicted occlusion duration is less than or equal to the first specified duration, the target object is determined to be in a short-duration occlusion state, and a first tracking control command is generated. This first tracking control command instructs the UAV to maintain its current flight altitude, while simultaneously maintaining its current flight position and tracking perspective. No additional adjustments to the aircraft's flight attitude are required. The UAV continuously acquires real-time environmental images through the visual acquisition unit and simultaneously performs target recognition, avoiding the impact of frequent flight maneuvers on tracking stability. Normal visual tracking logic is immediately restored once the target object is re-identified in the real-time environmental image.
[0043] The first tracking control command can also be used to instruct the UAV to maintain a hovering attitude, lock the current spatial position, flight altitude and body orientation, continuously perform target detection and recognition on the collected real-time environmental images, and wait for the target object to reappear within the effective field of view of the visual acquisition unit. During the hovering waiting process, the memory and update of the target object's historical movement trajectory can be maintained simultaneously. Once the target object is re-identified, the hovering state is immediately exited and the normal tracking mode is switched to ensure the continuity of tracking in short-term occlusion scenarios.
[0044] When the predicted occlusion duration exceeds a first specified duration, it is determined that the current target object is in a long-term occlusion state, and a second tracking control command is generated. The second tracking control command is used to instruct the UAV to increase its flight altitude. Specifically, the corresponding preset increase altitude value can be determined according to the magnitude of the predicted occlusion duration. The longer the predicted occlusion duration, the greater the corresponding increase altitude. By increasing the flight altitude, the ground field of view of the UAV's visual acquisition unit is expanded, and the probability of obstacles such as trees and buildings occluding the target object is reduced. At the same time, during the process of increasing altitude, target recognition is continuously performed on the real-time environmental image. When the target object is recognized, the flight altitude and flight attitude can be adjusted according to the real-time position of the target object to restore continuous and stable tracking of the target object.
[0045] The second tracking control command can also be used to instruct the UAV to fly in the direction corresponding to the predicted reappearance position of the target object. First, the motion trajectory information of the target object is determined based on the historical environmental images. Specifically, the acquisition timestamp and the identification and positioning information of the target object in the corresponding frame are extracted from each frame of the historical environmental image sequence. Combined with the flight attitude and spatial position data of the UAV at the corresponding acquisition time, the position information of the target object in the image coordinate system is converted into spatial position coordinates in the world coordinate system. Multiple sets of spatial position coordinates are fitted and smoothed in chronological order to obtain the continuous motion trajectory information of the target object before the occlusion occurs. This motion trajectory information can completely characterize the motion path and position change law of the target object before occlusion. Then, the predicted reappearance position of the target object is determined based on the motion trajectory information, that is, the motion state information of the target object is determined based on the motion trajectory information. The motion state information includes kinematic parameters such as the target object's motion speed, motion direction, acceleration, and turning angular velocity. These parameters can be obtained by differential calculation and Kalman filter optimization of the motion trajectory. The motion state information and the predicted occlusion duration are used to calculate the predicted displacement of the target object during the occlusion period using a preset kinematic model, combined with uniform motion, uniformly accelerated motion, or motion patterns fitted based on historical trajectories. This includes the displacement amount and direction. Finally, based on the target object's last known spatial position at the moment of occlusion, the calculated predicted displacement is superimposed to determine the predicted reappearance position of the target object at the end of the occlusion. This position is the spatial position where the target object is likely to reappear within the field of view. Based on the determined predicted reappearance position, a second tracking control command is generated, including position, speed, and heading adjustment commands, to control the UAV to fly in the direction corresponding to the predicted reappearance position. The UAV can be controlled to arrive near the predicted reappearance position in advance to wait for the target to appear, or to accompany the target along its predicted motion trajectory. Target recognition is continuously performed during flight. Once the target object is re-identified in the real-time environmental image, the system immediately switches to normal visual tracking mode. For long-duration occlusion scenarios, this effectively avoids target loss and tracking lag caused by passive hovering, significantly improving the robustness and effectiveness of UAV target tracking in complex dynamic environments.
[0046] In this embodiment, a generative artificial intelligence model is utilized to deeply understand and reason about historical environmental image sequences and current occlusion images. By integrating the target's motion patterns, the physical properties of the occlusion (such as statically fixed or dynamically movable), and scene structure information, the duration of the target's reappearance is quantified and predicted. This transforms the prediction of future occlusion states into a concrete basis for control decisions. Based on the predicted duration, the UAV can execute tracking and maintenance strategies of varying complexity and proactivity, ranging from "maintaining hover" to "actively increasing altitude or flying to the predicted reappearance location." This allows the UAV to avoid ineffective maneuvers, save energy, and maintain stability when encountering short-term occlusion, while proactively taking measures such as expanding the field of view or preemptively positioning itself when encountering long-term occlusion. This significantly improves the speed and success rate of re-acquiring targets in complex dynamic environments, enhancing the adaptability, continuity, and robustness of the entire tracking system in occlusion scenarios.
[0047] In some embodiments, the method further includes: When the target object is detected to be occluded by an obstacle, the pixel area percentage of the obstacle in the real-time environment image is determined. Pixel area percentage = (number of pixels in the obstacle area / total number of pixels in the real-time environment image) × 100%. When the pixel area percentage is greater than or equal to a specified pixel area percentage (e.g., 50%), an adjustment coefficient for adjusting a first specified duration is determined based on the relative difference between the pixel area percentage and the specified pixel area percentage. The relative difference is negatively correlated with the adjustment coefficient, and the adjustment coefficient is less than 1. Relative difference = (pixel area percentage - specified pixel area percentage) / specified pixel area percentage. The product of the first specified duration and the adjustment coefficient is used as the adjusted first specified duration. As shown in Table 1: Table 1
[0048] Table 1 illustrates the negative correlation between the relative difference and the adjustment coefficient; the larger the relative difference, the smaller the adjustment coefficient.
[0049] In this embodiment, when obstacles occupy a large portion of the frame, it indicates that the current tracking scene may be complex or the occlusion is large, reducing the likelihood of the target object quickly moving out of the occlusion area. In this case, the adjustment coefficient is reduced based on the increased relative difference, thereby shortening the "first specified duration" that serves as the decision-making boundary. This allows the drone to switch from a passive strategy of "holding a hover and waiting" to "increasing altitude or actively seeking" earlier, effectively avoiding the problem of slow drone response and excessive passive waiting caused by using a fixed duration threshold based on an ideal scenario in complex and heavily occluded environments, thus missing opportunities to re-acquire the target. This significantly improves the speed and success rate of target re-acquisition in complex and dynamic environments.
[0050] All of the above-mentioned optional technical solutions can be combined in any way to form the optional embodiments of this application, and will not be described in detail here.
[0051] Based on the same inventive concept, corresponding to the drone tracking method provided in the embodiments of this application, this application also provides a drone tracking device.
[0052] refer to Figure 3 The drone tracking device includes: The acquisition module 301 is configured to acquire real-time environmental images, perform target recognition on the real-time environmental images, and when a target object is identified, control the UAV to track the target object based on the image information of the area where the target object is located. The prediction module 302 is configured to acquire historical environmental images when the target object is detected to be occluded by an obstacle. The historical environmental images are a sequence of images in which the target object was detected within a specified historical time period. The real-time environmental images and the historical environmental images are input into a generative artificial intelligence model to obtain the predicted occlusion duration of the target object. The control module 303 is configured to determine a tracking control command based on the predicted occlusion duration, and control the flight attitude of the UAV based on the tracking control command.
[0053] In one possible implementation, the prediction module 302 is used to: Obtain model prompts, which are used to instruct the generative artificial intelligence model to predict the duration of occlusion of the target object based on the real-time environmental image and the historical environmental image; The real-time environmental image, the historical environmental image, and the model prompts are input into the generative artificial intelligence model to obtain the predicted occlusion duration of the target object.
[0054] In another possible implementation, the control module 303 is used for: In response to the predicted occlusion duration being less than or equal to a first specified duration, a first tracking control command is generated, which instructs the UAV to maintain its current flight altitude. In response to the predicted occlusion duration being greater than a first specified duration, a second tracking control command is generated, which instructs the UAV to increase its flight altitude.
[0055] In another possible implementation, the control module 303 is used for: In response to the predicted occlusion duration being less than or equal to a first specified duration, a first tracking control command is generated. The first tracking control command is used to instruct the UAV to maintain a hovering attitude until the target object is identified in the real-time environmental image. In response to the predicted occlusion duration being greater than a first specified duration, a second tracking control command is generated, which instructs the UAV to fly in the direction corresponding to the predicted recurrence location of the target object.
[0056] In another possible implementation, the control module 303 is used for: The motion trajectory information of the target object is determined based on the historical environmental image; The predicted recurrence location of the target object is determined based on the motion trajectory information.
[0057] In another possible implementation, the control module 303 is used for: The motion state information of the target object is determined based on the motion trajectory information; Based on the motion state information and the predicted occlusion duration, the predicted displacement of the target object is determined; The predicted recurrence position of the target object is determined based on the predicted displacement.
[0058] It should be noted that the drone tracking device provided in the above embodiments is only an example of the division of the above functional modules. In actual applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the device can be divided into different functional modules to complete all or part of the functions described above. In addition, the drone tracking device and the drone tracking method embodiments provided in the above embodiments belong to the same concept, and the specific implementation process can be found in the method embodiments, which will not be repeated here.
[0059] Based on the same inventive concept, corresponding to the drone tracking method provided in the embodiments of this application, this application also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the program, it implements the drone tracking method described in the above embodiments.
[0060] Figure 4 This embodiment illustrates a more specific hardware structure of an electronic device, which may include a processor 1010, a memory 1020, an input / output interface 1030, a communication interface 1040, and a bus 1050. The processor 1010, memory 1020, input / output interface 1030, and communication interface 1040 are interconnected internally via the bus 1050.
[0061] The processor 1010 can be implemented using a general-purpose CPU (Central Processing Unit), microprocessor, application-specific integrated circuit (ASIC), or one or more integrated circuits, and is used to execute relevant programs to implement the technical solutions provided in the embodiments of this specification.
[0062] The memory 1020 can be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory), static storage device, dynamic storage device, etc. The memory 1020 can store the operating system and other applications. When the technical solutions provided in the embodiments of this specification are implemented by software or firmware, the relevant program code is stored in the memory 1020 and is called and executed by the processor 1010.
[0063] The input / output interface 1030 is used to connect input / output modules to realize information input and output. Input / output modules can be configured as components within the device (not shown in the figure) or externally connected to the device to provide corresponding functions. Input devices may include keyboards, mice, touchscreens, microphones, various sensors, etc., while output devices may include displays, speakers, vibrators, indicator lights, etc.
[0064] The communication interface 1040 is used to connect a communication module (not shown in the figure) to enable communication between this device and other devices. The communication module can communicate via wired means (such as USB, Ethernet cable, etc.) or wireless means (such as mobile network, WIFI, Bluetooth, etc.).
[0065] Bus 1050 includes a pathway for transmitting information between various components of the device, such as processor 1010, memory 1020, input / output interface 1030, and communication interface 1040.
[0066] It should be noted that although the above-described device only shows the processor 1010, memory 1020, input / output interface 1030, communication interface 1040, and bus 1050, in specific implementations, the device may also include other components necessary for normal operation. Furthermore, those skilled in the art will understand that the above-described device may only include the components necessary for implementing the embodiments of this specification, and not necessarily all the components shown in the figures.
[0067] The electronic devices described above are used to implement the corresponding UAV tracking methods in the foregoing embodiments and have the beneficial effects of the corresponding method embodiments, which will not be repeated here.
[0068] In an exemplary embodiment, a computer-readable storage medium is also provided, such as a memory including instructions that can be executed by a processor in a terminal to complete the drone tracking method described above. This computer-readable storage medium can be non-transitory. For example, the computer-readable storage medium can be ROM (Read-Only Memory), RAM (Random Access Memory), CD-ROM (Compact Disc Read-Only Memory), magnetic tape, floppy disk, and optical data storage devices, etc.
[0069] In an exemplary embodiment, a computer program product is also provided, including computer program instructions that, when executed on a computer, cause the computer to perform the drone tracking method described above.
[0070] It should be noted that the information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data used for analysis, data stored, data displayed, etc.) and signals (including but not limited to signals transmitted between user terminals and other devices, etc.) involved in this application are all authorized by the user or fully authorized by all parties, and the collection, use and processing of related data must comply with the relevant laws, regulations and standards of the relevant countries and regions.
[0071] Those skilled in the art will understand that all or part of the steps of the above embodiments can be implemented by hardware or by a program instructing related hardware. The program can be stored in a computer-readable storage medium, such as a read-only memory, a disk, or an optical disk.
[0072] It should be understood that "multiple" as used herein refers to two or more. "And / or" describes the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A alone, A and B simultaneously, or B alone. The character " / " generally indicates that the preceding and following related objects are in an "or" relationship. Furthermore, the step numbers described herein are merely illustrative of one possible execution order. In some other embodiments, the steps may not be executed in numerical order, such as two steps with different numbers being executed simultaneously, or two steps with different numbers being executed in the reverse order of the illustration. This application does not limit this.
[0073] The above description is merely an optional embodiment of this application and is not intended to limit this application. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the protection scope of this application.
Claims
1. A method for tracking unmanned aerial vehicles (UAVs), characterized in that, include: Acquire real-time environmental images, perform target recognition on the real-time environmental images, and when a target object is identified, control the UAV to track the target object based on the image information of the area where the target object is located; When the target object is detected to be occluded by an obstacle, historical environmental images are acquired. The historical environmental images are a sequence of images in which the target object was detected within a specified historical time period. The real-time environmental images and the historical environmental images are input into a generative artificial intelligence model to obtain the predicted occlusion duration of the target object. Based on the predicted occlusion duration, a tracking control command is determined, and the flight attitude of the UAV is controlled based on the tracking control command.
2. The UAV tracking method according to claim 1, characterized in that, The step of inputting the real-time environmental image and the historical environmental image into a generative artificial intelligence model to obtain the predicted occlusion duration of the target object includes: Obtain model prompts, which are used to instruct the generative artificial intelligence model to predict the duration of occlusion of the target object based on the real-time environmental image and the historical environmental image; The real-time environmental image, the historical environmental image, and the model prompts are input into the generative artificial intelligence model to obtain the predicted occlusion duration of the target object.
3. The UAV tracking method according to claim 1, characterized in that, The tracking control command determined based on the predicted occlusion duration includes: In response to the predicted occlusion duration being less than or equal to a first specified duration, a first tracking control command is generated, which instructs the UAV to maintain its current flight altitude. In response to the predicted occlusion duration being greater than a first specified duration, a second tracking control command is generated, which instructs the UAV to increase its flight altitude.
4. The UAV tracking method according to claim 1, characterized in that, The tracking control command determined based on the predicted occlusion duration includes: In response to the predicted occlusion duration being less than or equal to a first specified duration, a first tracking control command is generated. The first tracking control command is used to instruct the UAV to maintain a hovering attitude until the target object is identified in the real-time environmental image. In response to the predicted occlusion duration being greater than a first specified duration, a second tracking control command is generated, which instructs the UAV to fly in the direction corresponding to the predicted recurrence location of the target object.
5. The UAV tracking method according to claim 4, characterized in that, The generation of the second tracking control command includes: The motion trajectory information of the target object is determined based on the historical environmental image; The predicted recurrence location of the target object is determined based on the motion trajectory information.
6. The UAV tracking method according to claim 5, characterized in that, Determining the predicted reappearance location of the target object based on the motion trajectory information includes: The motion state information of the target object is determined based on the motion trajectory information; Based on the motion state information and the predicted occlusion duration, the predicted displacement of the target object is determined; The predicted recurrence position of the target object is determined based on the predicted displacement.
7. A drone tracking device, characterized in that, include: The acquisition module is configured to acquire real-time environmental images, perform target recognition on the real-time environmental images, and when a target object is identified, control the UAV to track the target object based on the image information of the area where the target object is located. The prediction module is configured to acquire historical environmental images when the target object is detected to be occluded by an obstacle. The historical environmental images are a sequence of images in which the target object was detected within a specified historical time period. The real-time environmental images and the historical environmental images are input into a generative artificial intelligence model to obtain the predicted occlusion duration of the target object. The control module is configured to determine tracking control commands based on the predicted occlusion duration, and to control the flight attitude of the UAV based on the tracking control commands.
8. An electronic device comprising a memory, a processor, and a computer program stored in the memory and running on the processor, characterized in that, When the processor executes the program, it implements the method as described in any one of claims 1 to 6.
9. A non-transitory computer-readable storage medium storing computer instructions, characterized in that, The computer instructions are used to cause the computer to perform the method described in any one of claims 1 to 6.
10. A computer program product comprising computer program instructions, characterized in that, When the computer program instructions are executed on a computer, the computer causes the computer to perform the method as described in any one of claims 1 to 6.