Unmanned aerial vehicle autonomous tracking method and system based on monocular vision

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By combining monocular vision and Kalman filtering, autonomous identification and tracking of drones were achieved, solving the problems of reliance on manual operation and cumbersome hardware in existing technologies, and improving the stability and cost-effectiveness of drone tracking.

CN121596906BActive Publication Date: 2026-06-23SHANDONG UNIV

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: SHANDONG UNIV
Filing Date: 2026-01-30
Publication Date: 2026-06-23

Application Information

Patent Timeline

30 Jan 2026

Application

23 Jun 2026

Publication

CN121596906B

IPC: G05D1/686; G01P3/68; G05D1/249; G05D109/20

AI Tagging

Application Domain

Devices using time traversedVehicle position/course/altitude control

Technology Topics

Three-dimensional space Uncrewed vehicle

Technical Efficacy Phrases

reduce human interventionImprove reliability

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Cylindrical lithium-ion battery airtightness testing device and system
CN224456086Uimprove accuracySimplify the drilling processManufacturing technology Electrical battery
Stamping structure for precise metal structural part
CN224272902Uprecise positioning avoid scratches Metal-working feeding devices Cleaning using liquids Hydraulic cylinder Electric machinery
A variable irrigation decision system
CN120202915Boptimal moisture conditionAutomatically adjust irrigation volumeData processing applications Watering devices Soil science Decision system
Quantitative proportioning device for rice flour raw materials
CN224271071Uavoid confusion Improve processing quality Bio-packaging Transportation and packaging Agricultural science Agricultural engineering
Blockchain-based inter-provincial spot transaction carbon footprint tracking method, system and device
CN116739785Bincrease credibility high transparency Carbon footprint Embedded system

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure CN121596906B_ABST

Patent Text Reader

Abstract

The application belongs to the technical field of unmanned aerial vehicle control. An unmanned aerial vehicle autonomous tracking method and system based on monocular vision are provided. The target unmanned aerial vehicle image captured by the monocular camera of the tracking unmanned aerial vehicle is obtained, and the three-dimensional space coordinates and the current relative speed of the target unmanned aerial vehicle are determined according to the target unmanned aerial vehicle image. The image error vector is calculated according to the three-dimensional space coordinates. According to the three-dimensional space coordinates and the current relative speed, the current relative speed is optimized by using the Kalman filtering algorithm to obtain the optimized current relative speed, and the position error vector is determined according to the optimized current relative speed. According to the image error vector and the position error vector, the fusion control input vector is determined, and according to the fusion control input vector and the current relative speed, the updated relative speed is determined for tracking the tracking unmanned aerial vehicle. The application greatly improves the control accuracy of the unmanned aerial vehicle autonomous tracking.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of unmanned aerial vehicle (UAV) control technology, and specifically to an autonomous tracking method and system for UAVs based on monocular vision. Background Technology

[0002] The statements in this section are merely background information related to the present invention and do not necessarily constitute prior art.

[0003] With the rapid development of drone technology, drones have been widely used in commercial and civilian fields. However, the widespread use of drones has also brought many problems, urgently requiring a solution that can effectively identify, track, and intercept illegal drones. While existing solutions have achieved some drone identification and tracking, they still have the following issues:

[0004] Existing solutions rely excessively on operator visual inspection and remote control, which not only makes it difficult to maintain a stable interception success rate but also significantly increases personnel training costs, resulting in substantial consumption of human and material resources. Some drone identification and tracking solutions have obvious limitations in applicable scenarios, only working in hovering drones or open environments, with a narrow scope of application and a lack of sufficient universality, making it difficult to meet diverse practical needs. Tracking devices based on radar or electromagnetic interference principles are very expensive, and devices carrying electromagnetic devices are generally bulky, severely limiting mobility, which greatly hinders the large-scale promotion and application of such devices. Summary of the Invention

[0005] To address the shortcomings of existing technologies, this invention provides a method and system for autonomous drone tracking based on monocular vision. This overcomes the drawbacks of traditional anti-drone methods, such as reliance on manual labor, poor stability, and high cost. It can still achieve stable identification, tracking, and interception of target drones in complex dynamic environments, greatly improving the control accuracy of autonomous drone tracking.

[0006] To achieve the above objectives, the present invention adopts the following technical solution:

[0007] In a first aspect, the present invention provides an autonomous tracking method for unmanned aerial vehicles based on monocular vision.

[0008] An autonomous tracking method for unmanned aerial vehicles based on monocular vision includes the following process:

[0009] Acquire images of the target drone captured by a monocular camera that tracks the drone, and determine the target drone's three-dimensional spatial coordinates and current relative speed based on the images;

[0010] Calculate the image error vector based on three-dimensional spatial coordinates;

[0011] Based on the three-dimensional spatial coordinates and the current relative velocity, the Kalman filter algorithm is used to optimize the current relative velocity to obtain the optimized current relative velocity. The position error vector is then determined based on the optimized current relative velocity.

[0012] Based on the image error vector and the position error vector, the fusion control input vector is determined. Based on the fusion control input vector and the current relative velocity, the updated relative velocity is determined for tracking the drone.

[0013] In one implementation of the first aspect of the present invention, determining the three-dimensional spatial coordinates of the target UAV based on the target UAV image includes:

[0014] The position coordinates and size of the target UAV are determined using a pre-trained target detection model.

[0015] The relative distance to the target drone is determined based on its size, the intrinsic parameters of the monocular camera, and the actual physical width of the target drone.

[0016] The three-dimensional spatial coordinates of the target UAV are determined based on its position coordinates, relative distance, and intrinsic parameters of the monocular camera.

[0017] In one implementation of the first aspect of the present invention, determining the current relative speed of the target drone based on an image of the target drone includes:

[0018] Based on two adjacent frames of the target drone image, the three-dimensional spatial coordinates of the target drone in the previous frame and the current frame are determined respectively. Combined with the time difference between the two adjacent frames of the target drone image, the current relative speed of the target drone is determined.

[0019] In one implementation of the first aspect of the present invention, the image error vector is a three-dimensional vector, including: the difference between the width of the target image frame output by the target detection model and the preset width of the target image frame, the difference between the Y coordinate of the center of the target image frame output by the target detection model and the Y coordinate of the center of the optical axis of the monocular camera, and the difference between the Z coordinate of the center of the target image frame output by the target detection model and the Z coordinate of the center of the optical axis of the monocular camera.

[0020] In one implementation of the first aspect of the present invention, the position error vector is a three-dimensional vector including components of the current relative velocity in the X, Y and Z directions.

[0021] In one implementation of the first aspect of the present invention, the fusion control input vector is: ,in, Represents the position error vector. Represents the image error vector. The weight matrix represents the image error vector. The weight matrix represents the position error vector. Represents the fusion control input vector. The updated relative velocity is the sum of the fusion control input vector and the current relative velocity.

[0022] Secondly, the present invention provides an autonomous tracking system for unmanned aerial vehicles based on monocular vision.

[0023] An autonomous drone tracking system based on monocular vision includes:

[0024] The monocular image processing unit is configured to: acquire images of the target drone captured by a monocular camera tracking the drone, and determine the three-dimensional spatial coordinates and current relative speed of the target drone based on the images;

[0025] The image error calculation unit is configured to calculate the image error vector based on three-dimensional spatial coordinates.

[0026] The position error calculation unit is configured to: optimize the current relative velocity using a Kalman filter algorithm based on the three-dimensional spatial coordinates and the current relative velocity, obtain the optimized current relative velocity, and determine the position error vector based on the optimized current relative velocity;

[0027] The speed fusion control unit is configured to: determine the fusion control input vector based on the image error vector and the position error vector, and determine the updated relative speed based on the fusion control input vector and the current relative speed for tracking the drone.

[0028] Thirdly, the present invention provides a computer device, comprising: a processor and a computer-readable storage medium;

[0029] A processor, adapted to execute computer programs;

[0030] A computer-readable storage medium storing a computer program, which, when executed by a processor, implements the first aspect of the present invention: an autonomous tracking method for unmanned aerial vehicles based on monocular vision.

[0031] Fourthly, the present invention provides a computer-readable storage medium storing a computer program adapted to be loaded by a processor and executed by the monocular vision-based autonomous tracking method for unmanned aerial vehicles according to the first aspect of the present invention.

[0032] Fifthly, the present invention provides a computer program product, which includes a computer program. When the computer program is executed by a processor, it implements the monocular vision-based autonomous tracking method for unmanned aerial vehicles according to the first aspect of the present invention.

[0033] Compared with the prior art, the beneficial effects of the present invention are:

[0034] Existing methods mostly employ long-distance remote control, requiring operators to locate target drones visually. Interception results rely on experience, leading to inconsistent success rates. This invention utilizes monocular vision to automatically identify target drones and achieves continuous and stable tracking and approach through a fusion control strategy, fundamentally reducing human intervention and improving the reliability and consistency of denial operations.

[0035] Traditional solutions require operators to have high visual inspection and manipulation skills, resulting in long training cycles and high costs. The method of this invention does not require professional operators, and the system can autonomously complete identification and tracking tasks, which not only reduces training costs and reliance on manpower, but also enables rapid deployment in emergency situations to form a denial capability.

[0036] When using only image-based visual servoing (IBVS), the target UAV is prone to oscillations due to excessively small errors during the approach phase; when using only position-based visual servoing (PBVS), the scale assumption of monocular vision may introduce biases. This invention organically combines IBVS and PBVS: IBVS ensures that the target UAV is always in the center of the field of view, while PBVS is responsible for the precise matching of speed and position. The two complement each other, enabling the tracking UAV to stably approach and ultimately intercept the target UAV even when it is performing complex maneuvers.

[0037] This invention introduces relative velocity estimation and combines it with Kalman filtering for optimization, which effectively reduces the uncertainty caused by changes in illumination, wind speed interference and ranging noise. The filtered velocity and position estimations are smoother, enabling the system to maintain high tracking accuracy in dynamic environments.

[0038] Compared to solutions that rely on depth cameras, radar, or multi-sensor fusion, this invention can complete detection and tracking using only monocular vision. The hardware structure is simple and the cost is low, which not only makes it easy to promote in scientific research and engineering practice, but also provides a more economical and feasible solution for applications in civil security, airport protection and other fields.

[0039] Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. Attached Figure Description

[0040] The accompanying drawings, which form part of this invention, are used to provide a further understanding of the invention. The illustrative embodiments of the invention and their descriptions are used to explain the invention and do not constitute an improper limitation of the invention.

[0041] Figure 1A schematic diagram illustrating the principle of an autonomous UAV tracking method based on monocular vision, provided as an exemplary embodiment of the present invention;

[0042] Figure 2 A schematic diagram illustrating the principle of an autonomous tracking system for unmanned aerial vehicles based on monocular vision, provided as an exemplary embodiment of the present invention;

[0043] Figure 3 A schematic diagram of a computer device provided for an exemplary embodiment of the present invention. Detailed Implementation

[0044] The present invention will be further described below with reference to the accompanying drawings and embodiments.

[0045] It should be noted that the following detailed descriptions are exemplary and intended to provide further illustration of the invention. Unless otherwise specified, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains.

[0046] This implementation proposes an autonomous tracking method for unmanned aerial vehicles (UAVs) based on monocular vision, such as... Figure 1 As shown, the YOLO target detection model is used to detect the target UAV, and then the three-dimensional spatial coordinates and the current relative velocity are estimated. The resulting information will generate an image error vector on the one hand, and a position error vector will be generated by using Kalman filtering to optimize the current relative velocity. The image error vector and the position error vector are fused together, and the relative velocity is updated by combining the fusion result.

[0047] Specifically, this invention uses a monocular camera for real-time image acquisition, employs a YOLO target detection model to identify the target drone and obtain its bounding box information, thereby estimating the relative distance to the drone. Using the position information of the bounding box combined with camera intrinsic parameters, the position and velocity of the target drone in three-dimensional space are calculated. A Kalman filter algorithm is used, combining measurement and prediction data, to optimize the target's velocity estimation and reduce noise interference. Combining the image error vector with the relative velocity of the target drone, a combined strategy of image-based visual servoing (IBVS) and position-based visual servoing (PBVS) is employed to achieve stable target drone tracking and interception. More specifically, the process includes the following:

[0048] S101: Target UAV identification and ranging.

[0049] This study utilizes the YOLO (You Only Look Once) object detection model to perform real-time target drone identification and estimates the relative distance to the target drone based on the target image bounding box. YOLO is a highly efficient object detection algorithm. Its basic principle is to simultaneously detect multiple targets in an image and directly predict the bounding box and category of the target drone using a regression method. YOLO is characterized by its speed and high accuracy, making it ideal for target detection tasks in real-time video streams. The YOLO model divides the image into a grid, with each grid predicting the position, size, and presence of a target within that position of the bounding box. The algorithm completes the target detection task for the entire image in one pass, significantly improving processing speed.

[0050] In this invention, the YOLO target detection model is used to detect target drones in images in real time. After each detection, the YOLO target detection model outputs the target image box of the target drone, including the target category, position coordinates ((u, v) in the image coordinate system) and the size of the target image box (the width W and height H of the target image box).

[0051] Understandably, in other implementations, other object detection models can also be used to generate object image boxes, such as the SSD object detection model, the Faster R-CNN object detection model, or traditional vision methods (such as optical flow or feature point tracking), which will not be detailed here.

[0052] S102: Calculation of relative distance based on the target image bounding box.

[0053] The target bounding boxes provided by the YOLO target detection model provide the basis for tracking and estimating the distance to target drones. The width W and height H of the target bounding box are key parameters for estimating the relative distance to the target drone.

[0054] In a monocular vision system, the camera's intrinsic parameters (focal length) are used... The relative distance D of the target drone is estimated by comparing the sensor size and the size W of the target image frame, assuming the actual physical width of the target drone is... The relative distance to the target drone can be calculated using the following formula:

[0055] (1);

[0056] in, It is the image width of the target image frame. It is the actual physical width of the target drone. The focal length of the camera is used to calculate the relative distance between the target UAV and the monocular camera using formula (1). .

[0057] S103: Estimation of the target UAV's three-dimensional spatial coordinates and relative velocity.

[0058] Using the acquired target image bounding box and combined with camera intrinsic parameters, the three-dimensional spatial coordinates and current relative velocity of the target UAV in three-dimensional space are estimated.

[0059] S103-1: Estimation of the three-dimensional spatial coordinates of the target UAV.

[0060] Based on the position information of the target image frame and the camera's intrinsic parameters, the three-dimensional spatial coordinates of the target UAV in three-dimensional space can be calculated. Assume the coordinates of the target image frame are... The principal point coordinates of the camera are and focal length is and The relative distance between the target drone and the monocular camera is Then the three-dimensional spatial coordinates of the target UAV It can be calculated using the following formula:

[0061] (2);

[0062] (3);

[0063] (4);

[0064] S103-2: Estimation of the relative speed of the target UAV.

[0065] To obtain the relative velocity of the target drone, it is necessary to calculate the change in the target drone's position between two consecutive frames. Let the target drone be in the first frame... The spatial location of the frame is (X k ,Y k Z k ), in the The spatial location of the frame is (X k 1,Y k 1,Z k 1) The relative speed of the target UAV can be calculated using the following formula:

[0066] (5);

[0067] (6);

[0068] (7);

[0069] Where Δt is the time interval between two adjacent frames of the target UAV image, the spatial velocity of the target UAV can be obtained through formulas (5)-(7). v X ,v Y ,v Z ), which is the relative speed of the target drone.

[0070] S104: Kalman filter speed fusion.

[0071] Since the relative velocity estimation of the target UAV is affected by noise and dynamic changes, Kalman filtering is used to optimize the velocity estimation. By combining the predicted velocity and the measured velocity of the target UAV, noise interference is filtered out and the estimation accuracy is improved.

[0072] S104-1: Kalman filtering principle.

[0073] Kalman filtering is a recursive estimation algorithm widely used for predicting and estimating the state of dynamic systems. In this study, Kalman filtering is used to fuse the velocity measurement and prediction of a target UAV to optimize the estimation of the target UAV's relative velocity.

[0074] The target UAV's state vector contains its spatial position and velocity, and is represented as:

[0075] (8);

[0076] The state prediction of the target UAV is performed using the state transition matrix A:

[0077] (9);

[0078] in, represent The state vector at any given time; represent The state vector at time t.

[0079] At each moment, the speed and position of the target drone are measured, assuming the measured values are... Kalman filtering compares the measured value with the predicted value to obtain the estimated value:

[0080] (10);

[0081] in, Kalman gain; The observation matrix represents how the state of the target UAV is mapped to the measurement space.

[0082] S104-2: Optimization results.

[0083] Through Kalman filtering calculations, the velocity and position estimates of the target UAV become smoother, effectively reducing the impact of noise and dynamic changes on velocity estimation, thereby improving tracking accuracy.

[0084] Understandably, some other implementations allow for online or offline tuning of the process noise covariance or measurement noise covariance to achieve optimization; alternatively, a linear optimal recursive form equivalent to Kalman can be used, as long as the basic function of "prediction-update-output smooth state" is satisfied, which will not be elaborated here.

[0085] S105: Vision servo control.

[0086] To achieve efficient and stable tracking of the target drone, the system employs Visual Servoing (VS) technology. VS adjusts the drone's movement in real time using image data fed back from the camera, and can be divided into two types:

[0087] (1) Image error-based visual servo control (IBVS) is a method that controls the movement of the UAV by directly utilizing the error in the image space. Specifically, the IBVS control strategy adjusts the flight state of the UAV by minimizing the image error of the target UAV (such as the deviation between the center of the target image frame and the camera origin).

[0088] (2) Position-based visual servoing control (PBVS) adjusts the pose of the UAV by estimating the three-dimensional spatial position error of the target UAV. PBVS typically adjusts the trajectory of the tracking UAV by estimating the relative position of the target UAV in the world coordinate system. If only image error-based visual servoing is used, although the UAV's attitude can be adjusted by controlling the error between the center of the target image frame and the camera center, this method is prone to control oscillation when the image error is small. Specifically, when the target UAV approaches, the image error decreases and the control system gradually decelerates; however, after deceleration, the UAV will start to be slower than the target UAV, which requires acceleration again, resulting in repeated oscillations and making it difficult to maintain stable tracking; although position-based visual servoing can track the target UAV by relative speed, since monocular vision can only estimate the distance through the target image frame and the size of the target UAV is based on assumptions, the position estimation of the target UAV may have errors, especially when the target UAV is moving fast, which can easily lead to tracking deviations and make it impossible to accurately keep the target centered in the field of view. To address the aforementioned issues, this invention proposes a fusion control strategy that combines IBVS and PBVS to optimize both image error and spatial position error, thereby improving the accuracy and stability of target UAV tracking.

[0089] S106: Fusion control strategy.

[0090] To achieve efficient and stable tracking of target UAVs, this study proposes a visual servoing control strategy that integrates image error and position error. This strategy combines the advantages of both image error-based visual servoing (IBVS) and position-based visual servoing (PBVS), giving full play to their strengths and solving the challenges encountered when using image servoing or position servoing alone. Thus, while ensuring fast tracking, it avoids oscillation and position estimation errors.

[0091] S106-1: Image Error Control (IBVS).

[0092] In image error-based visual servoing control, the error mainly originates from the deviation between the center position of the target image box and the camera center, including:

[0093] Errors in the Y and Z directions: that is, the deviation between the center of the target image frame and the center of the camera's optical axis;

[0094] Error in the X direction: the difference between the size of the target image frame and the preset target image frame size.

[0095] These three errors can be combined into an error vector. It is used to control the movement of drones.

[0096] For image error control, the control input is set as follows:

[0097] (11);

[0098] in: Represents the width of the target image bounding box; This represents the preset target image frame width; and These are the coordinates of the target image frame in the target UAV image (generally the coordinates of the center point of the target image frame). and Represents the coordinates of the camera's optical axis center.

[0099] In this implementation, optionally, the coefficient matrix of the control input is determined based on the quotient of the maximum speed and the maximum deviation. The advantage of this design is that it can adjust the speed according to the actual situation, ensuring that the UAV will not experience unnecessary oscillations due to excessively small image errors during tracking. By reasonably setting the coefficient matrix, the control system can avoid overreaction during deceleration, thereby achieving more stable tracking.

[0100] S106-2: Position Error Control (PBVS).

[0101] Position error control is adjusted based on the estimated relative position and velocity of the target UAV. Since monocular vision ranging relies on assumptions about the target UAV's size, position servoing can only provide an estimate, not a perfectly accurate target position. However, by directly using the estimated relative target velocity... This allows for more stable adjustment of the flight trajectory of the tracking drone.

[0102] In position error control, the weight matrix of the control input is usually set to 1 because the position error feedback has a direct impact on the control and does not require additional adjustment.

[0103] The specific control inputs are:

[0104] (12);

[0105] in, v X , v Y and v Z It is the velocity components of the target UAV's relative velocity in the X, Y, and Z directions.

[0106] S106-3: Fusion control strategy.

[0107] The core idea of the fusion control strategy is to dynamically adjust the UAV's control input by combining feedback information from image errors and position errors. Specifically, the control input consists of the following two parts:

[0108] Image Error Control (IBVS): Used to adjust the attitude of the drone and keep the target drone stable in the field of view;

[0109] Position Error Control (PBVS): Used to adjust the speed of the drone, forcing the drone to get closer to the target drone and maintain the target drone's position in the camera's field of view.

[0110] Fusion control input vector The expression is:

[0111] (13);

[0112] in: and It is a weight matrix that controls the effects of image error and position error on the control input, respectively.

[0113] Understandably, in some other implementation methods, and It can be adaptively adjusted according to distance, confidence level, or error amplitude; or a segmented switching scheme (such as far-field biased IBVS, near-field enhanced PBVS) can be adopted, as long as the target UAV meets the requirements of "field of view stabilization + velocity matching", which will not be elaborated here.

[0114] In this fusion strategy, the combination of image error control and position error control can effectively avoid the shortcomings of both. Image error control can quickly adjust the attitude of the UAV to keep the target UAV in the field of view. Position error control can force the UAV to accelerate or decelerate according to the relative speed of the target UAV to avoid the UAV from stagnating when the image error is small.

[0115] S106-4: Update the relative velocity of the target drone.

[0116] By fusing control input vectors Add current relative velocity The updated relative velocity is obtained. :

[0117] (14);

[0118] In this way, the relative speed of the target drone is adjusted, and the updated relative speed... It dynamically adjusts based on the error feedback (image error and position error) of the target drone. This update mechanism ensures that the drone can stably track the target drone while avoiding oscillations and keeping the target drone in the center of the field of view.

[0119] By integrating these two control methods, the system can simultaneously achieve the following objectives: (1) rapid response and stable tracking. Image error control can quickly respond to changes in the position of the target UAV, while position error control can ensure that the speed of the UAV matches that of the target UAV and will not cause oscillations due to the decrease in image error; (2) elimination of oscillations. When the image error is small, image error control will decelerate, but will not cause excessive deceleration, thereby avoiding the risk of losing the target; (3) stable target maintenance. The integrated control enables the UAV to not only track the target UAV, but also to keep the target UAV in the center of the field of view in a dynamically changing environment, avoiding the tracking problem when relying solely on image error; (4) precise target approach. Position error control helps the tracking UAV to dynamically adjust according to the relative speed of the target UAV, so that the target UAV gradually approaches and is eventually successfully intercepted.

[0120] Understandably, in other implementations, while maintaining the low-cost orientation of the system, near-field obstacle avoidance sensors (such as depth cameras or lidar) can be added for near-field safety or scale correction, serving as an auxiliary tracking tool; a "global path planning + local trajectory optimization" scheme can be used to achieve high-precision tracking of drones; and when interception conditions are available, different denial methods (such as capture, expulsion, or destruction) can be selected according to mission requirements.

[0121] In summary, the proposed monocular vision-based autonomous UAV tracking method has the following advantages: (1) It utilizes deep learning target detection (such as YOLO) and an embodied intelligent decision-making framework to achieve automatic identification, path planning, and tracking of illegal target UAVs without human intervention; (2) Through monocular vision, it improves the accuracy of target UAV positioning and environmental perception, and supports real-time tracking in complex dynamic environments. Compared with radar and binocular vision systems, monocular vision and lightweight algorithms significantly reduce costs, making the solution easier to promote to civilian and security fields; (3) It adopts a visual servoing method that combines image error (IBVS) and position error (PBVS), solving the oscillation and instability problems of single control methods in fast maneuvering environments, and significantly improving tracking stability and success rate. Therefore, this invention not only overcomes the shortcomings of existing solutions that rely on manual labor, have high costs, and poor robustness, but also achieves breakthroughs in autonomy, real-time performance, and universality, providing a feasible and efficient technical path for the governance of illegal target UAVs, and enabling autonomous operation of "identification-estimation-control-interception".

[0122] Figure 2 An autonomous drone tracking system based on monocular vision is shown, comprising:

[0123] The monocular image processing unit 201 is configured to: acquire images of the target drone captured by a monocular camera tracking the drone, and determine the three-dimensional spatial coordinates and current relative speed of the target drone based on the images of the target drone;

[0124] The image error calculation unit 202 is configured to calculate the image error vector based on three-dimensional spatial coordinates.

[0125] The position error calculation unit 203 is configured to: optimize the current relative velocity using a Kalman filter algorithm based on the three-dimensional spatial coordinates and the current relative velocity to obtain the optimized current relative velocity, and determine the position error vector based on the optimized current relative velocity;

[0126] The speed fusion control unit 204 is configured to: determine a fusion control input vector based on the image error vector and the position error vector, and determine an updated relative speed based on the fusion control input vector and the current relative speed for tracking the drone.

[0127] It is understood that the aforementioned units can be individually or entirely merged into one or more other units, or some of the units can be further divided into multiple functionally smaller units. This achieves the same operation without affecting the technical effects of the embodiments of the present invention. The aforementioned units are based on logical functional division. In practical applications, the function of one unit can be implemented by multiple units, or the function of multiple units can be implemented by one unit. In other embodiments of the present invention, the system may also include other units. In practical applications, these functions can also be implemented with the assistance of other units, and can be implemented collaboratively by multiple units.

[0128] According to another embodiment of the present invention, the system of this embodiment can be constructed by running a computer program (including program code) capable of performing the steps involved in the corresponding method of the present invention on a general-purpose computing device, such as a computer, which includes processing elements and storage elements such as a central processing unit (CPU), random access memory (RAM), and read-only memory (ROM). The computer program can be recorded on, for example, a computer-readable recording medium, loaded into the aforementioned computing device through the computer-readable recording medium, and run therein.

[0129] Figure 3 A computer device is shown, which includes a processor 301, a communication interface 302, and a computer-readable storage medium 303. The processor 301, communication interface 302, and computer-readable storage medium 303 can be connected via a bus or other means.

[0130] The communication interface 302 is used to receive and send data. The computer-readable storage medium 303 can be stored in the memory of the electronic device. The computer-readable storage medium 303 is used to store computer programs, which include program instructions. The processor 301 is used to execute the program instructions stored in the computer-readable storage medium 303.

[0131] The processor 301 is the computing and control core of the electronic device. It is suitable for implementing one or more instructions, specifically for loading and executing one or more instructions to achieve the corresponding method flow or corresponding function.

[0132] Processor 301 is configured to perform the following procedure:

[0133] Acquire images of the target drone captured by a monocular camera that tracks the drone, and determine the target drone's three-dimensional spatial coordinates and current relative speed based on the images;

[0134] Calculate the image error vector based on three-dimensional spatial coordinates;

[0135] Based on the three-dimensional spatial coordinates and the current relative velocity, the Kalman filter algorithm is used to optimize the current relative velocity to obtain the optimized current relative velocity. The position error vector is then determined based on the optimized current relative velocity.

[0136] Based on the image error vector and the position error vector, the fusion control input vector is determined. Based on the fusion control input vector and the current relative velocity, the updated relative velocity is determined for tracking the drone.

[0137] This invention also provides a computer-readable storage medium, which is a memory device in an electronic device for storing programs and data. It is understood that the computer-readable storage medium here may include both built-in storage media in the electronic device and extended storage media supported by the electronic device. The computer-readable storage medium provides storage space for storing the processing system of the electronic device.

[0138] Furthermore, this storage space also contains one or more instructions suitable for loading and execution by the processor. These instructions can be one or more computer programs (including program code). It should be noted that the computer-readable storage medium here can be a high-speed RAM memory; alternatively, it can also be at least one computer-readable storage medium located remotely from the aforementioned processor.

[0139] In one embodiment, the computer-readable storage medium stores one or more instructions; the processor loads and executes the one or more instructions stored in the computer-readable storage medium to perform the following process:

[0140] Acquire images of the target drone captured by a monocular camera that tracks the drone, and determine the target drone's three-dimensional spatial coordinates and current relative speed based on the images;

[0141] Calculate the image error vector based on three-dimensional spatial coordinates;

[0142] Based on the three-dimensional spatial coordinates and the current relative velocity, the Kalman filter algorithm is used to optimize the current relative velocity to obtain the optimized current relative velocity. The position error vector is then determined based on the optimized current relative velocity.

[0143] Based on the image error vector and the position error vector, the fusion control input vector is determined. Based on the fusion control input vector and the current relative velocity, the updated relative velocity is determined for tracking the drone.

[0144] The present invention also provides a computer program product or computer program comprising computer instructions stored in a computer-readable storage medium. A processor of an electronic device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the electronic device to perform the following process:

[0145] Acquire images of the target drone captured by a monocular camera that tracks the drone, and determine the target drone's three-dimensional spatial coordinates and current relative speed based on the images;

[0146] Calculate the image error vector based on three-dimensional spatial coordinates;

[0147] Based on the three-dimensional spatial coordinates and the current relative velocity, the Kalman filter algorithm is used to optimize the current relative velocity to obtain the optimized current relative velocity. The position error vector is then determined based on the optimized current relative velocity.

[0148] Based on the image error vector and the position error vector, the fusion control input vector is determined. Based on the fusion control input vector and the current relative velocity, the updated relative velocity is determined for tracking the drone.

[0149] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed in this invention can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can implement the described functions using different methods for each specific application, but such implementations should not be considered beyond the scope of this invention.

[0150] In the above embodiments, implementation can be achieved, in whole or in part, through software, hardware, firmware, or any combination thereof. When implemented in software, it can be implemented, in whole or in part, as a computer program product. A computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the flow or function according to the embodiments of the present invention is generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in or transmitted through a computer-readable storage medium. The computer instructions can be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic cable, digital cable) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that a computer can access or a data processing device such as a server or data center that integrates one or more available media. The available medium can be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid-state drive), etc.

[0151] The above description is merely a preferred embodiment of the present invention and is not intended to limit the invention. Various modifications and variations can be made to the present invention by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.

Claims

1. A monocular vision based method for autonomous tracking of a UAV, characterized in that, The method comprises the following steps: acquiring a target drone image captured by a monocular camera of a tracking drone, determining a three-dimensional space coordinate and a current relative speed of the target drone according to the target drone image; calculating an image error vector according to the three-dimensional space coordinate; the image error vector is a three-dimensional vector, comprising a difference between a width of a target image frame output by a target detection model and a preset target image frame width, a difference between a Y coordinate of a center of the target image frame output by the target detection model and a Y coordinate of a center of an optical axis of the monocular camera, and a difference between a Z coordinate of the center of the target image frame output by the target detection model and a Z coordinate of the center of the optical axis of the monocular camera; performing current relative speed optimization by using a Kalman filtering algorithm according to the three-dimensional space coordinate and the current relative speed, to obtain an optimized current relative speed, and determining a position error vector according to the optimized current relative speed; determining a fusion control input vector according to the image error vector and the position error vector, and determining an updated relative speed according to the fusion control input vector and the current relative speed, so as to be used for tracking of the tracking drone; The fusion control input vector is: wherein, represents a position error vector, represents an image error vector, represents a weight matrix of a position image vector, represents a weight matrix of a position error vector, represents a fusion control input vector, and the updated relative velocity is the sum of the fusion control input vector and the current relative velocity.

2. The method according to claim 1, wherein the three-dimensional space coordinate of the target drone is determined according to the target drone image, comprising: determining a position coordinate and a size of the target drone by using a pre-trained target detection model; determining a relative distance of the target drone according to the size, an intrinsic parameter of the monocular camera and an actual physical width of the target drone; determining the three-dimensional space coordinate of the target drone according to the position coordinate, the relative distance and the intrinsic parameter of the monocular camera.

3. The method according to claim 1, wherein the current relative speed of the target drone is determined according to the target drone image, comprising: determining a previous frame three-dimensional space coordinate and a current frame three-dimensional space coordinate of the target drone respectively according to adjacent two target drone images, and determining the current relative speed of the target drone in combination with a time difference between the adjacent two target drone images.

4. The method according to any one of claims 1-3, wherein the position error vector is a three-dimensional vector, comprising components of the current relative speed in X, Y and Z directions. The method comprises the following steps: a monocular image processing unit configured to acquire a target drone image captured by a monocular camera of a tracking drone, and determine a three-dimensional space coordinate and a current relative speed of the target drone according to the target drone image; an image error calculation unit configured to calculate an image error vector according to the three-dimensional space coordinate; 5. A monocular vision-based unmanned aerial vehicle autonomous tracking system, characterized in that, the image error vector is a three-dimensional vector, comprising a difference between a width of a target image frame output by a target detection model and a preset target image frame width, a difference between a Y coordinate of a center of the target image frame output by the target detection model and a Y coordinate of a center of an optical axis of the monocular camera, and a difference between a Z coordinate of the center of the target image frame output by the target detection model and a Z coordinate of the center of the optical axis of the monocular camera; The position error calculation unit is configured to: perform current relative speed optimization on the three-dimensional space coordinates and the current relative speed by using a Kalman filtering algorithm to obtain an optimized current relative speed, and determine a position error vector according to the optimized current relative speed; The speed fusion control unit is configured to: determine a fusion control input vector according to the image error vector and the position error vector, and determine an updated relative speed for tracking the unmanned aerial vehicle according to the fusion control input vector and the current relative speed. The fusion control input vector is: wherein, represents a position error vector, represents an image error vector, represents a weight matrix of the position image vector, represents a weight matrix of the position error vector, represents a fusion control input vector, and the updated relative velocity is the sum of the fusion control input vector and the current relative velocity.

6. A computer device, comprising: The method comprises: a processor and a computer readable storage medium; the processor is adapted to execute the computer program; the computer readable storage medium stores the computer program, and the computer program is executed by the processor to implement the autonomous tracking method of the unmanned aerial vehicle based on monocular vision according to any one of claims 1 to 4.

7. A computer readable storage medium characterized in that, The computer readable storage medium stores the computer program, and the computer program is adapted to be loaded and executed by the processor to implement the autonomous tracking method of the unmanned aerial vehicle based on monocular vision according to any one of claims 1 to 4.

8. A computer program product, characterised in that, The computer program product comprises a computer program, and the computer program is executed by the processor to implement the autonomous tracking method of the unmanned aerial vehicle based on monocular vision according to any one of claims 1 to 4.