Method and system for automatic monitoring of structural displacement by cooperation of unmanned aerial vehicle and machine vision

By using a monitoring method that combines drones and machine vision, along with sliding window factor maps and laser ranging, high-frequency, high-precision, and full-coverage structural displacement monitoring in geotechnical engineering has been achieved. This solves the problems of insufficient monitoring accuracy and low automation in traditional methods, and enables efficient, all-weather monitoring.

CN121898261BActive Publication Date: 2026-06-19CHANGZHOU ARCHITECTUAL RES INST GRP CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
CHANGZHOU ARCHITECTUAL RES INST GRP CO LTD
Filing Date
2026-03-20
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies are insufficient to achieve high-frequency, high-precision, and full-coverage monitoring of support structures such as foundation pits, slopes, and tunnel entrances in geotechnical engineering. Furthermore, drone monitoring accuracy is insufficient, sensor deployment costs are high, and the automation level of introducing manual benchmarks is low.

Method used

A monitoring method combining UAVs and machine vision is adopted. Data is collected by UAVs equipped with visible light cameras and laser ranging modules. Combined with ground-based stereo vision base stations, global optimization of sliding window factor maps and laser ranging distance factors are used to achieve millimeter-level displacement monitoring.

Benefits of technology

It achieves all-weather, unmanned, high-precision monitoring with millimeter-level accuracy, covering an area of ​​over 20,000 square meters, and a monitoring point density of up to 50 points per 100 square meters. It reduces dependence on ambient light and improves the robustness and automation of the system.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN121898261B_ABST
    Figure CN121898261B_ABST
Patent Text Reader

Abstract

This invention discloses an automatic structural displacement monitoring method, system, and medium using UAVs and machine vision, addressing the limitations of existing technologies in simultaneously achieving monitoring frequency, accuracy, and coverage. The automatic monitoring method includes: synchronously acquiring multi-source data from a UAV and a ground base station; performing parallel target sub-pixel localization, laser ranging-based scale constraint generation, and front-end tracking and keyframe judgment for each frame of data; rapidly outputting displacement estimates for non-keyframes, and triggering global optimization based on a sliding window factor graph for keyframes; constructing state variables containing global scale factors during optimization, and tightly coupling laser ranging values ​​with visual observations using laser ranging distance factors to correct photogrammetric scale drift in real time, thereby calculating millimeter-level displacement and velocity. The automatic monitoring system comprises three main units: perception execution, edge processing, and central calculation. This invention achieves fully automatic displacement monitoring with high precision, high efficiency, and high reliability.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of geotechnical engineering safety monitoring technology, and in particular to an automatic structural displacement monitoring method and system that combines unmanned aerial vehicles (UAVs) and machine vision. Background Technology

[0002] In the field of geotechnical engineering safety monitoring, real-time, accurate, and comprehensive monitoring of the displacement of support structures such as foundation pits, slopes, and tunnel entrances is a core element in disaster prevention and ensuring engineering safety. An ideal monitoring solution must simultaneously meet three rigid requirements: high frequency, high precision, and full coverage. However, existing mainstream technologies all have certain limitations, making it difficult to simultaneously achieve these goals. Specifically, this manifests in the following three major contradictions:

[0003] (1) The contradiction between high precision and full coverage: Traditional high-precision monitoring mainly relies on fixed sensor networks (such as total stations, GNSS reference stations, inclinometers, etc.). Although such methods can achieve single-point accuracy at the millimeter or even sub-millimeter level, the sensor deployment cost is high and the cycle is long, and the monitoring point density is limited, making it difficult to achieve continuous, fine-grained, and full-coverage monitoring of the entire slope or foundation pit sidewall. For areas with complex geological conditions or wide areas, this method faces severe challenges in terms of both feasibility and economy.

[0004] (2) The contradiction between flexibility and high precision: In recent years, UAV inspection technology has been used for large-scale deformation monitoring due to its advantages of mobility, flexibility, and relatively low cost. However, the close-range photogrammetry technology it relies on has fundamental flaws that are difficult to overcome when applied to high-precision displacement monitoring:

[0005] (a) Scale drift problem: Due to changes in altitude and attitude and camera lens distortion during flight, the scale reference of monocular or binocular visual measurement of UAVs will drift, which will limit the accuracy of absolute displacement measurement to the centimeter level for a long time, and cannot meet the requirements of the "Specification for Monitoring of Foundation Pit Engineering" and other standards for millimeter-level monitoring of key parts.

[0006] (b) Environmental sensitivity issues: Its accuracy is easily affected by factors such as drastic changes in ambient light and image blurring caused by wind disturbance, resulting in insufficient stability.

[0007] Therefore, while the drone solution solves the problem of full coverage, it fails to meet the high-precision requirement.

[0008] (3) The contradiction between automation and reliability: In order to improve the accuracy and stability of visual measurement, existing technologies attempt to introduce high-precision reference points set up manually for calibration. However, this introduces new problems: manual setting up of reference points is time-consuming and labor-intensive, with low automation; and under severe weather conditions such as rainstorms and fog, the effective observation rate of conventional visible light vision methods drops sharply or even fails completely, resulting in the inability to guarantee system reliability and making it difficult to achieve true unmanned operation and all-weather operation. Summary of the Invention

[0009] The technical problem to be solved by the present invention is to provide a method and system for automatic monitoring of structural displacement in collaboration with UAVs and machine vision, in order to address the problems of high accuracy but insufficient coverage of fixed sensors, wide coverage but insufficient accuracy of UAVs, and the sacrifice of automation and reliability when introducing artificial references.

[0010] The technical solution adopted by this invention to solve its technical problem is: an automatic structural displacement monitoring method based on the collaboration of UAV and machine vision, comprising the following steps:

[0011] S1. A UAV equipped with a visible light camera and a laser ranging module collects a sequence of visible light images, a sequence of laser ranging values, and a sequence of pose data of the monitored area; at the same time, a ground-based stereo vision base station collects a reference image containing the monitored target.

[0012] S2. For each frame of data collected, target detection and localization, generation of scale constraint information based on laser ranging, and key frame judgment based on motion tracking are performed in parallel.

[0013] S3. Tightly Coupled Optimization and Displacement Inversion:

[0014] S31. If S2 is determined to be a non-key frame, then output a fast displacement estimate based on historical optimization results and current tracking information.

[0015] S32. If S2 determines that it is a keyframe, then trigger the execution of global optimization based on the sliding window factor graph;

[0016] In the global optimization based on the sliding window factor graph, the state variables include at least the camera pose of the keyframe, the world coordinates of the monitored target, and a global scale factor. Its observation factors include at least the reprojection factor constructed based on the target positioning results, the laser ranging distance factor constructed based on laser ranging information, the inertial navigation factor constructed based on pose data, and the reference constraint factor constructed based on ground reference images.

[0017] The residual of the laser ranging distance factor is constructed according to the following model:

[0018] ,

[0019] In the formula, This is the laser ranging value. To monitor the target's coordinates in the current camera coordinate system;

[0020] Through global optimization based on a sliding window factor graph, the millimeter-level three-dimensional displacement and displacement rate of the monitored target are output.

[0021] By combining air-ground data collection from UAVs and ground base stations, the system simultaneously achieved the UAV's full-coverage mobility and the ground base station's stable, high-precision benchmark. Furthermore, it incorporated global optimization based on a sliding window factor graph and a laser ranging distance factor. By tightly coupling the absolute distance information from laser ranging with visual observation, the scale factor of photogrammetry can be fundamentally solved and corrected online. This improves accuracy from centimeter-level to millimeter-level. The entire process involves automated data acquisition and processing, enabling 24 / 7, unmanned operation.

[0022] Furthermore, the target detection and localization in S2 specifically includes:

[0023] An improved YOLO neural network model was used to detect zigzag targets in images;

[0024] Zernike moment edge detection was used for the detected target region image;

[0025] Based on the detected edge points, the sub-pixel coordinates of the target center are calculated using quadratic surface fitting. The closed-form solution of the center coordinates of the quadratic surface fitting is:

[0026] ,

[0027] in, The center coordinates of the fit are and For the coefficients of the quadratic surface.

[0028] A dedicated detection and high-precision positioning method for shaped targets was developed to ensure the high accuracy and robustness of the front-end observation data, providing high-quality input for subsequent optimization.

[0029] Furthermore, sub-pixel coordinates are estimated simultaneously with localization. Uncertainty covariance matrix and the covariance matrix Reprojection factor used for weight adjustment; covariance matrix The following is estimated using the Monte Carlo Dropout method:

[0030] During the network inference phase, the Dropout layer is enabled;

[0031] Perform T forward propagations on the same target image patch to obtain T sets of sub-pixel coordinate prediction values;

[0032] Calculate the sample covariance matrix of these T sets of predicted values, as the final value. .

[0033] Furthermore, the covariance matrix The prediction is performed directly using a heteroscedasticity estimation network head, which outputs a three-dimensional vector. A lower triangular matrix is ​​constructed using the following formula, which is then directly used as the covariance matrix. :

[0034] ,

[0035] Furthermore, during training, the heteroscedasticity estimation network head uses the negative log-likelihood of the reprojection error as the loss function.

[0036] Furthermore, the scale constraint information generated in S2 based on laser ranging is based on the following camera imaging model:

[0037] Let the coordinates of the target in the camera coordinate system be... Its projection onto subpixel coordinates The model is:

[0038] ,

[0039] in, As a scale fuzzy factor, Camera intrinsic parameter matrix:

[0040] ,

[0041] Focal length Principal point coordinates;

[0042] The laser ranging module measures the Euclidean distance from the camera's optical center to the target. ,Right now:

[0043] ,

[0044] Subpixel coordinates Calculate normalized camera plane coordinates and subpixel coordinates :

[0045] , ;

[0046] Combined with camera imaging model, target depth With normalized coordinates The laser ranging constraint is obtained as follows:

[0047] .

[0048] Two advanced methods for limiting observation uncertainty estimation, Monte Carlo Dropout and heteroscedastic network head, enable the optimization process to more reasonably weigh the credibility of different observations, significantly improving the system's anti-interference capability and overall accuracy in complex environments.

[0049] Furthermore, based on laser ranging constraints, for the observation of a single target in a single frame, an instantaneous observation constraint on the global scale factor s is generated:

[0050] ,

[0051] in, Photometric depth is estimated based on the current image and location;

[0052] Based on laser ranging constraints, for observations of multiple targets in a single frame, an initial estimate of the global scale factor s is generated by solving the following least squares problem. :

[0053] ,

[0054] The closed-form solution to the least squares problem is:

[0055] , .

[0056] Furthermore, the coordinates of the monitored target in the current camera coordinate system in S32 The calculation is performed by associating the camera pose and target world coordinates with the state variables in the following ways:

[0057] ,

[0058] in, and It is the rotation matrix and translation vector of the current keyframe camera pose in the state variables. These are the coordinates of the target in the world coordinate system;

[0059] Coordinates of the target in the world coordinate system As the state variable to be optimized, and the global scale factor The solution is optimized along with the camera pose.

[0060] Furthermore, the keyframe determination based on motion tracking in S2 includes a front-end tracking step, specifically:

[0061] The relative motion increment from the previous keyframe to the current frame is calculated based on IMU pre-integration.

[0062] Based on the relative motion increment and the optimized pose of the previous keyframe, the initial pose estimate of the current frame is obtained.

[0063] Using the initial pose estimation of the current frame, the optimized 3D coordinates of the target from the previous keyframe are... Projecting onto the current frame image plane yields the predicted pixel coordinates. ;

[0064] Calculate predicted coordinates Compared with the actual observed coordinates of the current frame Reprojection error between ;

[0065] Statistical reprojection error less than threshold Number of targets As a number of successful tracking And calculate its average reprojection error. .

[0066] A mathematical model for laser ranging to correct visual scale is established. The laser ranging constraint equation is derived from the camera imaging model. Then, the constraint is used to estimate the scale of single point / multipoint, thus mathematizing and specifying the laser ranging constraint scale.

[0067] Furthermore, the projection model for projecting the target's three-dimensional coordinates onto the current frame image plane is as follows:

[0068] , ,

[0069] in, The coordinates of the target in the current frame's camera coordinate system. The global scale factor optimized from the previous keyframe. and The rotation matrix and translation vector are the initial pose of the current frame. For camera projection function, This is the camera intrinsic parameter matrix.

[0070] In the limited laser ranging factor The calculation method and the target coordinates are specified. It is optimizable, enabling laser ranging constraints to be deeply integrated into the SLAM optimization framework, optimized together with pose and scale factor, to achieve dynamic and optimal scale correction.

[0071] Furthermore, the number of successful tracking numbers The calculation formula is: Its average reprojection error The calculation formula is:

[0072] .

[0073] Furthermore, in S2, keyframes are determined based on at least one of the following conditions; if a frame is satisfied, it is considered a keyframe:

[0074] a. The relative translation change between the current frame and the previous keyframe or rotational change ;

[0075] b. Average reprojection error ;

[0076] c. Number of successful tracking The number of visible targets is less than the number in the previous keyframe. 80%;

[0077] d. The frame number interval between the current frame and the previous keyframe is greater than 10 frames;

[0078] in, , ,

[0079] In the formula, This is the initial translation vector for the current frame. The translation vector optimized from the previous keyframe; The initial rotation matrix for the current frame. The rotation matrix is ​​optimized for the previous keyframe;

[0080] Then, based on the judgment structure, the following branches are executed:

[0081] If the current frame is determined to be a keyframe, then the keyframe processing branch is executed;

[0082] If the current frame is determined to be a non-critical frame, then the non-critical frame fast estimation branch is executed.

[0083] Intelligent keyframe decision-making logic and specific thresholds balance computing resources and information freshness, ensuring that time-consuming global optimizations are triggered only when necessary.

[0084] Furthermore, the keyframe processing branch includes:

[0085] The camera pose of the current frame is added as a new state variable node to the sliding window factor graph;

[0086] Add the observation factors corresponding to the current frame to the factor graph;

[0087] Trigger incremental optimization of the sliding window factor graph;

[0088] After optimization, update the values ​​of all state variables within the sliding window;

[0089] Marginalize the oldest keyframe and its associated state variables in the sliding window to maintain the window size.

[0090] The system employs a highly efficient front-end tracking process, including IMU pre-integration motion prediction, projection tracking based on optimization results, and quantitative evaluation of tracking quality, ensuring that the system can output displacement estimates at high frequency and stably in non-critical frames.

[0091] Furthermore, the non-keyframe fast estimation branch includes:

[0092] The global optimization of the sliding window factor graph is not triggered;

[0093] Based on the initial pose of the current frame Target world coordinates optimized from the previous keyframe and the scale factor optimized from the previous keyframe Calculate the instantaneous displacement estimate of the target :

[0094] ,

[0095] in, The coordinates of the target in the camera coordinate system at the preset reference time.

[0096] Limit the insertion, optimization, and sliding window management of keyframes to ensure the accuracy and consistency of the system in long-term operation; limit the fast estimation formula for non-keyframes to ensure the continuity and high frequency of displacement output.

[0097] Furthermore, the incremental optimization process involves solving for the maximum a posteriori probability estimate on a factor graph. Its objective function is to minimize the weighted sum of squares of all factor residuals, and a robust kernel function is used to handle outliers. Specifically, it is expressed as follows:

[0098] ,

[0099] in, For the set of state variables, For the factor set, As a factor The residual vector, The covariance matrix of the corresponding observations, This is the Lubang kernel function used for gross error removal.

[0100] Furthermore, incremental smoothing and mapping algorithms are used to solve the maximum a posteriori probability estimate; gross error removal is performed during the optimization process, and the gross error removal method includes at least one of dynamic covariance scaling, switchable priors, and outlier marginalization based on chi-square test.

[0101] A well-defined objective function, an efficient solution algorithm, and a robust gross error elimination mechanism work together to ensure the high accuracy, high reliability, and computational feasibility of the global optimization results.

[0102] Furthermore, after the optimization of the keyframe processing branch is completed, the following information is output:

[0103] Optimized three-dimensional coordinates of each target in the world coordinate system and its covariance;

[0104] Optimized global scale factor and its covariance;

[0105] Three-dimensional displacement vector of each target relative to a preset reference time and its covariance;

[0106] Displacement rate calculated based on multi-period displacement data.

[0107] The optimized output provides complete information, including displacement and velocity with uncertainties, to meet the engineering monitoring requirements for quantitative and assessable results.

[0108] A system for implementing the above-described method of automatic structural displacement monitoring through collaboration between UAV and machine vision is also provided, comprising a perception execution unit, an edge processing unit, and a central calculation unit;

[0109] The perception execution unit includes:

[0110] The mobile platform for unmanned aerial vehicles integrates a visible light camera, a laser rangefinder, an attitude measurement unit, and a flight controller.

[0111] Ground-based stereo vision base stations are fixedly deployed outside the monitoring area;

[0112] The edge processing unit is mounted on a drone mobile platform and includes:

[0113] The target localization module is used to perform target identification and sub-pixel localization on the images acquired by the visible light camera.

[0114] The scale constraint module is used to fuse the measurement values ​​of the laser rangefinder to generate visual scale correction constraints;

[0115] The tracking decision module is used to perform motion tracking and output keyframe judgments based on the tracking results and preset conditions;

[0116] The central processing unit is communicatively connected to the edge processing unit and the ground-based stereo vision base station, and includes:

[0117] The optimization solution module is used to fuse multi-source observation data and solve millimeter-level displacements by incrementally optimizing a factor graph model that includes a global scale factor when a keyframe is received for judgment.

[0118] The fast estimation module is used to quickly output displacement based on the tracking results when no keyframe judgment is received.

[0119] A computer-readable storage medium is also provided, on which a computer program is stored. When the computer program is executed by a processor, it implements the automatic structural displacement monitoring method based on the collaboration of UAV and machine vision described above.

[0120] An electronic device is also provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the program, it implements the automatic structural displacement monitoring method based on the collaboration of UAV and machine vision described above.

[0121] The beneficial effects of this invention are:

[0122] This invention deeply integrates the all-domain maneuverability of UAVs with the absolute reference stability of ground-based stereo vision base stations and the scale constraints of laser ranging. Through a real-time tightly coupled optimization algorithm, it solves the scale drift problem of close-range photogrammetry by UAVs. Practical application shows that the system displacement monitoring accuracy RMS≤0.3 mm, which fully meets the highest level requirements of the "Specification for Monitoring of Foundation Pit Engineering". At the same time, a single operation can efficiently cover an area of ​​more than 20,000 square meters, and the monitoring point density can reach more than 50 points / 100 square meters, realizing all-domain, fine-grained, and high-precision integrated monitoring that traditional fixed sensor networks cannot achieve.

[0123] By integrating a 905nm laser ranging module with a near-infrared supplementary lighting array, and combining fusion optimization and advanced uncertainty processing mechanisms, the system significantly reduces its dependence on ambient light. Under low-light conditions such as light fog, nighttime, and heavy rain, the system's effective data acquisition rate can still remain above 90%, while the acquisition rate of traditional pure vision solutions is usually less than 50%. Through intelligent keyframe scheduling and efficient processing, the entire process from data acquisition to displacement cloud map generation can be completed within 20 minutes, meeting the real-time requirements of engineering projects.

[0124] The entire monitoring process is highly integrated and automated. The "bow-shaped" flight path based on the BIM model can be generated with one click, and dynamic track correction can be performed in conjunction with RTK-GNSS / IMU. From takeoff, data acquisition, real-time parallel processing to output of results, the entire process requires less than 1 minute of manual intervention, and the overall labor cost can be reduced by more than 70%, achieving an automated operation mode with almost zero human intervention.

[0125] The system integrates three independent data sources: airborne subpixel target positioning, real-time scale constraints of laser ranging, and absolute reference of ground stereo vision. It performs real-time fusion and automatic gross error removal through a tightly coupled optimization framework. Even if one of the data sources fails temporarily, the system can still degrade and maintain its core monitoring functions. Its robustness is far superior to traditional solutions that rely on a single sensor.

[0126] The system not only outputs high-precision displacement field and velocity, but also incorporates an intelligent early warning and response mechanism. When the displacement velocity exceeds the preset risk threshold, it can automatically trigger the drone to switch to a 1-meter close-up shooting mode and call the lightweight crack segmentation model to identify and quantify the length, width and grade of the crack in real time. This enables one-stop intelligent inspection from macroscopic displacement safety monitoring to precise diagnosis of microscopic defects, greatly expanding the engineering application value of the system.

[0127] This invention eliminates the need for densely deploying expensive fixed equipment such as total stations or GNSS arrays around the perimeter of the monitoring area. The system baseline only requires the deployment of inexpensive coded targets in a U-shape and a single ground-based stereo vision base station. The cost of the entire hardware system can be controlled within one-third of that of traditional monitoring solutions, and the drones and base stations can be reused, allowing for flexible and rapid deployment. Attached Figure Description

[0128] The present invention will be further described below with reference to the accompanying drawings and embodiments.

[0129] Figure 1 This is an architecture diagram of the automatic monitoring system of the present invention.

[0130] Figure 2 This is a flowchart of the target positioning module in the automatic monitoring system of the present invention.

[0131] Figure 3 This is a flowchart of the scale constraint module in the automatic monitoring system of the present invention.

[0132] Figure 4 This is a flowchart of the tracking decision module in the automatic monitoring system of the present invention.

[0133] Figure 5 This is a flowchart of the central calculation unit in the automatic monitoring system of the present invention.

[0134] Figure 6This is a flowchart of the monitoring method of the automatic monitoring system of the present invention. Detailed Implementation

[0135] The present invention will now be described in further detail with reference to the accompanying drawings. These drawings are simplified schematic diagrams, illustrating only the basic structure of the invention, and therefore only show the components relevant to the invention.

[0136] Example 1

[0137] like Figure 1 As shown, an automatic structural displacement monitoring system integrating UAV and machine vision includes a perception execution unit, an edge processing unit, and a central calculation unit. The perception execution unit is responsible for collecting raw data, including a UAV mobile platform and a ground-based stereo vision base station. Optionally, the UAV mobile platform uses a DJI Matrice 350 RTK or a similar industrial UAV as the flight carrier. The platform integrates a visible light camera, a laser rangefinder, a pose measurement unit, and a flight controller. The visible light camera uses a Sony ILCE-7RM4A full-frame camera with a resolution of 61 megapixels to acquire high-definition images of structural surfaces. The laser rangefinder uses a Beowing TF03 Plus LiDAR module with a measurement distance of up to 180 meters and an accuracy of ±2cm. It is rigidly connected and calibrated with the camera's optical axis to achieve synchronous triggering. The pose measurement unit uses the UAV's built-in RTK-GNSS module tightly coupled with a built-in high-precision IMU (Inertial Measurement Unit) to provide a high-frequency pose data stream. The flight controller is responsible for receiving mission commands and controlling the UAV to fly autonomously along a preset route.

[0138] The ground-based stereo vision base station is deployed at a stable location outside the monitoring area. The base station consists of two industrial cameras that have undergone high-precision relative calibration, with a baseline length of 2 to 5 meters. It is connected to the central processing unit via gigabit Ethernet to provide absolute spatial reference observation that does not rely on UAV positioning.

[0139] The edge processing unit, carried by an airborne computer on the drone, is responsible for real-time, parallel processing of each frame of acquired data. It includes a target localization module, a scale constraint module, and a tracking decision module, such as... Figure 2 As shown, the target localization module receives images from a visible light camera. First, it uses an improved and lightweight trained YOLOv11-nano model to quickly detect pre-placed coded targets in the image and obtain their bounding boxes. Then, it performs Zernike moment edge detection on the image region (e.g., 128x128 pixels) within each bounding box to obtain sub-pixel precision edge point coordinates. Finally, based on these edge points, a quadratic surface fitting algorithm is used to obtain the edge coordinates through a closed-form solution. Precise calculation of sub-pixel coordinates of the target center To improve the robustness of subsequent optimizations, the target localization module also estimates the 2x2 uncertainty covariance matrix of the coordinates using either the Monte Carlo Dropout method or a dedicated heteroscedasticity estimation network head. .

[0140] like Figure 3 As shown, the scale constraint module synchronously receives the measurement values ​​from the laser rangefinder. and the output of the target positioning module Based on camera intrinsic parameter matrix Will Convert to normalized planar coordinates Combined with the initial pose estimation provided by the RTK / IMU in the current frame, a laser ranging constraint equation is constructed. For multiple targets observed within a single frame, this module solves the least squares problem. Calculate a robust initial estimate for the global scaling factor s. Its closed-form solution is The initial values ​​and constraints will be output as key information.

[0141] like Figure 4 As shown, the tracking decision module calculates the relative motion between adjacent frames based on IMU pre-integration, and combines it with the optimized pose of the previous keyframe to predict the initial pose of the current frame. Using this pose, the target world coordinates optimized from the previous keyframe are... Through projection model and Projected onto the current frame, the calculation is compared with the current actual observation. reprojection error Statistical tracking of the number of successful targets and average error Subsequently, keyframe determination is performed based on four conditions: (a) pose change exceeds the threshold ( or rotational change (b) Average reprojection error (c) <0.8× , (d) If the frame interval is greater than 10 frames, it is determined to be a key frame and the judgment result is output.

[0142] The central processing unit is deployed on a ground workstation or cloud server, and communicates with the edge processing unit and ground base station via a wireless network. Its workflow is as follows: Figure 5As shown, it includes an optimization solution module and a fast estimation module. Upon receiving a keyframe judgment and its corresponding data packet, the optimization solution module is triggered. It uses the camera pose of the current frame as a new variable node and assigns the corresponding reprojection factor (weighted by...) to it. Determined), laser ranging distance factor (residual is) ,in The IMU pre-integration factor and the ground base station observation factor are added together to a maintained sliding window factor graph. The state variables of the sliding window factor graph include the poses of all keyframes within the window (usually N=10~15). , ), world coordinates of all targets and global scale factor Subsequently, the iSAM2 incremental solver from the GTSAM library is invoked to minimize the objective function with a robust kernel function (such as the Huber kernel) while maximizing the posterior probability. Efficient optimization is performed. During the optimization process, Dynamic Covariance Scaling (DCS) is used to automatically remove outliers. After optimization, the optimized world coordinates of all targets are output. Scale factor And its covariance, and then the millimeter-level three-dimensional displacement vector relative to the initial reference time is calculated. And displacement rate; then, the oldest keyframe in the sliding window is marginalized to control computational complexity.

[0143] When a non-critical frame is received, the fast estimation module works. It does not trigger global optimization, but directly uses the tracking results from the tracking decision module and the optimization state of the previous critical frame. , ), through formula The system quickly calculates and outputs the instantaneous displacement estimate for the current frame, ensuring high-frequency continuity of displacement data output.

[0144] The system workflow of this embodiment is as follows: After the mission begins, the UAV flies along a preset "bow-shaped" flight path. The perception and execution unit continuously collects data. Each frame of data is processed in parallel by three modules of the edge processing unit on the UAV, and key frame judgment is performed. The judgment result and the processed data are sent to the central calculation unit through the data transmission link. The central calculation unit alternately performs high-precision optimization or rapid estimation based on the judgment result, generating and updating the displacement field in real time. When the displacement rate exceeds the warning threshold of 2 mm / d, the system automatically sends a command to the UAV to switch to close-range mode (1 meter distance) to take high-definition pictures of the risk area and send the images back for automatic crack identification and classification.

[0145] Example 2

[0146] like Figure 6 As shown, the automatic structural displacement monitoring method using a combination of UAVs and machine vision includes the following steps:

[0147] Step 1: Based on the BIM design model or existing point cloud model of the foundation pit or slope to be monitored, define the monitoring range in the task planning software; the software automatically generates a "bow-shaped" flight path covering the area, and ensures that the forward overlap and lateral overlap are both no less than 75% to meet the requirements of subsequent high-precision 3D reconstruction and continuous tracking; the flight path parameters (including waypoint coordinates, flight altitude, and speed) are uploaded to the UAV flight control system.

[0148] On the structural surfaces of the monitoring area (such as support piles and slope grid beams), specially designed high-contrast targets in the shape of a square are manually deployed at a density of no less than 50 points per 100 square meters. The core of the target is a black square with a white background to form a strong contrast. Some targets can be expanded into coded targets, and unique ID identification can be achieved by adding radial barcodes or specific dot matrix patterns, which facilitates automated data processing.

[0149] In stable areas surrounding the monitoring area (such as stable bedrock or load-bearing columns of existing buildings), ground-based stereo vision base stations are erected. Each ground-based stereo vision base station consists of two industrial cameras that have undergone high-precision collinear calibration. The relative positions and attitudes of the two cameras are fixed and known, forming a stable spatial measurement benchmark. The ground-based stereo vision base station is connected to the data processing center via a wired network.

[0150] Internal parameters (focal length) of the visible light camera mounted on the drone. Main point The laser ranging module and camera are jointly calibrated using lens distortion coefficient calibration to accurately obtain the spatial transformation relationship between them, ensuring that the ranging value is accurate when the laser beam is aligned with the target center. Can be compared with the target pixel coordinates in the camera image Precise synchronization; setting hardware trigger signals for RTK-GNSS, IMU, camera, and laser rangefinder to achieve microsecond-level time synchronization.

[0151] The drone takes off autonomously along a preset route and flies, during which the following data are collected simultaneously:

[0152] Drone-side data stream: Visible light image sequence, captured by camera at a fixed frequency (e.g., 10 Hz), acquiring RGB images with a resolution of at least 3840×2160 pixels;

[0153] A laser rangefinder continuously measures the distance to a target at the center of its field of view and outputs the corresponding distance value. ;

[0154] The RTK / IMU fusion pose data sequence allows the airborne integrated navigation system to output high-precision UAV position and attitude in the WGS84 coordinate system at a higher frequency (e.g., 200 Hz). This data is then interpolated and aligned with image frame timestamps to provide an initial, coarse camera pose estimate for each frame. .

[0155] Ground-based data stream: Ground reference image pairs, ground stereo vision base station synchronously triggered, acquiring left and right view image pairs containing visible targets within the monitoring area.

[0156] Step 2: For each frame of data acquired in Step 1, immediately start three parallel processing threads on the onboard edge computing unit (such as NVIDIA Jetson AGX Orin):

[0157] Thread A, high-precision target detection and sub-pixel localization, specifically includes:

[0158] For coarse target detection, the current frame image is input into an improved lightweight YOLOv11-nano model. This model improves the recall of distant, small-sized square-shaped or coded targets by inserting a prototype alignment module before sampling on the Neck and designing dual detection heads (one for normal-sized targets and one for detecting small targets less than 5 pixels away). The target detection success rate can be improved to over 99%. The model output is the bounding box of all targets in the image.

[0159] Subpixel center refinement: For each detected bounding box, expand it outward by 15% and then crop it into a fixed-size (e.g., 128×128 pixels) image patch. Perform the following operations on this image patch:

[0160] Zernike moment edge detection calculates the Zernike moments of image blocks to obtain the position of edge points with sub-pixel precision;

[0161] Quadratic surface fitting involves constructing a local quadratic surface model based on extracted edge points and solving a closed-form solution. The sub-pixel coordinates of the target center within the image patch were calculated. The accuracy can reach 0.05 to 0.1 pixels.

[0162] Uncertainty estimation provides reliable observation weights for subsequent optimization; this step also estimates coordinates. Two-dimensional uncertainty covariance matrix This is achieved through one of two preferred solutions:

[0163] Option 1, Monte Carlo Dropout: During the inference phase of the localization network, the Dropout layer remains enabled, and the same target image patch is subjected to T=30 independent forward propagations, resulting in 30 sets of images. The predicted value is calculated using its sample covariance matrix. .

[0164] Option 2: Heteroscedasticity estimation head: A lightweight sub-network head is connected in parallel at the end of the localization network. This head directly outputs a three-dimensional vector. And through the formula We directly construct the lower triangular form of the covariance matrix and use the negative log-likelihood of the reprojection error as the loss function for end-to-end optimization during training.

[0165] Output: The final output of thread A is the target ID and subpixel coordinates of each visible target. Covariance Matrix .

[0166] Thread B: Generation of laser ranging scale constraint information, specifically including:

[0167] Data reception and coordinate transformation: Receive the target pixel coordinates output by thread A. and synchronized laser ranging values Using the camera intrinsic parameter matrix Convert pixel coordinates to normalized camera plane coordinates:

[0168] , ;

[0169] Laser ranging constraint modeling: Based on the pinhole camera model, the target's coordinates in the current camera coordinate system. satisfy Meanwhile, laser ranging value It is the Euclidean distance from the optical center of the camera to the target: Combining the two, the core laser ranging constraint equation can be derived: .

[0170] Initial estimation of scale factor: For N (N≥1) targets observed in a single frame, a robust initial estimate of the global scale factor s can be constructed using the above constraints. This is achieved by establishing and solving the following least squares problem:

[0171] ,in Based on the current coarse pose estimation of photogrammetric depth, this problem has a closed-form solution:

[0172] ,in .

[0173] Output: The final output of thread B is the target ID and the laser range value. Normalized coordinates Initial Scale Estimation Constraining residual information provides direct scale constraints for subsequent optimization.

[0174] Thread C: Front-end motion tracking and keyframe decision-making, specifically including:

[0175] Motion prediction: Pre-integration is performed using IMU data to calculate the relative motion increment from the previous keyframe k to the current frame t. This increment is then compared with the optimized pose from the previous keyframe. By combining these methods, a more accurate initial pose estimate for the current frame can be obtained. .

[0176] Reprojection tracking: utilizing The target world coordinates that have been optimized in the previous keyframe and should be visible in the current frame. The pixel coordinates of the element in the current frame are predicted using the following projection model. :

[0177] , ,

[0178] in, It is the scale factor optimized from the previous keyframe. The camera projection function takes distortion into account.

[0179] Tracking quality assessment: Calculating predicted coordinates Coordinates actually observed by thread A Reprojection error between Statistically count all errors less than the threshold The number of targets (usually 3-5 pixels) is recorded as the number of successfully tracked targets. And calculate the average reprojection error of these successfully tracked targets. .

[0180] Keyframe decision: The decision is based on the following four independent conditions. If any one of these conditions is met, the current frame is determined to be a keyframe:

[0181] Condition 1 (large displacement change): or ;

[0182] Condition 2 (Large tracking error): ;

[0183] Condition 3 (Few tracking targets): ,in It is the number of visible targets in the previous keyframe;

[0184] Condition 4 (Long time interval): The frame number interval between the current frame and the previous keyframe is greater than 10 frames.

[0185] Output: The final output of thread C is the keyframe judgment result (yes / no) and the number of successfully tracked frames. Mean reprojection error The initial pose of the current frame .

[0186] Step 3: Based on the keyframe judgment results output by thread C in step 2, the system enters different processing branches.

[0187] For non-critical frame fast displacement estimation branches, if a frame is determined to be non-critical, the system executes an efficient and fast displacement output process:

[0188] Reuse Optimized State: Without triggering global optimization, the system directly uses the optimized state from the previous keyframe, including the world coordinates of all targets. and global scale factor .

[0189] Instantaneous displacement calculation: For each successfully tracked target j, use the initial pose of the current frame. The instantaneous displacement of the object at the current moment relative to a preset reference time (usually the first observation) is calculated using the following formula. :

[0190] ,in, It is the coordinate of the target in the camera coordinate system at the reference time, which is a fixed known value.

[0191] Output: Immediately output the instantaneous displacement of all tracked targets in the current frame. This process involves minimal computation, ensuring high-frequency continuity of displacement output and meeting the requirements of high-frequency monitoring.

[0192] The keyframe tightly coupled optimization and displacement inversion branches, if determined to be a keyframe, trigger a high-precision, globally consistent optimization process:

[0193] Sliding window factor graph update:

[0194] Add a state variable: change the camera pose of the current keyframe. It is added as a new variable node to the maintained sliding window factor graph. This window typically holds the most recent N=10 to 20 keyframes.

[0195] Add observation factors: Add multiple observations corresponding to this keyframe to the graph as factor nodes:

[0196] Reprojection factor: based on the output of thread A and Constructed, its weights are determined by the information matrix. Decide.

[0197] Laser ranging distance factor: constructed based on data output from thread B, its residual model is as follows: ,in This factor will affect the laser ranging value. Scale factor Camera pose ( , ) and target coordinates They are tightly coupled together.

[0198] IMU pre-integration factor: Added between adjacent keyframes to constrain their relative motion.

[0199] Ground reference factor: Constructed based on target information observed by ground base stations, providing absolute spatial constraints independent of UAV pose.

[0200] Incremental optimization solution:

[0201] Optimization objective: Solving for the maximum a posteriori probability estimate of the factor graph is equivalent to minimizing the following nonlinear least squares problem: ,in, It is a set of state variables (pose, target point, s). As a factor The residual vector, The covariance matrix of the corresponding observations, These are robust kernel functions (such as the Huber kernel) used to suppress gross errors.

[0202] Solver and Gross Error Removal: The iSAM2 algorithm from the GTSAM library is used for incremental solving, which is extremely efficient. During the optimization process, mechanisms such as Dynamic Covariance Scaling (DCS) or Switchable Prior are integrated, combined with... The system automatically identifies and reduces the weight of outlier observations (gross errors) or marginalizes them, ensuring strong robustness.

[0203] State Update and Sliding Window Management: After optimization, update the optimal estimates and covariances of all state variables within the sliding window, including: poses of each keyframe ( , ), world coordinates of all targets And the global scale factor s.

[0204] Marginalization: To control computational complexity, the oldest keyframe and its associated state variables in the sliding window are marginalized from the current optimization problem, while the information they carry is transformed into prior constraints for subsequent frames, thereby keeping the window size fixed.

[0205] High-precision displacement calculation and output:

[0206] Based on the latest optimized target world coordinates Calculate its coordinates relative to the initial reference time. Three-dimensional displacement vector Simultaneously, the covariance matrix of the displacement vector and the displacement rate obtained by time difference analysis of multiple displacement data are calculated.

[0207] Output a formatted displacement contour map, including each monitoring point. information.

[0208] Step 4: Risk Intelligent Early Warning and Fine-grained Inspection Trigger System. The system monitors the displacement rate output from the above steps in real time. Once the system detects that the displacement rate at any monitoring point continuously exceeds the preset risk threshold of 2 mm / d, it immediately triggers the intelligent response process:

[0209] Task switching: Send a command to the UAV flight control system to immediately suspend the current wide-area monitoring route.

[0210] Close-up shooting: The drone automatically flies to directly above the area of ​​abnormal displacement, descends to a close-up height of about 1 meter, and starts the camera to acquire high-definition images with sub-millimeter resolution.

[0211] Intelligent Crack Recognition: High-resolution images captured at close range are input into a lightweight DeepLabV3+ semantic segmentation neural network model deployed at the edge or in the cloud. This model automatically identifies crack pixels in the image and accurately calculates the crack's length, maximum width, and average width, automatically classifying the crack's hazard according to relevant standards.

[0212] Early warning push: Displacement anomaly alarms, along with crack quantification reports, are pushed to the BIM-based engineering safety early warning platform to achieve one-stop automated inspection from anomaly detection to cause diagnosis.

[0213] The system continuously executes the above steps in a loop until the flight and data collection of the entire preset route are completed. Finally, the system integrates the displacement field data of all time slices to generate a comprehensive result that includes a full-domain displacement evolution animation, a maximum displacement distribution map, risk area markings, and a detailed crack detection report, providing comprehensive and accurate data support for engineering safety assessment and decision-making.

[0214] Example 3

[0215] A computer-readable storage medium, such as a USB flash drive, portable hard drive, read-only memory (ROM), random access memory (RAM), magnetic disk, or optical disk, stores a computer program (i.e., software instructions). When this storage medium is connected to a computing device (such as the airborne computer or ground workstation in Embodiment 1), and the computer program therein is loaded and executed by one or more processors, the computing device can fully implement all the steps and functions of the automatic structural displacement monitoring method based on UAV and machine vision collaboration as described in Embodiment 2. Specifically, the program instructions control the device to perform a series of operations as defined in Embodiment 2, including data acquisition instruction distribution, parallel algorithm scheduling, optimization solution, and result output.

[0216] Example 4

[0217] An electronic device includes at least one processor (CPU / GPU), a memory (RAM / ROM), a communication interface (for connecting a drone, a ground base station, and a network), and necessary input / output devices. The memory stores a computer program (i.e., the program in Embodiment 3) and an operating system that can run on the processor. When the device starts and runs the program, the processor receives data streams from the drone flight control system, sensors, and the ground base station through the communication interface; allocates computing resources in the memory; and organizes and schedules the process described in Embodiment 2.

[0218] The specific execution includes core computational tasks such as image decoding and target detection algorithms, point cloud and image coordinate transformation calculations, construction and optimization of large-scale sparse matrices (e.g., executing GTSAM library functions), and numerical calculation of displacement and velocity.

[0219] The final displacement cloud map, risk warning information, crack detection report, and other results can be output, displayed, or uploaded to the cloud platform through the communication interface or graphical interface.

[0220] The electronic device can be a specially designed embedded edge computing box (integrated into the drone), or a ground server or a cloud virtual server. In essence, it provides a general or special computing hardware platform required to execute the monitoring method of Embodiment 2, ensuring the physical feasibility and efficiency of the method.

[0221] Based on the above-described preferred embodiments of the present invention, and through the foregoing description, those skilled in the art can make various changes and modifications without departing from the inventive concept. The technical scope of this invention is not limited to the contents of the specification, but must be determined according to the scope of the claims.

Claims

1. A method for automatic structural displacement monitoring using a combination of unmanned aerial vehicles (UAVs) and machine vision, characterized in that, Includes the following steps: S1. A UAV equipped with a visible light camera and a laser ranging module collects a sequence of visible light images, a sequence of laser ranging values, and a sequence of pose data of the monitored area; at the same time, a ground-based stereo vision base station collects a reference image containing the monitored target. S2. For each frame of data collected, target detection and localization, generation of scale constraint information based on laser ranging, and key frame judgment based on motion tracking are performed in parallel. S3. Tightly Coupled Optimization and Displacement Inversion: S31. If S2 is determined to be a non-key frame, then output a fast displacement estimate based on historical optimization results and current tracking information. S32. If S2 determines that it is a keyframe, then trigger the execution of global optimization based on the sliding window factor graph; In the global optimization based on the sliding window factor graph, the state variables include at least the camera pose of the keyframe, the world coordinates of the monitored target, and a global scale factor. Its observation factors include at least the reprojection factor constructed based on the target positioning results, the laser ranging distance factor constructed based on laser ranging information, the inertial navigation factor constructed based on pose data, and the reference constraint factor constructed based on ground reference images. The residual of the laser ranging distance factor is constructed according to the following model: , In the formula, This is the laser ranging value. To monitor the target's coordinates in the current camera coordinate system; Through global optimization based on a sliding window factor graph, the millimeter-level three-dimensional displacement and displacement rate of the monitored target are output. Keyframe determination based on motion tracking in S2 includes a front-end tracking step, specifically: The relative motion increment from the previous keyframe to the current frame is calculated based on IMU pre-integration. Based on the relative motion increment and the optimized pose of the previous keyframe, the initial pose estimate of the current frame is obtained. Using the initial pose estimation of the current frame, the target's 3D coordinates optimized from the previous keyframe are... Projecting onto the current frame image plane yields the predicted pixel coordinates. ; Calculate predicted coordinates Compared with the actual observed coordinates of the current frame Reprojection error between ; Statistical reprojection error less than threshold Number of targets As a number of successful tracking And calculate its average reprojection error. .

2. The automatic structural displacement monitoring method using a combination of UAV and machine vision as described in claim 1, characterized in that: S2 target detection and localization specifically includes: An improved YOLO neural network model was used to detect zigzag targets in images; Zernike moment edge detection was used for the detected target region image; Based on the detected edge points, the sub-pixel coordinates of the target center are calculated using quadratic surface fitting. The closed-form solution of the center coordinates of the quadratic surface fitting is: , in, The center coordinates of the fit are and For the coefficients of the quadratic surface.

3. The automatic structural displacement monitoring method using a combination of UAV and machine vision as described in claim 1, characterized in that: Simultaneously with localization, sub-pixel coordinates are estimated. Uncertainty covariance matrix and the covariance matrix Reprojection factor used for weight adjustment; covariance matrix Estimated using the following Monte Carlo Dropout method, specifically: During the network inference phase, the Dropout layer is enabled; Perform T forward propagations on the same target image patch to obtain T sets of sub-pixel coordinate prediction values; Calculate the sample covariance matrix of these T sets of predicted values, as the final value. .

4. The automatic structural displacement monitoring method using a combination of UAV and machine vision as described in claim 3, characterized in that: The covariance matrix The prediction is performed directly using a heteroscedasticity estimation network head, which outputs a three-dimensional vector. A lower triangular matrix is ​​constructed using the following formula, which is then directly used as the covariance matrix. : , Furthermore, during training, the heteroscedasticity estimation network head uses the negative log-likelihood of the reprojection error as the loss function.

5. The automatic structural displacement monitoring method using a combination of UAV and machine vision as described in claim 1, characterized in that: The scale constraint information generated in S2 based on laser ranging is based on the following camera imaging model: Let the coordinates of the target in the camera coordinate system be... Its projection onto subpixel coordinates The model is: , in, As a scale fuzzy factor, Camera intrinsic parameter matrix: , Focal length Principal point coordinates; The laser ranging module measures the Euclidean distance from the camera's optical center to the target. ,Right now: , Subpixel coordinates Calculate normalized camera plane coordinates and subpixel coordinates : , ; Combined with camera imaging model, target depth With normalized coordinates The laser ranging constraint is obtained as follows: 。 6. The automatic structural displacement monitoring method using a combination of UAV and machine vision as described in claim 5, characterized in that: Based on laser ranging constraints, for the observation of a single target in a single frame, an instantaneous observation constraint on the global scale factor s is generated: , in, Photometric depth is estimated based on the current image and location; Based on laser ranging constraints, for observations of multiple targets in a single frame, an initial estimate of the global scale factor s is generated by solving the following least squares problem. : , The closed-form solution to the least squares problem is: , 。 7. The method for automatic structural displacement monitoring using a combination of UAV and machine vision according to claim 1, characterized in that: The coordinates of the monitored target in the current camera coordinate system in S32 The calculation is performed by associating the camera pose and target world coordinates with the state variables in the following ways: , in, and It is the rotation matrix and translation vector of the current keyframe camera pose in the state variables. These are the coordinates of the target in the world coordinate system; Coordinates of the target in the world coordinate system As the state variable to be optimized, and the global scale factor The solution is optimized along with the camera pose.

8. The automatic structural displacement monitoring method using a combination of UAV and machine vision as described in claim 1, characterized in that: The projection model that projects the target's three-dimensional coordinates onto the current frame image plane is: , , in, The coordinates of the target in the current frame's camera coordinate system. The global scale factor optimized from the previous keyframe. and The rotation matrix and translation vector are the initial pose of the current frame. For camera projection function, This is the camera intrinsic parameter matrix.

9. The automatic structural displacement monitoring method using a combination of UAV and machine vision according to claim 1, characterized in that: Successful tracking number The calculation formula is: Its average reprojection error The calculation formula is: 。 10. The method for automatic structural displacement monitoring using a combination of UAV and machine vision according to claim 1, characterized in that: In S2, a keyframe is determined based on at least one of the following conditions: a. The relative translation change between the current frame and the previous keyframe Or rotational change ; b. Average reprojection error ; c. Number of successful tracking The number of visible targets is less than the number in the previous keyframe. 80%; d. The frame number interval between the current frame and the previous keyframe is greater than 10 frames; in, , , In the formula, This is the initial translation vector for the current frame. The translation vector optimized from the previous keyframe; The initial rotation matrix for the current frame. The rotation matrix is ​​optimized for the previous keyframe; Then, based on the judgment structure, the following branches are executed: If the current frame is determined to be a keyframe, then the keyframe processing branch is executed; If the current frame is determined to be a non-critical frame, then the non-critical frame fast estimation branch is executed.

11. The automatic structural displacement monitoring method using a combination of UAV and machine vision according to claim 10, characterized in that: Keyframe processing branches include: The camera pose of the current frame is added as a new state variable node to the sliding window factor graph; Add the observation factors corresponding to the current frame to the factor graph; Trigger incremental optimization of the sliding window factor graph; After optimization, update the values ​​of all state variables within the sliding window; Marginalize the oldest keyframe and its associated state variables in the sliding window to maintain the window size.

12. The method for automatic structural displacement monitoring using a combination of UAV and machine vision according to claim 10, characterized in that: The non-keyframe fast estimation branch includes: The global optimization of the sliding window factor graph is not triggered; Based on the initial pose of the current frame Target world coordinates optimized from the previous keyframe and the scale factor optimized from the previous keyframe Calculate the instantaneous displacement estimate of the target : , in, The coordinates of the target in the camera coordinate system at the preset reference time.

13. The automatic structural displacement monitoring method using a combination of UAV and machine vision according to claim 11, characterized in that: The incremental optimization process involves solving for the maximum a posteriori probability estimate on a factor graph. Its objective function is to minimize the weighted sum of squares of all factor residuals, and a robust kernel function is used to handle outliers. Specifically, it is expressed as follows: , in, For the set of state variables, For the factor set, As a factor The residual vector, The covariance matrix of the corresponding observations, This is the Lubang kernel function used for gross error removal.

14. The automatic structural displacement monitoring method using a combination of UAV and machine vision according to claim 13, characterized in that: The maximum a posteriori probability estimate is solved using an incremental smoothing and mapping algorithm; gross error removal is performed during the optimization process, and the gross error removal method includes at least one of dynamic covariance scaling, switchable prior, and outlier marginalization based on chi-square test.

15. The automatic structural displacement monitoring method using a combination of UAV and machine vision according to claim 13, characterized in that: After the optimization of the keyframe processing branch is completed, the following information is output: Optimized three-dimensional coordinates of each target in the world coordinate system and its covariance; Optimized global scale factor and its covariance; Three-dimensional displacement vector of each target relative to a preset reference time and its covariance; Displacement rate calculated based on multi-period displacement data.

16. A system for implementing the automatic structural displacement monitoring method of unmanned aerial vehicle (UAV) and machine vision collaboration as described in any one of claims 1 to 15, characterized in that: It includes a perception execution unit, an edge processing unit, and a central solution unit; The perception execution unit includes: The mobile platform for unmanned aerial vehicles integrates a visible light camera, a laser rangefinder, an attitude measurement unit, and a flight controller. Ground-based stereo vision base stations are fixedly deployed outside the monitoring area; The edge processing unit is mounted on a drone mobile platform and includes: The target localization module is used to perform target identification and sub-pixel localization on the images acquired by the visible light camera. The scale constraint module is used to fuse the measurement values ​​of the laser rangefinder to generate visual scale correction constraints; The tracking decision module is used to perform motion tracking and output keyframe judgments based on the tracking results and preset conditions; The central processing unit is communicatively connected to the edge processing unit and the ground-based stereo vision base station, and includes: The optimization solution module is used to fuse multi-source observation data and solve millimeter-level displacements by incrementally optimizing a factor graph model that includes a global scale factor when a keyframe is received for judgment. The fast estimation module is used to quickly output displacement based on the tracking results when no keyframe judgment is received.

17. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the automatic structural displacement monitoring method of UAV and machine vision collaboration as described in any one of claims 1 to 15.

18. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the automatic structural displacement monitoring method of UAV and machine vision collaboration as described in any one of claims 1 to 15.