A Deep Learning-Based Spatiotemporal Parameter Calibration Method for Camera-LiDAR

By establishing an online joint calibration network for spatiotemporal parameters of camera-3D LiDAR using deep learning methods, the problem of environmental changes and time delays in sensor calibration was solved, and efficient and accurate sensor data fusion was achieved.

CN116740188BActive Publication Date: 2026-06-30SOUTH CHINA UNIV OF TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SOUTH CHINA UNIV OF TECH
Filing Date
2023-05-16
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing camera-3D LiDAR extrinsic parameter calibration methods cannot effectively correct calibration deviations caused by environmental changes and vibrations, and fail to consider the effects of time delay and evoked motion, resulting in low calibration efficiency and insufficient accuracy.

Method used

A deep learning-based approach is adopted to establish an online joint calibration network for the spatiotemporal parameters of a camera-3D LiDAR using an attention mechanism and a hybrid pooling liquid network. This network enables simultaneous calibration of spatial geometric parameters and temporal delay parameters. The network includes modules for feature extraction, attention weighting, feature matching, and parameter regression output, which correct error drift and perform time delay compensation.

Benefits of technology

It achieves fully automated online calibration, improves calibration accuracy and robustness, enhances the learning and representation capabilities of the network model, and enables efficient sensor data fusion in dynamic environments.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116740188B_ABST
    Figure CN116740188B_ABST
Patent Text Reader

Abstract

This invention discloses a deep learning-based method for calibrating the spatiotemporal parameters of a camera-3D LiDAR. The method employs an established deep learning model for online joint calibration of the spatiotemporal parameters of a camera-3D LiDAR to calibrate the extrinsic parameters between the camera and the 3D LiDAR, and to compensate for motion delays caused by time synchronization errors, acquisition delays, and ego motion. Specifically, the deep learning model for online joint calibration of the spatiotemporal parameters of a camera-3D LiDAR includes: a feature extraction module, an attention mechanism module, a visual odometry module, a feature matching module combining a hybrid pooling pyramid and a liquid time constant network, and a spatiotemporal parameter regression module, connected sequentially. This invention requires no human intervention or control throughout the entire process, achieving fully automatic and high-precision online calibration of the spatiotemporal parameters of a camera-3D LiDAR.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of external parameter calibration of perception sensors in autonomous driving systems, and in particular to a camera-LiDAR spatiotemporal parameter calibration method based on deep learning. Background Technology

[0002] Autonomous driving has become a hot topic in recent years, involving complex tasks such as perception, control, and planning. Its primary task is to achieve intelligent perception based on multi-sensor fusion. This involves perceiving the external environment through sensors such as LiDAR and cameras, and then fusing the perception data from various sensors to enhance the environmental perception capabilities in autonomous driving environments.

[0003] In the perception layer of autonomous driving systems, the most commonly used sensors are RGB cameras (monocular and binocular) and multi-line LiDAR. LiDAR emits laser light into the surrounding environment and receives reflected light, calculating the distance from the detected point to the LiDAR based on the time difference between reception and reflection and the speed of light. Based on this ranging principle, LiDAR can obtain high-precision depth information in 360 degrees horizontally and a certain angle vertically. However, the number of laser beams in a LiDAR is limited, resulting in only sparse point clouds with limited resolution. RGB monocular cameras can acquire high-resolution color and texture information from the environment, but they cannot recover scale information from real 3D motion in image data and cannot obtain high-precision depth information. Therefore, RGB cameras and LiDAR are complementary at the data level, and their fusion is a hot research topic in current autonomous driving sensor fusion. To achieve high-quality sensor fusion, the external parameter calibration of cameras and LiDAR is crucial. Only by obtaining the precise coordinate system transformation relationship between each sensor can the data obtained from each sensor be accurately matched, thereby enabling data fusion at various levels.

[0004] Currently, most extrinsic calibration methods for camera-3D LiDAR employ offline calibration. These methods cannot correct deviations caused by environmental changes or vibrations during operation. If a sensor experiences calibration deviations due to unforeseen external factors, recalibration by professionals is often required. Manual calibration is not only time-consuming and labor-intensive but also prone to significant errors. Furthermore, the dynamic operating environment of autonomous driving is affected by time synchronization errors, acquisition delays, and egoemotion. The CalibNet paper ignores the impact of latency and does not consider the contribution of 3D point cloud and image data to the calibration task. Simply considering static calibration parameters based on geometric spatial transformations cannot establish an accurate alignment between RGB image pixels and 3D LiDAR point clouds (CalibNet: Geometrically Supervised Extrinsic Calibration using 3D Spatial Transformer Networks Ganesh Lyer, R. Karnik Ram, J. Krishna Murthy, and K. Madhava Krishna). Therefore, the calibration efficiency and accuracy of existing sensor extrinsic calibration methods need improvement. Summary of the Invention

[0005] Based on this, embodiments of the present invention provide a camera-LiDAR spatiotemporal parameter calibration method based on deep learning. This method simultaneously performs online calibration estimation of spatial geometric parameters and the time delay parameters of each sensor, achieving fully automatic online calibration and error self-correction, thus improving calibration accuracy and algorithm robustness while ensuring calibration efficiency.

[0006] The present invention is achieved by at least one of the following technical solutions.

[0007] A deep learning-based method for calibrating the spatiotemporal parameters of a camera-LiDAR system includes the following steps:

[0008] S1. Based on the initial calibration parameters of the camera-3D LiDAR, the LiDAR point cloud is mapped onto the image to form a point cloud depth map.

[0009] S2. The VO algorithm is used to solve the running speed of the data acquisition device between two consecutive RGB images;

[0010] S3. Establish an online joint calibration network for the spatiotemporal parameters of camera-3D lidar based on the combination of attention mechanism and hybrid pooling liquid network;

[0011] S4. Input the next frame and point cloud depth map from the continuous RGB image into the online joint calibration network of camera-3D LiDAR spatiotemporal parameters, and output the camera-3D LiDAR calibration extrinsic parameters and motion delay compensation parameters.

[0012] Furthermore, in step S1, the initial calibration parameters simulate the cumulative drift of calibration parameters caused by vibration, collision, and long-term operation of the camera-LiDAR in a dynamic operating environment. This cumulative drift is corrected using a deep learning online calibration network.

[0013] Furthermore, the specific process of step S2 is as follows:

[0014] S21. Use the ORB algorithm to extract key points from two consecutive frames of images and construct image feature descriptors;

[0015] S22. After step S21, the correspondence between multiple pairs of points in two consecutive RGB images can be obtained. Then, the Nister five-point algorithm based on RANSAC is used to estimate the essential matrix E.

[0016] S23. After obtaining the essential matrix E, perform singular value decomposition (SVD) on the essential matrix and verify the solution to obtain the camera pose transformation (R, t) between consecutive frames:

[0017] E=U∑V T ,∑=diag(σ,σ,0)

[0018]

[0019] In the formula, U and V are orthogonal matrices, ∑ represents the singular value matrix, σ represents the singular value, and the parameter... and They are respectively:

[0020]

[0021] After performing SVD on the essential matrix E, four sets of rotation and translation transformation parameters (R, t) are obtained, namely (R1, t1), (R1, t2), (R2, t1) and (R2, t2). Only one set of rotation and translation transformation parameters can ensure that all projection points are positive depth in both cameras at the same time. Then, the correct set is selected based on the matching points.

[0022] S24. Based on the sampling time of two consecutive frames of the camera, calculate the time difference between the two consecutive frames. Combined with the translation parameters obtained in step S23, the running speed of the camera between consecutive frames can be further obtained.

[0023] Furthermore, in step S3, the online joint calibration network for the spatiotemporal parameters of the camera-3D LiDAR based on the combination of attention mechanism and hybrid pooling liquid network includes: a feature extraction module, an attention weighting module, a feature matching module, and a parameter regression output module.

[0024] Furthermore, the feature extraction module specifically includes an RGB image feature extraction submodule and a point cloud depth map feature extraction submodule.

[0025] Furthermore, both the RGB image feature extraction submodule and the point cloud depth map feature extraction submodule adopt the ResNet-18 network, and the number of convolution kernels in the point cloud depth map feature extraction submodule is half that of the RGB image feature extraction submodule.

[0026] Furthermore, the attention weighting module includes an RGB image feature weighting branch and a point cloud depth map feature weighting branch. The attention weighting module employs a channel attention mechanism.

[0027] Furthermore, the feature matching module includes a Hybrid Pooling Pyramid (HSPP) and a Liquid Time Constant Network (LTC). The Hybrid Pooling Pyramid (HSPP) module combines max pooling and average pooling spatial pyramids (SPP). This structure is used to mine features of different sizes from the weighted features and output features with a fixed length to avoid data pruning required for model transfer. The Liquid Time Constant Network is a recurrent neural network.

[0028] Furthermore, the parameter regression output module considers the effects of camera-LiDAR time synchronization errors, data acquisition delays, and the ego emotion of the acquisition equipment vehicle under dynamic operating scenarios. It compensates for motion delay by using the mapping relationship between the LiDAR and the image plane obtained from the inter-frame data acquisition equipment's operating speed as solved in step S24.

[0029]

[0030] Furthermore, the above formula can be expressed as:

[0031]

[0032] In the formula, K is the intrinsic parameter of the combined camera, R and t represent the rotation and translation parameters, and V... k The data acquisition device's operating speed is represented between frame (k-1) and frame k, Δt represents the data acquisition delay between the camera and the 3D LiDAR, and P represents the 3D point cloud. L Any point cloud p at point i in the middle i,k (px i,k ,py i,k ,pzi,k After V k The coordinates after motion compensation of Δt are represented as p i,k,Δt (px i,k,Δt ,py i,k,Δt ,pz i,k,Δt ), These are the coordinates of the point cloud mapped to the image plane after motion compensation, denoted as (pu). i,k,Δt ,pv i,k,Δt ).

[0033] Furthermore, the cost loss function constructed by the parameter regression output module is:

[0034] L total =L T +λ1L P +λ2L A

[0035] in,

[0036] L T =L translation +L rotation

[0037] =||r pre -r gt ||2+||t pre -t gt ||

[0038]

[0039]

[0040] Where L total L represents the loss value. T K represents the prediction transformation loss. p L represents the point cloud distance loss. A For alignment loss, λ1 and λ2 are balancing factors during the training process of the cost loss function, L translation and L rotation T represents the translation and rotation losses, respectively. pre This represents the predicted values ​​of the calibration parameters output by the deep learning model. R pre R represents the predicted rotation parameters output by the deep learning model. pre Represented by the rotation vector r pre , t pre T represents the predicted translation parameters output by the deep learning model. gt r is the reference value for the calibration parameter. gt This represents the reference value for the rotation parameter, t. gtThis represents the reference value for the translation parameter, N represents the number of point clouds involved in the loss function calculation, and T... init P represents the initial training calibration parameters. i This represents the i-th point in the lidar training point cloud. q represents the coordinates of the point cloud mapped to the image plane after motion compensation. j,k,Δt This represents the image pixels that are mapped from the point cloud to the image plane.

[0041] Compared with the prior art, the beneficial effects of the present invention are:

[0042] (1) The present invention discloses a camera-lidar spatiotemporal parameter calibration method based on deep learning, which can simultaneously perform online calibration estimation of spatial geometric parameters and time delay parameters of each sensor, realize fully automatic online calibration and error self-correction, and improve calibration accuracy and algorithm robustness while ensuring calibration efficiency.

[0043] (2) The present invention discloses a camera-lidar spatiotemporal parameter calibration method based on deep learning, which adopts a channel attention mechanism to realize adaptive weighting of extracted features according to the target calibration task, effectively enhancing the learning and representation capabilities of the network model.

[0044] (3) The present invention discloses a camera-lidar spatiotemporal parameter calibration method based on deep learning. The feature matching layer is designed by combining hybrid pooling pyramid and liquid time constant network. It can avoid data pruning during data migration while mining the correlation of features of different spatial sizes. The network model has strong robustness in dealing with extreme data. Attached Figure Description

[0045] Figure 1 This is a flowchart of the online joint calibration network model for camera-3D LiDAR spatiotemporal parameters disclosed in an embodiment of the present invention;

[0046] Figure 2 This is a flowchart illustrating the specific implementation of the CALNet module in the online joint calibration network model for camera-3D LiDAR spatiotemporal parameters disclosed in this embodiment of the invention, where Conv represents convolution and FC represents fully connected layer;

[0047] Figure 3 This is a schematic diagram of the network structure design of the feature matching module in the online joint calibration network model of camera-3D LiDAR spatiotemporal parameters disclosed in this embodiment of the invention;

[0048] Figure 4 A graph showing the relationship between the rotational mean absolute error and the quantity;

[0049] Figure 5 This is a graph showing the relationship between the translational average absolute error and the quantity.

[0050] Figure 6 This is a flowchart of a camera-lidar spatiotemporal parameter calibration method based on deep learning, as described in an embodiment of the present invention. Detailed Implementation

[0051] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the invention. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without inventive effort are within the scope of protection of this invention.

[0052] like Figure 1 The diagram shows a flowchart of a camera-LiDAR spatiotemporal parameter calibration method based on deep learning disclosed in this invention. Figure 2 This is a schematic diagram illustrating the specific implementation of the CALNet module in the network model. The specific implementation steps of the camera-LiDAR spatiotemporal parameter calibration method based on deep learning of this invention are as follows:

[0053] S1. Construct a deep learning network model (an online joint calibration network for camera-3D LiDAR spatiotemporal parameters based on a combination of attention mechanism and hybrid pooling liquid network), the network flow structure is as follows: Figure 1 and Figure 2 As shown, it specifically includes a feature extraction module, an attention weighting module, a feature matching module, and a parameter regression output module.

[0054] As one embodiment, the feature extraction module specifically includes an RGB image feature extraction submodule and a point cloud depth map feature extraction submodule.

[0055] As one embodiment, both the RGB image feature extraction submodule and the point cloud depth map feature extraction submodule adopt the ResNet-18 network, and the number of convolution kernels in the point cloud depth map feature extraction submodule is half that of the RGB image feature extraction submodule.

[0056] The attention weighting module includes an RGB image feature weighting branch and a point cloud depth map feature weighting branch; the attention weighting module adopts a channel attention mechanism.

[0057] As one embodiment, the feature matching module includes a Hybrid Pooling Pyramid (HSPP) module and a Liquid Time Constant (LTC) network; the Hybrid Pooling Pyramid module is used to mine features of different sizes from the weighted features and output features with a fixed length to avoid data pruning required for model transfer; the Liquid Time Constant network participates in the construction of the feature matching layer to improve the robustness of the entire network model.

[0058] S2. Based on the initial calibration parameters of the camera-3D LiDAR, the LiDAR point cloud is mapped onto the image to form a point cloud depth map, dividing the training and testing sets. Random errors are added using ground truth to simulate initial training calibration parameters with varying magnitudes of error between the camera and LiDAR. Then, based on the generated initial training calibration parameters and the camera's intrinsic parameters, the LiDAR's 3D point cloud is projected onto the image to generate a point cloud depth map containing a series of calibration random errors. The point cloud depth map stores the depth and distance information of the LiDAR point cloud. The initial calibration parameters simulate the cumulative drift of calibration parameters caused by vibration, collision, and long-term operation of the camera-LiDAR in a dynamic operating environment. This cumulative error drift is corrected using a deep learning online calibration network. The specific implementation process is as follows:

[0059] First, the ground truth T of the public dataset... gt [R gt |t gt Incorporating random error yields the initial training calibration parameters T with error. init [R init |t init The random error range is for rotation (-10°, +10°) and translation (-0.25m, +0.25m). Secondly, the intrinsic parameter K of the combined camera can be used to obtain the point cloud P from the 3D LiDAR. i =[X i ,Y i Z i ]∈R 3 Projected onto 2D image coordinate system p i =[u i ,v i ]∈R 2 The following describes the construction of a depth map of a 3D point cloud. The projection process is shown in the formula:

[0060]

[0061] in, For P i a homogeneous coordinate system For p i The homogeneous coordinate system, R init t init These are the initial training calibration parameters T. init The rotation vector and translation vector, R gt This represents the reference value for the rotation parameter, t. gt This indicates the reference value for the translation parameter. P for 3D LiDAR point cloud i Mapping to image p i The stored point cloud depth value.

[0062] S3. For each consecutive two adjacent frames of RGB image data, the VO algorithm is used, and the data acquisition time difference between adjacent frames is combined to solve the running speed of the data acquisition device.

[0063] The specific process of step S3 is as follows:

[0064] (1) The ORB algorithm is used to extract key points and construct image feature descriptors for two consecutive frames of images;

[0065] (2) After step S21, the correspondence between multiple pairs of points in two consecutive RGB images can be obtained, and then the essential matrix E is estimated using the Nister five-point algorithm based on RANSAC.

[0066] (3) After obtaining the essential matrix E, singular value decomposition (SVD) is performed on the essential matrix, and the solution is verified to obtain the pose transformation (R, t) of the camera between consecutive frames:

[0067] E=U∑V T ,∑=diag(σ,σ,0)

[0068]

[0069] In the formula, U and V are orthogonal matrices, ∑ represents the singular value matrix, σ represents the singular value, and the parameter... and They are respectively:

[0070]

[0071] After performing SVD on the essential matrix E, four sets of rotation and translation transformation parameters (R, t) are obtained, namely (R1, t1), (R1, t2), (R2, t1) and (R2, t2). However, only one set of rotation and translation transformation parameters can ensure that all projection points are positive depth in both cameras at the same time. Then, the correct set is selected based on the matching points.

[0072] (4) Based on the sampling time of two consecutive frames of the camera, calculate the time difference between the two consecutive frames. Combined with the translation parameters obtained in step S23, the running speed of the camera between consecutive frames can be further obtained.

[0073] S4. Input the next frame of the two adjacent frames and the matching point cloud depth map into the online joint calibration network model of camera-3D LiDAR spatiotemporal parameters to obtain the calibration parameters (rotation parameters and translation parameters) after error drift accumulation correction, as well as the time delay compensation parameters between camera and 3D LiDAR caused by time synchronization error or error accumulation.

[0074] The specific process of step S4 is as follows:

[0075] (1) Feature extraction module

[0076] This module uses ResNet-18 as the baseline. The feature extraction module is divided into an RGB image feature extraction module and a point cloud depth map extraction module. The network structure parameters of ResNet-18 are shown in Table 1.

[0077] Table 1. Structural parameters of the ResNet-18 network

[0078]

[0079] (2) Feature matching module

[0080] The network structure diagram of the feature matching module of the deep learning-based online joint calibration network for spatiotemporal parameters of camera-3D LiDAR proposed in this invention is as follows: Figure 3 As shown, this module can be divided into two parts: a hybrid pooling pyramid module and a liquid time constant network module. The hybrid pooling pyramid combines max pooling and average pooling, with pooling layer sizes of 1, 4, and 16 for the three pyramid layers. The liquid time constant network is a recurrent neural network that incorporates the concept of Neural ODE, exhibiting superior representation learning capabilities compared to Neural ODE. The liquid time constant network proposes a more stable hidden state time derivative:

[0081]

[0082] Where N(t) = f(h(t),x(t),α)(Sh(t)), δ represents the time constant, N(t) represents the network determined by the neural network parameters α and S, f(h(t),x(t),t,α) represents the neural network determined by the neural network parameter α, x(t) represents the input data of f(h(t),x(t),t,α) at ​​time t, and h(t) represents the output hidden state of f(h(t),x(t),t,α) at ​​time t.

[0083] (3) Parameter Regression Output Module

[0084] In the design of the parameter regression output module, considering the effects of camera-LiDAR time synchronization error, data acquisition delay, and vehicle ego emotion in dynamic operating scenarios, motion delay compensation is performed by using the mapping relationship between the LiDAR and the image plane obtained from the inter-frame data acquisition equipment's operating speed in step S3.

[0085]

[0086] Furthermore, the above formula can be expressed as:

[0087]

[0088] In the formula, K is the intrinsic parameter of the joint camera, R and t represent the rotation and translation parameters corrected by the camera-3D LiDAR spatiotemporal parameter calibration network, and V k The data acquisition device's operating speed is represented between frame (k-1) and frame k, Δt represents the data acquisition delay between the camera and the 3D LiDAR, and P represents the 3D point cloud. L Any point cloud p at point i in the middle i,k (px i,k ,py i,k ,pz i,k After V k The coordinates after motion compensation of Δt are represented as p i,k,Δt (px i,k,Δt ,py i,k,Δt ,pz i,k,Δt ), These are the coordinates of the point cloud mapped to the image plane after motion compensation, denoted as (pu). i,k,Δt ,pv i,k,Δt ).

[0089] Furthermore, the cost loss function constructed by the parameter regression output module is:

[0090] L total =L T +λ1L P +λ2L A

[0091] in,

[0092] L T =L translation +L rotation

[0093] =||r pre -r gt ||2+||t pre -t gt ||

[0094]

[0095]

[0096] L total L represents the loss value. T L represents the prediction transformation loss. p L represents the point cloud distance loss. A For alignment loss, λ1 and λ2 are balancing factors during the training process of the cost loss function. translation and L rotationThese represent translation and rotation losses, respectively. T pre This represents the predicted values ​​of the calibration parameters output by the deep learning model. R pre R represents the predicted rotation parameters output by the deep learning model. pre Represented by the rotation vector r pre , t pre T represents the predicted translation parameters output by the deep learning model. gt r is the reference value for the calibration parameter. gt This represents the reference value for the rotation parameter, t. gt This represents the reference value for the translation parameter, N represents the number of point clouds involved in the loss function calculation, and T... init P represents the initial training calibration parameters. i This represents the i-th point cloud in the lidar training point cloud. q represents the coordinates of the point cloud mapped to the image plane after motion compensation. j,k,Δt This represents the image pixels that are mapped from the point cloud to the image plane.

[0097] In step S4, the specific performance tests are as follows:

[0098] (1) The “2011_09_26” sequence from the KITTI dataset was used for training and testing. The training and validation datasets consisted of 24,000 sets, and the test dataset consisted of 6,000 sets. The initial error range was rotation (-10°, +10°) and translation (-0.10m, +0.10m).

[0099] (2) Network model parameter settings

[0100] The hardware and software parameter configurations for training and testing the deep learning network model in this invention are shown in Table 2. Furthermore, an adaptive moment estimator (Adam) with momentum (0.9, 0.99) is used, and the learning rate is adjusted by a factor of 0.5 every 15 epochs using the lr_scheduler.MultiStepLRtorch function. The batch size is set to 4, the epoch size to 45, and the learning rate to 3e. -4 .

[0101] Table 2 Hardware and software configuration parameters

[0102]

[0103] join Figure 4 and Figure 5 The figure shows the distribution of the rotation mean absolute error and translation absolute error and their quantitative relationship after the calibration network model is corrected when the initial error range is: rotation (-10°, +10°) and translation (-0.10m, +0.10m).

[0104] The camera-3D LiDAR spatiotemporal parameter online joint calibration network model in this embodiment is called ST-CALNet. To fully verify the excellent performance of this model, it is compared with representative works. The experimental comparison results are shown in Tables 3 and 4 (the best results are shown in bold).

[0105] Table 3 Comparison of ST-CALNet spatiotemporal calibration performance (delay, translation parameters)

[0106]

[0107] Table 4. Comparison of ST-CALNet spatiotemporal calibration performance (delay, rotation parameters)

[0108]

[0109] As shown in Tables 3 and 4, ST-CALNet possesses good spatiotemporal parameter calibration capabilities for camera-3D LiDAR. After network calibration correction, its translation parameter ATD is 2.25 cm, its rotation parameter AEAD is 0.21°, and the time delay estimation error between the camera and LiDAR is 4 ms, slightly lower than that of SSTCalib. However, the ST-CALNet proposed in this paper not only has time delay motion compensation capabilities but also exhibits superior extrinsic parameter calibration correction capabilities.

[0110] The preferred embodiments of the present invention disclosed above are merely illustrative of the invention. These preferred embodiments do not exhaustively describe all details, nor do they limit the invention to the specific implementations described. Clearly, many modifications and variations can be made based on the content of this specification. This specification selects and specifically describes these embodiments to better explain the principles and practical applications of the invention, thereby enabling those skilled in the art to better understand and utilize the invention. The invention is limited only by the claims and their full scope and equivalents.

Claims

1. A method for calibrating spatiotemporal parameters of a camera-3D LiDAR based on deep learning, characterized in that, Includes the following steps: S1. Map the laser radar point cloud onto the image to form a point cloud depth map based on the initial calibration parameters of the camera-3D laser radar; the initial calibration parameters are the accumulated drift of calibration parameters caused by vibration, collision and long-term operation of the camera-3D laser radar in a dynamic operating environment. S2. The VO algorithm is used to solve the running speed of the data acquisition device between two consecutive RGB images; S3. Establish an online joint calibration network for the spatiotemporal parameters of camera-3D lidar based on the combination of attention mechanism and hybrid pooling liquid network; S4. Input the next frame and point cloud depth map from the continuous RGB image into the online joint calibration network of camera-3D LiDAR spatiotemporal parameters, and output the camera-3D LiDAR calibration extrinsic parameters and motion delay compensation parameters. In step S3, the camera-3D LiDAR spatiotemporal parameter online joint calibration network based on attention mechanism and hybrid pooling liquid network includes: feature extraction module, attention weighting module, feature matching module, and parameter regression output module; The feature matching module includes a hybrid pooling pyramid (HSPP) and a liquid time constant network (LTC); the hybrid pooling pyramid (HSPP) combines max pooling and average pooling spatial pyramid (SPP); In step S4, the parameter regression output module considers the effects of camera-3D LiDAR time synchronization error, data acquisition delay, and the Ego motion of the acquisition equipment vehicle under dynamic operating scenarios. It uses the running speed of the continuous inter-frame data acquisition equipment solved in step S2 to perform motion delay compensation on the mapping relationship between the LiDAR and the image plane.

2. The method for calibrating spatiotemporal parameters of a camera-3D LiDAR based on deep learning according to claim 1, characterized in that, The specific process of step S2 is as follows: S21. Use the ORB algorithm to extract key points from two consecutive frames of images and construct image feature descriptors; S22. After step S21, the correspondence between multiple pairs of points in two consecutive RGB images can be obtained. Then, the Nister five-point algorithm based on RANSAC is used to estimate the essential matrix E. S23. After obtaining the essential matrix E, perform singular value decomposition (SVD) on the essential matrix and verify the solution to obtain the pose transformation (R, t) of the camera between consecutive frames. S24. Based on the sampling time of two consecutive frames of the camera, calculate the time difference between the two consecutive frames. Combined with the translation parameters obtained in step S23, the running speed of the camera between consecutive frames can be further obtained.

3. The method for calibrating spatiotemporal parameters of a camera-3D LiDAR based on deep learning according to claim 1, characterized in that, The feature extraction module specifically includes an RGB image feature extraction submodule and a point cloud depth map feature extraction submodule.

4. The method for calibrating spatiotemporal parameters of a camera-3D LiDAR based on deep learning according to claim 3, characterized in that, Both the RGB image feature extraction submodule and the point cloud depth map feature extraction submodule use the ResNet-18 network, and the number of convolution kernels in the point cloud depth map feature extraction submodule is half that of the RGB image feature extraction submodule.

5. The method for calibrating spatiotemporal parameters of a camera-3D LiDAR based on deep learning according to claim 1, characterized in that, The attention weighting module includes an RGB image feature weighting branch and a point cloud depth map feature weighting branch; the attention weighting module adopts a channel attention mechanism.

6. The method for calibrating spatiotemporal parameters of a camera-3D LiDAR based on deep learning according to claim 1, characterized in that, The cost loss function constructed by the parameter regression output module is: in, in Indicates the loss value. Indicates the predicted transformation loss. This indicates the point cloud distance loss. For alignment loss, and As a balancing factor in the cost loss function training process, and These represent translation and rotation losses, respectively. This represents the predicted values ​​of the calibration parameters output by the deep learning model. , This represents the predicted rotation parameters output by the deep learning model. Represented by rotation vector as , This represents the predicted translation parameters output by the deep learning model. For calibration parameter reference values, This indicates the reference value for the rotation parameter. This indicates the reference value for the translation parameter. This represents the number of point clouds involved in the loss function calculation. Indicates the initial training calibration parameters. Represents the first point in the lidar training point cloud. A point cloud, This represents the coordinates of the point cloud mapped to the image plane after motion compensation. This represents the image pixels that are mapped from the point cloud to the image plane.