A depth completion method and device based on binocular camera and laser radar fusion

By combining the data measurement advantages of binocular cameras and lidar, depth maps are generated and repaired, solving the problems of sparse, incomplete, and insufficient depth information in existing technologies, and achieving more complete and high-precision depth information acquisition.

CN122289340APending Publication Date: 2026-06-26CHINA FAW CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CHINA FAW CO LTD
Filing Date
2026-03-30
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing methods for acquiring depth information using a single sensor (LiDAR or binocular camera) suffer from problems such as sparse, incomplete, and insufficient depth information, making it difficult to avoid incomplete and inaccurate depth information.

Method used

By combining the data measurement advantages of binocular cameras and LiDAR, scene maps and depth maps are generated through spatiotemporal alignment. Edge information is repaired and abnormal depth values ​​are corrected, and the resulting maps are fused to generate a more complete and high-precision fused depth map.

Benefits of technology

It improves the completeness and accuracy of depth information, solves the problems of sparse, incomplete and insufficient depth information, and generates a more complete and more accurate fused depth map.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122289340A_ABST
    Figure CN122289340A_ABST
Patent Text Reader

Abstract

This application provides a depth completion method and apparatus based on the fusion of binocular camera and lidar. The depth completion method can combine the data measurement advantages of binocular camera and lidar to effectively fuse the depth maps measured by both, resulting in a fused depth map with more complete depth information and higher data accuracy. This solves the technical problems of sparse, incomplete, and insufficient depth information in the prior art.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of environmental perception technology, and more specifically, to a depth completion method and apparatus based on the fusion of binocular camera and lidar. Background Technology

[0002] Depth information acquisition is a crucial technical step in environmental perception and 3D reconstruction, and it is widely used in fields such as autonomous driving, robot navigation, and augmented reality. Currently, the mainstream methods for acquiring depth information typically involve using a single sensor (LiDAR or binocular camera) to collect depth maps of the target scene, thereby obtaining depth information corresponding to different points within the target scene.

[0003] However, point clouds generated by LiDAR are spatially sparse, easily missing environmental details in the target scene. Furthermore, significant depth measurement errors exist at the edge regions of the point cloud. Binocular cameras, on the other hand, are prone to disparity matching failures in weakly textured scenes, resulting in numerous holes in the generated depth maps. Moreover, depth information generated by binocular cameras is sensitive to lighting conditions, easily introducing noise and affecting the reliability of depth perception. Therefore, due to the different drawbacks of different sensor types, existing depth information acquisition methods, regardless of the sensor used, inevitably suffer from incomplete depth information and insufficient accuracy. Summary of the Invention

[0004] In view of this, this application provides a depth completion method and device based on the fusion of binocular camera and lidar, which can combine the data measurement advantages of binocular camera and lidar to effectively fuse the depth maps measured by both, and obtain a fused depth map with more complete depth information and higher data accuracy, thereby solving the technical problems of sparse, incomplete and insufficient depth information in the prior art.

[0005] To make the above-mentioned objectives, features and advantages of this application more apparent and understandable, preferred embodiments are described below in detail with reference to the accompanying drawings.

[0006] In a first aspect, embodiments of this application provide a depth completion method based on the fusion of binocular camera and LiDAR, the depth completion method comprising: The binocular camera and lidar installed in the target scene are spatiotemporally aligned, and a scene map and a binocular camera depth map corresponding to the target scene are generated based on the spatiotemporally aligned binocular camera. A lidar depth map corresponding to the target scene is generated based on the spatiotemporally aligned lidar. The edge information of the scene map, the binocular camera depth map, and the lidar depth map is repaired respectively to obtain a first closed edge map corresponding to the scene map, a second closed edge map corresponding to the binocular camera depth map, and a third closed edge map corresponding to the lidar depth map. Based on the first closed edge map and the second closed edge map, the abnormal depth values ​​in the stereo camera depth map are corrected to obtain the depth repair map corresponding to the stereo camera depth map. Based on the third closed edge map and the depth repair map, a fused depth map corresponding to the target scene is generated.

[0007] Secondly, embodiments of this application provide a depth completion device based on the fusion of a binocular camera and a lidar, the depth completion device comprising: The spatiotemporal alignment module is used to perform spatiotemporal alignment of the binocular camera and the LiDAR installed in the target scene, and generate a scene map and a binocular camera depth map corresponding to the target scene based on the spatiotemporally aligned binocular camera, and generate a LiDAR depth map corresponding to the target scene based on the spatiotemporally aligned LiDAR. The edge repair module is used to repair the edge information of the scene map, the binocular camera depth map and the lidar depth map respectively, to obtain a first closed edge map corresponding to the scene map, a second closed edge map corresponding to the binocular camera depth map and a third closed edge map corresponding to the lidar depth map. The depth repair module is used to correct abnormal depth values ​​in the stereo camera depth map based on the first closed edge map and the second closed edge map, so as to obtain the depth repair map corresponding to the stereo camera depth map. The deep fusion module is used to generate a fused depth map corresponding to the target scene based on the third closed edge map and the depth repair map.

[0008] Thirdly, embodiments of this application provide an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the deep completion method described above.

[0009] Fourthly, embodiments of this application provide a computer-readable storage medium storing a computer program, which, when executed by a processor, performs the steps of the aforementioned deep completion method.

[0010] The technical solutions provided by the embodiments of this application may include the following beneficial effects: This application provides a depth completion method and apparatus based on the fusion of binocular camera and lidar, which can combine the respective data measurement advantages of binocular camera and lidar to effectively fuse the depth maps measured by both, resulting in a fused depth map with more complete depth information and higher data accuracy, thereby solving the technical problems of sparse, incomplete and insufficient depth information in the prior art. Attached Figure Description

[0011] To more clearly illustrate the technical solutions of the embodiments of this application, the accompanying drawings used in the embodiments will be briefly introduced below. It should be understood that the following drawings only show some embodiments of this application and should not be regarded as a limitation of the scope. For those skilled in the art, other related drawings can be obtained based on these drawings without creative effort.

[0012] Figure 1 The diagram illustrates a flow chart of a depth completion method based on the fusion of binocular camera and lidar provided in an embodiment of this application. Figure 2 A schematic diagram of a depth completion device based on the fusion of binocular camera and lidar provided in an embodiment of this application is shown. Figure 3 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. Detailed Implementation

[0013] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. It should be understood that the accompanying drawings in this application are for illustrative and descriptive purposes only and are not intended to limit the scope of protection of this application. Furthermore, it should be understood that the schematic drawings are not drawn to scale. The flowcharts used in this application illustrate operations implemented according to some embodiments of this application. It should be understood that the operations in the flowcharts may not be implemented in sequence, and steps without logical contextual relationships may be reversed or implemented simultaneously. In addition, those skilled in the art, guided by the content of this application, may add one or more other operations to the flowcharts, or remove one or more operations from the flowcharts.

[0014] Furthermore, the described embodiments are merely some, not all, of the embodiments of this application. The components of the embodiments of this application described and illustrated herein can typically be arranged and designed in various different configurations. Therefore, the following detailed description of the embodiments of this application provided in the accompanying drawings is not intended to limit the scope of the claimed application, but merely to illustrate selected embodiments of the application. All other embodiments obtained by those skilled in the art based on the embodiments of this application without inventive effort are within the scope of protection of this application.

[0015] It should be noted that the term "comprising" will be used in the embodiments of this application to indicate the presence of the features declared thereafter, but does not exclude the addition of other features.

[0016] In this embodiment of the application, a depth completion method based on the fusion of a binocular camera and a lidar can be applied to a depth completion device. The depth completion device can communicate with the binocular camera and lidar installed in the target scene, receive relevant data from the binocular camera and lidar, and then process the received relevant data to generate the final depth map (i.e., the fused depth map) corresponding to the target scene.

[0017] To facilitate understanding of the embodiments of this application, a depth completion method and apparatus based on the fusion of binocular camera and lidar provided in the embodiments of this application will be described in detail below.

[0018] Reference Figure 1 As shown, Figure 1 The diagram illustrates a flow chart of a depth completion method based on the fusion of a binocular camera and LiDAR, provided in an embodiment of this application. The depth completion method includes steps S101-S104; specifically: S101, perform spatiotemporal alignment of the binocular camera and LiDAR installed in the target scene, and generate a scene map and a binocular camera depth map corresponding to the target scene based on the spatiotemporally aligned binocular camera, and generate a LiDAR depth map corresponding to the target scene based on the spatiotemporally aligned LiDAR.

[0019] Here, the target scene refers to the spatial scene for which depth information needs to be acquired. For example, if the target scene is an indoor scene, a mobile robot can be placed in the indoor scene, and a LiDAR and a binocular camera can be mounted on the mobile robot. By controlling the mobile robot to move in the indoor scene, the LiDAR and the binocular camera can collect depth information at various spatial locations in the indoor scene and generate corresponding depth maps.

[0020] Here, because there are time differences (i.e., time deviations between the timestamps corresponding to the same data collected by the binocular camera and the lidar) and spatial differences (the coordinate systems corresponding to the same spatial location point are different in the data collected by the binocular camera and the lidar respectively), spatiotemporal alignment of the binocular camera and the lidar is the basis for the subsequent fusion of the depth information collected by the two sensors.

[0021] Specifically, as an optional embodiment, the binocular camera and the lidar can be spatiotemporally aligned according to the method shown in steps a1-a5 below: Step a1: Collect data from the calibration reference object using a binocular camera and a lidar, respectively, to obtain the target image and laser point cloud data corresponding to the calibration reference object.

[0022] Here, in order to determine the coordinate transformation relationship (i.e., spatial position mapping relationship) between the camera coordinate system corresponding to the binocular camera and the lidar coordinate system corresponding to the lidar, a calibration reference object located in the target scene (i.e., the target scene includes the calibration reference object) can be used to obtain the position coordinates of the calibration reference object in the two coordinate systems respectively. Thus, the coordinate transformation relationship between the two coordinate systems can be solved based on the position coordinates of the same spatial position (i.e., the same position point on the calibration reference object) in the two coordinate systems respectively.

[0023] It should be noted that the above-mentioned calibration reference object can be a calibration board (such as a checkerboard calibration board) placed in front of the binocular camera and the lidar in the target scene, or it can be a spatial plane such as a wall or ground in the target scene. This application embodiment does not limit the specific type of the above-mentioned calibration reference object.

[0024] Step a2: Determine the spatial position mapping relationship between the camera coordinate system corresponding to the binocular camera and the lidar coordinate system corresponding to the lidar based on the first position coordinate of the feature point on the calibration reference in the target image and the second position coordinate of the feature point in the lidar point cloud data.

[0025] Here, when the calibration reference object is a calibration board (e.g., a checkerboard calibration board), the aforementioned feature points can be multiple corner points on the calibration board (e.g., multiple interior corner points on the checkerboard calibration board); when the calibration reference object is a spatial plane such as a wall or ground in the target scene, the aforementioned feature points can be multiple specific location points pre-marked on the aforementioned spatial plane.

[0026] Specifically, the coordinates of the first position of a feature point in the target image are marked as... Mark the second position coordinates of the feature point in the laser point cloud data. Then it can be determined = ×R+T; where R represents the rotation matrix and T represents the translation vector; based on this, according to the first position coordinates and the second position coordinates corresponding to multiple feature points respectively, by solving the above equation, the specific parameter values ​​of the rotation matrix R and the translation vector T can be obtained. At this time, the spatial position mapping relationship between the camera coordinate system and the lidar coordinate system can be represented by the rotation matrix R and the translation vector T.

[0027] Step a3: Within a preset time period, control the binocular camera to continuously acquire images of the calibration reference object while in motion, obtain a target image sequence, and determine the camera motion trajectory corresponding to the binocular camera based on the target image sequence.

[0028] Here, in order to determine the time deviation between the depth information acquired by the binocular camera and the depth information acquired by the lidar, it is necessary to acquire the motion trajectories of the binocular camera and the lidar within the same time period (i.e., the aforementioned preset time period); wherein, the specific length of the preset time period can be flexibly adjusted according to the actual detection requirements, and this embodiment of the application does not impose any limitation on it.

[0029] Here, taking the binocular camera and lidar mounted on a mobile robot as an example, by controlling the mobile robot to move around the aforementioned calibration reference within the aforementioned preset time period, the binocular camera and lidar can both be in motion within the same preset time period.

[0030] Specifically, during step a3, the stereo camera... At any given time, an image is acquired from the calibration reference object, and a timestamp can be obtained. If a stereo camera captures n images within a preset time period, then the image sequence consisting of these n images, with timestamps arranged in the order of capture, can be used as the target image sequence. At this point, a target feature point can be selected from the calibration reference. From the target image sequence, the position coordinates and corresponding timestamps of the target feature point in each captured image can be extracted to obtain the movement trajectory of the target feature point in the target image sequence within the preset time period, thereby quantitatively representing the camera motion trajectory corresponding to the stereo camera.

[0031] Step a4: Within the same preset time period, control the lidar to continuously collect data from the calibration reference object while in motion to obtain a continuous point cloud frame sequence, and determine the lidar motion trajectory corresponding to the lidar based on the continuous point cloud frame sequence.

[0032] Specifically, during step a4, the lidar in Data is collected from the calibration reference at any given time, and the timestamp can be obtained. If a point cloud frame is continuously acquired by the lidar within a preset time period, then a continuous point cloud frame sequence consisting of n point cloud frames with timestamps arranged in the order of acquisition can be obtained. At this time, the same target feature point can be selected from the calibration reference object, and the position coordinates and corresponding timestamps of the target feature point in each point cloud frame can be extracted from the continuous point cloud frame sequence. The movement trajectory of the target feature point in the continuous point cloud frame sequence within the preset time period can be obtained to quantitatively represent the lidar motion trajectory corresponding to the lidar.

[0033] Step a5: Using the Levonburg-Marquardt optimization algorithm, perform time offset calibration on the camera motion trajectory and the lidar motion trajectory to determine the time offset between the binocular camera and the lidar.

[0034] Here, the Levenberg-Marquardt optimization algorithm can be used to nonlinearly minimize the difference between two motion trajectories.

[0035] Specifically, the camera motion trajectory and the lidar motion trajectory are input into the Levenberg-Marquardt optimization algorithm. The Levenberg-Marquardt optimization algorithm will continuously try different time offset t values. After shifting the lidar motion trajectory by t, the spatial distance error between the time-shifted lidar motion trajectory and the camera motion trajectory at the corresponding time is calculated until the spatial distance error reaches its minimum. The optimal solution for the time offset t value is then obtained as the time offset.

[0036] It should be noted that, in addition to the Levenberg-Marquardt optimization algorithm, time synchronization technology based on IMU (Inertial Measurement Unit) can also be used to replace the Levenberg-Marquardt algorithm to complete the time offset calibration between multiple sensors (i.e., binocular camera and LiDAR).

[0037] Here, the binocular camera consists of two lenses. Therefore, by taking pictures of the target scene based on the spatiotemporally aligned binocular camera, two two-dimensional color images of the target scene can be obtained (which can be denoted as the left eye image and the right eye image, respectively). At this time, one of the two two-dimensional color images can be selected as the scene image (for example, the left eye image can be selected as the scene image). Then, the disparity of each pixel point between the two two-dimensional color images (that is, the difference in image position of the same point point between the left eye image and the right eye image) is calculated. The depth information corresponding to each pixel point can be obtained by using the disparity matching algorithm, thereby generating a binocular camera depth map.

[0038] Here, since lidar uses a light beam to measure the distance between the target and the sensor, lidar can directly measure the depth information corresponding to each location point in the target scene by emitting a light beam and calculating the round-trip time of the light beam, thus obtaining the lidar depth map mentioned above.

[0039] It should be noted that when aligning the stereo camera and the LiDAR in time and space, the position information of the data generated by the LiDAR (i.e., the LiDAR depth map mentioned above) can be transformed from the LiDAR coordinate system to the camera coordinate system corresponding to the stereo camera, based on the spatial position mapping relationship calculated in step a2 above. The timestamp corresponding to the data generated by the LiDAR is then superimposed with the time offset mentioned above, thereby ensuring that the data generated by the LiDAR and the data generated by the stereo camera (i.e., the scene map and the stereo camera depth map mentioned above) are valid data with a basis for fusion (i.e., spatial position is aligned and time is also aligned).

[0040] S102, the edge information of the scene map, the binocular camera depth map and the lidar depth map are repaired respectively to obtain the first closed edge map corresponding to the scene map, the second closed edge map corresponding to the binocular camera depth map and the third closed edge map corresponding to the lidar depth map.

[0041] Specifically, when executing step S102, the depth completion device can first use an edge detection algorithm (e.g., the Canny edge detection algorithm) to extract edges from the scene map, the stereo camera depth map, and the lidar depth map respectively, to obtain a first initial edge map corresponding to the scene map, a second initial edge map corresponding to the stereo camera depth map, and a third initial edge map corresponding to the lidar depth map (which is also equivalent to extracting the edge features corresponding to the scene map, the stereo camera depth map, and the lidar depth map respectively).

[0042] Then, the depth completion device can use the edge repair function to repair the sparse edges existing in the first initial edge map, the second initial edge map and the third initial edge map respectively, to obtain the first closed edge map, the second closed edge map and the third closed edge map.

[0043] S103, based on the first closed edge map and the second closed edge map, the abnormal depth values ​​in the stereo camera depth map are corrected to obtain the depth repair map corresponding to the stereo camera depth map.

[0044] Here, abnormal depth values ​​refer to the depth values ​​located in the hole region and the noise region in the depth map of the stereo camera; the correction of abnormal depth values ​​can be divided into two steps: detecting abnormal depth values ​​from the depth map of the stereo camera and adjusting the detected abnormal depth values.

[0045] Specifically, as an optional embodiment, step S103 can be performed according to the method shown in steps b1-b3 below: Step b1: Based on the first closed edge map, determine from the binocular camera depth map that the depth value that does not match the detection criteria belongs to the abnormal depth value.

[0046] Here, since the first closed edge map is derived from the scene map (RGB image), the first closed edge map can provide the complete outline boundary of the real objects in the target scene.

[0047] Specifically, in an ideal situation, the depth value inside the same object should change continuously, while the depth value between different objects or between an object and the background will change abruptly. Therefore, step b1 takes advantage of this characteristic and uses the first closed edge map as the detection basis to find depth values ​​that do not conform to the above rules from the stereo camera depth map as the abnormal depth values ​​to be detected.

[0048] For example, if a hole (depth value of 0) or a drastic change in depth value (difference from surrounding pixels exceeding a preset threshold) appears at the corresponding location in the depth map of a certain object region indicated by the first closed edge map, the depth values ​​of these pixels can be determined as abnormal depth values.

[0049] Step b2: Based on the position of the abnormal depth value in the stereo camera depth map, generate a mask region corresponding to the abnormal depth value in the stereo camera depth map.

[0050] Specifically, by recording all the pixel positions that were determined to be abnormal depth values ​​in step b1, the original stereo camera depth map can be converted into a binary mask image. In this binary mask image, pixels marked as "1" (or white) indicate that there are abnormal depth values ​​at that position, which need to be repaired later; pixels marked as "0" (or black) indicate that the depth value at that position is normal and remains unchanged.

[0051] It should be noted that, at this time, in the above binary mask image, the area formed by the pixels marked as 1 is the mask area corresponding to the abnormal depth value in the stereo camera depth map (which can be used to indicate the specific location in the stereo camera depth map where depth completion is required).

[0052] Step b3: Based on the second sealed edge map, complete the depth values ​​in the mask area to obtain the depth repair map.

[0053] Here, since the second closed edge map is derived from the edge extraction and repair of the stereo camera depth map itself, the second closed edge map can reflect the location of depth discontinuity in the stereo camera depth map (i.e., object boundary). Step b3 uses the second closed edge map as guiding information for the completion process, which can complete the depth values ​​in the mask area.

[0054] Specifically, during the completion process, the effective depth value at the boundary of the masked region is used as a reference. Interpolation calculations are performed inwards along a direction parallel to the second closed edge map, ensuring that the completed depth value transitions smoothly within the object (i.e., the real object in the target scene corresponding to the stereo camera depth map) and remains clear at the object boundary. Simultaneously, for sparse LiDAR points within the masked region, their depth values ​​are preferentially used as anchor points, which helps improve completion accuracy.

[0055] S104, Generate a fused depth map corresponding to the target scene based on the third closed edge map and the depth repair map.

[0056] Here, when performing step S104, the depth value of the edge region in the depth repair map can be corrected according to the third closed edge map to obtain the fused depth map.

[0057] Specifically, as an optional embodiment, the depth value of the corresponding position in the lidar depth map can be extracted at the edge pixel position indicated by the third closed edge map, the depth value of the corresponding position in the depth repair map can be replaced with the depth value, and the replaced edge can be smoothed to obtain the fused depth map.

[0058] Specifically, as another optional embodiment, for the edge area covered by the third closed edge map, sparse depth points of the lidar depth map in that area can be obtained, and weighted fusion can be performed with the lidar depth value as the high weight and the original depth value of the depth repair map as the low weight to generate a fused depth map with improved edge accuracy.

[0059] In this embodiment of the application, regarding the depth restoration method corresponding to step S103 above, as another optional embodiment, in addition to step S103 above, the depth completion device can also input the scene map, the stereo camera depth map and the lidar depth map into a pre-trained depth generation model, and correct the abnormal depth values ​​in the stereo camera depth map through the depth generation model to output the depth restoration map; wherein, the depth generation model can be a GAN (Generative Adversarial Network) model.

[0060] In this application embodiment, as another optional embodiment, a depth consistency constraint can also be constructed using a graph optimization or random field model, following steps c1-c5, to replace the edge correction strategy in step S103 above, thereby obtaining a high-precision depth restoration map. Specifically: Step c1: Construct an energy function based on the depth map of the binocular camera, the depth map of the lidar, the first closed edge map, and the second closed edge map.

[0061] Here, the energy function is a mathematical expression used to quantify the "quality" of a depth map; the smaller the value of the energy function, the more the depth map meets the user's expectations.

[0062] Specifically, in step c1, the depth restoration map to be solved is treated as an unknown variable, and an energy function containing multiple constraint terms is constructed. These constraint terms correspond to the data terms, smoothing terms, and prior terms that will be described in subsequent steps. Through the specific definition of these constraint terms in subsequent steps, the energy function can comprehensively reflect the consistency between the depth restoration map and the true value of the lidar, the similarity with the reliable area of ​​the binocular depth map, and the smoothness requirements of the depth map itself.

[0063] Step c2: Determine the smoothing term constraint conditions in the energy function based on the first closed edge diagram and the second closed edge diagram.

[0064] Here, the smoothing constraint condition is used to allow the depth value to jump at the edge position indicated by the first closed edge map or the second closed edge map, and to constrain the smooth change of the depth value in the non-edge region.

[0065] Specifically, the smoothing constraint is used to constrain the difference in depth values ​​between adjacent pixels. Its core idea is that: inside an object, the depth value should change smoothly (small difference between adjacent pixels); at the edge of an object, the depth value can change abruptly (large difference between adjacent pixels).

[0066] In step c2, the first and second closed edge maps are used to identify which locations are object edges. In the smoothing term, adjacent pixel pairs that cross the edge line are assigned a very small weight (or even zero), making them almost unconstrained by smoothing, thus allowing depth jumps. For adjacent pixel pairs in non-edge regions, a larger weight is assigned, forcing their depth values ​​to remain smooth. In this way, the smoothing term can adaptively smooth inside the object while preserving discontinuities at the edges.

[0067] Step c3: Determine the data constraints in the energy function based on the lidar depth map.

[0068] Here, the data item constraints are used to ensure that the optimized depth value is consistent with the sparse effective depth value in the lidar depth map.

[0069] Specifically, the data constraints are used to ensure that the resulting depth restoration map faithfully reflects the high-precision sparse depth values ​​provided by the LiDAR. Although the LiDAR depth map is sparse, the depth value of each valid point is accurate and reliable, and can serve as an "anchor point" for depth restoration.

[0070] In step c3, these valid LiDAR points are used as hard constraints, requiring the optimized depth values ​​in the energy function to be as close as possible to the original LiDAR measurements at these pixel locations. If a pixel location has no LiDAR point, this constraint has no effect.

[0071] Step c4: Determine the prior constraints in the energy function based on the depth map of the binocular camera.

[0072] Here, the prior constraint is used to ensure that the optimized depth value is close to the reliable region depth value in the stereo camera depth map.

[0073] Specifically, the prior constraints are used to preserve useful information from the stereo camera depth map. Although the stereo camera depth map may contain holes and noise, its depth values ​​still have some reference value in areas with rich texture and reliable matching.

[0074] In step c4, reliable regions in the stereo camera depth map can first be identified (for example, by using a confidence map or determining if the deviation from the LiDAR point is less than a certain threshold). Then, the energy function requires the optimized depth value to be as close as possible to the original value of the stereo camera depth map in these reliable regions. This preserves the texture details of the stereo depth map while preventing unreliable regions from interfering with the solution results.

[0075] Step c5: Under the condition that the smoothing term constraint, the data term constraint and the prior term constraint are satisfied, the depth restoration map corresponding to the stereo camera depth map is obtained by minimizing the energy function.

[0076] Specifically, after defining the three constraints mentioned above, the energy function is expressed as a quadratic function of the unknown depth value (each term being a squared error). At this point, the problem of solving the depth restoration map is transformed into finding the minimum value of this quadratic function. Since the minimum value of the quadratic function can be obtained by solving a large system of sparse linear equations, step c5 uses numerical optimization methods (such as the conjugate gradient method, Gauss-Seidel iteration method, etc.) to solve this system of linear equations. The solution result is the optimal depth value for each pixel location. Organizing these values ​​into an image format yields the complete depth restoration map. This depth restoration map maintains high accuracy at the effective points of the LiDAR, preserves texture details in the reliable depth region of the binocular system, maintains smoothness inside objects, and maintains clear discontinuities at edges.

[0077] Based on the depth completion method based on the fusion of binocular camera and lidar provided in the embodiments of this application, the depth maps obtained by the binocular camera and lidar can be effectively fused by combining their respective data measurement advantages to obtain a fused depth map with more complete depth information and higher data accuracy, thereby solving the technical problems of sparse, incomplete and insufficient depth information in the prior art.

[0078] Based on the same inventive concept, this application also provides a depth completion device corresponding to the above-mentioned depth completion method. Since the principle of the depth completion device in the embodiments of this application is similar to that of the above-mentioned depth completion method in the embodiments of this application, the implementation of the depth completion device can refer to the implementation of the above-mentioned depth completion method, and the repeated parts will not be described again.

[0079] Reference Figure 2 As shown, Figure 2 This illustration shows a schematic diagram of a depth completion device based on the fusion of a binocular camera and a lidar, according to an embodiment of this application. The depth completion device includes: The spatiotemporal alignment module 201 is used to perform spatiotemporal alignment of the binocular camera and the lidar installed in the target scene, and generate a scene map and a binocular camera depth map corresponding to the target scene based on the spatiotemporally aligned binocular camera, and generate a lidar depth map corresponding to the target scene based on the spatiotemporally aligned lidar. The edge repair module 202 is used to repair the edge information of the scene map, the binocular camera depth map and the lidar depth map respectively, to obtain a first closed edge map corresponding to the scene map, a second closed edge map corresponding to the binocular camera depth map and a third closed edge map corresponding to the lidar depth map. The depth repair module 203 is used to correct the abnormal depth values ​​in the stereo camera depth map according to the first closed edge map and the second closed edge map, so as to obtain the depth repair map corresponding to the stereo camera depth map. The deep fusion module 204 is used to generate a fusion depth map corresponding to the target scene based on the third closed edge map and the depth repair map.

[0080] In an optional implementation, the target scene includes a calibration reference, wherein, during the spatiotemporal alignment of the binocular camera and lidar installed in the target scene, the spatiotemporal alignment module 201 is used to: Data is collected from the calibration reference object using the binocular camera and the lidar respectively, to obtain the target image and lidar point cloud data corresponding to the calibration reference object; Based on the first position coordinates of the feature point on the calibration reference in the target image and the second position coordinates of the feature point in the laser point cloud data, the spatial position mapping relationship between the camera coordinate system corresponding to the binocular camera and the laser radar coordinate system corresponding to the laser radar is determined. Within a preset time period, the binocular camera is controlled to continuously acquire images of the calibration reference object while in motion, thereby obtaining a target image sequence, and the camera motion trajectory corresponding to the binocular camera is determined based on the target image sequence. Within the same preset time period, the lidar is controlled to continuously collect data from the calibration reference object while in motion, thereby obtaining a continuous point cloud frame sequence, and the lidar motion trajectory corresponding to the lidar is determined based on the continuous point cloud frame sequence. The Levonburg-Marquardt optimization algorithm is used to perform time offset calibration on the camera motion trajectory and the lidar motion trajectory to determine the time offset between the binocular camera and the lidar.

[0081] In an optional implementation, when repairing the edge information in the scene map, the binocular camera depth map, and the lidar depth map, the edge repair module 202 is used to: Edge detection algorithms are used to extract edges from the scene map, the stereo camera depth map, and the lidar depth map to obtain a first initial edge map corresponding to the scene map, a second initial edge map corresponding to the stereo camera depth map, and a third initial edge map corresponding to the lidar depth map. The sparse edges in the first initial edge map, the second initial edge map, and the third initial edge map are repaired by the edge repair function to obtain the first closed edge map, the second closed edge map, and the third closed edge map.

[0082] In an optional implementation, when correcting abnormal depth values ​​in the binocular camera depth map based on the first occlusion edge map and the second occlusion edge map, the depth repair module 203 is used to: Based on the first closed edge map, from the depth map of the binocular camera, it is determined that depth values ​​that do not match the detection criteria belong to the abnormal depth values; Based on the position of the abnormal depth value in the stereo camera depth map, a mask region corresponding to the abnormal depth value in the stereo camera depth map is generated. Based on the second sealed edge map, the depth values ​​within the mask area are completed to obtain the depth repair map.

[0083] In an optional implementation, when generating the fused depth map corresponding to the target scene based on the third closed edge map and the depth repair map, the depth fusion module 204 is used to: Based on the third closed edge map, the depth values ​​of the edge regions in the depth repair map are corrected to obtain the fused depth map.

[0084] In an optional implementation, the depth completion device further includes: a first alternative repair device, wherein the first alternative repair device is used for: The scene map, the stereo camera depth map, and the lidar depth map are input into a pre-trained depth generation model. The depth generation model corrects abnormal depth values ​​in the stereo camera depth map and outputs the repaired depth map.

[0085] In one optional embodiment, the depth completion device further includes: a second alternative repair device, wherein the second alternative repair device is used for: An energy function is constructed based on the binocular camera depth map, the lidar depth map, the first closed edge map, and the second closed edge map. Based on the first closed edge map and the second closed edge map, the smoothing term constraint condition in the energy function is determined; wherein, the smoothing term constraint condition is used to allow the depth value to jump at the edge position indicated by the first closed edge map or the second closed edge map, and to constrain the depth value to change smoothly in the non-edge region; Based on the lidar depth map, the data term constraints in the energy function are determined; wherein, the data term constraints are used to ensure that the optimized depth value is consistent with the sparse effective depth value in the lidar depth map; Based on the depth map from the stereo camera, the prior constraint conditions in the energy function are determined; wherein, the prior constraint conditions are used to ensure that the optimized depth value is close to the depth value of the reliable region in the depth map from the stereo camera. Under the condition that the smoothing term constraint, the data term constraint, and the prior term constraint are satisfied, the depth restoration map corresponding to the stereo camera depth map is obtained by minimizing the energy function.

[0086] like Figure 3 As shown, this application embodiment also provides an electronic device 300 for executing the deep completion method in this application (the electronic device 300 is also equivalent to the aforementioned deep completion device). The electronic device includes a memory 301, a processor 302, and a computer program stored in the memory 301 and executable on the processor 302. The memory 301 and the processor 302 are connected via a bus for communication. When the processor 302 executes the computer program, it implements the steps of the aforementioned deep completion method.

[0087] Specifically, the memory 301 and processor 302 mentioned above can be general-purpose memory and processor, without any specific limitations. When the processor 302 runs the computer program stored in the memory 301, it can execute the deep completion method mentioned above.

[0088] Corresponding to the deep completion method in this application, this application embodiment also provides a computer-readable storage medium storing a computer program, which is executed by a processor to perform the steps of the above-described deep completion method.

[0089] Specifically, the storage medium can be a general-purpose storage medium, such as a removable disk or hard disk. When the computer program on the storage medium is run, it can execute the aforementioned deep completion method.

[0090] In the embodiments provided in this application, it should be understood that the disclosed systems and methods can be implemented in other ways. The system embodiments described above are merely illustrative. For example, the division of units is only a logical functional division, and there may be other division methods in actual implementation. Furthermore, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Additionally, the coupling or direct coupling or communication connection shown or discussed may be through some communication interface; the indirect coupling or communication connection between systems or units may be electrical, mechanical, or other forms.

[0091] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0092] In addition, the functional units in the embodiments provided in this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit.

[0093] If the aforementioned functions are implemented as software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or a portion of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0094] It should be noted that similar labels and letters in the following figures indicate similar items. Therefore, once an item is defined in one figure, it does not need to be further defined and explained in subsequent figures. In addition, the terms "first", "second", "third", etc. are used only to distinguish descriptions and should not be construed as indicating or implying relative importance.

[0095] Finally, it should be noted that the above-described embodiments are merely specific implementations of this application, used to illustrate the technical solutions of this application, and not to limit them. The protection scope of this application is not limited thereto. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that any person skilled in the art can still modify or easily conceive of changes to the technical solutions described in the foregoing embodiments, or make equivalent substitutions for some of the technical features, within the scope of the technology disclosed in this application; and these modifications, changes, or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this application. All should be covered within the protection scope of this application. Therefore, the protection scope of this application should be determined by the protection scope of the claims.

Claims

1. A depth completion method based on the fusion of binocular camera and lidar, characterized in that, The depth completion method includes: The binocular camera and lidar installed in the target scene are spatiotemporally aligned, and a scene map and a binocular camera depth map corresponding to the target scene are generated based on the spatiotemporally aligned binocular camera. A lidar depth map corresponding to the target scene is generated based on the spatiotemporally aligned lidar. The edge information of the scene map, the binocular camera depth map, and the lidar depth map is repaired respectively to obtain a first closed edge map corresponding to the scene map, a second closed edge map corresponding to the binocular camera depth map, and a third closed edge map corresponding to the lidar depth map. Based on the first closed edge map and the second closed edge map, the abnormal depth values ​​in the stereo camera depth map are corrected to obtain the depth repair map corresponding to the stereo camera depth map. Based on the third closed edge map and the depth repair map, a fused depth map corresponding to the target scene is generated.

2. The depth completion method according to claim 1, characterized in that, The target scene includes calibration reference objects, wherein the spatiotemporal alignment of the binocular camera and lidar installed in the target scene includes: Data is collected from the calibration reference object using the binocular camera and the lidar respectively, to obtain the target image and lidar point cloud data corresponding to the calibration reference object; Based on the first position coordinates of the feature point on the calibration reference in the target image and the second position coordinates of the feature point in the laser point cloud data, the spatial position mapping relationship between the camera coordinate system corresponding to the binocular camera and the laser radar coordinate system corresponding to the laser radar is determined. Within a preset time period, the binocular camera is controlled to continuously acquire images of the calibration reference object while in motion, thereby obtaining a target image sequence, and the camera motion trajectory corresponding to the binocular camera is determined based on the target image sequence. Within the same preset time period, the lidar is controlled to continuously collect data from the calibration reference object while in motion, thereby obtaining a continuous point cloud frame sequence, and the lidar motion trajectory corresponding to the lidar is determined based on the continuous point cloud frame sequence. The Levonburg-Marquardt optimization algorithm is used to perform time offset calibration on the camera motion trajectory and the lidar motion trajectory to determine the time offset between the binocular camera and the lidar.

3. The depth completion method according to claim 1, characterized in that, The step of repairing the edge information in the scene map, the binocular camera depth map, and the lidar depth map includes: Edge detection algorithms are used to extract edges from the scene map, the stereo camera depth map, and the lidar depth map to obtain a first initial edge map corresponding to the scene map, a second initial edge map corresponding to the stereo camera depth map, and a third initial edge map corresponding to the lidar depth map. The sparse edges in the first initial edge map, the second initial edge map, and the third initial edge map are repaired by the edge repair function to obtain the first closed edge map, the second closed edge map, and the third closed edge map.

4. The depth completion method according to claim 1, characterized in that, The step of correcting abnormal depth values ​​in the depth map of the binocular camera based on the first and second closure edge maps includes: Based on the first closed edge map, from the depth map of the binocular camera, it is determined that depth values ​​that do not match the detection criteria belong to the abnormal depth values; Based on the position of the abnormal depth value in the stereo camera depth map, a mask region corresponding to the abnormal depth value in the stereo camera depth map is generated. Based on the second sealed edge map, the depth values ​​within the mask area are completed to obtain the depth repair map.

5. The depth completion method according to claim 1, characterized in that, The step of generating a fused depth map corresponding to the target scene based on the third closed edge map and the depth restoration map includes: Based on the third closed edge map, the depth values ​​of the edge regions in the depth repair map are corrected to obtain the fused depth map.

6. The depth completion method according to claim 1, characterized in that, The depth completion method also includes: The scene map, the stereo camera depth map, and the lidar depth map are input into a pre-trained depth generation model. The depth generation model corrects abnormal depth values ​​in the stereo camera depth map and outputs the repaired depth map.

7. The depth completion method according to claim 1, characterized in that, The depth completion method also includes: An energy function is constructed based on the binocular camera depth map, the lidar depth map, the first closed edge map, and the second closed edge map. Based on the first closed edge map and the second closed edge map, the smoothing term constraint condition in the energy function is determined; wherein, the smoothing term constraint condition is used to allow the depth value to jump at the edge position indicated by the first closed edge map or the second closed edge map, and to constrain the depth value to change smoothly in the non-edge region; Based on the lidar depth map, the data term constraints in the energy function are determined; wherein, the data term constraints are used to ensure that the optimized depth value is consistent with the sparse effective depth value in the lidar depth map; Based on the depth map from the stereo camera, the prior constraint conditions in the energy function are determined; wherein, the prior constraint conditions are used to ensure that the optimized depth value is close to the depth value of the reliable region in the depth map from the stereo camera. Under the condition that the smoothing term constraint, the data term constraint, and the prior term constraint are satisfied, the depth restoration map corresponding to the stereo camera depth map is obtained by minimizing the energy function.

8. A depth completion device based on the fusion of binocular camera and lidar, characterized in that, The depth completion device includes: The spatiotemporal alignment module is used to perform spatiotemporal alignment of the binocular camera and the LiDAR installed in the target scene, and generate a scene map and a binocular camera depth map corresponding to the target scene based on the spatiotemporally aligned binocular camera, and generate a LiDAR depth map corresponding to the target scene based on the spatiotemporally aligned LiDAR. The edge repair module is used to repair the edge information of the scene map, the binocular camera depth map and the lidar depth map respectively, to obtain a first closed edge map corresponding to the scene map, a second closed edge map corresponding to the binocular camera depth map and a third closed edge map corresponding to the lidar depth map. The depth repair module is used to correct abnormal depth values ​​in the stereo camera depth map based on the first closed edge map and the second closed edge map, so as to obtain the depth repair map corresponding to the stereo camera depth map. The deep fusion module is used to generate a fused depth map corresponding to the target scene based on the third closed edge map and the depth repair map.

9. An electronic device, characterized in that, include: The device includes a processor, a memory, and a bus. The memory stores machine-readable instructions executable by the processor. When the electronic device is running, the processor communicates with the memory via the bus. When the machine-readable instructions are executed by the processor, the steps of the deep completion method as described in any one of claims 1 to 7 are performed.

10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that, when executed by a processor, performs the steps of the deep completion method as described in any one of claims 1 to 7.