A point cloud and image-based target fusion method and device

By using image tracking algorithms to determine and fuse the 3D bounding boxes of LiDAR point clouds during image and point cloud fusion, the problems of target omission and fragmentation are solved, and a more stable target detection effect is achieved.

CN115994877BActive Publication Date: 2026-06-30ZHEJIANG GEELY HLDG GRP CO LTD +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
ZHEJIANG GEELY HLDG GRP CO LTD
Filing Date
2023-01-13
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

In existing technologies, image and point cloud fusion methods suffer from problems such as missed target detection and target fragmentation, leading to unstable target detection results.

Method used

By acquiring the 3D bounding boxes of the LiDAR point cloud and the 2D bounding boxes of the image, the image tracking algorithm is used to determine whether multiple 3D bounding boxes are multiple targets split from the same target, and the maximum bounding rectangle is used to fuse them into a single target. The target fusion is performed by combining point cloud clustering algorithm and image detection algorithm.

Benefits of technology

It improves the stability of target detection, solves the problems of missed target detection and target fragmentation, and makes the target detection results more accurate and consistent.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115994877B_ABST
    Figure CN115994877B_ABST
Patent Text Reader

Abstract

Embodiments of the present application provide a target fusion method and device based on point cloud and image, the method comprising: obtaining a first sampling time, a first 3D box and a first 2D box of a laser radar point cloud; obtaining a second 2D box of an image and a third 2D box obtained by tracking the second 2D box; determining whether a plurality of first 3D boxes are a plurality of targets split from a same target according to the third 2D box; and when the plurality of first 3D boxes are a plurality of targets split from a same target, fusing the plurality of targets into one target by using a maximum circumscribed rectangle. Through the embodiment scheme, the target shoulder exposure and target splitting problem are solved, and the target detection result is more stable.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to image processing technology in autonomous driving, and more particularly to a target fusion method and apparatus based on point clouds and images. Background Technology

[0002] Currently, image and point cloud fusion is more of a pre-fusion of point clouds, and no longer relies heavily on the calibration matrix between LiDAR points and image pixels. This method can make full use of contextual relationships, enabling point cloud target detection to utilize image information. At the same time, the image-guided query initialization strategy can handle targets that are difficult to detect in point clouds. Summary of the Invention

[0003] This application provides a target fusion method and apparatus based on point cloud and image, which can solve the problems of target missed detection and target fragmentation, and make the target detection results more stable.

[0004] This application provides a target fusion method based on point clouds and images, the method may include:

[0005] Acquire the first sampling time, the first 3D bounding box, and the first 2D bounding box of the LiDAR point cloud;

[0006] Acquire a second 2D bounding box of the image and a third 2D bounding box obtained by tracking the second 2D bounding box;

[0007] Based on the third 2D bounding box, determine whether multiple first 3D bounding boxes are multiple targets split from the same target;

[0008] When multiple first 3D bounding boxes are multiple targets split from the same target, the multiple targets are merged into one target using the largest bounding rectangle.

[0009] In an exemplary embodiment of this application, the second 2D bounding box for acquiring the image and the third 2D bounding box obtained by tracking the second 2D bounding box may include:

[0010] The original data of the image and the second sampling time of the original data are obtained, and the second 2D bounding box is obtained based on the original data and a preset image target detection algorithm;

[0011] The second 2D bounding box is tracked using a preset tracking algorithm to obtain the trajectory corresponding to the second 2D bounding box;

[0012] At the second sampling time, the trajectory is predicted to be at the position of the trajectory at the first sampling time, and the detection box of the image at the first sampling time is obtained as the third 2D box.

[0013] In an exemplary embodiment of this application, determining whether multiple first 3D frames are multiple targets split from the same target based on the third 2D frame may include:

[0014] Calculate whether there is overlap between multiple third 2D boxes. When there is overlap between multiple third 2D boxes, calculate the overlapping area and mark the third 2D box with the overlapping area as an overlapping 2D box.

[0015] Calculate the overlap ratio of each of the first 2D boxes in each overlapping 2D box;

[0016] The overlap ratio is used to determine whether multiple first 3D bounding boxes are multiple targets split from the same target.

[0017] In an exemplary embodiment of this application, calculating the overlap ratio of each of the first 2D boxes in each overlapping 2D box may include:

[0018] For each overlapping 2D frame, calculate the first region of each first 2D frame that is within the overlapping 2D frame but not within the overlapping region of the overlapping 2D frame.

[0019] The area of ​​the first region and the area of ​​the first 2D frame are pre-calculated, and the calculation result is used as the overlap ratio.

[0020] In an exemplary embodiment of this application, determining whether the plurality of first 3D bounding boxes are multiple targets split from the same target based on the magnitude of the overlap ratio may include:

[0021] When the overlap ratio is greater than a preset threshold, the first 2D frame is stored.

[0022] If there are multiple first 2D boxes stored for each overlapping 2D box, then the first 3D boxes corresponding to these multiple first 2D boxes are treated as the same target and split into multiple targets.

[0023] In an exemplary embodiment of this application, determining whether the plurality of first 3D bounding boxes are multiple targets split from the same target based on the magnitude of the overlap ratio may include:

[0024] When the overlap ratio is less than or equal to a preset threshold, the first 3D box corresponding to the first 2D box is treated as a single target.

[0025] In an exemplary embodiment of this application, the step of performing a preset calculation on the area of ​​the first region and the area of ​​the first 2D frame, and using the calculation result as the overlap ratio, may include: calculating the ratio of the area of ​​the first region to the area of ​​the first 2D frame, and using the area ratio as the overlap ratio.

[0026] In an exemplary embodiment of this application, obtaining the first 3D bounding box and the first 2D bounding box of the LiDAR point cloud may include:

[0027] Based on the lidar point cloud, the first 3D bounding box is obtained using a preset clustering algorithm;

[0028] Convert the first 3D frame into a first 2D frame.

[0029] In an exemplary embodiment of this application, converting the first 3D frame into a first 2D frame may include:

[0030] Obtain extrinsic and intrinsic parameters; the extrinsic parameters include a first matrix corresponding to the transformation relationship between the camera coordinate system and the lidar coordinate system, and the intrinsic parameters include a second matrix corresponding to the camera's internal parameters;

[0031] Each of the multiple corners corresponding to each of the first 3D frames is transformed to the camera coordinate system using the first matrix;

[0032] The second matrix converts each corner, after being transformed to the camera coordinate system, into pixel coordinate values.

[0033] The largest bounding rectangle is obtained based on the pixel coordinates of multiple corners and used as the first 2D bounding box. This application provides a target fusion apparatus based on point clouds and images, which may include a processor and a computer-readable storage medium. The computer-readable storage medium stores instructions, which, when executed by the processor, implement the target fusion method based on point clouds and images.

[0034] Compared with related technologies, the embodiments of this application may include: acquiring a first sampling time, a first 3D bounding box, and a first 2D bounding box of the LiDAR point cloud; acquiring a second 2D bounding box of the image and a third 2D bounding box obtained by tracking the second 2D bounding box; determining whether multiple first 3D bounding boxes are multiple targets split from the same target based on the third 2D bounding box; when multiple first 3D bounding boxes are multiple targets split from the same target, merging the multiple targets into one target using the largest bounding rectangle. This embodiment solves the problems of target missed detection and target splitting, making the target detection results more stable.

[0035] Other features and advantages of this application will be set forth in the following description, and will be apparent in part from the description, or may be learned by practicing the application. Other advantages of this application can be realized and obtained by means of the solutions described in the description and the accompanying drawings. Attached Figure Description

[0036] The accompanying drawings are used to provide an understanding of the technical solutions of this application and constitute a part of the specification. They are used together with the embodiments of this application to explain the technical solutions of this application and do not constitute a limitation on the technical solutions of this application.

[0037] Figure 1 This is a flowchart of a target fusion method based on point clouds and images according to an embodiment of this application;

[0038] Figure 2 This is a schematic diagram of a target fusion method based on point cloud and image according to an embodiment of this application;

[0039] Figure 3 This is a schematic diagram of a point cloud processing method according to an embodiment of this application;

[0040] Figure 4 This is a schematic diagram illustrating the timing synchronization method between the 3D bounding box of a point cloud and the 2D bounding box of an image according to an embodiment of this application.

[0041] Figure 5 This is a schematic diagram of a method for aggregating point cloud split boxes based on image detection boxes according to an embodiment of this application;

[0042] Figure 6 This is a flowchart illustrating a method for determining whether multiple first 3D frames are multiple targets split from the same target based on a third 2D frame, according to an embodiment of this application.

[0043] Figure 7 This is a block diagram of a target fusion device based on point cloud and image according to an embodiment of this application. Detailed Implementation

[0044] This application describes several embodiments, but these descriptions are exemplary and not limiting, and it will be apparent to those skilled in the art that many more embodiments and implementations are possible within the scope of the embodiments described herein. Although many possible combinations of features are shown in the drawings and discussed in the detailed description, many other combinations of the disclosed features are also possible. Unless specifically limited, any feature or element of any embodiment may be used in combination with or in lieu of any other feature or element in any other embodiment.

[0045] This application includes and contemplates combinations of features and elements known to those skilled in the art. The embodiments, features, and elements disclosed in this application may also be combined with any conventional features or elements to form a unique inventive scheme as defined by the claims. Any feature or element of any embodiment may also be combined with features or elements from other inventive schemes to form another unique inventive scheme as defined by the claims. Therefore, it should be understood that any feature shown and / or discussed in this application may be implemented individually or in any suitable combination. Therefore, the embodiments are not limited except by the limitations imposed by the appended claims and their equivalents. Furthermore, various modifications and changes may be made within the scope of the appended claims.

[0046] Furthermore, in describing representative embodiments, the specification may have presented methods and / or processes as a specific sequence of steps. However, the method or process should not be limited to the specific order of steps described herein, to the extent that it does not depend on such a specific order. As will be understood by those skilled in the art, other sequences of steps are also possible. Therefore, the specific order of steps set forth in the specification should not be construed as a limitation of the claims. Moreover, the claims concerning the method and / or process should not be limited to the steps performed in the written order, and those skilled in the art will readily understand that these orders can be varied and still remain within the spirit and scope of the embodiments of this application.

[0047] This application provides a target fusion method based on point cloud and image, such as... Figure 1 , Figure 2 As shown, the method may include steps S101-S104:

[0048] S101, acquire the first sampling time T1 of the lidar point cloud, the first 3D bounding box, and the first 2D bounding box;

[0049] S102, acquire the second 2D bounding box of the image and the third 2D bounding box obtained by tracking the second 2D bounding box;

[0050] S103. Determine whether the multiple first 3D boxes are multiple targets split from the same target based on the third 2D box;

[0051] S104. When multiple first 3D boxes are multiple targets split from the same target, the multiple targets are merged into one target using the largest bounding rectangle.

[0052] In the exemplary embodiments of this application, the fusion is mainly based on the radar 3D clustering algorithm and the camera 2D detection fusion algorithm in the image domain.

[0053] Target detection schemes based on LiDAR point clouds include traditional clustering methods and deep learning detection methods, but each of these methods has its own advantages and disadvantages.

[0054] (1) Traditional clustering algorithms can detect without missing any targets, but they are prone to target splitting (the same target will split into multiple targets), which will affect subsequent vehicle control.

[0055] (2) If a deep learning algorithm is used, the target splitting will basically not occur, but the target may be missed.

[0056] In the exemplary embodiments of this application, to ensure no missed detections, the scheme of this application still uses clustering methods for target detection, such as... Figure 3 As shown.

[0057] In an exemplary embodiment of this application, obtaining the first 3D bounding box and the first 2D bounding box of the LiDAR point cloud may include:

[0058] Based on the lidar point cloud, the first 3D bounding box is obtained using a preset clustering algorithm;

[0059] Convert the first 3D frame into a first 2D frame.

[0060] In the exemplary embodiments of this application, the clustering algorithm can be any existing and implementable clustering algorithm, and no specific algorithm is limited.

[0061] In an exemplary embodiment of this application, converting the first 3D frame into a first 2D frame may include:

[0062] Obtain extrinsic and intrinsic parameters; the extrinsic parameters include a first matrix corresponding to the transformation relationship between the camera coordinate system and the lidar coordinate system, and the intrinsic parameters include a second matrix corresponding to the camera's internal parameters;

[0063] Each of the multiple corners corresponding to each of the first 3D frames is transformed to the camera coordinate system using the first matrix;

[0064] The second matrix converts each corner, after being transformed to the camera coordinate system, into pixel coordinate values.

[0065] The largest bounding rectangle is obtained based on the pixel coordinates of multiple corners and used as the first 2D bounding box.

[0066] In an exemplary embodiment of this application, any one of the multiple first 3D frames can be used as an example for illustration. For instance, the first 3D frame corresponds to 8 corner points. Each corner point is transformed to the camera coordinate system through a first matrix (which can be an RT matrix, i.e., a pose transformation matrix), and then the corner point is transformed to pixel coordinate values ​​through a second matrix (which can be a P matrix, i.e., a square matrix with all principal minors being positive). Based on the pixel coordinate values ​​of the 8 corner points, the maximum outer bounding box of the 8 corner points is obtained, and thus the first 2D frame corresponding to the first 3D frame is obtained.

[0067] In an exemplary embodiment of this application, a first 3D bounding box, a first 2D bounding box, and a first sampling time T1 of the lidar point cloud can be output.

[0068] In an exemplary embodiment of this application, point cloud clustering is used to generate the first 3D bounding box. Due to the inherent disadvantages of clustering, a single target can easily split into multiple target boxes, thus affecting subsequent autonomous driving control because the target is unstable. For example, at time t1, there may be one target box, at time t2, it may split into two target boxes, and at time t3, it may be one target box, which has a significant impact on planning and control. The following solution in the embodiments of this application can solve the problem of target box splitting.

[0069] In an exemplary embodiment of this application, in order to solve the problem of target fragmentation, this application proposes a novel image and point cloud fusion scheme to address the target fragmentation issue.

[0070] In exemplary embodiments of this application, as Figure 4 As shown, the high-precision point cloud 3D bounding box and the image 2D bounding box are first synchronized in time. That is, the third 2D bounding box of the image is obtained based on the first 3D bounding box of the LiDAR point cloud at the first sampling time T1. This can be achieved through the following scheme.

[0071] In an exemplary embodiment of this application, the second 2D bounding box for acquiring the image and the third 2D bounding box obtained by tracking the second 2D bounding box may include:

[0072] The original data of the image and the second sampling time T2 of the original data are obtained, and the second 2D bounding box is obtained based on the original data and a preset image target detection algorithm;

[0073] The second 2D bounding box is tracked using a preset tracking algorithm to obtain the trajectory corresponding to the second 2D bounding box;

[0074] At the second sampling time T2, the trajectory is predicted to be located at the first sampling time T1, and the detection box of the image at the first sampling time T1 is obtained as the third 2D box.

[0075] In an exemplary embodiment of this application, the original data of the image, the image sampling time (i.e., the second sampling time T2), and the point cloud sampling time (i.e., the first sampling time T1) can be obtained; and a second 2D bounding box can be obtained based on the original data and the image target detection algorithm.

[0076] In exemplary embodiments of this application, the image target detection algorithm may include, but is not limited to, SSD (SingleShot MultiBox Detector) and YOLO (You Only Look Once) series algorithms; the image target detection algorithm may also use any existing target detection algorithm that can be implemented, and no specific algorithm is limited.

[0077] In an exemplary embodiment of this application, a tracking algorithm is performed on the obtained second 2D bounding box to obtain the trajectory corresponding to the second 2D bounding box.

[0078] In exemplary embodiments of this application, the tracking algorithm may include, but is not limited to, DeepSORT (DeepSimple Online And Realtime Tracking) and SORT (Simple Online And Realtime Tracking) algorithms.

[0079] In an exemplary embodiment of this application, the trajectory can be predicted at time T2, and the second 2D detection box of the trajectory at time T1 can be used as the third 2D box.

[0080] In an exemplary embodiment of this application, the point cloud split boxes can be aggregated based on the image detection box (third 2D box), such as... Figure 5 As shown. Before proceeding, the split box (i.e., the split target) within the first 3D bounding box can be determined first.

[0081] In exemplary embodiments of this application, as Figure 6 As shown, determining whether multiple first 3D boxes are multiple targets split from the same target based on the third 2D bounding box may include steps S201-S203:

[0082] S201. Calculate whether there is overlap between the multiple third 2D boxes. When there is overlap between the multiple third 2D boxes, calculate the overlapping area and mark the third 2D box with the overlapping area as an overlapping 2D box.

[0083] In an exemplary embodiment of this application, a first 3D bounding box output by the lidar point cloud at time T1, a first 2D bounding box output by the lidar point cloud at time T1, and a third 2D bounding box output by the image at time T1 can be acquired.

[0084] In an exemplary embodiment of this application, for the third 2D bounding boxes output at time T1, it can first be calculated whether these third 2D bounding boxes overlap (since these boxes are rectangular boxes with rotation angles, the overlap area between these boxes is small). If there is an overlap area, the overlap area of ​​these third 2D bounding boxes can be recorded and added to the storage queue, and the third 2D bounding box with the overlap area can be marked as an overlapping 2D bounding box; if there is no overlap, it can be recorded as 0.

[0085] S202. Calculate the overlap ratio of each of the first 2D boxes in each overlapping 2D box.

[0086] In an exemplary embodiment of this application, calculating the overlap ratio of each of the first 2D boxes in each overlapping 2D box may include:

[0087] For each overlapping 2D frame, calculate the first region of each first 2D frame that is within the overlapping 2D frame but not within the overlapping region of the overlapping 2D frame.

[0088] The area of ​​the first region and the area of ​​the first 2D frame are pre-calculated, and the calculation result is used as the overlap ratio.

[0089] In an exemplary embodiment of this application, for each box in the overlapping 2D box and the corresponding overlapping region, the region Area1 (i.e., the first region) of each point cloud 2D box (i.e., the first 2D box) that is in the overlapping 2D box but not in the overlapping region can be calculated, and the region of the point cloud 2D box (i.e., the first 2D box) can be regarded as Area2.

[0090] In an exemplary embodiment of this application, the preset calculation may include: calculating the ratio of the area of ​​the first region to the area of ​​the first 2D frame.

[0091] In an exemplary embodiment of this application, the area ratio of Area1 to Area2 is calculated, and the area ratio of Area1 to Area2 can be used as the overlap ratio.

[0092] In an exemplary embodiment of this application, determining whether the plurality of first 3D bounding boxes are multiple targets split from the same target based on the magnitude of the overlap ratio may include:

[0093] When the overlap ratio is greater than a preset threshold, the first 2D frame is stored.

[0094] If there are multiple first 2D boxes stored for each overlapping 2D box, then the first 3D boxes corresponding to these multiple first 2D boxes are treated as the same target and split into multiple targets.

[0095] In an exemplary embodiment of this application, when the overlap ratio is greater than a preset threshold, the first 2D frame can be stored in a preset queue. When the length of the preset queue is greater than or equal to 2, the first 3D frames corresponding to all the first 2D frames in the queue can be split into multiple targets as the same target.

[0096] In an exemplary embodiment of this application, determining whether the plurality of first 3D bounding boxes are multiple targets split from the same target based on the magnitude of the overlap ratio may include:

[0097] When the overlap ratio is less than or equal to a preset threshold, the first 3D box corresponding to the first 2D box is treated as a single target.

[0098] In an exemplary embodiment of this application, the overlap ratio can be expressed using IOU (Intersection over Union).

[0099] S203. Determine whether the multiple first 3D boxes are multiple targets split from the same target based on the size of the overlap ratio.

[0100] In an exemplary embodiment of this application, if for each overlapping 2D box there are multiple first 2D boxes that overlap with it, the original first 3D boxes corresponding to these first 2D boxes that all overlap with a single overlapping 2D box are determined as the same target. This target eventually splits into multiple targets, and the multiple targets that were originally a single target can be output.

[0101] In an exemplary embodiment of this application, multiple targets split from a single output target can be merged into one target, and the largest bounding rectangle can be used to merge these multiple split targets into one target. Finally, multiple split targets can be deleted.

[0102] The exemplary embodiments of this application include at least the following advantages:

[0103] 1. Excellent temporal synchronization, enabling point clouds and images to be fused and matched at the same point in time. It can unify the 3D bounding boxes and 2D bounding boxes detected by point clouds into the same spatiotemporal space for matching.

[0104] 2. It can make full use of the advantages of point clouds and images to fuse them into the optimal target detection algorithm.

[0105] 3. By using an IOU (Intersection over Union) to match 3D and 2D boxes, the split 3D boxes can be reassembled into a single box to a great extent.

[0106] This application provides a target fusion device 1 based on point cloud and image, such as... Figure 7 As shown, it may include a processor 11 and a computer-readable storage medium 12, wherein the computer-readable storage medium 12 stores instructions that, when executed by the processor 11, implement the target fusion method based on point cloud and image.

[0107] In the exemplary embodiments of this application, any of the aforementioned target fusion methods based on point clouds and images can be applied to this device embodiment, and will not be described in detail here.

[0108] It will be understood by those skilled in the art that all or some of the steps, systems, or apparatuses disclosed above, and their functional modules / units, can be implemented as software, firmware, hardware, or suitable combinations thereof. In hardware implementations, the division between functional modules / units mentioned above does not necessarily correspond to the division of physical components; for example, a physical component may have multiple functions, or a function or step may be performed collaboratively by several physical components. Some or all components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application-specific integrated circuit (ASIC). Such software may be distributed on a computer-readable medium, which may include computer storage media (or non-transitory media) and communication media (or transient media). As is known to those skilled in the art, the term computer storage media includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storing information (such as computer-readable instructions, data structures, program modules, or other data). Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, digital versatile disc (DVD) or other optical disc storage, magnetic cartridges, magnetic tape, disk storage or other magnetic storage devices, or any other medium that can be used to store desired information and can be accessed by a computer. Furthermore, it is well known to those skilled in the art that communication media typically contain computer-readable instructions, data structures, program modules, or other data in modulated data signals such as carrier waves or other transmission mechanisms, and may include any information delivery medium.

Claims

1. A point cloud and image based object fusion method, characterized in that, The method includes: Acquire the first sampling time, the first 3D bounding box, and the first 2D bounding box of the LiDAR point cloud; Acquire a second 2D bounding box of the image and a third 2D bounding box obtained by tracking the second 2D bounding box; Based on the third 2D bounding box, determine whether multiple first 3D bounding boxes are multiple targets split from the same target; When multiple first 3D bounding boxes are multiple targets split from the same target, the maximum bounding rectangle is used to merge the multiple targets into one target; The second 2D bounding box for acquiring the image and the third 2D bounding box obtained by tracking the second 2D bounding box include: The original data of the image and the second sampling time of the original data are obtained, and the second 2D bounding box is obtained based on the original data and a preset image target detection algorithm; The second 2D bounding box is tracked using a preset tracking algorithm to obtain the trajectory corresponding to the second 2D bounding box; At the second sampling time, the trajectory is predicted to be at the position of the trajectory at the first sampling time, and the detection box of the image at the first sampling time is obtained as the third 2D box; The step of determining whether multiple first 3D boxes are multiple targets split from the same target based on the third 2D bounding box includes: Calculate whether there is overlap between multiple third 2D boxes. When there is overlap between multiple third 2D boxes, calculate the overlapping area and mark the third 2D boxes with the overlapping area as overlapping 2D boxes. Calculating the overlap ratio of each first 2D frame in each overlapping 2D frame includes: for each overlapping 2D frame, calculating the first region of each first 2D frame that is within the overlapping 2D frame but not within the overlapping area of ​​the overlapping 2D frame; performing a preset calculation on the first region and the area of ​​the first 2D frame, and using the calculation result as the overlap ratio; When the overlap ratio is greater than a preset threshold, the first 2D frame is stored. If there are multiple first 2D boxes stored for each overlapping 2D box, then the first 3D boxes corresponding to these multiple first 2D boxes are treated as the same target and split into multiple targets.

2. The target fusion method based on point cloud and image according to claim 1, characterized in that, The step of determining whether multiple first 3D bounding boxes are multiple targets split from the same target based on the overlap ratio includes: When the overlap ratio is less than or equal to a preset threshold, the first 3D box corresponding to the first 2D box is treated as a single target.

3. The target fusion method based on point cloud and image according to claim 1 or 2, characterized in that, The step of performing a preset calculation on the areas of the first region and the first 2D frame, and using the calculation result as the overlap ratio, includes: Calculate the ratio of the area of ​​the first region to the area of ​​the first 2D frame, and use the ratio of the areas as the overlap ratio.

4. The target fusion method based on point cloud and image according to claim 1, characterized in that, Obtain the first 3D bounding box and the first 2D bounding box of the LiDAR point cloud, including: Based on the lidar point cloud, the first 3D bounding box is obtained using a preset clustering algorithm; Convert the first 3D frame into a first 2D frame.

5. The target fusion method based on point cloud and image according to claim 4, characterized in that, The step of converting the first 3D frame into a first 2D frame includes: Obtain extrinsic and intrinsic parameters; the extrinsic parameters include a first matrix corresponding to the transformation relationship between the camera coordinate system and the lidar coordinate system, and the intrinsic parameters include a second matrix corresponding to the camera's internal parameters; Each of the multiple corners corresponding to each of the first 3D frames is transformed to the camera coordinate system using the first matrix; The second matrix converts each corner, after being transformed to the camera coordinate system, into pixel coordinate values. The largest bounding rectangle is obtained based on the pixel coordinates of multiple corners and used as the first 2D bounding box.

6. A target fusion apparatus based on point cloud and image, comprising a processor and a computer-readable storage medium, wherein the computer-readable storage medium stores instructions, characterized in that, When the instructions are executed by the processor, the target fusion method based on point cloud and image as described in any one of claims 1-5 is implemented.