4d millimeter wave radar and visible light camera data fusion method, system and device

CN121616474BActive Publication Date: 2026-06-19HARBIN ENGINEERING UNIVERSITY SANYA NANHAI INNOVATION & DEVELOPMENT BASE +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
HARBIN ENGINEERING UNIVERSITY SANYA NANHAI INNOVATION & DEVELOPMENT BASE
Filing Date
2026-02-03
Publication Date
2026-06-19

Smart Images

  • Figure CN121616474B_ABST
    Figure CN121616474B_ABST
Patent Text Reader

Abstract

This invention provides a method, system, and device for data fusion between 4D millimeter-wave radar and a visible light camera, relating to the field of millimeter-wave radar technology. The method includes: acquiring point cloud data from a 4D millimeter-wave radar and image data from a visible light camera; performing spatial analysis on the point cloud data and image data respectively to obtain the image vertex coordinates in the pixel coordinate system of the image data and the radar point spatial coordinates in the radar coordinate system of the point cloud data; performing data alignment processing based on the image vertex coordinates and radar point spatial coordinates using the Pntri algorithm to obtain a transformation matrix between the 4D millimeter-wave radar and the visible light camera; performing coordinate transformation on the point cloud data according to the transformation matrix to obtain projected point cloud data; and performing time alignment on the projected point cloud data and image data using a weighted fusion algorithm to obtain fused data from the 4D millimeter-wave radar and the visible light camera. This invention improves the accuracy of data fusion.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of millimeter-wave radar technology, and more specifically, to a method, system, and device for data fusion of 4D millimeter-wave radar and visible light camera. Background Technology

[0002] In scenarios such as surface navigation systems, target detection relies on the collaborative work of multiple sensors. While visible light cameras can provide target details, their detection range is limited and they are susceptible to interference from environmental factors such as water mist and light spots. 4D millimeter-wave radar has a long detection range and strong anti-interference capabilities, but it cannot identify target categories. Therefore, it is necessary to fuse data from both types of sensors to balance environmental adaptability and detection accuracy. Since the coordinate systems of 4D millimeter-wave radar and visible light cameras are independent, the (Perspective-n-Point-PnP) algorithm is typically used for extrinsic parameter calibration to achieve coordinate transformation and thus enable sensor fusion.

[0003] In related technologies, the PnP algorithm uses known 3D spatial points and their 2D projection points in an image to infer the camera pose. However, millimeter-wave radar point clouds are sparse. If the center of the triangle of the radar calibration block is used as the corresponding 2D point for mapping the projection points, it will lead to a large error in the PnP algorithm, thus restricting the practical application of sensor data fusion. Summary of the Invention

[0004] The problem addressed by this invention is how to improve the data fusion accuracy between millimeter-wave radar and visible light cameras.

[0005] To address the aforementioned problems, this invention provides a method, system, and device for fusing data from 4D millimeter-wave radar and visible light camera.

[0006] In a first aspect, the 4D millimeter-wave radar and visible light camera data fusion method of the present invention includes:

[0007] Acquire point cloud data from 4D millimeter-wave radar and image data from a visible light camera;

[0008] Spatial analysis is performed on the point cloud data and the image data respectively to obtain the image vertex coordinates of the image data in the pixel coordinate system and the radar point spatial coordinates of the point cloud data in the radar coordinate system.

[0009] Using the Pntri algorithm, data alignment is performed based on the image vertex coordinates and the radar point spatial coordinates to obtain the transformation matrix between the 4D millimeter-wave radar and the visible light camera;

[0010] Based on the transformation matrix, the point cloud data is subjected to coordinate transformation to obtain projected point cloud data;

[0011] By using a weighted fusion algorithm, the projected point cloud data and the image data are time-aligned to obtain the fused data of the 4D millimeter-wave radar and the visible light camera.

[0012] Optionally, the step of performing spatial analysis on the point cloud data and the image data respectively to obtain the image vertex coordinates of the image data in the pixel coordinate system and the radar point spatial coordinates of the point cloud data in the radar coordinate system includes:

[0013] Based on preset camera intrinsic parameters and preset distortion coefficients, the image data is subjected to distortion correction processing to obtain distorted image data;

[0014] A preset image region is extracted from the distortion-free image data to obtain multiple vertices; wherein, the preset image region is the two-dimensional region corresponding to the radar corner reflector in the image data acquired by the visible light camera, and the vertices are the contour vertices of the radar corner reflector in the distortion-free image data;

[0015] The coordinate data of the vertex in the pixel coordinate system are used as the image vertex coordinates;

[0016] Intensity filtering is applied to the point cloud data to obtain the spatial coordinates of the radar point in the radar coordinate system.

[0017] Optionally, the step of performing data alignment processing based on the image vertex coordinates and the radar point spatial coordinates using the Pntri algorithm to obtain the transformation matrix between the 4D millimeter-wave radar and the visible light camera includes:

[0018] The image vertex coordinates are converted into two-dimensional reference points in the camera coordinate system corresponding to the visible light camera.

[0019] Multiple feature points are selected from the two-dimensional reference points within the preset image area;

[0020] The transformation matrix is ​​obtained by randomly matching the spatial coordinates of the radar point and all the feature points using the Pntri algorithm.

[0021] Optionally, the step of obtaining the transformation matrix by randomly matching the spatial coordinates of the radar point and all the feature points using the Pntri algorithm includes:

[0022] The spatial coordinates of the radar points are used as three-dimensional spatial points, and the Pntri algorithm is used to establish a matching association between the three-dimensional spatial points and the feature points to obtain a three-dimensional-two-dimensional mapping relationship.

[0023] From all the feature points, feature points corresponding to the number of spatial coordinates of the radar points are randomly selected to form a feature point combination;

[0024] For the feature point combination, based on the three-dimensional to two-dimensional mapping relationship and combined with the preset distortion coefficients in the camera, the initial transformation matrix corresponding to the feature point combination is determined;

[0025] The initial transformation matrix of the feature point combination is iteratively optimized using a weighted loss function to obtain the transformation matrix.

[0026] Optionally, the step of iteratively optimizing the initial transformation matrix of the feature point combination using a weighted loss function to obtain the transformation matrix includes:

[0027] The three-dimensional spatial points are projected onto the camera coordinate system using the initial transformation matrix to obtain projected two-dimensional points.

[0028] Obtain the perpendicular distance from the projected two-dimensional point to the three sides of the triangle of the radar corner reflector in the preset image area;

[0029] The total loss of the initial transformation matrix is ​​obtained by using a weighted loss function based on the vertical distance;

[0030] Based on the total loss and the preset target loss, the rotation matrix and translation matrix in the initial transformation matrix are iteratively updated to obtain the transformation matrix.

[0031] Optionally, the transformation matrix includes a rotation matrix and a translation matrix; the step of performing coordinate transformation on the point cloud data according to the transformation matrix to obtain projected point cloud data includes:

[0032] Based on the spatial coordinates of each radar point in the radar coordinate system according to the point cloud data, attitude correction is performed by combining the rotation matrix and position compensation is performed by combining the translation matrix, thereby determining the three-dimensional coordinates of the radar point spatial coordinates in the camera coordinate system.

[0033] A set of three-dimensional points is generated based on the three-dimensional coordinates in the camera coordinate system corresponding to the spatial coordinates of the radar points.

[0034] The set of three-dimensional points is used as the projected point cloud data.

[0035] Optionally, the step of performing time alignment on the projected point cloud data and the image data using a weighted fusion algorithm to obtain fused data from the 4D millimeter-wave radar and the visible light camera includes:

[0036] Based on the acquisition time of the 4D millimeter-wave radar and the visible light camera, the frame rates corresponding to the 4D millimeter-wave radar and the visible light camera are determined respectively.

[0037] Based on the frame rates corresponding to the 4D millimeter-wave radar and the visible light camera, the projected point cloud data and the image data are divided into main frames and sub-frames.

[0038] Select the two sub-frames closest to the main frame, and fuse the two sub-frames using a weighted fusion algorithm to obtain the fused sub-frame;

[0039] The data pair is formed based on the fused subframe and the main frame;

[0040] The data pairs are fused to obtain fused data from the 4D millimeter-wave radar and the visible light camera.

[0041] Optionally, the step of dividing the projected point cloud data and the image data into main frames and subframes according to the frame rates corresponding to the 4D millimeter-wave radar and the visible light camera respectively includes:

[0042] If the frame rate of the 4D millimeter-wave radar is greater than the frame rate of the visible light camera, then the projected point cloud data is determined to be the main frame and the image data is the sub-frame.

[0043] If the frame rate of the 4D millimeter-wave radar is less than the frame rate of the visible light camera, then the image data is determined to be the main frame and the projected point cloud data is the sub-frame.

[0044] Secondly, the 4D millimeter-wave radar and visible light camera data fusion system of the present invention includes:

[0045] The data acquisition module is used to acquire point cloud data from the 4D millimeter-wave radar and image data from the visible light camera.

[0046] The data alignment module is used to perform spatial analysis on the point cloud data and the image data respectively to obtain the image vertex coordinates of the image data in the pixel coordinate system and the radar point spatial coordinates of the point cloud data in the radar coordinate system; and to perform data alignment processing based on the image vertex coordinates and radar point spatial coordinates using the Pntri algorithm to obtain the transformation matrix between the 4D millimeter-wave radar and the visible light camera.

[0047] The time alignment module is used to perform coordinate transformation on the point cloud data according to the transformation matrix to obtain projected point cloud data; and to perform time alignment on the projected point cloud data and the image data through a weighted fusion algorithm to obtain fused data of the 4D millimeter-wave radar and the visible light camera.

[0048] Thirdly, the electronic device of the present invention includes a memory and a processor;

[0049] The memory is used to store computer programs;

[0050] The processor is configured to implement the 4D millimeter-wave radar and visible light camera data fusion method as described above when executing the computer program.

[0051] The present invention discloses a 4D millimeter-wave radar and visible light camera data fusion method, system, and device. First, it acquires point cloud data from the 4D millimeter-wave radar and image data from the visible light camera. Then, it performs spatial analysis on the point cloud data and image data respectively, obtaining the image vertex coordinates in the pixel coordinate system of the image data and the radar point spatial coordinates in the radar coordinate system of the point cloud data. This ensures the accuracy and reliability of the data and lays the foundation for subsequent alignment processing. Next, the Pntri algorithm is used to perform data alignment processing based on the image vertex coordinates and radar point spatial coordinates, obtaining the transformation matrix between the 4D millimeter-wave radar and the visible light camera. The Pntri algorithm addresses the sparseness of the radar point cloud, avoiding the errors caused by using the triangle center of the radar calibration block as the two-dimensional corresponding point in traditional methods. The Pntri algorithm improves the calculation accuracy of the transformation matrix through the mapping between three-dimensional and two-dimensional data, thus providing a more accurate coordinate transformation basis for sensor data fusion. After obtaining the transformation matrix, the point cloud data is further transformed according to the transformation matrix to obtain projected point cloud data, ensuring spatial alignment of the radar data and camera data, enabling subsequent processing of data from both sensors in the same coordinate system. Finally, the projected point cloud data and image data are time-aligned using a weighted fusion algorithm to obtain fused data from the 4D millimeter-wave radar and visible light camera. Since the weighted fusion algorithm considers the temporal characteristics of the data, it further improves the accuracy and reliability of the fused data. In summary, this invention, through the combination of the Pntri algorithm and the weighted fusion algorithm, not only improves the data fusion accuracy between the 4D millimeter-wave radar and the visible light camera, but also enhances the robustness and reliability of sensor data fusion. This significantly improves target detection performance in complex environments such as surface navigation systems, providing important support for the technological development in related fields. Attached Figure Description

[0052] Figure 1 This is a flowchart illustrating a method for fusing 4D millimeter-wave radar and visible light camera data in one embodiment of the present invention.

[0053] Figure 2 This is a schematic diagram of data fusion processing in one embodiment of the present invention;

[0054] Figure 3This is a data acquisition flowchart of a 4D millimeter-wave radar and a visible light camera in one embodiment of the present invention;

[0055] Figure 4 This is a flowchart illustrating the data alignment process in one embodiment of the present invention;

[0056] Figure 5 This is a diagram showing the radar point selection result in another embodiment of the present invention;

[0057] Figure 6 This is a schematic diagram of the structure of a 4D millimeter-wave radar and visible light camera data fusion system in another embodiment of the present invention. Detailed Implementation

[0058] To make the above-mentioned objects, features, and advantages of the present invention more apparent and understandable, specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings. Although some embodiments of the present invention are shown in the drawings, it should be understood that the present invention can be implemented in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided to provide a more thorough and complete understanding of the present invention. It should be understood that the accompanying drawings and embodiments of the present invention are for illustrative purposes only and are not intended to limit the scope of protection of the present invention.

[0059] It should be understood that the various steps described in the method embodiments of the present invention may be performed in different orders and / or in parallel. Furthermore, the method embodiments may include additional steps and / or omit the steps shown. The scope of the present invention is not limited in this respect.

[0060] The term "comprising" and its variations as used herein are open-ended, meaning "including but not limited to"; the term "based on" means "at least partially based on"; the term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments"; and the term "optionally" means "optional embodiments". Definitions of other terms will be given in the following description. It should be noted that the concepts of "first," "second," etc., mentioned in this invention are used only to distinguish different devices, modules, or units, and are not intended to limit the order of functions performed by these devices, modules, or units or their interdependencies.

[0061] It should be noted that the terms "a" and "a plurality of" used in this invention are illustrative rather than restrictive. Those skilled in the art should understand that, unless otherwise expressly indicated in the context, they should be understood as "one or more".

[0062] The names of the messages or information exchanged between the multiple devices in the embodiments of the present invention are for illustrative purposes only and are not intended to limit the scope of these messages or information.

[0063] Combination Figure 1 As shown in the figure, an embodiment of the present invention provides a method for fusing data from a 4D millimeter-wave radar and a visible light camera, comprising:

[0064] Acquire point cloud data from a 4D millimeter-wave radar and image data from a visible light camera.

[0065] Specifically, in a multi-sensor fusion system, 4D millimeter-wave radar and a visible light camera serve as two important sensors, each providing different information. The 4D millimeter-wave radar provides information on the target's range, velocity, angle, and radial velocity, while the visible light camera provides high-resolution visual information, including the target's shape, texture, and color. By simultaneously acquiring these two types of data, temporal consistency between the radar data and image data can be ensured during subsequent processing, providing a prerequisite for subsequent spatial and temporal alignment. The coordinate systems of the 4D millimeter-wave radar and the camera are independent. To achieve sensor fusion, the transformation matrix between them needs to be determined. During data acquisition, a common field of view is required, and N sets of data (N>6) must be acquired.

[0066] In a preferred embodiment of the present invention, combined with Figure 3 As shown, the data acquisition process is as follows:

[0067] The relative positions of the fixed camera and the 4D millimeter-wave radar are ensured to remain unchanged and have a common area during subsequent processes; a radar corner reflector is placed in the common area.

[0068] The acquisition module obtains RGB image data from the camera and point cloud data from the radar, and publishes it to RVIZ2 for display via ROS2. In RVIZ2, the point cloud points corresponding to the radar corner reflectors are identified, and these radar corner reflectors are also present in the RGB image data.

[0069] Run the data saving script to save several sets of data at the current time frame.

[0070] Assuming the radar corner reflector is not positioned in a straight line in the image, adjust its position and repeat the above steps until N sets of data are collected.

[0071] The process involves acquiring RGB image data from the camera and point cloud data from the radar, which are then published to RVIZ2 for display via ROS2. The point cloud points corresponding to the radar corner reflectors are identified in RVIZ2, and these radar corner reflectors are also present in the RGB image data. The 4D millimeter-wave radar node collects data returned by the radar and converts it into the required point cloud format, publishing the converted point cloud data via ROS2. Similarly, the visible light camera node collects image data returned by the camera and converts it into the required image format, publishing the converted image data via ROS2. The data saving script is a ROS2-based data storage node that uses multi-threading to receive radar point cloud data and image data published by the acquisition module, saving them in corresponding folders. Millimeter-wave radar point cloud data is saved in CSV format, and image data is saved in JPG format, with the filename being the system time at the time of acquisition.

[0072] Spatial analysis is performed on the point cloud data and the image data respectively to obtain the image vertex coordinates of the image data in the pixel coordinate system and the radar point spatial coordinates of the point cloud data in the radar coordinate system.

[0073] Specifically, spatial resolution converts the raw sensor data from 4D millimeter-wave radar and visible light cameras into a format suitable for subsequent processing. Typically, corner reflectors are used as acquisition devices, and the radar points corresponding to these reflectors are determined. Specifically, for visible light cameras, feature points, such as the contour vertices of the radar corner reflectors, need to be extracted from the image. These vertices are represented in a pixel coordinate system, which is two-dimensional. For 4D millimeter-wave radar, the spatial coordinates of the radar points need to be extracted from the point cloud data. These coordinates are represented in a radar coordinate system, which is three-dimensional. Spatial resolution converts the data from both sensors into a unified format, providing a foundation for subsequent data alignment.

[0074] The transformation matrix between the 4D millimeter-wave radar and the visible light camera is obtained by performing data alignment processing based on the image vertex coordinates and the radar point spatial coordinates using the Pntri algorithm.

[0075] Specifically, the Pntri algorithm is an improved PnP algorithm specifically designed to handle the problem of sparse radar point clouds. Traditional PnP algorithms, when processing radar point clouds, suffer from significant errors when using the center of the triangle of the radar calibration block as the corresponding 2D point due to the sparse nature of the radar point cloud. The Pntri algorithm improves alignment accuracy by matching 3D spatial points with 2D regions (such as triangular regions) in the image. This transformation matrix ensures that the spatial coordinates of the radar points are spatially matched with the image vertex coordinates, rather than using point-to-point matching as in existing technologies. The Pntri algorithm yields a transformation matrix between the 4D millimeter-wave radar and the visible light camera. This transformation matrix converts radar data from the radar coordinate system to the camera coordinate system, achieving spatial alignment. Furthermore, by using transformation rules—the transformation matrix—that eliminate the differences between the coordinate systems of the two sensors, the same physical target represented by both types of coordinates, such as a corner reflector, is spatially unified, providing a consistent coordinate reference for subsequent data fusion. Figure 2 As shown, data alignment can be performed using the data alignment module to obtain the transformation matrix, namely the rotation matrix and the translation matrix.

[0076] Based on the transformation matrix, the point cloud data is subjected to coordinate transformation to obtain projected point cloud data.

[0077] Specifically, radar point cloud data is transformed from the radar coordinate system to the camera coordinate system. Through a transformation matrix, the radar point cloud data can be projected onto the camera coordinate system to obtain projected point cloud data. This ensures spatial alignment between the radar data and the image data, allowing subsequent processing of data from both sensors within the same coordinate system. The accuracy of the coordinate transformation directly affects the quality of the fused data; therefore, it is crucial to ensure the accuracy and stability of the transformation matrix.

[0078] By using a weighted fusion algorithm, the projected point cloud data and the image data are time-aligned to obtain the fused data of the 4D millimeter-wave radar and the visible light camera.

[0079] Specifically, time alignment aims to ensure temporal consistency between radar data and image data. Since radar and camera frame rates differ, a weighted fusion algorithm is needed to time-align the projected point cloud data and image data. This algorithm calculates temporal weights to fuse the image data, generating an image frame time-aligned with the main frame. This process ensures temporal synchronization between radar and image data, enabling the fused data to accurately reflect scene information at the same moment. Because the accuracy of time alignment directly affects the reliability and effectiveness of the fused data, a reasonable weight calculation method and fusion strategy are required. Figure 2As shown, the point cloud data is first transformed using a time alignment module in conjunction with a rotation and translation matrix, and then time alignment is performed to obtain fused data from the 4D millimeter-wave radar and visible light camera. Finally, the data can be stored and displayed using a storage module and a display module, respectively.

[0080] The 4D millimeter-wave radar and visible light camera data fusion method of this invention first acquires point cloud data from the 4D millimeter-wave radar and image data from the visible light camera. Then, spatial analysis is performed on the point cloud data and image data respectively to obtain the image vertex coordinates in the pixel coordinate system of the image data and the radar point spatial coordinates in the radar coordinate system of the point cloud data. This ensures the accuracy and reliability of the data and lays the foundation for subsequent alignment processing. Next, the Pntri algorithm is used to perform data alignment processing based on the image vertex coordinates and radar point spatial coordinates to obtain the transformation matrix between the 4D millimeter-wave radar and the visible light camera. The Pntri algorithm addresses the sparse radar point cloud problem, avoiding the errors caused by using the triangle center of the radar calibration block as the two-dimensional corresponding point in traditional methods. The Pntri algorithm improves the calculation accuracy of the transformation matrix through the mapping between three-dimensional and two-dimensional data, thus providing a more accurate coordinate transformation basis for sensor data fusion. After obtaining the transformation matrix, the point cloud data is further transformed according to the transformation matrix to obtain projected point cloud data, ensuring spatial alignment of the radar data and camera data, enabling subsequent processing of data from both sensors in the same coordinate system. Finally, the projected point cloud data and image data are time-aligned using a weighted fusion algorithm to obtain fused data from the 4D millimeter-wave radar and visible light camera. Since the weighted fusion algorithm considers the temporal characteristics of the data, it further improves the accuracy and reliability of the fused data. In summary, this invention, through the combination of the Pntri algorithm and the weighted fusion algorithm, not only improves the data fusion accuracy between the 4D millimeter-wave radar and the visible light camera, but also enhances the robustness and reliability of sensor data fusion. This significantly improves target detection performance in complex environments such as surface navigation systems, providing important support for the technological development in related fields.

[0081] Optionally, the step of performing spatial analysis on the point cloud data and the image data respectively to obtain the image vertex coordinates of the image data in the pixel coordinate system and the radar point spatial coordinates of the point cloud data in the radar coordinate system includes:

[0082] Based on preset camera intrinsic parameters and preset distortion coefficients, the image data is subjected to distortion correction processing to obtain distorted image data;

[0083] A preset image region is extracted from the distortion-free image data to obtain multiple vertices; wherein, the preset image region is the two-dimensional region corresponding to the radar corner reflector in the image data acquired by the visible light camera, and the vertices are the contour vertices of the radar corner reflector in the distortion-free image data;

[0084] The coordinate data of the vertex in the pixel coordinate system are used as the image vertex coordinates;

[0085] Intensity filtering is applied to the point cloud data to obtain the spatial coordinates of the radar point in the radar coordinate system.

[0086] Specifically, firstly, since visible light camera images suffer from optical distortion, distortion correction of image data based on preset camera intrinsic parameters and preset distortion coefficients is a crucial prerequisite for data preprocessing. Failure to correct distortion will lead to deviations in subsequent image vertex coordinate extraction. Therefore, by calling camera intrinsic parameters (focal length, principal point coordinates, etc.) and distortion coefficients, the image is restored to an ideal, distortion-free imaging state, providing an accurate image carrier for feature extraction. Secondly, the contour vertices of a preset image region are extracted, and the image vertex coordinates are determined. The preset image region is defined as the two-dimensional region corresponding to the radar corner reflector. In the distortion-corrected image, the radar corner reflector appears as a clearly defined triangular region, and its contour vertices are precisely located feature points. Furthermore, the vertex coordinates are recorded based on the pixel coordinate system (with the top left corner of the image as the origin), avoiding the feature blurring problem caused by random environmental points in traditional methods. Finally, since the corner reflector has a strong signal, intensity filtering is performed on the radar point cloud data. Intensity filtering can remove environmental interference points and retain the spatial coordinates of the radar points corresponding to the corner reflectors. The coordinate reference is limited to the radar coordinate system, which provides a precise feature correspondence between radar 3D points and image 2D points for the subsequent Pntri algorithm to solve the transformation matrix.

[0087] In a preferred embodiment of the present invention, combined with Figure 2 As shown, while removing distortion from the camera data, the spatial coordinates of the three vertices in the pixel coordinate system corresponding to the corner reflector and the radar point in the radar point cloud corresponding to the radar corner reflector are obtained.

[0088] In this embodiment of the invention, optical errors are eliminated through image distortion correction, which improves the accuracy of the extracted corner reflector contour vertex coordinates and avoids subsequent Pntri algorithm mapping errors caused by vertex deviation. The radar point spatial coordinates filtered by intensity filtering retain only the valid points corresponding to the corner reflectors, eliminating environmental noise interference and significantly improving the signal-to-noise ratio of the radar point cloud data. The combination of the two can construct a feature correspondence between high-precision image vertex coordinates and high signal-to-noise ratio radar point spatial coordinates, directly reducing the transformation matrix solution error.

[0089] Optionally, the step of performing data alignment processing based on the image vertex coordinates and the radar point spatial coordinates using the Pntri algorithm to obtain the transformation matrix between the 4D millimeter-wave radar and the visible light camera includes:

[0090] The image vertex coordinates are converted into two-dimensional reference points in the camera coordinate system corresponding to the visible light camera.

[0091] Multiple feature points are selected from the two-dimensional reference points within the preset image area;

[0092] The transformation matrix is ​​obtained by randomly matching the spatial coordinates of the radar point and all the feature points using the Pntri algorithm.

[0093] Specifically, firstly, the image vertex coordinates are converted into two-dimensional reference points in the camera coordinate system. Initially, the image vertex coordinates are based on the pixel coordinate system, while the camera coordinate system has its origin at the camera's optical center. Camera intrinsic parameters (such as focal length and principal point coordinates) eliminate the scale difference between the pixel and camera coordinate systems, converting the pixel coordinates into two-dimensional reference points in the camera coordinate system. This provides a unified spatial reference and avoids matching deviations caused by inconsistent coordinate system references. Secondly, multiple feature points are selected from the two-dimensional reference points within a preset image region. The source range of the feature points is clearly defined as the preset image region (i.e., the two-dimensional region corresponding to the radar corner reflector). Selection based on two-dimensional reference points ensures the effectiveness of the feature points (clear features and well-defined boundaries in the corner reflector region) and provides sufficient candidate samples for subsequent random matching. Typically, the selected feature points include region vertices, edge midpoints, and centroids, covering key locations of the region's geometric features and avoiding random matching errors caused by single feature points. Finally, the transformation matrix is ​​obtained by randomly matching all feature points using the Pntri algorithm. Unlike the fixed matching logic of the traditional PnP algorithm (3D-2D point matching), the Pntri algorithm utilizes random feature point matching, randomly selecting combinations from multiple feature points corresponding to the number of radar points. This avoids the problem of single feature point deviations amplifying the overall error in fixed matching. Simultaneously, through the algorithm's built-in mapping logic, a connection is established between the 3D points in the radar coordinate system and the 2D reference points in the camera coordinate system. Ultimately, the transformation matrix representing the pose relationship between the two sensors is solved, including rotation and translation matrices. The entire process does not rely on an external world coordinate system; calibration can be completed solely based on the data from the two sensors themselves, simplifying the calibration process and improving adaptability. This invention constitutes a complete data alignment processing flow through benchmark unification, feature enhancement, and mapping solution.

[0094] In a preferred embodiment of the present invention, the distortion coefficients are selected as input calibration functions according to requirements, based on the camera intrinsic parameters, the coordinates of the three image vertices of the corner reflector, and the spatial region of the corner reflection. The calibration function selects the region of the corner reflector in the image, selects M two-dimensional reference points for random matching, and obtains the corresponding transformation matrix between the 4D millimeter-wave radar and the camera.

[0095] In this embodiment of the invention, by transforming the two-dimensional reference point and selecting the regionalized feature point, it is ensured that the feature points used for matching have accurate spatial reference and clear geometric features, reducing the matching deviation caused by invalid feature points or chaotic reference, providing high-quality input data for the Pntri algorithm, and making the subsequent mapping relationship more reliable; at the same time, through the random matching mechanism of the Pntri algorithm, the randomness error of a single matching combination is reduced by trying and screening multiple sets of feature point combinations.

[0096] Optionally, the step of obtaining the transformation matrix by randomly matching the spatial coordinates of the radar point and all the feature points using the Pntri algorithm includes:

[0097] The spatial coordinates of the radar points are used as three-dimensional spatial points, and the Pntri algorithm is used to establish a matching association between the three-dimensional spatial points and the feature points to obtain a three-dimensional-two-dimensional mapping relationship.

[0098] From all the feature points, feature points corresponding to the number of spatial coordinates of the radar points are randomly selected to form a feature point combination;

[0099] For the feature point combination, based on the three-dimensional to two-dimensional mapping relationship and combined with the preset distortion coefficients in the camera, the initial transformation matrix corresponding to the feature point combination is determined;

[0100] The initial transformation matrix of the feature point combination is iteratively optimized using a weighted loss function to obtain the transformation matrix.

[0101] Specifically, firstly, a 3D-to-2D mapping relationship is established between radar point spatial coordinates and feature points. Considering the sparse point cloud characteristics of millimeter-wave radar, the Pntri algorithm is used to establish the association between radar 3D spatial points and image feature points, ensuring a precise spatial correspondence between the two types of sensor data and providing effective input for subsequent matrix solving. Simultaneously, the mapping relationship is established using feature points (originating from the 2D region corresponding to the radar corner reflector) as the carrier, ensuring the validity and consistency of the associated data and avoiding interference from environmental noise. Secondly, feature points corresponding to the number of radar points are randomly selected from the feature points to form combinations. This considers the limited number of radar points (usually 3) while also covering different combinations of feature points through random selection, avoiding systematic errors caused by fixed combinations and providing multiple sets of samples to support subsequent optimization. Furthermore, the initial transformation matrix is ​​determined by combining the 3D-to-2D mapping relationship and the camera's intrinsic distortion coefficients. By incorporating these coefficients into the calculation, coordinate deviations caused by camera optical characteristics are eliminated. The 3D points in the radar coordinate system are converted to coordinates in the camera coordinate system using perspective projection. This allows for the solution of the initial rotation and translation matrices characterizing the pose relationship between the two sensors, ensuring the physical meaning and computational accuracy of the initial matrix. Finally, the initial transformation matrix is ​​iteratively optimized using a weighted loss function. Loss calculation rules are designed based on the geometric characteristics of the corner reflector. Weight allocation emphasizes the coordinate accuracy of key radar points, and the matrix parameters are iteratively adjusted until the loss meets preset requirements, making the final transformation matrix more closely match the actual pose relationship.

[0102] In a preferred embodiment of the present invention, combined with Figure 2 As shown, the transformation matrix and error are calculated. Considering the characteristics of the corner reflector, an error loss technique function is designed, as shown below:

[0103] ;

[0104] in, This represents the loss corresponding to one set of data. , , This is the distance between the radar point projected onto the camera coordinate system and the side length of the triangle corresponding to the three vertices of the corner reflector in the pixel coordinate system. This is represented by the distance threshold, which is usually set to 1. These are the coordinates of the millimeter-wave radar points projected onto the camera coordinate system. This refers to the triangular region in the camera coordinate system from the corner reflector. Represented as weights, these are used to set the coordinates of radar points of greater interest. The total loss is the current change matrix, and M is the number of millimeter-wave radar points.

[0105] Determine the difference between the overall loss and the target loss corresponding to the projected pixel coordinates, and continuously adjust the rotation and offset matrices to reach the optimal solution;

[0106] In this embodiment of the invention, by establishing a three-dimensional-two-dimensional mapping relationship and selecting random feature combinations, the sparse characteristics of radar point clouds are adapted, avoiding the error bias caused by fixed matching, and enabling the solution of the initial transformation matrix to have multiple sets of samples to support it, reducing random errors. At the same time, the introduction of the camera's internal distortion coefficients eliminates the influence of camera optical distortion on coordinate transformation, further improving the calculation accuracy of the initial matrix.

[0107] Optionally, the step of iteratively optimizing the initial transformation matrix of the feature point combination using a weighted loss function to obtain the transformation matrix includes:

[0108] The three-dimensional spatial points are projected onto the camera coordinate system using the initial transformation matrix to obtain projected two-dimensional points.

[0109] Obtain the perpendicular distance from the projected two-dimensional point to the three sides of the triangle of the radar corner reflector in the preset image area;

[0110] The total loss of the initial transformation matrix is ​​obtained by using a weighted loss function based on the vertical distance;

[0111] Based on the total loss and the preset target loss, the rotation matrix and translation matrix in the initial transformation matrix are iteratively updated to obtain the transformation matrix.

[0112] Specifically, firstly, the 3D spatial points are projected onto the camera coordinate system using an initial transformation matrix. The 3D spatial points in the radar coordinate system (the effective radar points corresponding to the corner reflectors) are then transformed into projected 2D points in the camera coordinate system using the initial transformation matrix (including rotation and translation matrices). This unifies the coordinates of the two types of sensors, providing a common spatial reference. Secondly, the perpendicular distances from the projected 2D points to the three sides of the radar corner reflector triangle are obtained. Using the triangular region presented by the corner reflector in the image as a reference, and considering the corner reflector as a calibration component with fixed geometric properties in its triangular outline, the perpendicular distances from the projected 2D points to the three sides can be directly quantified. Compared to traditional point-to-point distance measurements, the deviation between the projected radar point position and the ideal region more comprehensively reflects the projection deviation (if the projected point is within the triangular region, the distance can be considered a deviation within the effective range; if it is outside the region, the deviation is significant), meeting the requirements for region matching accuracy in calibration scenarios. Furthermore, the total loss calculation process utilizes a weighted loss function, which strengthens the error contribution of key radar points through weight allocation. This makes the loss calculation more closely reflect the actual differences in data accuracy and avoids evaluation distortion caused by averaging single errors. The total loss calculation integrates multiple sets of vertical distances according to weights, forming a quantifiable accuracy index. Finally, the matrix is ​​iteratively updated based on the total loss and the preset target loss. By repeatedly adjusting the parameters of the rotation matrix and translation vector, the total loss is continuously reduced until the preset requirements are met, ensuring that the projection error of the final transformation matrix continuously converges. This solves the problem of insufficient accuracy in a single solution by traditional algorithms.

[0113] In a preferred embodiment of the present invention, iterative optimization can be performed through a data alignment module. Specifically, the inputs to the data alignment module are a 3D millimeter-wave radar point set (N×3), the corresponding calibration block region (N×3×2), the camera intrinsic parameter matrix (3×3), the distortion coefficients (1×5), the maximum number of iterations, the minimum tolerance parameter, the initial rotation matrix, and the initial translation matrix. The initial rotation matrix and the initial translation matrix are optional parameters. The outputs of the data alignment module are the optimal rotation vector, the optimal translation vector, the set of interior point indices, and the total loss.

[0114] Specifically, combined Figure 4 and 5 As shown, for step 1: For each calibration block region, select M points (combined with...) Figure 5 As shown, when M is 7, 3 vertices, 3 midpoints of sides, and 1 centroid can be selected. Randomly select 1 point from M points in a region. Each point is matched with a radar point.

[0115] For step 2: use the perspective n-point problem (PnP) method to obtain the rotation and translation matrices. If the PnP method fails, return to step 1 above.

[0116] For step 3: When the PnP method returns a success result, calculate the overall loss. The pixel coordinates obtained after projecting N millimeter-wave radar points are compared with their corresponding corner reflectors. When a pixel is within the region of its corresponding corner reflector, the loss is 0. When a pixel is outside the region of its corresponding corner reflector, the loss is the Euclidean distance from the pixel to the nearest corner reflector edge. The overall loss is obtained by summing the results of the projection transformations of N points.

[0117] For step 4: Compare the overall error with the minimum tolerance parameter. If the error is less than the minimum tolerance parameter, return the result. If it is greater than the minimum tolerance parameter, save the result and return to step 1.

[0118] When the desired result is obtained or the number of iterations exceeds the maximum number of iterations, the current optimal solution is returned directly.

[0119] In this embodiment of the invention, by measuring the perpendicular distance between the projected two-dimensional point and the three sides of the triangle, projection deviation can be quantified more accurately, avoiding misjudgment of overall accuracy due to deviation of a single feature point. Simultaneously, a weighted loss function considers the accuracy differences of radar point data, and by assigning weights to highlight the contribution of high-quality radar points, making the overall loss assessment more objective and providing accurate guidance for optimization.

[0120] Optionally, the transformation matrix includes a rotation matrix and a translation matrix; the step of performing coordinate transformation on the point cloud data according to the transformation matrix to obtain projected point cloud data includes:

[0121] Based on the spatial coordinates of each radar point in the radar coordinate system according to the point cloud data, attitude correction is performed by combining the rotation matrix and position compensation is performed by combining the translation matrix, thereby determining the three-dimensional coordinates of the radar point spatial coordinates in the camera coordinate system.

[0122] A set of three-dimensional points is generated based on the three-dimensional coordinates in the camera coordinate system corresponding to the spatial coordinates of the radar points.

[0123] The set of three-dimensional points is used as the projected point cloud data.

[0124] Specifically, firstly, the 3D coordinates of a single radar point in the camera coordinate system are determined based on the rotation and translation matrices of the transformation matrix. This clarifies that the rotation matrix (representing the attitude difference between the two coordinate systems) and the translation matrix (representing the position difference between the two coordinate systems) in the transformation matrix serve as the calculation basis. For the spatial coordinates (3D data in the radar coordinate system) of each radar point in the point cloud data, coordinate transformation is completed using the rigid body transformation formula (i.e., camera coordinate system coordinates = rotation matrix × radar coordinate system coordinates + translation matrix). This solves the problem of independent coordinate systems between the two sensors, ensuring that each radar point can find a unique corresponding spatial position in the camera coordinate system. Simultaneously, the transformation operation is performed on a single radar point-by-point basis, avoiding coordinate confusion caused by batch processing and ensuring transformation accuracy. Secondly, a 3D point set is generated based on the coordinates of each point, integrating all transformed 3D coordinates in the camera coordinate system into an ordered set. This preserves the spatial distribution characteristics of the original radar point cloud (such as target outlines and distance information) and achieves a unified coordinate reference, transforming the point cloud data from a radar perspective to a camera perspective, thus allowing direct matching with camera image data in the same spatial dimension. Finally, the three-dimensional point set is defined as projected point cloud data, which clarifies the attributes of the transformation result and makes the projected point cloud data a key bridge connecting radar point clouds and camera images. This provides a standardized input data format for subsequent time alignment and information fusion, avoiding fusion obstacles caused by inconsistent data formats.

[0125] In this embodiment of the invention, a point-by-point transformation method based on rotation and translation matrices ensures that the coordinate transformation accuracy of each radar point remains consistent with the accuracy of the transformation matrix, avoiding accumulated errors in batch transformations. This allows the projected point cloud data to accurately reflect the spatial information of the original radar point cloud and is fully compatible with the camera coordinate system. Subsequently, the projected point cloud can be directly projected onto the image pixel coordinate system using camera intrinsic parameters, achieving spatial association between the point cloud and the image. Furthermore, the projected point cloud data, based on a three-dimensional point set, preserves the original spatial structure of the radar point cloud (such as the target's three-dimensional position and motion parameters) and possesses compatibility with camera image data for collaborative processing. This provides a unified data carrier for subsequent time alignment (such as weighted fusion algorithms) and information fusion (such as combining target contours with spatial positions).

[0126] Optionally, the step of performing time alignment on the projected point cloud data and the image data using a weighted fusion algorithm to obtain fused data from the 4D millimeter-wave radar and the visible light camera includes:

[0127] Based on the acquisition time of the 4D millimeter-wave radar and the visible light camera, the frame rates corresponding to the 4D millimeter-wave radar and the visible light camera are determined respectively.

[0128] Based on the frame rates corresponding to the 4D millimeter-wave radar and the visible light camera, the projected point cloud data and the image data are divided into main frames and sub-frames.

[0129] Select the two sub-frames closest to the main frame, and fuse the two sub-frames using a weighted fusion algorithm to obtain the fused sub-frame;

[0130] The data pair is formed based on the fused subframe and the main frame;

[0131] The data pairs are fused to obtain fused data from the 4D millimeter-wave radar and the visible light camera.

[0132] Specifically, firstly, combining Figure 2 As shown, image data needs to be distorted. Specifically, based on preset camera intrinsic parameters and preset distortion coefficients, the image data is distorted to obtain distorted image data. Then, by calculating the frame rate (e.g., the number of frames acquired per unit time) through the acquisition time difference, the time interval difference between the 4D millimeter-wave radar and visible light camera data generation can be clearly defined. Since millimeter-wave radar and visible light cameras have different hardware acquisition principles, their frame rates are usually inconsistent (e.g., radar frame rate 10Hz, camera frame rate 20Hz). Therefore, quantifying the frame rate difference provides an objective basis for subsequent main and sub-frame division, avoiding alignment deviations caused by ambiguous time references. Secondly, since low frame rate data has a long generation interval and high frame rate data can provide more time node references, dividing the main and sub-frames based on frame rate ensures that the main frame can serve as a stable time anchor point, providing a clear time matching target for sub-frame fusion. Furthermore, by assigning weights (e.g., the weight of a sub-frame closer to the main frame is increased), information complementarity between the two sub-frames is achieved. This solves the problem of large time deviations between a single sub-frame and the main frame, and preserves the detailed information of high-frame-rate sub-frames through fusion, avoiding data distortion caused by time interpolation. The weighted fusion algorithm ensures that the fused sub-frames accurately match the main frame in the time dimension while retaining the original features of the sensor data. Subsequently, the fused sub-frames and the main frame are paired to achieve time binding between the two types of sensor data, ensuring that each data pair has a unified timestamp, providing a time consistency foundation for subsequent information fusion. Finally, the spatial location information (such as distance and orientation) of the projected point cloud is integrated with the visual feature information (such as target contour and color) of the image to form multimodal data that combines environmental adaptability and detection accuracy, completing the closed loop from time alignment to information fusion.

[0133] In a preferred embodiment of the present invention, the acquisition time of the corresponding 4D millimeter-wave radar and camera is obtained, and the corresponding frame rate is calculated based on the time; wherein, the frame rate with higher frame rate is the sub-frame, and the frame rate with lower frame rate is the main frame. Typically, the 4D millimeter-wave radar is the main frame, and the camera is the sub-frame.

[0134] Subframe data fusion:

[0135] Two frames of data that are close to the main frame are selected, and a weight-based fusion method is used to fuse the two frames of data, as shown below:

[0136] ;

[0137] in, The subframe data obtained by fusing with the main frame. The fusion weight (values ​​range from 0 to 1). , These are two frames of data that are close to the main frame data;

[0138] In this embodiment of the invention, the frame rate determination and the division of main and sub-frames clearly define the benchmark for time alignment, avoiding time deviations caused by irregular matching. Simultaneously, the time-aligned data stream ensures the temporal consistency of the two types of sensor data, providing a reliable prerequisite for information fusion. This allows the fused data to inherit the advantages of millimeter-wave radar in terms of anti-interference (e.g., water mist, light spots) and long detection range, while also integrating the clear target category recognition characteristics of visible light cameras. This significantly improves target detection accuracy and environmental adaptability in complex scenarios (e.g., surface navigation systems).

[0139] The following methods can be used to obtain images based on weighted fusion:

[0140] target = cv2.addWeighted(src1, alpha, src2, beta, gamma);

[0141] Where target represents the final fused data, and src1, alpha, src2, beta, and gamma represent the weights of image 1 and src1 to be fused, image 2 and src2 to be fused, and the gamma correction coefficient, respectively. The values ​​of alpha and beta can be determined based on their distance from the main frame time point; the closer they are, the greater their weight.

[0142] Optionally, the step of dividing the projected point cloud data and the image data into main frames and subframes according to the frame rates corresponding to the 4D millimeter-wave radar and the visible light camera respectively includes:

[0143] If the frame rate of the 4D millimeter-wave radar is greater than the frame rate of the visible light camera, then the projected point cloud data is determined to be the main frame and the image data is the sub-frame.

[0144] If the frame rate of the 4D millimeter-wave radar is less than the frame rate of the visible light camera, then the image data is determined to be the main frame and the projected point cloud data is the sub-frame.

[0145] Specifically, low frame rate data, with its longer time intervals and lower generation frequency, is more suitable as a stable benchmark for time alignment; while high frame rate data, with its shorter time intervals and higher density, can supplement temporal details through fusion. Furthermore, 4D millimeter-wave radar and visible light cameras naturally differ in their frame rates due to their different hardware operating principles (radar detects through electromagnetic waves, while cameras image through optical means). The core requirement for time alignment is finding a stable time anchor point; therefore, using frame rate as the dividing line avoids alignment confusion caused by frequent benchmark changes. Secondly, the assignment of primary and secondary frames is clearly defined for the two frame rate comparison scenarios. When the radar frame rate is higher than the camera frame rate, the radar generates more dense data, while the camera data intervals are longer. In this case, the camera image data is the secondary frame, and the radar projected point cloud data is the primary frame, utilizing the long-interval stability of the camera data as a benchmark. When the radar frame rate is lower than the camera frame rate, the camera data is more dense, while the radar data intervals are longer. In this case, the radar projected point cloud data is the secondary frame, and the camera image data is the primary frame. This bidirectional judgment logic covers all possible frame rate comparisons between the two types of sensors, and ensures that no matter how the frame rate changes, the main frame is always the more stable anchor data in the time dimension, and the secondary frame is always supplementary data that can be adapted to the time of the main frame by fusion. This avoids the failure of the division rules due to frame rate fluctuations and ensures the continuity and effectiveness of the time alignment process.

[0146] In this embodiment of the invention, by using clear rules for dividing the main and sub-frames, the problem of unclear benchmarks and lack of basis for division in the time alignment of 4D millimeter-wave radar and visible light camera is solved, providing a stable and unified time anchor point for subsequent weight fusion, and significantly improving the accuracy of time alignment and process stability.

[0147] Combination Figure 6 As shown, another embodiment of the present invention provides a 4D millimeter-wave radar and visible light camera data fusion system, comprising:

[0148] The data acquisition module is used to acquire point cloud data from the 4D millimeter-wave radar and image data from the visible light camera.

[0149] The data alignment module is used to perform spatial analysis on the point cloud data and the image data respectively to obtain the image vertex coordinates of the image data in the pixel coordinate system and the radar point spatial coordinates of the point cloud data in the radar coordinate system; and to perform data alignment processing based on the image vertex coordinates and radar point spatial coordinates using the Pntri algorithm to obtain the transformation matrix between the 4D millimeter-wave radar and the visible light camera.

[0150] The time alignment module is used to perform coordinate transformation on the point cloud data according to the transformation matrix to obtain projected point cloud data; and to perform time alignment on the projected point cloud data and the image data through a weighted fusion algorithm to obtain fused data of the 4D millimeter-wave radar and the visible light camera.

[0151] The 4D millimeter-wave radar and visible light camera data fusion system of the present invention has the same advantages over the prior art as the aforementioned 4D millimeter-wave radar and visible light camera data fusion method, and will not be repeated here.

[0152] Another embodiment of the present invention provides an electronic device including a memory and a processor;

[0153] The memory is used to store computer programs;

[0154] The processor is configured to implement the 4D millimeter-wave radar and visible light camera data fusion method as described above when executing the computer program.

[0155] The electronic device of the present invention has the same advantages over the prior art as the aforementioned 4D millimeter-wave radar and visible light camera data fusion method over the prior art, and will not be repeated here.

[0156] While the present invention has been disclosed above, its scope of protection is not limited thereto. Those skilled in the art can make various changes and modifications without departing from the spirit and scope of the present invention, and all such changes and modifications will fall within the scope of protection of the present invention.

Claims

1. A method for fusing data from a 4D millimeter-wave radar and a visible light camera, characterized in that, include: Acquire point cloud data from 4D millimeter-wave radar and image data from a visible light camera; Spatial analysis is performed on the point cloud data and the image data respectively to obtain the image vertex coordinates of the image data in the pixel coordinate system and the radar point spatial coordinates of the point cloud data in the radar coordinate system. Using the Pntri algorithm, data alignment processing is performed based on the image vertex coordinates and the radar point spatial coordinates to obtain the transformation matrix between the 4D millimeter-wave radar and the visible light camera. This includes: converting the image vertex coordinates into two-dimensional reference points in the camera coordinate system corresponding to the visible light camera; selecting multiple feature points from the two-dimensional reference points within a preset image area; and randomly matching the radar point spatial coordinates with all the feature points to obtain the transformation matrix. Specifically, randomly matching the radar point spatial coordinates with all the feature points to obtain the transformation matrix includes: using the radar point spatial coordinates as three-dimensional spatial points, establishing a matching association between the three-dimensional spatial points and the feature points to obtain a three-dimensional-two-dimensional mapping relationship; randomly selecting feature points from all the feature points corresponding to the number of radar point spatial coordinates to form a feature point combination; and determining the corresponding feature point combination based on the three-dimensional-two-dimensional mapping relationship and a preset distortion coefficient within the preset camera, according to the preset distortion coefficient. An initial transformation matrix is ​​obtained by iteratively optimizing the initial transformation matrix of the feature point combination using a weighted loss function. Specifically, this iterative optimization includes: projecting the three-dimensional spatial points onto the camera coordinate system using the initial transformation matrix to obtain projected two-dimensional points; obtaining the perpendicular distances from the projected two-dimensional points to the three sides of the triangle of the radar corner reflector in the preset image region; obtaining the total loss of the initial transformation matrix using the weighted loss function based on the perpendicular distances; and iteratively updating the rotation and translation matrices in the initial transformation matrix based on the total loss and the preset target loss to obtain the transformation matrix. The preset image region is the two-dimensional region corresponding to the radar corner reflector in the image data acquired by the visible light camera. The coordinates of the radar corner reflector's contour vertices in the image data under the pixel coordinate system are used as the image vertex coordinates. Based on the transformation matrix, the point cloud data is subjected to coordinate transformation to obtain projected point cloud data; A weighted fusion algorithm is used to time-align the projected point cloud data and the image data to obtain fused data of the 4D millimeter-wave radar and the visible light camera. This includes: determining the frame rate corresponding to the 4D millimeter-wave radar and the visible light camera based on their acquisition times; dividing the projected point cloud data and the image data into main frames and sub-frames based on their respective frame rates; selecting the two sub-frames closest to the main frame and fusing them using the weighted fusion algorithm to obtain a fused sub-frame; forming a data pair based on the fused sub-frame and the main frame; and fusing the data pairs to obtain the fused data of the 4D millimeter-wave radar and the visible light camera.

2. The 4D millimeter-wave radar and visible light camera data fusion method according to claim 1, characterized in that, The step of performing spatial analysis on the point cloud data and the image data respectively to obtain the image vertex coordinates of the image data in the pixel coordinate system and the radar point spatial coordinates of the point cloud data in the radar coordinate system includes: Based on the preset camera intrinsic parameters and the preset distortion coefficients, the image data is subjected to distortion correction processing to obtain the distortion-corrected image data; The preset image region in the distortion-free image data is extracted to obtain multiple vertices; wherein, the vertices are the contour vertices of the radar corner reflector in the image data; The coordinate data of the vertex in the pixel coordinate system are used as the image vertex coordinates; Intensity filtering is applied to the point cloud data to obtain the spatial coordinates of the radar point in the radar coordinate system.

3. The 4D millimeter-wave radar and visible light camera data fusion method according to claim 1, characterized in that, The transformation matrix includes a rotation matrix and a translation matrix; the step of performing coordinate transformation on the point cloud data according to the transformation matrix to obtain projected point cloud data includes: Based on the spatial coordinates of each radar point in the radar coordinate system according to the point cloud data, attitude correction is performed by combining the rotation matrix and position compensation is performed by combining the translation matrix, thereby determining the three-dimensional coordinates of the radar point spatial coordinates in the camera coordinate system. A set of three-dimensional points is generated based on the three-dimensional coordinates in the camera coordinate system corresponding to the spatial coordinates of the radar points. The set of three-dimensional points is used as the projected point cloud data.

4. A 4D millimeter-wave radar and visible light camera data fusion system applied to the 4D millimeter-wave radar and visible light camera data fusion method described in claim 1, characterized in that, The 4D millimeter-wave radar and visible light camera data fusion system includes: The data acquisition module is used to acquire point cloud data from the 4D millimeter-wave radar and image data from the visible light camera. The data alignment module is used to perform spatial analysis on the point cloud data and the image data respectively to obtain the image vertex coordinates of the image data in the pixel coordinate system and the radar point spatial coordinates of the point cloud data in the radar coordinate system; and to perform data alignment processing based on the image vertex coordinates and radar point spatial coordinates using the Pntri algorithm to obtain the transformation matrix between the 4D millimeter-wave radar and the visible light camera. The time alignment module is used to perform coordinate transformation on the point cloud data according to the transformation matrix to obtain projected point cloud data; and to perform time alignment on the projected point cloud data and the image data through a weighted fusion algorithm to obtain fused data of the 4D millimeter-wave radar and the visible light camera.

5. An electronic device, characterized in that, Including memory and processor; The memory is used to store computer programs; The processor is configured to implement the 4D millimeter-wave radar and visible light camera data fusion method as described in any one of claims 1-3 when executing the computer program.