Millimeter-wave radar inertial navigation positioning and mapping methods, equipment and media

By employing a self-supervised learning method and utilizing rotation angle difference and U-Net network to enhance the radar spectral structure, sub-pixel-level landmark matching and velocity estimation are performed. This solves the robustness problem of millimeter-wave radar inertial navigation positioning schemes in complex environments, achieving stable and reliable positioning and mapping.

CN120446947BActive Publication Date: 2026-06-30HUAZHONG UNIV OF SCI & TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
HUAZHONG UNIV OF SCI & TECH
Filing Date
2025-04-18
Publication Date
2026-06-30

Smart Images

  • Figure CN120446947B_ABST
    Figure CN120446947B_ABST
Patent Text Reader

Abstract

This application designs a rotation-based heterogeneous information cross-fusion scheme to enhance the spatial structure information of millimeter-wave radar inter-frame spectra and alleviate the positional ambiguity caused by limited angular resolution in spatial landmark feature extraction. A spatially consistent landmark extraction and association scheme is designed to extract robust and dense landmark feature point clouds from millimeter-wave radar spectra rich in external spatial information, effectively reducing the impact of millimeter-wave point cloud sparsity and clutter. A differentiable ego-velocity estimation method is proposed to accurately extract sub-pixel-level radial velocities from Doppler spectra and precisely estimate ego-velocities. A loss function based on geometric, velocity consistency, and Doppler velocity constraints is constructed to train the landmark extraction network in a self-supervised manner, thereby achieving accurate localization and robust mapping. This application can seamlessly adapt to complex and extensive traffic and driving environments without ground truth supervision of localization, and achieve stable and reliable state estimation and landmark extraction.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of autonomous navigation and environmental perception technology for unmanned systems, and more specifically, to a millimeter-wave radar inertial navigation positioning and mapping method, device and medium. Background Technology

[0002] With the rapid development of artificial intelligence technology, autonomous driving has become one of the most promising innovation directions in the global transportation field. The core objective of autonomous driving is to achieve autonomous vehicle operation through perception, decision-making, and control, and high-precision positioning and environmental reconstruction technologies are the key foundation for achieving this goal. In complex and dynamic traffic scenarios, vehicles need to acquire their own pose (position and attitude) in real time and construct a 3D map of the surrounding environment to ensure safe path planning and obstacle avoidance. Traditional positioning systems rely on GNSS, LiDAR, or camera sensors; however, these sensors are easily affected by signal blockage, changes in lighting, or inclement weather, leading to positioning drift or mapping failure. Therefore, achieving all-weather, highly reliable, and robust positioning and mapping technologies has become a core challenge for the implementation of autonomous driving and a prerequisite for ensuring its safety and reliability.

[0003] With the development of radar imaging technology, the positioning combination system of millimeter-wave radar (mmWave Radar) and inertial navigation system (INS) has shown unique application potential. Compared with GNSS signals being easily blocked, lidar being expensive and having low penetration capability, and cameras being sensitive to light, millimeter-wave radar, with its all-weather operation capability (penetrating rain, fog, and dust) and low-cost mass production advantages, has become a reliable choice for positioning and perception in complex environments. As a low-cost internal sensing sensor, the inertial navigation system is unaffected by external weather and light, providing vehicles with high-speed acceleration and angular velocity information. This technical approach significantly reduces the dependence of autonomous driving on high-cost hardware, while improving the system's adaptability in extreme weather and complex road conditions. It provides an economical and robust solution for the large-scale commercial deployment of autonomous driving, becoming an important driving force for the industry to move towards L4 / L5 level advanced autonomous driving.

[0004] Traditional millimeter-wave radar inertial navigation positioning schemes mainly focus on the following directions: (1) One is to use the observations of the IMU (Inertial Measurement Unit) for the state propagation equation, and solve the odometer of the radar inertial navigation system by scanning and matching between millimeter-wave radar frames. However, this scheme requires high spatial resolution of millimeter-wave radar and is not suitable for low-cost monolithic radars. (2) Another scheme is to use the internally sensed IMU to model the state propagation process of the vehicle, use the Doppler velocity of the millimeter-wave radar as the observation constraint, and fuse multi-source information by optimizing the model through Kalman filter or probability factor graph. However, this scheme is limited by the quality of the feature point cloud of millimeter-wave radar. Due to the limited size of commercial radar hardware, the acquired point cloud is usually very sparse and affected by clutter points caused by multipath effect and specular reflection effect, resulting in low robustness of this scheme and difficulty in being applicable to a wide range of complex driving scenarios. (3) Another scheme is to use deep neural networks for end-to-end training and learning. However, this method requires the collection of a wide range of training samples and a large amount of manual annotation, and the generalization is difficult to guarantee.

[0005] Therefore, how to achieve more stable and reliable radar inertial navigation positioning and mapping in more complex and extensive environments is a technical problem that all-weather autonomous driving systems urgently need to study. Summary of the Invention

[0006] To address at least one deficiency or improvement requirement of the prior art, this application provides a millimeter-wave radar inertial navigation positioning and mapping method, device and medium to achieve more stable and reliable radar inertial navigation positioning and mapping in more complex and extensive environments.

[0007] To achieve the above objectives, in a first aspect, this application provides a millimeter-wave radar inertial navigation positioning and mapping method, comprising:

[0008] By synchronizing millimeter-wave radar data frames through inter-frame pre-integration of the inertial measurement unit, a dot product attention mechanism is constructed using the rotation angle difference to generate the expectation matrix between adjacent radar frames, thereby achieving cross-fusion of heterogeneous information to enhance the spatial structure information of the radar spectrum.

[0009] The U-Net backbone network is used to extract multi-scale features from the fused radar spectrogram. The multi-head decoding structure outputs the spatial location, confidence weight and feature descriptor of the landmarks at the sub-pixel level, and the landmark feature matching is completed by combining the cross-frame attention mechanism.

[0010] The radial velocity of candidate landmarks is estimated based on the subpixel-level Doppler velocity soft query algorithm. The instantaneous self-velocity is then calculated by combining the geometric relationship between the radial velocity of static landmarks and radar self-velocity.

[0011] A joint loss function containing geometric consistency constraints, Doppler velocity constraints, and velocity consistency constraints is constructed to train the feature extraction network in a self-supervised manner.

[0012] A trained feature extraction network is used to extract spatially consistent radar landmark point clouds, and the autovelocity estimation results are fused to achieve real-time localization and mapping.

[0013] Furthermore, the inter-frame pre-integration of the inertial measurement unit includes:

[0014] Based on the linear acceleration measurement value of the inertial measurement unit angular velocity measurement value Linear acceleration measurement bias b a and angular velocity measurement bias b ω Calculate the relative position between adjacent radar frames speed and attitude quaternions The specific formulas include:

[0015]

[0016] in, Represents the rotation matrix in the coordinate system of the inertial measurement unit; This represents the quaternion multiplication operator; This represents the change in relative attitude of the inertial measurement unit from frame k to the current time t.

[0017] Furthermore, the implementation methods of the dot product attention mechanism include:

[0018] The radar spectrum of the k-th frame The azimuth angle is represented by η=(η1,…,η n ) and rotation angle The difference is used as a metric to generate the expectation vector from frame k to frame (k+1).

[0019]

[0020] Where κ represents the annealing coefficient, and the expected matrix is ​​adjusted through a scale alignment operation. Merged with the original spectrum to form an enhanced spectrum

[0021] Furthermore, the multi-head decoding structure outputs sub-pixel level spatial location, confidence weights, and feature descriptors of landmarks, including:

[0022] Local pixel blocks in the radar spectrum Inside, according to the test score chart The specific formula for performing Softmax weighted summation includes:

[0023]

[0024] Among them, (u ij v ij (u) represents pixel coordinates; k v k () represents subpixel coordinates;

[0025] Output standardized confidence weights from the confidence header of the multi-head decoding structure. To characterize the confidence level of each pixel being selected as a candidate landmark;

[0026] Encoder backbone block features are collected from the descriptor header of the multi-head decoding structure and used as descriptors to uniquely identify candidate landmark features.

[0027] Furthermore, landmark feature matching is achieved by combining cross-frame attention mechanisms, including:

[0028] Radar candidate landmark features between frames are matched using an attention mechanism:

[0029]

[0030] Among them, V′ k V represents the position matrix of candidate landmarks in the target frame; V represents the position matrix containing all pixel coordinates. This represents the query matrix consisting of N candidate landmark descriptors in the (k-1)th frame; κ represents the key matrix composed of pixel descriptors of the entire image in the k-th frame; κ represents the annealing coefficient that adjusts the sharpness of the attention weight distribution; H and W represent the range height and azimuth width of the radar spectrum, respectively.

[0031] Furthermore, the radial velocity of candidate landmarks is estimated based on the sub-pixel Doppler velocity soft query algorithm, including:

[0032] Based on the sub-pixel coordinates of candidate landmarks, the radial velocity v is extracted from the Doppler spectrum using a weighted soft query. r :

[0033]

[0034] in, This represents the distance matrix between the location coordinates of the candidate landmark and the pixel coordinates of the spectral image; This indicates pixel-level Doppler velocity.

[0035] Furthermore, the instantaneous ego velocity is calculated using a differentiable optimization algorithm, including:

[0036] The geometric relationship between the static feature and the radar's radial velocity and self-velocity is modeled as a linear equation GX = B, and solved using the weighted least squares method:

[0037] X = (G T G) -1 G T B

[0038] Where X represents instantaneous self-velocity; G represents direction vector matrix; and B represents observed velocity vector.

[0039] Furthermore, geometric consistency constraints Defined as:

[0040]

[0041] in, This represents the number of training batches; N represents the number of landmarks extracted from a single frame of radar point cloud. e represents the confidence weight of the i-th landmark in the k-th frame; i Represents geometric residuals; This represents the rotation matrix from frame k to frame (k+1). This represents the 3D coordinates of the i-th landmark in the radar coordinate system within the k-th frame; This represents the translation vector from frame k to frame (k+1). Indicates the relationship between the (k+1)th frame and The corresponding 3D coordinates of the matching landmark in the radar coordinate system;

[0042] Doppler velocity constraint Defined as:

[0043]

[0044] Where G represents the observation matrix of the Doppler velocity of static landmarks; I represents the identity matrix, with the same dimension and number of rows as G;

[0045] Speed ​​Consistency Constraint Defined as:

[0046]

[0047] in, I v k This represents the self-motion velocity in the inertial measurement unit coordinate system at frame k. This represents the change in speed from frame k to frame (k+1). This represents the rotation matrix from frame (k+1) to frame (k). I v k+1 This represents the velocity measured by the inertial measurement unit at frame k+1.

[0048] Secondly, this application provides an electronic device including at least one processing unit and at least one storage unit, wherein the storage unit stores a computer program that, when executed by the processing unit, enables the processing unit to perform the steps of the millimeter-wave radar inertial navigation positioning and mapping method described in any of the preceding claims.

[0049] Thirdly, this application provides a storage medium storing a computer program executable by an access authentication device, which, when run on the access authentication device, enables the access authentication device to perform the steps of the millimeter-wave radar inertial navigation positioning and mapping method described in any of the preceding claims.

[0050] In summary, compared with the prior art, the above-described technical solutions conceived in this application can achieve the following beneficial effects:

[0051] This application designs the first self-supervised learning-based millimeter-wave radar inertial navigation (INS) localization and mapping technology, enabling more stable and reliable INS localization and mapping in more complex and extensive environments by utilizing millimeter-wave radar spectra rich in external spatial information and IMU data containing high-frequency INS information. Specifically, firstly, this application designs a rotation-based heterogeneous information cross-fusion scheme to enhance the spatial structure information of millimeter-wave radar inter-frame spectra and alleviate the positional ambiguity caused by limited angular resolution in spatial landmark feature extraction. Secondly, this application designs a spatially consistent landmark extraction and association scheme, extracting robust and dense landmark feature point clouds from millimeter-wave radar spectra rich in external spatial information, thereby effectively reducing the sparsity of millimeter-wave point clouds and the influence of clutter point clouds. Thirdly, this application proposes a differentiable self-velocity estimation method, which can accurately extract sub-pixel-level radial velocities from Doppler spectra and accurately estimate self-velocities. Fourthly, this application constructs a loss function based on geometry, velocity consistency, and Doppler velocity constraints to train the landmark extraction network in a self-supervised manner, thereby achieving accurate localization and robust mapping. Compared with existing millimeter-wave radar inertial navigation positioning and mapping technologies, the self-supervised learning radar inertial navigation positioning and mapping scheme proposed in this application can be seamlessly adapted to complex and extensive traffic and driving environments without the need for ground truth supervision of positioning, and achieve stable and reliable state estimation and landmark extraction. Attached Figure Description

[0052] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0053] Figure 1 This application provides a core flowchart of a millimeter-wave radar inertial navigation positioning and mapping method according to an embodiment of the present application.

[0054] Figure 2 A schematic block diagram illustrating the three key steps of a millimeter-wave radar inertial navigation positioning and mapping method provided in this application embodiment;

[0055] Figure 3 A flowchart illustrating the entire process of a millimeter-wave radar inertial navigation positioning and mapping method provided in this application embodiment;

[0056] Figure 4 A schematic block diagram illustrating the entire process of a millimeter-wave radar inertial navigation positioning and mapping method provided in this application embodiment;

[0057] Figure 5 A block diagram of an electronic device suitable for implementing the millimeter-wave radar inertial navigation positioning and mapping method described above, provided in an embodiment of this application. Detailed Implementation

[0058] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application. Furthermore, the technical features involved in the various embodiments described below can be combined with each other as long as they do not conflict with each other.

[0059] The terms "comprising" or "having," and any variations thereof, in the specification, claims, or drawings of this application are intended to cover a non-exclusive inclusion. For example, a process, method, system, product, or apparatus that includes a series of steps or units is not limited to the steps or units listed, but may optionally include steps or units not listed, or may optionally include other steps or units inherent to such processes, methods, products, or apparatus.

[0060] As described in the background section of the manual, traditional millimeter-wave radar inertial navigation positioning schemes mainly focus on the following directions: (1) One approach is to use IMU observations for the state propagation equation and calculate the odometer of the radar inertial navigation system by scanning and matching between millimeter-wave radar frames. However, this approach requires high spatial resolution of millimeter-wave radar and is not suitable for low-cost monolithic radars. (2) Another approach is to use an internally sensed IMU to model the state propagation process of the vehicle, use the Doppler velocity of the millimeter-wave radar as the observation constraint, and fuse multi-source information by optimizing the model through Kalman filters or probability factor graphs. However, this approach is limited by the quality of the feature point cloud of the millimeter-wave radar. Due to the limited size of commercial radar hardware, the acquired point cloud is usually very sparse and affected by clutter points caused by multipath effects and specular reflection effects, resulting in low robustness of this approach and making it difficult to apply to a wide range of complex driving scenarios. (3) Another approach is to use deep neural networks for end-to-end training and learning. However, this method requires the collection of a wide range of training samples and a large amount of manual annotation, and the generalization is difficult to guarantee. Therefore, achieving more stable and reliable radar inertial navigation positioning and mapping in more complex and widespread environments is a crucial technical issue that all-weather autonomous driving systems urgently need to address. In view of this, this application proposes a millimeter-wave radar inertial navigation positioning and mapping method, device, and medium to achieve more stable and reliable radar inertial navigation positioning and mapping in more complex and widespread environments.

[0061] refer to Figures 1-4 One embodiment of this application proposes a millimeter-wave radar inertial navigation localization and mapping method based on self-supervised learning, which may specifically include the following steps.

[0062] Step 1: Synchronize millimeter-wave radar data frames through inter-frame pre-integration of the inertial measurement unit, construct a dot product attention mechanism using rotation angle difference, generate the expectation matrix between adjacent radar frames, and realize the cross-fusion of heterogeneous information to enhance the spatial structure information of the radar spectrum.

[0063] Commercial single-chip radars typically have only a limited number of antennas, resulting in low angular resolution for millimeter-wave radars, making it difficult to extract reliable landmarks directly from radar spectra. To address this, this application designs a rotation-based heterogeneous information cross-fusion scheme to enhance the spatial structure information of millimeter-wave radar inter-frame spectra. Specifically, this application utilizes the linear mapping relationship between the spatial sensing spectra of millimeter-wave radar and IMU inertial navigation data on the rotation component, and augments the target positions in the spectra based on the angular information provided by the inertial navigation system, thereby alleviating the positional ambiguity caused by the limited angular resolution in feature extraction of spatial landmarks.

[0064] More specifically, considering the different sampling rates of the IMU and the radar, it is necessary to synchronize the millimeter-wave radar data frames through IMU inter-frame pre-integration before performing rotation-based heterogeneous information cross-fusion.

[0065]

[0066] in, and These represent the linear acceleration and angular velocity measurements of the inertial measurement unit, respectively; b a and b ω These represent the offset for linear acceleration measurement and the offset for angular velocity measurement, respectively. Represents the rotation matrix in the coordinate system of the inertial measurement unit; This represents the quaternion multiplication operator; This represents the change in relative attitude of the inertial measurement unit from the k-th frame to the current time t. and Let represent the relative position, velocity, and attitude quaternions between adjacent radar frames, respectively. Next, let the millimeter-wave radar energy spectrum of the k-th frame be... H and W represent the range-height and azimuth-width of the radar spectrum, respectively. Unlike the range dimension, which has uniform linear resolution, the angular resolution of the radar spectrum exhibits non-linear characteristics. This is reflected in the energy spectrum M... k The angle along the orientation is represented as η=(η1,…,η n Considering that the radar rotates in a relatively short time, such as 0.1 seconds... At that time, the static landmark maintains the θ direction in the radar coordinate system of the k-th frame, and maintains the θ direction in the (k+1)-th frame. Direction. Let... express The key step in cross-fusion of heterogeneous information, transforming the desired vector from frame k to frame (k+1), is to construct the desired vector by designing a dot product attention mechanism that measures the angle difference:

[0067]

[0068] Where κ represents the annealing coefficient. Then, the expected matrix is ​​adjusted through a scale alignment operation. Merged with the original spectrum to form an enhanced spectrum in,

[0069]

[0070] Similarly, the formula for the expectation matrix provided by the (k+1)th frame to act on the kth frame is:

[0071] The rotation-based cross-fusion operation in step 1 above significantly enhances the spatial structure information of the inter-frame spectrogram of millimeter-wave radar, and alleviates the positional ambiguity caused by the limited angular resolution in the feature extraction of spatial landmarks.

[0072] Step 2: Use the U-Net backbone network to extract multi-scale features from the fused radar spectrogram. Output sub-pixel level spatial location, confidence weight and feature descriptor of landmarks through a multi-head decoding structure, and complete landmark feature matching by combining cross-frame attention mechanism.

[0073] Traditional radar inertial navigation positioning schemes are based on sparse point clouds processed on-board. However, due to the large proportion of clutter in these point clouds, traditional scan matching or velocity estimation schemes are difficult to operate normally. To address this, this application designs a spatially consistent landmark extraction and association mechanism. It uses a feature extraction network to extract the spatial location, confidence level, and descriptor of landmark features from the millimeter-wave radar spatial spectrum map, and aligns candidate landmark features through a differentiable feature matching mechanism. Finally, the candidate features are mapped to a Cartesian coordinate system for subsequent velocity estimation.

[0074] Specifically, after rotation-based cross-fusion, feature extraction and spatial feature point matching are performed on the enhanced geometrically consistent spatial radar spectrogram to generate candidate landmark features across adjacent time frames. More specifically, U-Net is used as the backbone network for feature extraction, with the enhanced radar spectrogram as the network input. A multi-head architecture is designed to decode the spatial location of feature points, their confidence as candidate landmark features, and the descriptor of the point. The location header outputs a detection score L. To fully utilize the signal energy sidelobe information in the RAS, a detection score L is generated from M. k Sub-pixel coordinates are extracted within each pixel block. This is done within local pixel blocks of the radar spectrogram. Inside, according to the test score chart The specific formula for performing Softmax weighted summation includes:

[0075]

[0076] Among them, (u ij v ij (u) represents pixel coordinates; k v k ) represents subpixel coordinates.

[0077] The second tap (the confidence header of the multi-head decoding structure) outputs the normalized confidence weights. It represents the confidence level of each pixel being selected as a candidate landmark. The influence of clutter points with high reflectivity can be eliminated by using the confidence score.

[0078] The third tap (the descriptor header of the multi-head decoding structure) collects the encoder backbone block features and uses them as the descriptor to uniquely identify candidate landmark features.

[0079] Next, the radar candidate landmark features between frames are matched using an attention mechanism:

[0080]

[0081] Among them, V ′ V represents the position matrix of candidate landmarks in the target frame; V represents the position matrix containing all pixel coordinates. and These are descriptors representing the candidate landmarks in frame (k-1) and the pixel coordinates of all pixels in frame k, respectively. This represents the query matrix consisting of N candidate landmark descriptors in the (k-1)th frame; κ represents the key matrix composed of pixel descriptors of the entire image in the k-th frame; κ represents the annealing coefficient (i.e., temperature coefficient) that adjusts the sharpness of the attention weight distribution.

[0082] Through step 2 above, robust and dense landmark feature point clouds can be extracted from millimeter-wave radar spectra rich in external spatial information, thereby effectively reducing the impact of sparsity and clutter point clouds in millimeter-wave point clouds.

[0083] Step 3: Estimate the radial velocity of candidate landmarks based on the subpixel-level Doppler velocity soft query algorithm, and combine the geometric relationship between the radial velocity of static landmarks and radar ego velocity to solve the instantaneous ego velocity through a differentiable optimization algorithm.

[0084] To achieve accurate velocity estimation for radar inertial navigation in a self-supervised manner, this application further designs a differentiable self-velocity estimation method for estimating motion states. Specifically, for the first time, differentiable sub-pixel-level Doppler velocities are extracted from the radar's Doppler spectrum based on the positions of candidate features. Then, instantaneous self-velocity estimation is achieved based on a differentiable weighted least squares algorithm. During the inference phase, some clutter points and dynamic feature point clouds are filtered out through Doppler velocity constraints, thereby obtaining robust landmark features.

[0085] Specifically, to estimate the motion state of the millimeter-wave radar, the Doppler velocity of candidate landmarks needs to be estimated first. Since the candidate coordinates are at the sub-pixel level, target query cannot be performed directly from the original Doppler spectrum. Therefore, this application designs a sub-pixel level Doppler extraction scheme, similar to landmark feature extraction from the spectrum. A weighted soft query is performed using the pixel positions of candidate landmarks to extract the radial velocity v from the Doppler spectrum. r :

[0086]

[0087] in, This represents the distance matrix between the location coordinates of the candidate landmark and the pixel coordinates of the spectral image; This indicates pixel-level Doppler velocity.

[0088] After obtaining the spatial coordinates of the candidate landmarks, the radial velocity relative to the radar is used based on static features. For radar self-speed R v is in the target direction vector [cosα] i sinα i ] T The characteristic of the vertical component, namely:

[0089]

[0090] The above equation can be rewritten as GX = B, and then the instantaneous ego velocity X = (G... T G) -1 G T B. Where X represents the instantaneous self-velocity; G represents the direction vector matrix; and B represents the observed velocity vector.

[0091] Then, during the inference phase, some clutter points and dynamic feature point clouds are filtered out by Doppler velocity constraints, thereby obtaining robust landmark features.

[0092] Step 4: Construct a joint loss function that includes geometric consistency constraints, Doppler velocity constraints, and velocity consistency constraints, and train the feature extraction network in a self-supervised manner.

[0093] This application designs three constraints—geometric consistency constraint, velocity consistency constraint, and Doppler velocity constraint—for self-supervised training of the spatial consistency landmark extraction network (feature extraction network), thereby achieving accurate localization and robust mapping. Specifically, this application constructs a joint loss function based on geometric consistency constraint, velocity consistency constraint, and Doppler velocity constraint for self-supervised training of the spatial consistency landmark extraction network.

[0094] Geometric consistency constraints Defined as:

[0095]

[0096] in, This represents the number of training batches; N represents the number of landmarks extracted from a single frame of radar point cloud. This represents the confidence weight of the i-th landmark in the k-th frame, ranging from [0,1]. It is typically predicted by the network; the weights of dynamic points (such as moving objects) or mismatched points tend to be 0, while those of static points tend to be 1. iThis represents the geometric residual, used to measure the matching error of landmark positions between adjacent frames; The rotation matrix from frame k to frame (k+1) is estimated by the network or obtained through IMU pre-integration. This represents the 3D coordinates of the i-th landmark in the radar coordinate system within the k-th frame; This represents the translation vector from frame k to frame (k+1). Indicates the relationship between the (k+1)th frame and The corresponding 3D coordinates of the matching landmark in the radar coordinate system.

[0097] Translation components It can be obtained by integrating the self-velocity calculated by radar: in, Represents the rotation matrix from time t to the k-th frame (used for coordinate system alignment); I v k This represents the self-motion velocity in the IMU coordinate system at frame k; This represents the IMU position offset from frame (k+1) to frame (k) (provided by IMU pre-integration).

[0098] Doppler velocity constraint Defined as:

[0099]

[0100] Where G represents the observation matrix of Doppler velocities of static landmarks, that is, the design matrix composed of Doppler velocity observations of static landmark point clouds, with each row corresponding to the Doppler velocity and spatial location of a landmark point. If the landmark is static, its Doppler velocity should be linearly related to its self-motion velocity, that is, satisfy the column space constraint of G. I represents the identity matrix, with the same dimension and number of rows as G.

[0101] Through the projection matrix G(G) T G) -1 G T The velocity consistency component of static landmarks is extracted, and its difference from the identity matrix reflects the deviation of dynamic points or mismatched points. Dynamic point clouds are suppressed because they do not meet the constraints. The candidate landmark feature point clouds of all static matching results satisfy the velocity consistency constraint. Through this loss function, dynamic feature point clouds and mismatched point pairs are ignored by the network during the feature extraction stage, thereby improving the quality of landmarks.

[0102] Speed ​​Consistency Constraint Defined as:

[0103]

[0104] in, I vk This represents the self-motion velocity in the inertial measurement unit coordinate system at frame k. This represents the change in speed from frame k to frame (k+1). This represents the rotation matrix from frame (k+1) to frame (k). I v k+1 This represents the velocity measured by the inertial measurement unit at frame k+1.

[0105] By ensuring that the velocity variation calculated by the radar is consistent with the velocity measured by the IMU after rotational alignment, the accumulated error in motion estimation can be eliminated. Furthermore, considering the consistency between the velocity variation provided by the IMU and the ego velocity calculated by the radar, the motion velocity can be further optimized by aligning the ego velocities of the radar and IMU.

[0106] Step 5: Extract spatially consistent radar landmark point clouds through a trained feature extraction network and fuse the autovelocity estimation results to achieve real-time localization and mapping.

[0107] Specifically, the aforementioned loss function is used to train a spatially consistent landmark extraction network under self-supervised conditions. During the inference phase, spatially consistent landmarks can be extracted, while simultaneously obtaining the vehicle's precise motion state. This achieves accurate localization and robust mapping without ground truth supervision, thereby ensuring precise, real-time, stable, reliable, and all-weather autonomous driving.

[0108] This application designs the first self-supervised learning-based millimeter-wave radar inertial navigation (INS) localization and mapping technology, enabling more stable and reliable INS localization and mapping in more complex and extensive environments by utilizing millimeter-wave radar spectra rich in external spatial information and IMU data containing high-frequency INS information. Specifically, firstly, this application designs a rotation-based heterogeneous information cross-fusion scheme to enhance the spatial structure information of millimeter-wave radar inter-frame spectra and alleviate the positional ambiguity caused by limited angular resolution in spatial landmark feature extraction. Secondly, this application designs a spatially consistent landmark extraction and association scheme, extracting robust and dense landmark feature point clouds from millimeter-wave radar spectra rich in external spatial information, thereby effectively reducing the sparsity of millimeter-wave point clouds and the influence of clutter point clouds. Thirdly, this application proposes a differentiable self-velocity estimation method, which can accurately extract sub-pixel-level radial velocities from Doppler spectra and accurately estimate self-velocities. Fourthly, this application constructs a loss function based on geometry, velocity consistency, and Doppler velocity constraints to train the landmark extraction network in a self-supervised manner, thereby achieving accurate localization and robust mapping. Compared with existing millimeter-wave radar inertial navigation positioning and mapping technologies, the self-supervised learning radar inertial navigation positioning and mapping scheme proposed in this application can be seamlessly adapted to complex and extensive traffic and driving environments without the need for ground truth supervision of positioning, and achieve stable and reliable state estimation and landmark extraction.

[0109] Figure 5 A block diagram schematically illustrates an electronic device suitable for implementing the millimeter-wave radar inertial navigation positioning and mapping method described above, according to an embodiment of this application. Figure 5 The electronic device shown is merely an example and should not be construed as limiting the functionality and scope of the embodiments of this application.

[0110] like Figure 5 As shown, the electronic device 1000 described in this embodiment includes a processor 1001, which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 1002 or a program loaded from a storage portion 1008 into a random access memory (RAM) 1003. The processor 1001 may include, for example, a general-purpose microprocessor (e.g., a CPU), an instruction set processor and / or an associated chipset and / or a special-purpose microprocessor (e.g., an application-specific integrated circuit (ASIC)), etc. The processor 1001 may also include onboard memory for caching purposes. The processor 1001 may include a single processing unit or multiple processing units for performing different actions of the millimeter-wave radar inertial navigation positioning and mapping method flow according to embodiments of this application.

[0111] RAM 1003 stores various programs and data required for the operation of electronic device 1000. Processor 1001, ROM 1002, and RAM 1003 are interconnected via bus 1004. Processor 1001 executes various operations of the millimeter-wave radar inertial navigation positioning and mapping method flow according to embodiments of this application by executing programs in ROM 1002 and / or RAM 1003. It should be noted that the programs may also be stored in one or more memories other than ROM 1002 and RAM 1003. Processor 1001 may also execute various operations of the millimeter-wave radar inertial navigation positioning and mapping method flow according to embodiments of this application by executing programs stored in said one or more memories.

[0112] According to embodiments of this application, the electronic device 1000 may further include an input / output (I / O) interface 1005, which is also connected to a bus 1004. The electronic device 1000 may also include one or more of the following components connected to the I / O interface 1005: an input section 1006 including a keyboard, mouse, etc.; an output section 1007 including a cathode ray tube (CRT), liquid crystal display (LCD), etc., and a speaker, etc.; a storage section 1008 including a hard disk, etc.; and a communication section 1009 including a network interface card such as a LAN card, modem, etc. The communication section 1009 performs communication processing via a network such as the Internet. A drive 1010 is also connected to the I / O interface 1005 as needed. A removable medium 1011, such as a disk, optical disk, magneto-optical disk, semiconductor memory, etc., is installed on the drive 1010 as needed so that computer programs read from it can be installed into the storage section 1008 as needed.

[0113] The millimeter-wave radar inertial navigation positioning and mapping method flow according to embodiments of this application can be implemented as a computer software program. For example, embodiments of this application include a computer program product comprising a computer program carried on a computer-readable storage medium, the computer program containing program code for executing the millimeter-wave radar inertial navigation positioning and mapping method shown in the flowchart. In such embodiments, the computer program can be downloaded and installed from a network via communication section 1009, and / or installed from removable medium 1011. When the computer program is executed by processor 1001, it performs the functions defined in the system of embodiments of this application. According to embodiments of this application, the systems, devices, apparatuses, modules, and / or units described above can be implemented using computer program modules.

[0114] Embodiments of this application also provide a computer-readable storage medium, which may be included in the device / apparatus / system described in the above embodiments, or it may exist independently without being assembled into the device / apparatus / system. The computer-readable storage medium carries one or more programs, which, when executed, can implement the steps of the millimeter-wave radar inertial navigation positioning and mapping method according to embodiments of this application.

[0115] According to embodiments of this application, the computer-readable storage medium may be a non-volatile computer-readable storage medium, such as including but not limited to: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof. In embodiments of this application, the computer-readable storage medium may be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. For example, according to embodiments of this application, the computer-readable storage medium may include one or more memories other than the ROM 1002 and / or RAM 1003 described above.

[0116] It should be noted that the functional modules in the various embodiments of this application can be integrated into one processing module, or each module can exist physically separately, or two or more modules can be integrated into one module. The integrated module can be implemented in hardware or as a software functional module. If the integrated module is implemented as a software functional module and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product.

[0117] The flowcharts and / or block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this application. In this regard, each block in the flowcharts and / or block diagrams may represent a module, segment, or portion of code containing one or more executable instructions for implementing the specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. Furthermore, it should be noted that each block in the block diagram or flowchart, and combinations of blocks in the block diagram or flowchart, can be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.

[0118] Those skilled in the art will understand that the features described in the various embodiments and / or claims of this application can be combined and / or combined in various ways, even if such combinations or combinations are not explicitly described in this application. In particular, without departing from the spirit and teachings of this application, the technical features described in the various embodiments and / or claims of this application can be combined and / or combined in various ways, and all such combinations and / or combinations fall within the scope of this application.

[0119] Although this application has been shown and described with reference to specific exemplary embodiments thereof, those skilled in the art will understand that various changes in form and detail may be made to this application without departing from the spirit and scope of the application as defined by the appended claims and their equivalents. Therefore, the scope of this application should not be limited to the above embodiments, but should be determined not only by the appended claims, but also by their equivalents.

Claims

1. A millimeter-wave radar inertial navigation positioning and mapping method, characterized in that, include: By synchronizing millimeter-wave radar data frames through inter-frame pre-integration of the inertial measurement unit, a dot product attention mechanism is constructed using the rotation angle difference to generate the expectation matrix between adjacent radar frames, thereby achieving cross-fusion of heterogeneous information to enhance the spatial structure information of the radar spectrum. The U-Net backbone network is used to extract multi-scale features from the fused radar spectrogram. The multi-head decoding structure outputs the spatial location, confidence weight and feature descriptor of the landmarks at the sub-pixel level, and the landmark feature matching is completed by combining the cross-frame attention mechanism. The radial velocity of candidate landmarks is estimated based on the subpixel-level Doppler velocity soft query algorithm. The instantaneous self-velocity is then calculated by combining the geometric relationship between the radial velocity of static landmarks and radar self-velocity. Construct a joint loss function that includes geometric consistency constraints, Doppler velocity constraints, and velocity consistency constraints, and train the feature extraction network in a self-supervised manner; A spatially consistent radar landmark point cloud is extracted by a trained feature extraction network, and the autovelocity estimation results are fused to achieve real-time localization and mapping. The inter-frame pre-integration of the inertial measurement unit includes: Based on the linear acceleration measurement value of the inertial measurement unit Angular velocity measurement value Linear acceleration measurement bias and angular velocity measurement bias Calculate the relative position between adjacent radar frames ,speed and attitude quaternions The specific formulas include: ; in, ⊗ represents the rotation matrix in the inertial measurement unit coordinate system; ⊗ represents the quaternion multiplication operator; This represents the change in relative attitude of the inertial measurement unit from the k-th frame to the current time t. The implementation methods of the dot product attention mechanism include: The radar spectrum of the k-th frame azimuth angle representation The difference between the value of the rotation angle ϑ and the value of the rotation angle ϑ is used as a metric to generate the expected vector from frame k to frame (k+1). : ; Where κ represents the annealing coefficient, and the expected matrix is ​​adjusted through a scale alignment operation. Merged with the original spectrum to form an enhanced spectrum .

2. The millimeter-wave radar inertial navigation positioning and mapping method as described in claim 1, characterized in that, The multi-head decoding structure outputs sub-pixel-level spatial location, confidence weights, and feature descriptors of landmarks, including: Local pixel blocks in the radar spectrum Inside, according to the test score chart The specific formula for performing Softmax weighted summation includes: ; in,( , () represents pixel coordinates; , () represents sub-pixel coordinates; Output standardized confidence weights from the confidence header of the multi-head decoding structure. ∈[0,1], to characterize the confidence level of each pixel being selected as a candidate landmark; Encoder backbone block features are collected from the descriptor header of the multi-head decoding structure and used as descriptors to uniquely identify candidate landmark features.

3. The millimeter-wave radar inertial navigation positioning and mapping method as described in claim 1, characterized in that, Landmark feature matching using cross-frame attention mechanisms includes: Radar candidate landmark features between frames are matched using an attention mechanism: ; in, This represents the position matrix of candidate landmarks in the target frame; This represents a position matrix containing all pixel coordinates; This represents the query matrix consisting of N candidate landmark descriptors in the (k-1)th frame; Represents the key matrix composed of pixel descriptors of the entire image in the k-th frame; The annealing coefficient represents the sharpness of the attention weight distribution; H and W represent the range height and azimuth width of the radar spectrum, respectively.

4. The millimeter-wave radar inertial navigation positioning and mapping method as described in claim 1, characterized in that, The radial velocity of candidate landmarks is estimated based on the sub-pixel Doppler velocity soft query algorithm, including: Based on the sub-pixel coordinates of candidate landmarks, radial velocity is extracted from the Doppler spectrum using a weighted soft query. : ; in, This represents the distance matrix between the location coordinates of the candidate landmark and the pixel coordinates of the spectral image; This indicates pixel-level Doppler velocity.

5. The millimeter-wave radar inertial navigation positioning and mapping method as described in claim 4, characterized in that, The instantaneous self-velocity is calculated using a differentiable optimization algorithm, including: The geometric relationship between the static feature and the radar's radial velocity and self-velocity is modeled as a linear equation. And solve it using the weighted least squares method: ; in, G represents the instantaneous self-velocity; B represents the direction vector matrix; and C represents the observed velocity vector.

6. The millimeter-wave radar inertial navigation positioning and mapping method as described in claim 1, characterized in that, Geometric consistency constraints Defined as: ; ; in, Indicates the number of training batches; This indicates the number of landmarks extracted from a single frame of radar point cloud; This represents the confidence weight of the i-th landmark in the k-th frame; Represents geometric residuals; Represents the rotation matrix from frame k to frame (k+1). This represents the 3D coordinates of the i-th landmark in the k-th frame within the radar coordinate system. This represents the translation vector from frame k to frame (k+1). Indicates the relationship between the (k+1)th frame and The corresponding 3D coordinates of the matching landmark in the radar coordinate system; Doppler velocity constraint Defined as: ; in, The observation matrix representing the Doppler velocity of static landmarks; This represents the identity matrix, with the same number of rows as G; Speed ​​Consistency Constraint Defined as: ; in, This represents the self-motion velocity in the inertial measurement unit coordinate system at frame k. This represents the change in velocity from frame k to frame (k+1). This represents the rotation matrix from frame (k+1) to frame (k). This represents the velocity measured by the inertial measurement unit at frame k+1.

7. An electronic device, characterized in that, It includes at least one processing unit and at least one storage unit, wherein the storage unit stores a computer program that, when executed by the processing unit, enables the processing unit to perform the steps of the millimeter-wave radar inertial navigation positioning and mapping method according to any one of claims 1-6.

8. A storage medium, characterized in that, It stores a computer program executable by an access authentication device, which, when run on the access authentication device, enables the access authentication device to perform the steps of the millimeter-wave radar inertial navigation positioning and mapping method according to any one of claims 1-6.