A high-fidelity lidar point cloud generation method and system
By integrating the attention mechanism and the annealing Langevin dynamics algorithm into the U-Net model, and dynamically adjusting the number of iterations, high-fidelity LiDAR point cloud data is generated. This solves the problem of information forgetting in complex scenes by deep learning methods, and achieves efficient and accurate point cloud generation, meeting the needs of autonomous driving simulation.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- JILIN UNIVERSITY
- Filing Date
- 2026-03-05
- Publication Date
- 2026-06-19
Smart Images

Figure CN122244302A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of autonomous vehicle technology, and in particular to a high-fidelity lidar point cloud generation method and system. Background Technology
[0002] LiDAR is one of the core sensors for autonomous vehicles, and the accuracy of its data is crucial in autonomous driving simulation testing. The most basic, core, and commonly used output data form of LiDAR is point cloud data. LiDAR point cloud data generally consists of three-dimensional coordinates (X, Y, Z) and reflection intensity.
[0003] Currently, the main methods for acquiring LiDAR point cloud data are through collecting sensor data from actual vehicles and obtaining point cloud data through simulation. While collecting data from actual vehicles yields the most realistic sensor data, it suffers from high acquisition costs and limited scalability. In contrast, obtaining simulated LiDAR point cloud data through simulation experiments can rapidly generate large amounts of simulated data, along with relevant information such as category labels, reducing data acquisition time and costs and thus overcoming the limitations of actual vehicle acquisition.
[0004] Currently, LiDAR point cloud generation technologies are mainly divided into two categories: those based on traditional geometric modeling and those based on deep learning. The core of the traditional geometric modeling-based LiDAR point cloud generation method is to simulate the physical process of LiDAR emission-reflection-reception through physical geometric principles, mathematical formula derivation, and scene geometric feature definition, thereby generating LiDAR point cloud data that conforms to the real spatial structure. This method achieves interpretability and high-precision generation of point clouds by accurately modeling the laser propagation law, target geometry, and the interaction between the target and the environment. The core of the deep learning-based LiDAR point cloud method is to learn the data distribution law of real point clouds through neural networks, achieving a mapping from noise or conditional signals to highly realistic point clouds. Compared to traditional geometric modeling, this method is better at handling the detailed generation of complex scenes (such as natural vegetation and severe weather) and supports multimodal control (text, semantics, temporal). Due to its high generation accuracy and adaptability to complex scenes, it has become the mainstream technology.
[0005] While deep learning-based LiDAR point cloud methods offer significant advantages in handling complex scenes and multimodal controllability, traditional neural networks face a core challenge when adapting to this technology: the inherent long sequence characteristics of point cloud data, leading to issues like "information forgetting" or "difficulty in capturing long dependencies." Attention mechanisms offer an effective solution to this problem: by calculating "attention weights," different parts of the input information are assigned varying levels of importance. Higher weights indicate greater importance for the input element and its need for focus in subsequent tasks. This allows the model to prioritize high-weight key information in later processing, thereby accurately capturing spatial relationships and detailed features in the point cloud data. This approach not only maintains the advantages of deep learning in adapting to complex scenes but also further improves the accuracy and reliability of point cloud generation. Summary of the Invention
[0006] The purpose of this invention is to provide a high-fidelity lidar point cloud generation method and system that can well meet the stringent requirements for data quality and generation efficiency in demanding scenarios such as autonomous driving simulation.
[0007] To achieve the above objectives, the present invention provides the following solution: A high-fidelity lidar point cloud generation method includes the following steps: S1. Acquire LiDAR point cloud data and convert the 3D point cloud into an isorectangular view; S2. Add random noise to the rectangular view. Use U-Net as the fractional network model. Input the noisy rectangular image and the corresponding noise level into the fractional network model. Calculate the pixel attention weight map through the attention mechanism. At the same time, calculate the correction direction to obtain the correction direction map with the same size as the input. S3. The difference between the initial direction and the corrected direction of the model is calculated by minimizing the multi-scale loss function. The network parameters are updated by backpropagation using the Adam optimizer. The training continues until the loss stabilizes and converges, resulting in a well-trained fractional network model. S4. For the rectangular view with added random noise, input it into the fractional network model and use the annealing Langevin dynamics algorithm for iterative denoising to generate the denoised rectangular view. The number of iterations is dynamically adjusted according to the pixel weights, with more iterations in high-weight areas and fewer iterations in low-weight areas. S5. Convert the denoised rectangular view into 3D LiDAR point cloud data, filter out invalid points, and output high-fidelity point cloud data.
[0008] Preferably, in S1, converting the 3D point cloud into an isorectangular view specifically includes: The formula for converting Cartesian coordinates (x, y, z) to spherical coordinates (θ, Φ, d) is as follows:
[0009]
[0010]
[0011] Where θ is the tilt angle, Φ is the azimuth angle, and d is the depth; the depth d is mapped to d norm Normalize the reflection intensity r to r norm The formula is as follows:
[0012] .
[0013] Preferably, in S2, U-Net is used as the score network model, and the model architecture includes: A circular convolutional layer is used to extract the basic features of the rectangular view, and the azimuth dimension is padded with a loop to adapt to periodicity; The downsampling module is used to compress spatial dimensions and increase channel dimensions through stride convolution to capture global features; The upsampling module is used to gradually restore the spatial size to its original size, while fusing the high-level features saved during the downsampling process with the current upsampling features; The attention weighting module, integrated into U-Net, is used to learn the weights of LiDAR features. It calculates the attention score of each pixel through the scaled dot product of multi-head self-attention in the attention mechanism, and then obtains the weights through Softmax normalization, finally generating a pixel weight map.
[0014] Preferably, in S2, the attention weighting module generates a pixel weight map, specifically including: The attention score is generated by scaling the dot product of the features of the rectangular view; the attention score matrix is normalized using the Softmax function to obtain the attention weight of each pixel; the multi-head attention weights are aggregated to generate a pixel weight map with the same size as the rectangular view.
[0015] The preferred formula for attention score is as follows:
[0016] in, Let m be the query vector matrix of the m-th attention head. Let be the transpose of the key vector matrix of the m-th attention head. For the dimension of a single attention head, This indicates that the output attention score matrix is an N×N real number matrix; The formula for attention weights is as follows:
[0017] in, Let m be the attention score matrix after the m-th attention head is masked. Let be the element in the i-th row and j-th column of the attention score matrix. Let be the element in the i-th row and k-th column of the attention score matrix.
[0018] Preferably, in S4, the annealing Langevin dynamics algorithm is used for iterative denoising processing, specifically including: A predefined sequence of noise levels is used, gradually switching from high to low noise; at each noise level, the step size is calculated. ,in, This is the initial step size coefficient. Given the current noise level; adjust the iteration count based on the pixel weight map, using the following formula:
[0019] Where round(·) is used for rounding to the nearest integer. Based on the number of iterations, For pixel weight maps; perform annealing and Langevin dynamics update formula:
[0020] in, This indicates the correction direction for the output of the fractional network model. Standard Gaussian noise, The step size calculated for each noise level. Let be the pixel value at coordinates (h, w) and channel c in the image at the s-th iteration, under noise level t. Let be the new value of the corresponding pixel at the (s+1)th iteration after one Langevin dynamics update at noise level t.
[0021] Preferably, the lidar point cloud generation method further includes a point cloud densification stage, used to densify the sparse lidar point cloud into a dense point cloud, specifically including: Input a sparse point cloud and convert it into an isometric view, simultaneously generating a visibility mask table, where a mask value of 1 represents a valid point and 0 represents an invalid point. Add random noise to the isometric view. For the isometric view with added random noise, input it into a fractional network model to generate a pixel weight map, and then apply mask constraints to the pixel weight map, as shown in the following formula:
[0022] in, For visibility mask table, This is the corrected pixel weight map; Based on the corrected pixel weight map, the annealing Langevin dynamics algorithm is used for denoising to ensure the consistency of the generated point cloud with the sparse input conditions.
[0023] The present invention also provides a high-fidelity lidar point cloud generation system for performing any of the above-described high-fidelity lidar point cloud generation methods, comprising: The data conversion module is used to acquire LiDAR point cloud data and convert the 3D point cloud into an isorectangular view. The weight map correction module is used to add random noise to the rectangular view. It uses U-Net as the fractional network model. The noise-added rectangular image and the corresponding noise level are input into the fractional network model. The pixel attention weight map is calculated through the attention mechanism. At the same time, the correction direction is calculated to obtain the correction direction map with the same size as the input. The model training module is used to calculate the difference between the corrected direction and the correct direction of the model by minimizing the multi-scale loss function, update the network parameters by backpropagation using the Adam optimizer, and train until the loss is stable and converged to obtain a trained fractional network model. The data processing module is used to input the isorectangular view with added random noise into the fractional network model, and to perform iterative denoising processing using the annealing Langevin dynamics algorithm to generate a denoised isorectangular view. The data output module is used to convert the denoised rectangular view into 3D LiDAR point cloud data, filter invalid points, and output high-fidelity point cloud data.
[0024] The present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements a high-fidelity lidar point cloud generation method as described above.
[0025] According to specific embodiments provided by the present invention, the present invention discloses the following technical effects: (1) This invention achieves multi-dimensional performance breakthroughs by innovatively integrating a pixel-level attention weighting module into a U-Net-based fractional network and applying it throughout the entire process of training, unconditional generation, and point cloud densification. First, it significantly improves generation efficiency. The attention mechanism can dynamically allocate computing resources, increasing iterations for high-weight key regions and reducing iterations for low-weight redundant regions during the annealing and Langevin dynamics denoising process, thereby focusing computing power on high-value targets such as vehicles and pedestrians, effectively solving the pain points of traditional methods such as dispersed computing power and low sampling efficiency. (2) Secondly, the accuracy and fidelity of the generated point cloud are significantly enhanced. This mechanism enables the model to focus on strengthening the correction of detailed structures such as object edges and road markings during generation, while effectively capturing long-range spatial dependencies to ensure the topological coherence of the global scene, overcoming the problems of local breaks and structural disorder. More importantly, it achieves excellent conditional consistency in the point cloud densification task. By introducing visibility mask constraints, the attention weights of invalid regions of sparse input are set to zero, strictly locking the generation range and ensuring a high degree of matching between the densified point cloud and the sparse input in terms of spatial position and structural morphology, solving the core problem of "misalignment between generation results and input conditions". Finally, this invention provides an efficient, accurate and reliable lidar point cloud generation solution that can well meet the stringent requirements of data quality and generation efficiency in high-requirement scenarios such as autonomous driving simulation. Attached Figure Description
[0026] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0027] Figure 1 This is a flowchart illustrating the training phase of the fractional network model of the present invention. Figure 2 This is a flowchart illustrating the unconditional generation stage in the high-fidelity lidar point cloud generation method of the present invention. Figure 3 This is a flowchart illustrating the point cloud densification stage of the high-fidelity lidar point cloud generation method of the present invention. Detailed Implementation
[0028] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0029] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.
[0030] like Figures 1-3 As shown, the present invention provides a high-fidelity lidar point cloud generation method, comprising the following steps: S1. Acquire LiDAR point cloud data and convert the 3D point cloud into an isorectangular view; S2. Add random noise to the rectangular view. Use U-Net as the fractional network model. Input the noisy rectangular image and the corresponding noise level into the fractional network model. Calculate the pixel attention weight map through the attention mechanism. At the same time, calculate the correction direction to obtain the correction direction map with the same size as the input. S3. The difference between the initial direction and the corrected direction of the model is calculated by minimizing the multi-scale loss function. The network parameters are updated by backpropagation using the Adam optimizer. The training continues until the loss stabilizes and converges, resulting in a well-trained fractional network model. S4. For the rectangular view with added random noise, input it into the fractional network model and use the annealing Langevin dynamics algorithm for iterative denoising to generate the denoised rectangular view. The number of iterations is dynamically adjusted according to the pixel weights, with more iterations in high-weight areas and fewer iterations in low-weight areas. S5. Convert the denoised rectangular view into 3D LiDAR point cloud data, filter out invalid points, and output high-fidelity point cloud data.
[0031] Specifically, the method of the present invention includes: Training phase process (e.g.) Figure 1 (as shown) 1. Input: The raw data source for the training phase is 3D point clouds from real-world LiDAR datasets (3DLiDAR data from the KITTI-360 dataset, 32-line scan, 100,000+ point cloud frames). Each point cloud sample contains two core pieces of information: (1) 3D position information: represented by Cartesian coordinates (x,y,z), which correspond to the specific position of the point in three-dimensional space; (2) Reflection intensity information: Represented by a scalar r, it reflects the degree of reflection of LiDAR light after it shines on the surface of an object.
[0032] 2. Initialization: Convert the unstructured 3D point cloud into an isorectangular view. (1) Convert the Cartesian coordinates (x,y,z) to spherical coordinates (θ,Φ,d), where θ is the tilt angle (angle with the vertical direction), Φ is the azimuth angle (angle with the horizontal direction), and d is the depth (the straight-line distance from the point to the sensor).
[0033]
[0034]
[0035]
[0036] (2) Quantize and rasterize the tilt and azimuth angles. Make each point's θ and φ correspond to a 2D graph V. h,wThe only "line (H, elevation dimension)" in the text. "Column (W, azimuth dimension)" position; number of channels , which correspond to depth and reflection intensity, respectively.
[0037] (3) Depth d is mapped as This reduces the numerical difference; the reflection intensity r is normalized to... The values of reflection intensity are reduced to between 0 and 1 to prevent the model from being affected by numerical scaling.
[0038]
[0039]
[0040] (4) Obtain the rectangular view
[0041] 3. Multi-scale Gaussian noise addition: This involves adding noise that follows a normal distribution, with the noise intensity controlled by the standard deviation σ (the larger σ is, the stronger the noise). Multi-scale refers to covering multiple σ levels from "strong noise" to "weak noise" to achieve sample enhancement with different noise intensities.
[0042] (1) Predefined noise level sequence: Construct a σ sequence from strong noise to weak noise: Initial noise level σ0 = 50 (strong noise), final ... =0.01 (weak noise), a total of 232 σ levels, generated using logarithmic intervals to avoid excessively dense small σ intervals.
[0043]
[0044] (2) Adding noise to a single training sample: Let the original rectangular view be... Sample a noise level from the σ sequence. Generate Gaussian noise with the same dimensions as the original rectangular view. ; satisfy:
[0045] Add noise to the view:
[0046] (3) Batch processing of training set: Repeat step 2 for each point cloud rectangular view to obtain a set of noisy views and corresponding noise levels.
[0047] 4. Fractional network forward propagation: U-Net is used as the fractional network. The noisy rectangular image and the corresponding noise level are input into the network to obtain a fractional image with the same size as the input.
[0048] (1) Initial feature mapping: The original features of the rectangular view such as the lidar are The fused angle coordinates (θ, Φ) are used as additional channels to construct a multi-channel input for fused angles. ,in :
[0049] Note: This represents all channel values of pixel (h, w).
[0050] (2) Extracting basic features by circular convolution: The core of circular convolution is the cyclic filling of the azimuth dimension (W) (to adapt to the periodicity of φ), while the elevation dimension (H) is a normal convolution.
[0051] Circular convolution kernel is ,in: k h : Kernel size in the elevation direction; k w : Convolution kernel size in the azimuth direction; C mid : The number of intermediate feature channels in the convolution output.
[0052] Convolution operation formula:
[0053] in, For convolution output features, ,
[0054] (3) Activation function processing maps the input to high-dimensional initial features: Choosing ReLU as the activation function enhances the nonlinear expression. The formula is:
[0055] in, For the activated features, the high-dimensional feature output is... ; (4) The normalized noise level is converted into “scale + shift” parameters and applied to the high-dimensional features of the current layer. The scale decreases as the noise increases (weakening the noise features), and the shift increases as the noise increases (compensating for the feature amplitude).
[0056] Normalization:
[0057] Scaling:
[0058] Offset:
[0059] Where k and k' are hyperparameters (k=0.5, k'=0.1).
[0060] (5) Attention Weighting (Core of this Invention): The weights of LiDAR features are learned. The attention score of each pixel is calculated by scaling the dot product of multi-head self-attention in the attention mechanism, and then the importance weights of 0 to 1 are obtained by Softmax normalization. (High noise relaxes constraints, low noise focuses details). A pixel weight map with the same size as the rectangular view is generated, and each pixel corresponds to a weight value to quantify the feature importance at that position.
[0061] Feature preprocessing: Flattened into a self-attention-processable sequence format:
[0062] in Total number of pixels ; Generate Q / K / V:
[0063]
[0064]
[0065] in, For learnable projection matrix; M represents the number of attention heads. ; Calculate the attention score by scaling the dot product (single head):
[0066] in, Let m be the query vector matrix of the m-th attention head. Let be the transpose of the key vector matrix of the m-th attention head. The dimension of a single attention head, i.e., the dimension of the query / key vector in each head. The attention score matrix representing the output is an N×N real matrix, where N is the sequence length, and each element in the matrix represents the attention weight between two positions in the sequence. Softmax normalization (attention weight matrix):
[0067] in, Let m be the attention score matrix after the m-th attention head is masked. Let be the element in the i-th row and j-th column of the attention score matrix. Let be the element in the i-th row and k-th column of the attention score matrix.
[0068] Pixel weight aggregation (generating the final weight map) Multi-head self-attention score aggregation: the final importance weight of the i-th pixel (taken from the attention score, i.e., the diagonal of the weight matrix):
[0069] Restore the rectangular view size:
[0070]
[0071] (6) Correction direction calculation: Obtain the corrected direction pattern with the same size as the input.
[0072] a. Downsampling module (spatial compression + high-level feature extraction): Gradually reduce the spatial size, increase the channel dimension, and capture large-scale global features (roads, vehicles, etc.). Let the input feature at level k be... The output is ; Convolution with stride s=2:
[0073] Normalization:
[0074] activation:
[0075] Deepest feature output (downsampling endpoint):
[0076] b. Deep feature integration: for the deepest layer Convolutional blocks are used to integrate and enhance global features, avoiding the loss of key scene information during downsampling. The convolutional blocks adopt a "Conv+BN+ReLU" concatenated structure. First convolutional layer: Channels remain unchanged, integrating global features:
[0077] The second convolutional layer halves the number of channels, simplifies features, reduces the computational cost of subsequent upsampling, and extracts core global information.
[0078] c. Upsampling module (spatial restoration + high-low layer feature fusion): Gradually restores the spatial size to the original size, and at the same time fuses the high-level features saved during the downsampling process with the current upsampling features, so as to retain global information and supplement detailed information.
[0079] Let the upsampled input at level t be... The output is The corresponding low-level features of the fusion downsampling ; Transposed convolution with stride s=2:
[0080] Feature splicing:
[0081] Convolutional Integration:
[0082] Upsampling output endpoint:
[0083] d. Correct the radiation pattern output The upsampled fusion features Mapped to a single-channel corrected orientation pattern, the output value is the angle correction amount (adapted to the orientation calibration requirements of lidar point clouds). (7) Output the final correction direction: The correction direction pattern is the final output of the fractional function.
[0084]
[0085] 5. Calculate the denoising score matching loss and optimize the model: Calculate the difference between the corrected direction and the correct direction of the model by minimizing the multi-scale loss function, and use the Adam optimizer (learning rate 1e-4) to backpropagate and update the network parameters until the loss is stable and converged.
[0086] (1) For a single noise level The input features are corrected by the network output direction. Corresponding to the real direction Using L2 loss as the base loss, the formula is:
[0087] Adding L2 regularization to prevent overfitting, the total loss is:
[0088] (2) Adam optimizer backpropagation to update parameters: With the goal of minimizing the loss function, the Adam optimizer is used to backpropagate and iteratively update all network parameters until the loss is stable and converged;
[0089] Unconditional generation phase process (e.g.) Figure 2 (as shown) 1. Initialize random noise: Generate Gaussian random noise with the same size as the rectangular view to obtain an initial noise image. ;
[0090] 2. Input: Initial noise ; Corrected orientation patterns generated by pre-trained fractional networks ; Noise level sequence { (From largest to smallest); Pixel weight map .
[0091] 3. Allocate pixel iteration counts according to weights: In the pixel weight map, increase the number of iterations for pixels with larger weights and decrease the number of iterations for pixels with smaller weights.
[0092] Let the number of basic iterations be for each noise level. The formula for the number of iterations in weight mapping is:
[0093] The base iteration count (or baseline iteration count) represents the default number of iterations at each position when the pixel weight is 1. Rounding to the nearest integer; Weighting gain coefficient.
[0094] 4. Annealing Langevin Core Iteration: Switch noise levels in descending order. For each level, first update the step size, then perform several Langevin updates based on the number of pixel iterations calculated in the previous step. The output of the previous level is used as the input of the next level, forming an annealing chain.
[0095] Step size calculation:
[0096] Core denoising formula:
[0097] in, This indicates the correction direction for the output of the fractional network model. Standard Gaussian noise, The step size calculated for each noise level. Let be the pixel value (i.e., the current state) in the image with coordinates (h, w) and channel c at the s-th iteration under noise level t. Let be the new value (i.e. the next state) of the corresponding pixel in the (s+1)th iteration after one Langevin dynamics update at noise level t. Initial value:
[0098] Standard Gaussian random noise (iterative perturbation, ensuring exploratory nature).
[0099] 5. Output a fully denoised view After the final iteration is completed, the output is the final denoising result.
[0100] 6. Inverse transformation from rectangular view to 3D point cloud: First, convert the spherical coordinates (θ, Φ, d) to Cartesian coordinates (x, y, z); reflection intensity Denormalization restores the physical imaging logic of the lidar.
[0101]
[0102]
[0103]
[0104]
[0105] Filtering invalid points: Remove invalid points such as those with depth of 0 or intensity of 0 to obtain a lidar point cloud that conforms to the true distribution.
[0106] Point cloud densification process (e.g.) Figure 3 (as shown) 1. Input: Input sparse LiDAR point cloud, initialized as an isometric view according to step 2 of the training phase. Generate an isometric view. Visibility mask tables of the same size (A value of 1 in the mask table indicates a valid point).
[0107] The formula for generating equal rectangular views is the same as that for training phase 2.
[0108] Original rectangular view of the lidar ( (corresponding distance + reflection intensity) : All raw channel values of pixel (h,w).
[0109] 2. Initialize random noise: Generate Gaussian random noise with the same size as the rectangular view to obtain an initial noise image. ; The core formula is the same as that of unconditional generation stage 1.
[0110] 3. Pixel weight map correction: The pixel weights of invalid points with a value of 0 in the visibility mask table are set to 0, and the attention is completely disabled to avoid unconstrained over-generation; only the attention weights of sparse effective regions are retained, and key sub-regions are further focused on for fine correction within these regions to ensure that the generated result is consistent with the input sparse point cloud.
[0111]
[0112] Corrected final pixel weight map .
[0113] 4. Annealing Langevin Dynamics Denoising: Input Initial Noise Corrected orientation patterns and noise level sequences generated by pre-trained fractional networks { The corrected pixel weight map (from largest to smallest) switches noise levels in a gradually decreasing order, performing several iterations within each level. Pixels with higher weights in the pixel weight map have more iterations, while pixels with lower weights have fewer iterations. During iteration, the orientation is adjusted using a pre-trained fractional network. The process stops when the noise level drops to 0.01 and the last iteration is completed. This yields a dense, rectangular view. The core formula is the same as that in unconditional generation stages 3-4.
[0114] 5. Rectangular view to 3D point cloud inverse conversion: First, convert the spherical coordinates (θ,Φ,d) to Cartesian coordinates (x,y,z); then denormalize the depth d and reflection intensity r to restore the physical imaging logic of the lidar. The core formula is the same as that of unconditional generation stage 6.
[0115] Filtering invalid points: Remove invalid points such as those with depth of 0 or intensity of 0 to obtain a lidar point cloud that conforms to the true distribution.
[0116] The present invention also provides a high-fidelity lidar point cloud generation system for performing any of the above-described high-fidelity lidar point cloud generation methods, comprising: The data conversion module is used to acquire LiDAR point cloud data and convert the 3D point cloud into an isorectangular view. The weight map correction module is used to add random noise to the rectangular view. It uses U-Net as the fractional network model. The noise-added rectangular image and the corresponding noise level are input into the fractional network model. The pixel attention weight map is calculated through the attention mechanism. At the same time, the correction direction is calculated to obtain the correction direction map with the same size as the input. The model training module is used to calculate the difference between the corrected direction and the correct direction of the model by minimizing the multi-scale loss function, update the network parameters by backpropagation using the Adam optimizer, and train until the loss is stable and converged to obtain a trained fractional network model. The data processing module is used to input the isorectangular view with added random noise into the fractional network model, and to perform iterative denoising processing using the annealing Langevin dynamics algorithm to generate a denoised isorectangular view. The data output module is used to convert the denoised rectangular view into 3D LiDAR point cloud data, filter invalid points, and output high-fidelity point cloud data.
[0117] The present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements a high-fidelity lidar point cloud generation method as described above.
[0118] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.
[0119] This document uses specific examples to illustrate the principles and implementation methods of the present invention. The descriptions of the above embodiments are only for the purpose of helping to understand the method and core ideas of the present invention. Furthermore, those skilled in the art will recognize that, based on the ideas of the present invention, there will be changes in the specific implementation methods and application scope. Therefore, the content of this specification should not be construed as a limitation of the present invention.
Claims
1. A high-fidelity lidar point cloud generation method, characterized in that, Includes the following steps: S1. Acquire LiDAR point cloud data and convert the 3D point cloud into an isorectangular view; S2. Add random noise to the rectangular view. Use U-Net as the fractional network model. Input the noisy rectangular image and the corresponding noise level into the fractional network model. Calculate the pixel attention weight map through the attention mechanism. At the same time, calculate the correction direction to obtain the correction direction map with the same size as the input. S3. The difference between the initial direction and the corrected direction of the model is calculated by minimizing the multi-scale loss function. The network parameters are updated by backpropagation using the Adam optimizer. The training continues until the loss stabilizes and converges, resulting in a well-trained fractional network model. S4. For the rectangular view with added random noise, input it into the fractional network model and use the annealing Langevin dynamics algorithm for iterative denoising to generate the denoised rectangular view. The number of iterations is dynamically adjusted according to the pixel weights, with more iterations in high-weight areas and fewer iterations in low-weight areas. S5. Convert the denoised rectangular view into 3D LiDAR point cloud data, filter out invalid points, and output high-fidelity point cloud data.
2. The high-fidelity lidar point cloud generation method according to claim 1, characterized in that, In step S1, converting the 3D point cloud into an isorectangular view specifically includes: The formula for converting Cartesian coordinates (x, y, z) to spherical coordinates (θ, Φ, d) is as follows: Where θ is the tilt angle, Φ is the azimuth angle, and d is the depth; the depth d is mapped to d norm Normalize the reflection intensity r to r norm The formula is as follows: 。 3. The high-fidelity lidar point cloud generation method according to claim 1, characterized in that, In S2, U-Net is used as the score network model, and the model architecture includes: A circular convolutional layer is used to extract the basic features of the rectangular view, and the azimuth dimension is padded with a loop to adapt to periodicity; The downsampling module is used to compress spatial dimensions and increase channel dimensions through stride convolution to capture global features; The upsampling module is used to gradually restore the spatial size to its original size, while fusing the high-level features saved during the downsampling process with the current upsampling features; The attention weighting module, integrated into U-Net, is used to learn the weights of LiDAR features. It calculates the attention score of each pixel through the scaled dot product of multi-head self-attention in the attention mechanism, and then obtains the weights through Softmax normalization, finally generating a pixel weight map.
4. The high-fidelity lidar point cloud generation method according to claim 3, characterized in that, In step S2, the attention weighting module generates a pixel weight map, specifically including: The attention score is generated by scaling the dot product of the features of the rectangular view; the attention score matrix is normalized using the Softmax function to obtain the attention weight of each pixel; the multi-head attention weights are aggregated to generate a pixel weight map with the same size as the rectangular view.
5. The high-fidelity lidar point cloud generation method according to claim 4, characterized in that, The formula for the attention score is as follows: in, Let m be the query vector matrix of the m-th attention head. Let be the transpose of the key vector matrix of the m-th attention head. For the dimension of a single attention head, This indicates that the output attention score matrix is an N×N real number matrix; The attention weight formula is as follows: in, Let m be the attention score matrix after the m-th attention head is masked. Let be the element in the i-th row and j-th column of the attention score matrix. Let be the element in the i-th row and k-th column of the attention score matrix.
6. The high-fidelity lidar point cloud generation method according to claim 1, characterized in that, In step S4, the annealing Langevin dynamics algorithm is used for iterative denoising processing, specifically including: A predefined sequence of noise levels is used, gradually switching from high to low noise; at each noise level, the step size is calculated. ,in, This is the initial step size coefficient. Given the current noise level; adjust the iteration count based on the pixel weight map, using the following formula: Where round(·) is used for rounding to the nearest integer. Based on the number of iterations, For pixel weight maps; perform annealing and Langevin dynamics update formula: in, This indicates the correction direction for the output of the fractional network model. Standard Gaussian noise, The step size calculated for each noise level. Let be the pixel value at coordinates (h, w) and channel c in the image at the s-th iteration, under noise level t. Let be the new value of the corresponding pixel at the (s+1)th iteration after one Langevin dynamics update at noise level t.
7. A high-fidelity lidar point cloud generation method according to claim 6, characterized in that, The lidar point cloud generation method further includes a point cloud densification stage, used to densify the sparse lidar point cloud into a dense point cloud, specifically including: Input a sparse point cloud and convert it into an isometric view, simultaneously generating a visibility mask table, where a mask value of 1 represents a valid point and 0 represents an invalid point. Add random noise to the isometric view. For the isometric view with added random noise, input it into a fractional network model to generate a pixel weight map, and then apply mask constraints to the pixel weight map, as shown in the following formula: in, For visibility mask table, This is the corrected pixel weight map; Based on the corrected pixel weight map, the annealing Langevin dynamics algorithm is used for denoising to ensure the consistency of the generated point cloud with the sparse input conditions.
8. A high-fidelity lidar point cloud generation system, used to execute the high-fidelity lidar point cloud generation method according to any one of claims 1-7, characterized in that, include: The data conversion module is used to acquire LiDAR point cloud data and convert the 3D point cloud into an isorectangular view. The weight map correction module is used to add random noise to the rectangular view. It uses U-Net as the fractional network model. The noise-added rectangular image and the corresponding noise level are input into the fractional network model. The pixel attention weight map is calculated through the attention mechanism. At the same time, the correction direction is calculated to obtain the correction direction map with the same size as the input. The model training module is used to calculate the difference between the corrected direction and the correct direction of the model by minimizing the multi-scale loss function, update the network parameters by backpropagation using the Adam optimizer, and train until the loss is stable and converged to obtain a trained fractional network model. The data processing module is used to input the isorectangular view with added random noise into the fractional network model, and to perform iterative denoising processing using the annealing Langevin dynamics algorithm to generate a denoised isorectangular view. The data output module is used to convert the denoised rectangular view into 3D LiDAR point cloud data, filter invalid points, and output high-fidelity point cloud data.
9. A non-transitory computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements a high-fidelity lidar point cloud generation method as described in claims 1-7.