Partition-aware LiDAR point cloud compression method and device based on non-uniform spatial quantization

By employing a partitioned sensing LiDAR point cloud compression method based on non-uniform spatial quantization, and utilizing logarithmic polar coordinate quantization and partitioned sensing occupancy probability prediction structure, the problem of uneven spatial distribution and scale distortion of LiDAR point clouds is solved. This achieves efficient point cloud compression and fast encoding and decoding, improving rate-distortion performance and coding efficiency.

CN122199693APending Publication Date: 2026-06-12XIAMEN UNIV OF TECH +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
XIAMEN UNIV OF TECH
Filing Date
2026-05-13
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing neural point cloud compression methods ignore the spatial unevenness and scale distortion characteristics of LiDAR point clouds in the Cartesian coordinate system, which limits the point cloud compression efficiency and makes it difficult to further improve rate-distortion performance while ensuring real-time encoding and decoding speed.

Method used

A partitioned sensing LiDAR point cloud compression method based on non-uniform spatial quantization is adopted. By constructing a partitioned sensing LiDAR point cloud compression model, logarithmic polar coordinate quantization, parallel occupancy code generator and partitioned sensing occupancy probability prediction structure are used. Combined with a soft-gated hybrid expert head, it can dynamically sense and adapt to the complex geometric distribution of different spatial regions, and achieve efficient point cloud compression.

🎯Benefits of technology

It effectively solves the problem of spatial density imbalance in LiDAR point clouds, improves compression ratio distortion performance, reduces geometric redundancy, and enhances encoding and decoding efficiency. It can maintain the global structural details and local surface structure reconstruction quality of point clouds under high compression ratios.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122199693A_ABST
    Figure CN122199693A_ABST
Patent Text Reader

Abstract

The application discloses a kind of partition perception LiDAR point cloud compression method and device based on non-uniform space quantization, it is related to point cloud processing field, including: obtaining the LiDAR point cloud to be compressed and input into trained partition perception LiDAR point cloud compression model, first into the logarithmic polar coordinate quantization module in encoding end and is quantized, obtain the quantized LiDAR point cloud and through parallel occupancy code generator and first partition perception occupancy probability prediction module, obtain the first predicted occupancy probability of each scale, and using arithmetic encoder is compressed, obtains compressed bit stream;Compressed bit stream is input into decoding end and is read to the lowest scale reconstruction coordinate and prediction occupancy code information by arithmetic decoder, then through parallel coordinate generator and second partition perception occupancy probability prediction module obtains the reconstructed LiDAR point cloud, and input into logarithmic polar coordinate dequantization module and is dequantized, obtain the restored LiDAR point cloud, to solve the inherent spatial density imbalance problem of eliminating LiDAR point cloud.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of point cloud processing, and more specifically to a method and apparatus for compressing partitioned sensing LiDAR point clouds based on non-uniform spatial quantization. Background Technology

[0002] As a high-fidelity digital representation of the three-dimensional physical world, LiDAR point clouds are widely used in cutting-edge fields such as autonomous driving, intelligent robot navigation, and computer vision. With the continuous improvement of 3D scanning hardware performance, devices can acquire point cloud models with extremely high levels of detail from complex scenes, leading to an exponential increase in the scale of point cloud data. This massive amount of data brings extremely high storage and transmission costs, posing a severe challenge to the memory capacity of terminal devices and the real-time network communication bandwidth. Therefore, efficient point cloud compression has become a core technology for overcoming this application bottleneck.

[0003] In recent years, with the advancement of international point cloud compression standards (such as G-PCC) and the integration of advanced technologies such as deep learning, point cloud compression technology has made significant progress in rate-distortion performance. However, most existing neural point cloud compression methods directly process data blindly in the traditional Cartesian coordinate system, ignoring the inherent uneven spatial distribution of LiDAR point clouds ("dense near-far" and scale distortion characteristics). How to more deeply mine and utilize the geometric and physical spatial information at the bottom layer of point clouds, break through the spatial limitations of traditional feature extraction, and further break through the efficiency limit of point cloud compression while ensuring real-time encoding and decoding speed remains a key technical challenge that urgently needs to be solved in this field. Summary of the Invention

[0004] The purpose of this application is to propose a partitioned sensing LiDAR point cloud compression method and device based on non-uniform spatial quantization to address the aforementioned technical problems. This method can effectively eliminate the inherent spatial density imbalance problem of LiDAR point clouds, thereby ensuring improved rate-distortion performance of point cloud compression while maintaining fast encoding and decoding.

[0005] In a first aspect, the present invention provides a partitioned sensing LiDAR point cloud compression method based on non-uniform spatial quantization, comprising the following steps:

[0006] A partition-aware LiDAR point cloud compression model is constructed and trained to obtain a trained partition-aware LiDAR point cloud compression model. The partition-aware LiDAR point cloud compression model includes an encoder and a decoder. The encoder includes a log-polar coordinate quantization module, a parallel occupancy code generator, a first partition-aware occupancy probability prediction module, and an arithmetic encoder. The decoder includes an arithmetic decoder, a second partition-aware occupancy probability prediction module, a parallel coordinate generator, and a log-polar coordinate inverse quantization module.

[0007] The LiDAR point cloud to be compressed is acquired and input into a trained partitioned sensing LiDAR point cloud compression model. First, it enters the encoding stage, where a logarithmic polar quantization module quantizes the LiDAR point cloud to obtain a quantized LiDAR point cloud. The quantized LiDAR point cloud is then input into a parallel occupancy code generator for layer-by-layer downsampling. Combined with a first partitioned sensing occupancy probability prediction module, the first predicted occupancy probability at each scale is obtained. The first predicted occupancy probability distribution at each scale, along with the coordinates and occupancy code information at the lowest scale, is input into an arithmetic encoder for compression, resulting in a compressed bitstream. This compressed bitstream is then input into the decoding stage, where an arithmetic decoder reads the reconstructed coordinates and predicted occupancy code information at the lowest scale. Based on these coordinates, a parallel coordinate generator performs layer-by-layer upsampling, and a second partitioned sensing occupancy probability prediction module generates reconstructed coordinates for each scale, constructing a reconstructed LiDAR point cloud. Finally, the reconstructed coordinates of each point in the reconstructed LiDAR point cloud at the highest scale are input into a logarithmic polar inverse quantization module for inverse quantization, resulting in the restored LiDAR point cloud.

[0008] As a preferred embodiment, both the first partition-aware occupancy probability prediction module and the second partition-aware occupancy probability prediction module adopt the partition-aware occupancy probability prediction structure and have the same parameters; the partition-aware occupancy probability prediction structure includes an adaptive partition-aware network, a multi-core multi-scale feature extraction network, and a soft-gated hybrid expert head.

[0009] The calculation process for the partition-aware occupancy probability prediction structure is as follows:

[0010] The input coordinates and input occupancy code information at the N-1 scale are embedded and summed to obtain the initial features at the N-1 scale. The input coordinates and initial features at the N-1 scale are then input into the adaptive partitioning perception network to obtain the modulated point cloud features at the N-1 scale, where N=1,2,…,n, and n is the total number of scales. The nth scale is taken as the highest scale, and the 0th scale is taken as the lowest scale.

[0011] The modulated point cloud features at the N-1 scale are input into a multi-kernel multi-scale feature extraction network to obtain the deep features at the N-1 scale.

[0012] The deep features at the (N-1)th scale are used as the initial features at the Nth scale. The input coordinates at the Nth scale and the initial features are input into the adaptive partitioning perception network to obtain the modulated point cloud features at the Nth scale.

[0013] The modulated point cloud features at the Nth scale are input into a multi-kernel multi-scale feature extraction network to obtain the deep features at the Nth scale.

[0014] The input coordinates of the Nth scale and the deep features are input into the soft-gated hybrid expert head to obtain the predicted occupancy probability for each scale.

[0015] Preferably, the quantized LiDAR point cloud is input into a parallel occupancy code generator for layer-by-layer downsampling, and combined with the first partition-aware occupancy probability prediction module to obtain the first predicted occupancy probability for each scale, specifically including:

[0016] S11, the quantized coordinates corresponding to each point in the quantized LiDAR point cloud are input into the parallel occupancy code generator as the highest-scale quantized coordinates to obtain the quantized coordinates of each scale, and the occupancy code information of each scale is generated using the quantized coordinates of each scale.

[0017] S12, the quantized coordinates and occupancy code information of the N-1 scale and the quantized coordinates of the N scale are respectively used as the input coordinates and input occupancy code information of the N-1 scale and the input coordinates of the N scale, and are input to the first partition perception occupancy probability prediction module to generate the first predicted occupancy probability of the N scale.

[0018] S13, Repeat steps S11-S12 n times to obtain the first predicted occupancy probability for each scale;

[0019] Based on the reconstructed coordinates at the lowest scale and the predicted occupancy code information, a parallel coordinate generator performs layer-by-layer upsampling, and combines this with the second-partition perceptual occupancy probability prediction module to generate reconstructed coordinates for each scale, specifically including:

[0020] S21, input the lowest-scale reconstructed coordinates and predicted occupancy code information into the parallel coordinate generator to obtain the reconstructed coordinates at each scale;

[0021] S22, the reconstructed coordinates and predicted occupancy code information of the N-1 scale and the reconstructed coordinates of the N-1 scale are respectively used as the input coordinates and input occupancy code information of the N-1 scale and the input coordinates of the N-1 scale, and are input to the second partition perception occupancy probability prediction module to generate the second predicted occupancy probability of the N-1 scale.

[0022] S23, input the second predicted occupancy probability of the Nth scale and the sub-bit stream of the Nth scale in the compressed bit stream into the arithmetic decoder to obtain the predicted occupancy code information of the Nth scale;

[0023] S24. Repeat steps S22-S23 n times to obtain the reconstructed coordinates for each scale.

[0024] As a preferred approach, the computation process of the adaptive partitioning sensing network for the Nth scale is as follows:

[0025] ;

[0026] ;

[0027] ;

[0028] in, This represents the logarithmic radial coordinates in the input coordinates at the Nth scale, after being normalized by constant division. This represents the sector orientation index after rounding down the input coordinates at the Nth scale. This represents a multilayer perceptron containing linear layers and the ReLU activation function. This indicates an element-wise addition operation. This represents the initial feature at the Nth scale. Represents the spatial context features at the Nth scale. This represents the dynamic scaling factor at the Nth scale. This represents the Sigmoid activation function. This represents the modulated point cloud features at the Nth scale. This represents the feature embedding operation;

[0029] The computation process of the multi-kernel multi-scale feature extraction network is as follows:

[0030] ;

[0031] ;

[0032] ;

[0033] ;

[0034] in, , and These represent the first, second, and third intermediate features at the Nth scale, respectively. Represents the ReLU activation function; This represents a 3D convolution with 32 input channels, 32 output channels, and a kernel of 3. This represents a 3D convolution with 32 input channels, 8 output channels, and a kernel of 1. This represents a 3D convolution with 32 input channels, 8 output channels, and a 3-kernel configuration. This represents a 3D convolution with 32 input channels, 16 output channels, and a kernel size of 5. This represents a 3D convolution with 32 input channels, 32 output channels, and a kernel of 1. This represents a vector concatenation operation. Deep features at the Nth scale;

[0035] The soft-gated hybrid expert head includes a gating network and M expert networks;

[0036] The input coordinates at the Nth scale are first passed through a gating network, which outputs the gating weight vectors of each expert network at the Nth scale, as shown in the following equation:

[0037] ;

[0038] in, This represents the logarithmic radial coordinates in the input coordinates at the Nth scale, after being normalized by constant division. This represents the sector azimuth coordinates in the input coordinates at the Nth scale, after being normalized by constant division. This represents the height coordinates in the input coordinates at the Nth scale, after being normalized by constant division. This indicates a gated multilayer sensor. Represents the Softmax function; This represents the gate weight vector at the Nth scale. Let i = 1, 2, ..., M, where M represents the total number of expert networks. Represents the gating weights of the i-th expert network at the N-th scale;

[0039] After inputting the deep features at the Nth scale into each expert network, they are weighted and fused with the gating weights of each expert network at the Nth scale to obtain the predicted occupancy probability at the Nth scale, as shown in the following formula:

[0040] ;

[0041] in, This represents the nonlinear mapping operation of the i-th expert network. This indicates an element-wise multiplication operation. This represents the predicted occupancy probability at the Nth scale.

[0042] Preferably, the parallel occupancy code generator includes n downsampled occupancy code generation modules. For the (n-N+1)th downsampled occupancy code generation module, the quantization coordinates of the Nth scale are input into the (n-N+1)th downsampled occupancy code generation module, and the quantization coordinates of the (N-1)th scale are first calculated using the following formula:

[0043] ;

[0044] in, Represents the quantization coordinates of the Nth scale. Represents the quantized coordinates of the (N-1)th scale; This indicates the floor function;

[0045] The occupancy code information for each scale is generated using the following formula:

[0046] ;

[0047] in, Let represent the logarithmic radial coordinate, sector azimuth coordinate, and height coordinate in the quantized coordinates of the Nth scale, respectively. This represents the modulo operation; This represents the occupancy code information for the (N-1)th scale, with a value range of 1 to 255; This represents the set of child node coordinates corresponding to the quantized coordinates of the (N-1)th scale; This represents the feature summation operation within a local space. This represents the quantized coordinate of the Nth scale belonging to the set of child node coordinates corresponding to the (N-1)th scale.

[0048] The parallel coordinate generator includes n upsampled coordinate generation modules. For the Nth upsampled coordinate generation module, the reconstructed coordinates and predicted occupancy code information of the N-1th scale are input into the Nth upsampled coordinate generation module, and the quantized coordinates of the Nth scale are output, as shown in the following formula:

[0049] ;

[0050] ;

[0051] ;

[0052] in, This indicates an element-wise addition operation. This represents the coordinates of the k-th candidate child node in the set of 8 candidate child node coordinates for the N-th scale. Represents the reconstructed coordinates at the (N-1)th scale; This represents the k-th relative offset vector among the eight possible combinations of relative offset vectors in the local space. This represents the k-th bit of the pruned mask, where the mask value is either 0 or 1, and k = 1, 2, ..., 8. Indicates if If the value is 1, then The result is 1, if If the value of is not 1, then The result is 0; This represents the predicted occupancy code information for the (N-1)th scale; This indicates the reconstructed coordinates of the Nth scale with a mask of 1 after removing invalid coordinates with a mask of 0.

[0053] As a preferred embodiment, the calculation process of the logarithmic polar coordinate quantization module is as follows:

[0054] Obtain the Cartesian coordinates of each point in the LiDAR point cloud. And calculate the physical radial distance corresponding to each point. With azimuth , These are the x, y, and z coordinates of each point in Cartesian coordinates;

[0055] Input the physical radial distance, azimuth angle, and z-axis coordinates corresponding to each point into the quantization function to obtain the quantized logarithmic radial coordinates, sector azimuth coordinates, and height coordinates corresponding to each point, as shown in the following formula:

[0056] ;

[0057] ;

[0058] ;

[0059] in, Represents the quantized logarithmic radial coordinates. This represents the quantized sector orientation coordinates. Represents the quantized height coordinates; This indicates the set logarithmic scaling factor; Indicates the set minimum effective distance; Indicates the set angular resolution; This indicates the set z-axis height resolution; This indicates the set Z-axis translation amount to prevent negative coordinates; This represents the rounding function. This indicates taking the maximum value;

[0060] The quantized logarithmic radial coordinates, sector azimuth coordinates, and height coordinates corresponding to each point are concatenated, and points with the same quantized logarithmic radial coordinates, sector azimuth coordinates, and height coordinates within the same spatial voxel are removed to obtain the quantized coordinates corresponding to each point, and the quantized LiDAR point cloud is constructed.

[0061] The calculation process of the logarithmic polar coordinate inverse quantization module is as follows:

[0062] The reconstructed coordinates of each point in the reconstructed LiDAR point cloud at the highest scale are input into the inverse quantization function to obtain the reconstructed physical radial distance, azimuth angle, and coordinates of each point at the highest scale, as shown in the following formula:

[0063] ;

[0064] ;

[0065] ;

[0066] ;

[0067] ;

[0068] in, , These represent the logarithmic radial coordinates, sector azimuth coordinates, and elevation coordinates of each point in the reconstructed LiDAR point cloud at the highest scale, respectively. This represents the physical radial distance of each point after reconstruction at the highest scale. This represents the azimuth angle of each point after reconstruction at the highest scale. , , This represents the x, y, and z coordinates of each point in the Cartesian coordinate system after reconstruction at the highest scale. The reconstructed coordinates of all points at the highest scale constitute the restored LiDAR point cloud.

[0069] In a second aspect, the present invention provides a partitioned sensing LiDAR point cloud compression device based on non-uniform spatial quantization, comprising:

[0070] The model building module is configured to build and train a partition-aware LiDAR point cloud compression model to obtain a trained partition-aware LiDAR point cloud compression model. The partition-aware LiDAR point cloud compression model includes an encoder and a decoder. The encoder includes a log-polar coordinate quantization module, a parallel occupancy code generator, a first partition-aware occupancy probability prediction module, and an arithmetic encoder. The decoder includes an arithmetic decoder, a second partition-aware occupancy probability prediction module, a parallel coordinate generator, and a log-polar coordinate inverse quantization module.

[0071] The compression module is configured to acquire the LiDAR point cloud to be compressed and input it into a trained partition-aware LiDAR point cloud compression model. First, in the encoding stage, the LiDAR point cloud to be compressed is quantized by a logarithmic polar coordinate quantization module to obtain the quantized LiDAR point cloud. The quantized LiDAR point cloud is then input into a parallel occupancy code generator for layer-by-layer downsampling. Combined with the first partition-aware occupancy probability prediction module, the first predicted occupancy probability at each scale is obtained. The first predicted occupancy probability distribution at each scale, along with the coordinates and occupancy code information of the lowest scale, is then input into the computation... The algorithm compresses the data in the encoder to obtain a compressed bitstream. The compressed bitstream is then input to the decoder, where the arithmetic decoder reads the lowest-scale reconstructed coordinates and predicted occupancy code information. Based on the lowest-scale reconstructed coordinates and predicted occupancy code information, the algorithm performs layer-by-layer upsampling through a parallel coordinate generator. Combined with the second partition perception occupancy probability prediction module, the algorithm generates reconstructed coordinates for each scale, thus constructing the reconstructed LiDAR point cloud. The reconstructed coordinates of each point in the reconstructed LiDAR point cloud at the highest scale are then input to the logarithmic polar coordinate inverse quantization module for inverse quantization processing, resulting in the restored LiDAR point cloud.

[0072] Thirdly, the present invention provides an electronic device including one or more processors; and a storage device for storing one or more programs, wherein when the one or more programs are executed by the one or more processors, the one or more processors implement the method as described in any implementation of the first aspect.

[0073] Fourthly, the present invention provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the method as described in any of the implementations of the first aspect.

[0074] Fifthly, the present invention provides a computer program product, including a computer program that, when executed by a processor, implements the method as described in any of the implementations in the first aspect.

[0075] Compared with the prior art, the present invention has the following beneficial effects:

[0076] (1) The partitioned LiDAR point cloud compression model proposed in this invention, based on non-uniform spatial quantization, adopts a logarithmic polar coordinate quantization strategy. This strategy can effectively solve the inherent spatial density imbalance and scale distortion problem of LiDAR point clouds. By converting the coordinates of the Cartesian coordinate system to the logarithmic polar coordinate system, the model can make the voxel occupancy of objects at different distances tend to be consistent, giving the three-dimensional sparse convolution a more stable feature representation environment, thereby reducing the geometric redundancy of data while preserving global structural details.

[0077] (2) The partition-aware LiDAR point cloud compression model proposed in this invention, based on non-uniform spatial quantization, designs a partition-aware occupancy probability prediction structure. It dynamically modulates the initial features by fusing input coordinates and input occupancy code information, and introduces a soft-gated hybrid expert head (MoE) to complete probability inference. This physical space prior-driven processing method breaks the spatial translation invariance limitation of traditional sparse convolution, enabling the model to dynamically perceive and adapt to the complex geometric distribution differences of different spatial regions (such as the ground near and buildings in the distance), thereby greatly improving the accuracy of the predicted occupancy probability estimation and reducing the compression code rate.

[0078] (3) The partitioned LiDAR point cloud compression method based on non-uniform spatial quantization proposed in this invention employs a parallel occupancy code generator and a parallel coordinate generator in its compression model. This effectively overcomes the extremely high computational latency caused by the traditional octree-based model while maintaining a high compression ratio. Compared to the traditional time-consuming multi-level tree structure traversal and construction operations, this parallel processing mechanism significantly reduces computational complexity and memory interaction overhead, significantly improving the model's running efficiency and enabling this method to quickly complete encoding and decoding. Attached Figure Description

[0079] To more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0080] Figure 1 This is a flowchart illustrating a partitioned sensing LiDAR point cloud compression method based on non-uniform spatial quantization, as an embodiment of this application.

[0081] Figure 2 This is a schematic diagram of a partitioned sensing LiDAR point cloud compression model, which is an embodiment of the partitioned sensing LiDAR point cloud compression method based on non-uniform spatial quantization, according to an embodiment of this application.

[0082] Figure 3 The image shows the D1 PSNR result of the partitioned sensing LiDAR point cloud compression method based on non-uniform spatial quantization, which is an embodiment of this application.

[0083] Figure 4 The image shows the D2 PSNR result of the partitioned sensing LiDAR point cloud compression method based on non-uniform spatial quantization, which is an embodiment of this application.

[0084] Figure 5This is a schematic diagram of the partition-aware occupancy probability prediction module of the partition-aware LiDAR point cloud compression method based on non-uniform spatial quantization, which is an embodiment of this application.

[0085] Figure 6 This is a schematic diagram of the structure of an adaptive partitioned sensing network for a partitioned sensing LiDAR point cloud compression method based on non-uniform spatial quantization, as an embodiment of this application.

[0086] Figure 7 This is a schematic diagram of the structure of a multi-kernel, multi-scale feature extraction network for a partitioned sensing LiDAR point cloud compression method based on non-uniform spatial quantization, as an embodiment of this application.

[0087] Figure 8 This is a schematic diagram of the soft-gated hybrid expert head of the partition-aware LiDAR point cloud compression method based on non-uniform spatial quantization, which is an embodiment of this application.

[0088] Figure 9 This is a schematic diagram of a partitioned sensing LiDAR point cloud compression device based on non-uniform spatial quantization, which is an embodiment of this application.

[0089] Figure 10 This is a schematic diagram of the hardware structure of an electronic device provided in an embodiment of the present invention. Detailed Implementation

[0090] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of this invention, and not all of them. Based on the embodiments of this invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this invention.

[0091] Figure 1 This application illustrates an embodiment of a partition-aware LiDAR point cloud compression method based on non-uniform spatial quantization, comprising the following steps:

[0092] S1. Construct and train a partition-aware LiDAR point cloud compression model to obtain a trained partition-aware LiDAR point cloud compression model. The partition-aware LiDAR point cloud compression model includes an encoder and a decoder. The encoder includes a log-polar coordinate quantization module, a parallel occupancy code generator, a first partition-aware occupancy probability prediction module, and an arithmetic encoder. The decoder includes an arithmetic decoder, a second partition-aware occupancy probability prediction module, a parallel coordinate generator, and a log-polar coordinate inverse quantization module.

[0093] For details, please refer to Figure 2The partition-aware LiDAR point cloud compression model proposed in this application has two parts: an encoding end and a decoding end, which are used to encode the LiDAR point cloud into a compressed bitstream and decode the compressed bitstream into a restored LiDAR point cloud, respectively. The encoding end includes a logarithmic polar coordinate quantization module, a parallel occupancy code generator, a first partition-aware occupancy probability prediction module, and an arithmetic encoder. The decoding end includes an arithmetic decoder, a second partition-aware occupancy probability prediction module, a parallel coordinate generator, and a logarithmic polar coordinate inverse quantization module. The logarithmic polar coordinate quantization module and the logarithmic polar coordinate inverse quantization module perform quantization processing and inverse quantization processing, respectively. Combining quantization processing and inverse quantization processing into the point cloud compression process can eliminate scale distortion. First, the input LiDAR point cloud is quantized using a logarithmic polar coordinate quantization module to obtain a quantized point cloud. Then, a parallel occupancy code generator is used to downsample the quantized point cloud layer by layer, obtaining the quantized coordinates and corresponding occupancy code information for each scale. A first-partition sensing occupancy probability prediction module is designed, combining physical space priors for feature extraction and soft-gated prediction to obtain the predicted occupancy probability distribution for all scales. An arithmetic encoder is further introduced to encode the quantized coordinates and occupancy code information of each coordinate, along with its predicted occupancy probability, into a compressed bitstream. A corresponding decoding prediction mechanism is designed, using an arithmetic decoder combined with a parallel coordinate generator for layer-by-layer upsampling, and simultaneously using the second-partition sensing occupancy probability prediction module to infer the predicted occupancy probability for each scale. The compressed bitstream is then decompressed, and the reconstructed coordinates for each scale are iteratively parsed. Finally, a logarithmic polar coordinate inverse quantization module is used to perform inverse mapping processing on the reconstructed coordinates of the highest scale, obtaining a restored LiDAR point cloud consistent with the initial resolution. This approach achieves efficient and high-fidelity compression of massive point cloud data.

[0094] In one embodiment, the experimental environment used for training the partition-aware LiDAR point cloud compression model proposed in this application includes a workstation equipped with an Intel(R) Xeon(R) Gold 6226R processor (2.90GHz), an NVIDIA RTX 3090 graphics card (24GB VRAM) and 128GB DDR4 memory, and the operating system is Ubuntu 20.04 LTS; the experiment uses the PyTorch deep learning framework and enables CUDA 11.8 acceleration.

[0095] The partitioned perception LiDAR point cloud compression model proposed in this application uses the KITTI dataset during training. This dataset is a large-scale LiDAR point cloud dataset for autonomous driving, acquired through scanning using a Velodyne HDL-64E sensor. It contains 14,999 point clouds, with the first 7,481 point clouds selected as the training set and the remaining 7,518 point clouds used as the test set. All the above data is legally and compliantly obtained.

[0096] The Adam optimizer was used during the training of the partition-aware LiDAR point cloud compression model, with an initial learning rate set to 1e. -4 The StepLR strategy was used to decay the learning rate to 0.1 every 10 epochs; the batch size was set to 32, the number of training epochs was 100, and all training was repeated under the same random seed to ensure the stability and reproducibility of the results; the cross-entropy loss function was used for training.

[0097] Quantitative experimental results on the KITTI dataset are as follows: Figure 3 and Figure 4 As shown, Figure 3 As shown, D1 PSNR is used to characterize the point-to-point distortion performance in point cloud geometric reconstruction. With the increase of bit rate (Bits Per Point), the D1 PSNR of both G-PCCv23 and the trained partition-aware LiDAR point cloud compression model of this application shows a continuous upward trend, indicating that both methods can achieve better geometric reconstruction quality at high bit rates. Meanwhile, the rate-distortion curve corresponding to the trained partition-aware LiDAR point cloud compression model of this application is generally above the G-PCCv23 curve, indicating that under the same bit rate conditions, the trained partition-aware LiDAR point cloud compression model of this application can achieve a higher D1 PSNR; or, to achieve the same D1 reconstruction quality, the trained partition-aware LiDAR point cloud compression model of this application requires a lower bit rate.

[0098] Furthermore, the advantages of the trained partition-aware LiDAR point cloud compression model in this application are even more pronounced in the medium-to-high bitrate range. For example, as shown in the figure, when the D1 PSNR reaches approximately 87 dB, the bitrate required by the trained partition-aware LiDAR point cloud compression model in this application is approximately 9-10 bpp, while G-PCCv23 typically requires a higher bitrate to achieve similar performance. This indicates that the trained partition-aware LiDAR point cloud compression model in this application has superior coding efficiency in maintaining point-level geometric accuracy and can more effectively reduce geometric distortion.

[0099] like Figure 4As shown, D2 PSNR is used to characterize the point-to-surface distortion performance in point cloud geometry reconstruction, and better reflects the ability of the reconstructed point cloud to preserve local surface structure. With the increase of bit rate, the D2 PSNR of both G-PCCv23 and the trained partition-aware LiDAR point cloud compression model of this application continues to improve. However, the rate-distortion curve of the trained partition-aware LiDAR point cloud compression model of this application is always above the curve of G-PCCv23 throughout the entire test interval, indicating that the trained partition-aware LiDAR point cloud compression model of this application also has better performance in terms of surface geometry preservation.

[0100] Particularly in the mid-to-high bitrate range, the D2PSNR improvement of the trained partition-aware LiDAR point cloud compression model of this application is more stable and significant. For example, near a D2 reconstruction quality of approximately 92–93 dB, the bitrate required for the trained partition-aware LiDAR point cloud compression model of this application to achieve this performance is lower than that of G-PCCv23, indicating that it can achieve higher surface reconstruction accuracy with less bit overhead. This result demonstrates that the trained partition-aware LiDAR point cloud compression model of this application not only improves point-to-point coordinate accuracy.

[0101] The above two results show that the trained partition-aware LiDAR point cloud compression model of this application exhibits better rate-distortion performance than G-PCCv23 in both the D1 and D2 geometric distortion metrics on the KITTI dataset. This verifies that the method proposed in the embodiments of this application has high compression efficiency and reconstruction quality in terms of both point-level geometric accuracy and surface structure preservation.

[0102] After training through the above process, a trained partition-aware LiDAR point cloud compression model is obtained.

[0103] S2. The LiDAR point cloud to be compressed is acquired and input into the trained partition-aware LiDAR point cloud compression model. First, it enters the encoding end, where the LiDAR point cloud to be compressed is quantized by the logarithmic polar coordinate quantization module to obtain the quantized LiDAR point cloud. The quantized LiDAR point cloud is then input into a parallel occupancy code generator for layer-by-layer downsampling. Combined with the first partition-aware occupancy probability prediction module, the first predicted occupancy probability at each scale is obtained. The first predicted occupancy probability distribution at each scale, along with the coordinates and occupancy code information of the lowest scale, are input into the arithmetic encoder for compression to obtain the compressed bitstream. The compressed bitstream is then input into the decoding end, where the arithmetic decoder reads the reconstructed coordinates and predicted occupancy code information of the lowest scale. Based on the reconstructed coordinates and predicted occupancy code information of the lowest scale, the parallel coordinate generator performs layer-by-layer upsampling, and the second partition-aware occupancy probability prediction module generates the reconstructed coordinates at each scale, constructing the reconstructed LiDAR point cloud. The reconstructed coordinates of each point in the reconstructed LiDAR point cloud at the highest scale are input into the logarithmic polar coordinate inverse quantization module for inverse quantization to obtain the restored LiDAR point cloud.

[0104] In a specific embodiment, the calculation process of the logarithmic polar coordinate quantization module is as follows:

[0105] Obtain the Cartesian coordinates of each point in the LiDAR point cloud. And calculate the physical radial distance corresponding to each point. With azimuth , These are the x, y, and z coordinates of each point in Cartesian coordinates;

[0106] Input the physical radial distance, azimuth angle, and z-axis coordinates corresponding to each point into the quantization function to obtain the quantized logarithmic radial coordinates, sector azimuth coordinates, and height coordinates corresponding to each point, as shown in the following formula:

[0107] ;

[0108] ;

[0109] ;

[0110] in, Represents the quantized logarithmic radial coordinates. This represents the quantized sector orientation coordinates. Represents the quantized height coordinates; This indicates the set logarithmic scaling factor; Indicates the set minimum effective distance; Indicates the set angular resolution; This indicates the set z-axis height resolution; This indicates the set Z-axis translation amount to prevent negative coordinates; This represents the rounding function. This indicates taking the maximum value;

[0111] The quantized logarithmic radial coordinates, sector azimuth coordinates, and height coordinates corresponding to each point are concatenated, and points with the same quantized logarithmic radial coordinates, sector azimuth coordinates, and height coordinates within the same spatial voxel are removed to obtain the quantized coordinates corresponding to each point, and the quantized LiDAR point cloud is constructed.

[0112] The calculation process of the logarithmic polar coordinate inverse quantization module is as follows:

[0113] The reconstructed coordinates of each point in the reconstructed LiDAR point cloud at the highest scale are input into the inverse quantization function to obtain the reconstructed physical radial distance, azimuth angle, and coordinates of each point at the highest scale, as shown in the following formula:

[0114] ;

[0115] ;

[0116] ;

[0117] ;

[0118] ;

[0119] in, , These represent the logarithmic radial coordinates, sector azimuth coordinates, and elevation coordinates of each point in the reconstructed LiDAR point cloud at the highest scale, respectively. This represents the physical radial distance of each point after reconstruction at the highest scale. This represents the azimuth angle of each point after reconstruction at the highest scale. , , This represents the x, y, and z coordinates of each point in the Cartesian coordinate system after reconstruction at the highest scale. The reconstructed coordinates of all points at the highest scale constitute the restored LiDAR point cloud.

[0120] Specifically, embodiments of this application utilize a logarithmic polar coordinate quantization module to quantize the LiDAR point cloud, obtaining the quantized logarithmic radial coordinates corresponding to each point in the LiDAR point cloud. Sector azimuth coordinates with height coordinates ; to the above The coordinates are stitched together, and duplicate coordinates within the same spatial voxel are removed to obtain the quantized coordinates corresponding to each point. The quantized coordinates of all points are used to construct a quantized LiDAR point cloud.

[0121] In a specific embodiment, both the first partition-aware occupancy probability prediction module and the second partition-aware occupancy probability prediction module adopt a partition-aware occupancy probability prediction structure and have the same parameters; the partition-aware occupancy probability prediction structure includes an adaptive partition-aware network, a multi-core multi-scale feature extraction network, and a soft-gated hybrid expert head.

[0122] The calculation process for the partition-aware occupancy probability prediction structure is as follows:

[0123] The input coordinates and input occupancy code information at the N-1 scale are embedded and summed to obtain the initial features at the N-1 scale. The input coordinates and initial features at the N-1 scale are then input into the adaptive partitioning perception network to obtain the modulated point cloud features at the N-1 scale, where N=1,2,…,n, and n is the total number of scales. The nth scale is taken as the highest scale, and the 0th scale is taken as the lowest scale.

[0124] The modulated point cloud features at the N-1 scale are input into a multi-kernel multi-scale feature extraction network to obtain the deep features at the N-1 scale.

[0125] The deep features at the (N-1)th scale are used as the initial features at the Nth scale. The input coordinates at the Nth scale and the initial features are input into the adaptive partitioning perception network to obtain the modulated point cloud features at the Nth scale.

[0126] The modulated point cloud features at the Nth scale are input into a multi-kernel multi-scale feature extraction network to obtain the deep features at the Nth scale.

[0127] The input coordinates of the Nth scale and the deep features are input into the soft-gated hybrid expert head to obtain the predicted occupancy probability for each scale.

[0128] For details, please refer to Figure 5 In the embodiments of this application, the first partition sensing occupancy probability prediction module and the second partition sensing occupancy probability prediction module both adopt the partition sensing occupancy probability prediction structure and have the same parameters. The only difference is that they are located at the encoding end and the decoding end, respectively. Since the structure and parameters are the same, the same predicted occupancy probability distribution can be generated at the decoding end as at the encoding end.

[0129] In a specific embodiment, the quantized LiDAR point cloud is input into a parallel occupancy code generator for layer-by-layer downsampling, and combined with the first partition-aware occupancy probability prediction module, the first predicted occupancy probability for each scale is obtained, specifically including:

[0130] S11, the quantized coordinates corresponding to each point in the quantized LiDAR point cloud are input into the parallel occupancy code generator as the highest-scale quantized coordinates to obtain the quantized coordinates of each scale, and the occupancy code information of each scale is generated using the quantized coordinates of each scale.

[0131] S12, the quantized coordinates and occupancy code information of the N-1 scale and the quantized coordinates of the N scale are respectively used as the input coordinates and input occupancy code information of the N-1 scale and the input coordinates of the N scale, and are input to the first partition perception occupancy probability prediction module to generate the first predicted occupancy probability of the N scale.

[0132] S13, Repeat steps S11-S12 n times to obtain the first predicted occupancy probability for each scale;

[0133] Based on the reconstructed coordinates at the lowest scale and the predicted occupancy code information, a parallel coordinate generator performs layer-by-layer upsampling, and combines this with the second-partition perceptual occupancy probability prediction module to generate reconstructed coordinates for each scale, specifically including:

[0134] S21, input the lowest-scale reconstructed coordinates and predicted occupancy code information into the parallel coordinate generator to obtain the reconstructed coordinates at each scale;

[0135] S22, the reconstructed coordinates and predicted occupancy code information of the N-1 scale and the reconstructed coordinates of the N-1 scale are respectively used as the input coordinates and input occupancy code information of the N-1 scale and the input coordinates of the N-1 scale, and are input to the second partition perception occupancy probability prediction module to generate the second predicted occupancy probability of the N-1 scale.

[0136] S23, input the second predicted occupancy probability of the Nth scale and the sub-bit stream of the Nth scale in the compressed bit stream into the arithmetic decoder to obtain the predicted occupancy code information of the Nth scale;

[0137] S24. Repeat steps S22-S23 n times to obtain the reconstructed coordinates for each scale.

[0138] Specifically, the inputs of the partition-aware occupancy probability prediction module are the input coordinates and occupancy code information at the (N-1)th scale and the input coordinates at the Nth scale, and the output is the predicted occupancy probability; at the encoding end, the inputs of the first partition-aware occupancy probability prediction module are the quantized coordinates and occupancy code information at the (N-1)th scale and the quantized coordinates at the Nth scale, and the output is the first predicted occupancy probability at the Nth scale; at the decoding end, the inputs of the second partition-aware occupancy probability prediction module are the reconstructed coordinates and predicted occupancy code information at the (N-1)th scale and the reconstructed coordinates at the Nth scale, and the output is the second predicted occupancy probability at the Nth scale.

[0139] In a specific embodiment, the parallel occupancy code generator includes n downsampled occupancy code generation modules. For the (n-N+1)th downsampled occupancy code generation module, the quantization coordinates of the Nth scale are input into the (n-N+1)th downsampled occupancy code generation module, and the quantization coordinates of the (N-1)th scale are first calculated using the following formula:

[0140] ;

[0141] in, Represents the quantization coordinates of the Nth scale. Represents the quantized coordinates of the (N-1)th scale; This indicates the floor function;

[0142] The occupancy code information for each scale is generated using the following formula:

[0143] ;

[0144] in, Let represent the logarithmic radial coordinate, sector azimuth coordinate, and height coordinate in the quantized coordinates of the Nth scale, respectively. This represents the modulo operation; This represents the occupancy code information for the (N-1)th scale, with a value range of 1 to 255; This represents the set of child node coordinates corresponding to the quantized coordinates of the (N-1)th scale; This represents the feature summation operation within a local space. This represents the quantized coordinate of the Nth scale belonging to the set of child node coordinates corresponding to the (N-1)th scale.

[0145] The parallel coordinate generator includes n upsampled coordinate generation modules. For the Nth upsampled coordinate generation module, the reconstructed coordinates and predicted occupancy code information of the N-1th scale are input into the Nth upsampled coordinate generation module, and the quantized coordinates of the Nth scale are output, as shown in the following formula:

[0146] ;

[0147] ;

[0148] ;

[0149] in, This indicates an element-wise addition operation. This represents the coordinates of the k-th candidate child node in the set of 8 candidate child node coordinates for the N-th scale. Represents the reconstructed coordinates at the (N-1)th scale; This represents the k-th relative offset vector among the eight possible combinations of relative offset vectors in the local space. This represents the k-th bit of the pruned mask, where the mask value is either 0 or 1, and k = 1, 2, ..., 8. Indicates if If the value is 1, then The result is 1, if If the value of is not 1, then The result is 0; This represents the predicted occupancy code information for the (N-1)th scale; This indicates the reconstructed coordinates of the Nth scale with a mask of 1 after removing invalid coordinates with a mask of 0.

[0150] Specifically, at the encoding end, the quantized coordinates corresponding to each point in the quantized LiDAR point cloud are used as the highest-scale quantized coordinates and input into a parallel occupancy code generator for layer-by-layer downsampling to generate multi-scale quantized coordinates. At the same time, multi-scale quantized coordinates are utilized. To generate occupancy code information for each scale In one embodiment, the feature summation operation within the local space can be implemented by a three-dimensional sparse convolution with a fixed weight of 1 and a stride of 2. That is, the quantized coordinates at the nth scale are input into a parallel occupancy code generator, and downsampling is performed layer by layer through n upsampling coordinate generation modules to obtain the (n-1)th quantized coordinates, the (n-2)th quantized coordinates, and so on, until the 0th quantized coordinate.

[0151] Then, the quantized coordinates and occupancy code information of the (N-1)th scale are used for feature embedding to obtain the initial features of the (N-1)th scale, as shown in the following formula:

[0152] = ;

[0153] in, This represents the initial feature at the (N-1)th scale. This represents the quantization coordinates of the (N-1)th scale. This represents the occupancy code information for the (N-1)th scale.

[0154] The quantized coordinates at the N-1 scale and the initial features are then input into the adaptive partitioning perception network to obtain the modulation point cloud features at the N-1 scale. The modulation point cloud features at the N-1 scale are then input into the multi-kernel multi-scale feature extraction network to obtain the deep features at the N-1 scale.

[0155] The quantized coordinates at the Nth scale are embedded as features, and then the deep features at the (N-1)th scale are overlaid, directly serving as the initial features for the Nth scale, resulting in high-scale point cloud features. These features are then input into an adaptive partitioning perception network to obtain modulated point cloud features at the Nth scale. The modulated point cloud features at the Nth scale are then input into a multi-core, multi-scale feature extraction network to obtain deep features at the Nth scale. These deep features at the Nth scale are then input into a soft-gated hybrid expert head to obtain the first predicted occupancy probability at the Nth scale. This process is repeated until the first predicted occupancy probability distribution for all scales is obtained.

[0156] The first predicted occupancy probability distributions for all scales, along with the quantization coordinates and occupancy code information for scale 0, are input into an arithmetic encoder for compression to obtain a compressed bitstream. The compressed bitstream contains the quantization coordinates and occupancy code information for scale 0, as well as sub-bitstreams for each scale, with each sub-bitstream also associated with its corresponding scale.

[0157] At the decoding end, the arithmetic decoder reads the quantization coordinates and occupancy code information of the 0th scale from the compressed bitstream, and uses the quantization coordinates and occupancy code information of the 0th scale as the reconstruction coordinates and prediction occupancy code information of the 0th scale.

[0158] The upsampling coordinate generation module in the parallel coordinate generator uses the reconstructed coordinates at scale 0 and the predicted occupancy code information as conditions to obtain the reconstructed coordinates at scale 1.

[0159] The reconstructed coordinates and predicted occupancy code information at scale 0, along with the reconstructed coordinates at scale 1, are input into the second partition-aware occupancy probability prediction module to generate the second predicted occupancy probability at scale 1. Then, the second predicted occupancy probability at scale 1, together with the sub-code stream of the corresponding scale in the bitstream, is input into the arithmetic decoder to obtain the predicted occupancy code information at scale 1.

[0160] In other words, the upsampling coordinate generation module in the parallel coordinate generator uses the reconstructed coordinates at the (N-1)th scale and the predicted occupancy code information as conditions to obtain the reconstructed coordinates at the Nth scale. The reconstructed coordinates at the (N-1)th scale and the predicted occupancy code information, along with the reconstructed coordinates at the Nth scale, are then input into the second partition-aware occupancy probability prediction module to generate the second predicted occupancy probability at the Nth scale. This second predicted occupancy probability at the Nth scale, along with the corresponding sub-bitstream at the corresponding scale in the bitstream, is then input into the arithmetic decoder to obtain the predicted occupancy code information at the Nth scale. By using the predicted occupancy code information at the Nth scale as the predicted occupancy code information at the (N-1)th scale and combining it with the reconstructed coordinates at the Nth scale for iterative processing, the reconstructed coordinates at each scale can be generated.

[0161] In one example, reference Figure 6 The local space in the upsampling coordinate generation module is Eight combinations of relative offset vectors, with their value ranges as follows: , , , , , , and .

[0162] In a specific embodiment, the computation process of the adaptive partitioning sensing network for the Nth scale is as follows:

[0163] ;

[0164] ;

[0165] ;

[0166] in, This represents the logarithmic radial coordinates in the input coordinates at the Nth scale, after being normalized by constant division. This represents the sector orientation index after rounding down the input coordinates at the Nth scale. This represents a multilayer perceptron containing linear layers and the ReLU activation function. This indicates an element-wise addition operation. This represents the initial feature at the Nth scale. Represents the spatial context features at the Nth scale. This represents the dynamic scaling factor at the Nth scale. This represents the Sigmoid activation function. This represents the modulated point cloud features at the Nth scale. This represents the feature embedding operation;

[0167] The computation process of the multi-kernel multi-scale feature extraction network is as follows:

[0168] ;

[0169] ;

[0170] ;

[0171] ;

[0172] in, , and These represent the first, second, and third intermediate features at the Nth scale, respectively. Represents the ReLU activation function; This represents a 3D convolution with 32 input channels, 32 output channels, and a kernel of 3. This represents a 3D convolution with 32 input channels, 8 output channels, and a kernel of 1. This represents a 3D convolution with 32 input channels, 8 output channels, and a 3-kernel configuration. This represents a 3D convolution with 32 input channels, 16 output channels, and a kernel size of 5. This represents a 3D convolution with 32 input channels, 32 output channels, and a kernel of 1. This represents a vector concatenation operation. Deep features at the Nth scale;

[0173] The soft-gated hybrid expert head includes a gating network and M expert networks;

[0174] The input coordinates at the Nth scale are first passed through a gating network, which outputs the gating weight vectors of each expert network at the Nth scale, as shown in the following equation:

[0175] ;

[0176] in, This represents the logarithmic radial coordinates in the input coordinates at the Nth scale, after being normalized by constant division. This represents the sector azimuth coordinates in the input coordinates at the Nth scale, after being normalized by constant division. This represents the height coordinates in the input coordinates at the Nth scale, after being normalized by constant division. This indicates a gated multilayer sensor. Represents the Softmax function; This represents the gate weight vector at the Nth scale. Let i = 1, 2, ..., M, where M represents the total number of expert networks. Represents the gating weights of the i-th expert network at the N-th scale;

[0177] After inputting the deep features at the Nth scale into each expert network, they are weighted and fused with the gating weights of each expert network at the Nth scale to obtain the predicted occupancy probability at the Nth scale, as shown in the following formula:

[0178] ;

[0179] in, This represents the nonlinear mapping operation of the i-th expert network. This indicates an element-wise multiplication operation. This represents the predicted occupancy probability at the Nth scale.

[0180] Specifically, the constant division normalization process used in the embodiments of this application is shown in the following formula:

[0181] ;

[0182] constant In one of the project implementations, the value is set to 4.

[0183] For details, please refer to Figure 7 In the adaptive partitioned perception network, taking the Nth scale as an example, the initial features of the Nth scale are first generated, and the logarithmic radial coordinates and sector azimuth coordinates of the input coordinates of the Nth scale are encoded by multilayer perceptron and feature embedding operation, respectively. Then, the dynamic scaling factor of the Nth scale is calculated and generated, and then combined with the initial features of the Nth scale to obtain the modulated point cloud features of the Nth scale.

[0184] refer to Figure 8 In the multi-kernel scale feature extraction network, taking the Nth scale as an example, the modulated point cloud features at the Nth scale are first passed through a 3D convolution with 32 input channels, 32 output channels, and a kernel of 3, followed by a ReLU activation function to obtain the first intermediate feature. This first intermediate feature is then passed through a second 3D convolution with 32 input channels, 32 output channels, and a kernel of 3, followed by a ReLU activation function, and a third 3D convolution with 32 input channels, 32 output channels, and a kernel of 3. After residual connection with the first intermediate feature, it is then passed through another ReLU activation function. After the function, the second intermediate feature is obtained. This second intermediate feature is then passed through a 3D convolution with 32 input channels, 8 output channels, and 1 kernel, a 3D convolution with 32 input channels, 8 output channels, and 3 kernels, and a 3D convolution with 32 input channels, 16 output channels, and 5 kernels. These are then concatenated to obtain the third intermediate feature. This third intermediate feature is then passed through a 3D convolution with 32 input channels, 32 output channels, and 1 kernel, and then residually connected to the second intermediate feature. Finally, it is passed through a ReLU activation function to obtain the depth feature at the Nth scale.

[0185] refer to Figure 9 The soft-gated hybrid expert head includes a gating network and M expert networks. Taking the Nth scale as an example, the deep features of the Nth scale are input into each expert network to obtain the expert logic output by each expert network. Additionally, the input coordinates at the Nth scale are fed into the gating network, and the gating weights of each expert network at the Nth scale are output, thus constructing the gating weight vector of all expert networks. The gating weights of each expert network are then coupled with their corresponding expert logic... By performing weighted summation and normalization using the Softmax function, the predicted occupancy probability for the Nth scale can be obtained.

[0186] Specifically, the down-rounding truncation process used in the embodiments of this application is shown in the following formula:

[0187] ;

[0188] The width of each sector is The constant S is 720 in one of the engineering implementations.

[0189] Further reference Figure 10 As an implementation of the methods shown in the above figures, this application provides an embodiment of a partitioned sensing LiDAR point cloud compression device based on non-uniform spatial quantization. This device embodiment is similar to... Figure 1 Corresponding to the method embodiments shown, this device can be specifically applied to various electronic devices.

[0190] This application provides a partitioned sensing LiDAR point cloud compression device based on non-uniform spatial quantization, comprising:

[0191] Model building module 1 is configured to build and train a partition-aware LiDAR point cloud compression model to obtain a trained partition-aware LiDAR point cloud compression model. The partition-aware LiDAR point cloud compression model includes an encoder and a decoder. The encoder includes a log-polar coordinate quantization module, a parallel occupancy code generator, a first partition-aware occupancy probability prediction module, and an arithmetic encoder. The decoder includes an arithmetic decoder, a second partition-aware occupancy probability prediction module, a parallel coordinate generator, and a log-polar coordinate inverse quantization module.

[0192] Compression module 2 is configured to acquire the LiDAR point cloud to be compressed and input it into a trained partition-aware LiDAR point cloud compression model. First, in the encoding stage, the LiDAR point cloud to be compressed is quantized by a logarithmic polar coordinate quantization module to obtain the quantized LiDAR point cloud. The quantized LiDAR point cloud is then input into a parallel occupancy code generator for layer-by-layer downsampling. Combined with the first partition-aware occupancy probability prediction module, the first predicted occupancy probability at each scale is obtained. The first predicted occupancy probability distribution at each scale, along with the coordinates and occupancy code information of the lowest scale, is then input into the computation... The algorithm compresses the data in the encoder to obtain a compressed bitstream. The compressed bitstream is then input to the decoder, where the arithmetic decoder reads the lowest-scale reconstructed coordinates and predicted occupancy code information. Based on the lowest-scale reconstructed coordinates and predicted occupancy code information, the algorithm performs layer-by-layer upsampling through a parallel coordinate generator. Combined with the second partition perception occupancy probability prediction module, the algorithm generates reconstructed coordinates for each scale, thus constructing the reconstructed LiDAR point cloud. The reconstructed coordinates of each point in the reconstructed LiDAR point cloud at the highest scale are then input to the logarithmic polar coordinate inverse quantization module for inverse quantization processing, resulting in the restored LiDAR point cloud.

[0193] Figure 10 This is a schematic diagram of the hardware structure of an electronic device provided in an embodiment of the present invention. For example... Figure 10 As shown, the electronic device of this embodiment includes a processor 1001 and a memory 1002; wherein the memory 1002 is used to store computer execution instructions; and the processor 1001 is used to execute the computer execution instructions stored in the memory to implement the various steps performed by the electronic device in the above embodiment. For details, please refer to the relevant descriptions in the foregoing method embodiments.

[0194] Alternatively, the memory 1002 can be either standalone or integrated with the processor 1001.

[0195] When the memory 1002 is set up independently, the electronic device also includes a bus 1003 for connecting the memory 1002 and the processor 1001.

[0196] This invention also provides a computer storage medium storing computer execution instructions, which, when executed by the processor 1001, implement the above method.

[0197] This invention also provides a computer program product, including a computer program, which, when executed by a processor 1001, implements the above-described method.

[0198] In the embodiments provided by this invention, it should be understood that the disclosed devices and methods can be implemented in other ways. For example, the device embodiments described above are merely illustrative; for instance, the division of modules is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple modules may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be indirect coupling or communication connection through some interfaces, devices, or modules, and may be electrical, mechanical, or other forms.

[0199] The modules described as separate components may or may not be physically separate. The components shown as modules may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to implement the solution of this embodiment according to actual needs.

[0200] Furthermore, the functional modules in the various embodiments of this invention can be integrated into one processing unit, or each module can exist physically separately, or two or more modules can be integrated into one unit. The unit formed by the above modules can be implemented in hardware or in the form of hardware plus software functional units.

[0201] The integrated modules implemented as software functional modules described above can be stored in a computer-readable storage medium. These software functional modules, stored in a storage medium, include several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) or processor 1001 to execute certain steps of the methods of the various embodiments of this application.

[0202] It should be understood that the processor 1001 described above can be a Central Processing Unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), etc. The general-purpose processor can be a microprocessor, or the processor 1001 can be any conventional processor 1001. The steps of the method disclosed in this invention can be directly manifested as execution by the hardware processor 1001, or execution by a combination of hardware and software modules within the processor 1001.

[0203] The memory 1002 may include high-speed RAM memory, and may also include non-volatile memory NVM, such as at least one disk storage device, and may also be a USB flash drive, portable hard drive, read-only memory, disk or optical disc, etc.

[0204] Bus 1003 can be an Industry Standard Architecture (ISA), a Peripheral Component Interconnect (PCI) bus, or an Extended Industry Standard Architecture (EISA) bus, etc. Bus 1003 can be divided into address bus, data bus, control bus, etc. For ease of illustration, the bus 1003 in the accompanying drawings of this application is not limited to only one bus 1003 or one type of bus 1003.

[0205] The aforementioned storage medium can be implemented from any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic storage, flash memory, magnetic disk, or optical disk. The storage medium can be any available medium accessible to general-purpose or special-purpose computers.

[0206] An exemplary storage medium is coupled to a processor 1001, enabling the processor 1001 to read information from and write information to the storage medium. Alternatively, the storage medium can be an integral part of the processor 1001. The processor 1001 and the storage medium can reside in an application-specific integrated circuit (ASIC). Alternatively, the processor 1001 and the storage medium can exist as discrete components in an electronic device or a main control device.

[0207] Those skilled in the art will understand that all or part of the steps of the above-described method embodiments can be implemented by hardware related to program instructions. The aforementioned program can be stored in a computer-readable storage medium. When executed, the program performs the steps of the above-described method embodiments; and the aforementioned storage medium includes various media capable of storing program code, such as ROM, RAM, magnetic disks, or optical disks.

[0208] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some or all of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A partitioned sensing LiDAR point cloud compression method based on non-uniform spatial quantization, characterized in that, Includes the following steps: A partition-aware LiDAR point cloud compression model is constructed and trained to obtain a trained partition-aware LiDAR point cloud compression model. The partition-aware LiDAR point cloud compression model includes an encoder and a decoder. The encoder includes a log-polar coordinate quantization module, a parallel occupancy code generator, a first partition-aware occupancy probability prediction module, and an arithmetic encoder. The decoder includes an arithmetic decoder, a second partition-aware occupancy probability prediction module, a parallel coordinate generator, and a log-polar coordinate inverse quantization module. The LiDAR point cloud to be compressed is acquired and input into the trained partition-aware LiDAR point cloud compression model. First, it enters the encoding end and is quantized by the logarithmic polar coordinate quantization module to obtain the quantized LiDAR point cloud. The quantized LiDAR point cloud is then input into the parallel occupancy code generator for layer-by-layer downsampling. Combined with the first partition-aware occupancy probability prediction module, the first predicted occupancy probability of each scale is obtained. The first predicted occupancy probability distribution of each scale and the coordinates and occupancy code information of the lowest scale are input into the arithmetic encoder for compression to obtain the compressed bitstream. The compressed bitstream is input to the decoding end, where the arithmetic decoder reads the lowest-scale reconstructed coordinates and predicted occupancy code information. Based on the lowest-scale reconstructed coordinates and predicted occupancy code information, the parallel coordinate generator performs layer-by-layer upsampling, and combines it with the second partition-aware occupancy probability prediction module to generate reconstructed coordinates for each scale, thus constructing a reconstructed LiDAR point cloud. The highest-scale reconstructed coordinates of each point in the reconstructed LiDAR point cloud are input to the logarithmic polar coordinate inverse quantization module for inverse quantization processing, resulting in the restored LiDAR point cloud.

2. The partitioned sensing LiDAR point cloud compression method based on non-uniform spatial quantization according to claim 1, characterized in that, Both the first partition-aware occupancy probability prediction module and the second partition-aware occupancy probability prediction module adopt a partition-aware occupancy probability prediction structure and have the same parameters; the partition-aware occupancy probability prediction structure includes an adaptive partition-aware network, a multi-core multi-scale feature extraction network, and a soft-gated hybrid expert head. The calculation process of the partition-aware occupancy probability prediction structure is as follows: The input coordinates and input occupancy code information at the N-1 scale are embedded and summed to obtain the initial features at the N-1 scale. The input coordinates and initial features at the N-1 scale are then input into the adaptive partitioning perception network to obtain the modulated point cloud features at the N-1 scale, where N = 1, 2, ..., n, and n is the total number of scales. The nth scale is taken as the highest scale, and the 0th scale is taken as the lowest scale. The modulated point cloud features at the N-1th scale are input into the multi-kernel multi-scale feature extraction network to obtain the deep features at the N-1th scale. The deep features at the (N-1)th scale are used as the initial features at the Nth scale. The input coordinates at the Nth scale and the initial features are input into the adaptive partitioning perception network to obtain the modulated point cloud features at the Nth scale. The modulated point cloud features at the Nth scale are input into the multi-kernel multi-scale feature extraction network to obtain the deep features at the Nth scale. The input coordinates and deep features of the Nth scale are input into the soft-gated hybrid expert head to obtain the predicted occupancy probability for each scale.

3. The partitioned sensing LiDAR point cloud compression method based on non-uniform spatial quantization according to claim 2, characterized in that, The quantized LiDAR point cloud is input into the parallel occupancy code generator for layer-by-layer downsampling, and combined with the first partition-aware occupancy probability prediction module, the first predicted occupancy probability for each scale is obtained, specifically including: S11, the quantized coordinates corresponding to each point in the quantized LiDAR point cloud are input into the parallel occupancy code generator as the highest-scale quantized coordinates to obtain the quantized coordinates of each scale, and the occupancy code information of each scale is generated using the quantized coordinates of each scale. S12, the quantized coordinates and occupancy code information of the N-1 scale and the quantized coordinates of the Nth scale are respectively used as the input coordinates and input occupancy code information of the N-1 scale and the input coordinates of the Nth scale, and input to the first partition perception occupancy probability prediction module to generate the first predicted occupancy probability of the Nth scale. S13, Repeat steps S11-S12 n times to obtain the first predicted occupancy probability for each scale; Based on the reconstructed coordinates at the lowest scale and the predicted occupancy code information, the parallel coordinate generator performs layer-by-layer upsampling and combines it with the second partition-aware occupancy probability prediction module to generate reconstructed coordinates at each scale, specifically including: S21, input the lowest-scale reconstructed coordinates and predicted occupancy code information into the parallel coordinate generator to obtain the reconstructed coordinates for each scale; S22, the reconstructed coordinates and predicted occupancy code information of the N-1 scale and the reconstructed coordinates of the N-1 scale are respectively used as the input coordinates and input occupancy code information of the N-1 scale and the input coordinates of the N-1 scale, and are input to the second partition perception occupancy probability prediction module to generate the second predicted occupancy probability of the N-1 scale. S23, input the second predicted occupancy probability of the Nth scale and the sub-bit stream of the Nth scale in the compressed bit stream into the arithmetic decoder to obtain the predicted occupancy code information of the Nth scale; S24. Repeat steps S22-S23 n times to obtain the reconstructed coordinates for each scale.

4. The partitioned sensing LiDAR point cloud compression method based on non-uniform spatial quantization according to claim 3, characterized in that, For the Nth scale, the calculation process of the adaptive partitioning sensing network is as follows: ; ; ; in, This represents the logarithmic radial coordinates in the input coordinates at the Nth scale, after being normalized by constant division. This represents the sector azimuth index after rounding down the input coordinates at the Nth scale. This represents a multilayer perceptron containing linear layers and the ReLU activation function. This indicates an element-wise addition operation. This represents the initial feature at the Nth scale. Represents the spatial context features at the Nth scale. This represents the dynamic scaling factor at the Nth scale. This represents the Sigmoid activation function. This represents the modulated point cloud features at the Nth scale. This represents the feature embedding operation; The computation process of the multi-kernel multi-scale feature extraction network is as follows: ; ; ; ; in, , and These represent the first, second, and third intermediate features at the Nth scale, respectively. Represents the ReLU activation function; This represents a 3D convolution with 32 input channels, 32 output channels, and a kernel of 3. This represents a 3D convolution with 32 input channels, 8 output channels, and a kernel of 1. This represents a 3D convolution with 32 input channels, 8 output channels, and a 3-kernel configuration. This represents a 3D convolution with 32 input channels, 16 output channels, and a kernel size of 5. This represents a 3D convolution with 32 input channels, 32 output channels, and a kernel of 1. This represents a vector concatenation operation. Deep features at the Nth scale; The soft-gated hybrid expert head includes a gating network and M expert networks; The input coordinates at the Nth scale are first passed through the gating network, which outputs the gating weight vectors of each expert network at the Nth scale, as shown in the following formula: ; in, This represents the logarithmic radial coordinates in the input coordinates at the Nth scale, after being normalized by constant division. This represents the sector azimuth coordinates in the input coordinates at the Nth scale, after being normalized by constant division. This represents the height coordinates in the input coordinates at the Nth scale, after being normalized by constant division. This refers to a gated multilayer sensor. Represents the Softmax function; This represents the gate weight vector at the Nth scale. Let i = 1, 2, ..., M, where M represents the total number of expert networks. Represents the gating weights of the i-th expert network at the N-th scale; After inputting the deep features at the Nth scale into each expert network, they are weighted and fused with the gating weights of each expert network at the Nth scale to obtain the predicted occupancy probability at the Nth scale, as shown in the following formula: ; in, This represents the nonlinear mapping operation of the i-th expert network. This indicates an element-wise multiplication operation. This represents the predicted occupancy probability at the Nth scale.

5. The partitioned sensing LiDAR point cloud compression method based on non-uniform spatial quantization according to claim 1, characterized in that, The parallel occupancy code generator includes n downsampled occupancy code generation modules. For the (n-N+1)th downsampled occupancy code generation module, the quantized coordinates of the Nth scale are input into the (n-N+1)th downsampled occupancy code generation module, and the quantized coordinates of the (N-1)th scale are first calculated using the following formula: ; in, Represents the quantization coordinates of the Nth scale. Represents the quantization coordinates of the (N-1)th scale; This indicates the floor function; The occupancy code information for each scale is generated using the following formula: ; in, Let represent the logarithmic radial coordinate, sector azimuth coordinate, and height coordinate in the quantized coordinates of the Nth scale, respectively. This represents the modulo operation; This represents the occupancy code information for the (N-1)th scale, with a value range of 1 to 255; This represents the set of child node coordinates corresponding to the quantized coordinates of the (N-1)th scale; This represents the feature summation operation within a local space. This represents the quantized coordinate of the Nth scale in the set of child node coordinates corresponding to the quantized coordinate of the (N-1)th scale. The parallel coordinate generator includes n upsampled coordinate generation modules. For the Nth upsampled coordinate generation module, the reconstructed coordinates and predicted occupancy code information of the N-1th scale are input into the Nth upsampled coordinate generation module, and the quantized coordinates of the Nth scale are output, as shown in the following formula: ; ; ; in, This indicates an element-wise addition operation. This represents the coordinates of the k-th candidate child node in the set of 8 candidate child node coordinates for the N-th scale. Represents the reconstructed coordinates at the (N-1)th scale; This represents the k-th relative offset vector among the eight possible combinations of relative offset vectors in the local space. This represents the k-th bit of the pruned mask, where the mask value is either 0 or 1, and k = 1, 2, ..., 8. Indicates if If the value is 1, then The result is 1, if If the value of is not 1, then The result is 0; This represents the predicted occupancy code information for the (N-1)th scale; This indicates the reconstructed coordinates of the Nth scale with a mask of 1 after removing invalid coordinates with a mask of 0.

6. The partitioned sensing LiDAR point cloud compression method based on non-uniform spatial quantization according to claim 1, characterized in that, The calculation process of the logarithmic polar coordinate quantization module is as follows: Obtain the Cartesian coordinates of each point in the LiDAR point cloud. And calculate the physical radial distance corresponding to each point. With azimuth , These are the x, y, and z coordinates of each point in Cartesian coordinates; Input the physical radial distance, azimuth angle, and z-axis coordinates corresponding to each point into the quantization function to obtain the quantized logarithmic radial coordinates, sector azimuth coordinates, and height coordinates corresponding to each point, as shown in the following formula: ; ; ; in, Represents the quantized logarithmic radial coordinates. This represents the quantized sector orientation coordinates. Represents the quantized height coordinates; This indicates the set logarithmic scaling factor; Indicates the minimum effective distance set; Indicates the set angular resolution; This indicates the set z-axis height resolution; This indicates the set Z-axis translation amount to prevent negative coordinates; This represents the rounding function. This indicates taking the maximum value; The quantized logarithmic radial coordinates, sector azimuth coordinates, and height coordinates corresponding to each point are concatenated, and points with the same quantized logarithmic radial coordinates, sector azimuth coordinates, and height coordinates within the same spatial voxel are removed to obtain the quantized coordinates corresponding to each point, and the quantized LiDAR point cloud is constructed. The calculation process of the logarithmic polar coordinate inverse quantization module is as follows: The reconstructed coordinates of each point in the reconstructed LiDAR point cloud at the highest scale are input into the inverse quantization function to obtain the reconstructed physical radial distance, azimuth angle, and coordinates of each point at the highest scale, as shown in the following formula: ; ; ; ; ; in, , These represent the logarithmic radial coordinates, sector azimuth coordinates, and elevation coordinates of each point in the reconstructed LiDAR point cloud at the highest scale, respectively. This represents the physical radial distance of each point after reconstruction at the highest scale. This represents the azimuth angle of each point after reconstruction at the highest scale. , , This represents the x, y, and z coordinates of each point in the Cartesian coordinate system after reconstruction at the highest scale. The reconstructed coordinates of all points at the highest scale constitute the restored LiDAR point cloud.

7. A partitioned sensing LiDAR point cloud compression device based on non-uniform spatial quantization, characterized in that, include: The model building module is configured to build and train a partition-aware LiDAR point cloud compression model to obtain a trained partition-aware LiDAR point cloud compression model. The partition-aware LiDAR point cloud compression model includes an encoder and a decoder. The encoder includes a log-polar coordinate quantization module, a parallel occupancy code generator, a first partition-aware occupancy probability prediction module, and an arithmetic encoder. The decoder includes an arithmetic decoder, a second partition-aware occupancy probability prediction module, a parallel coordinate generator, and a log-polar coordinate inverse quantization module. The compression module is configured to acquire the LiDAR point cloud to be compressed and input it into the trained partition-aware LiDAR point cloud compression model. First, it enters the encoding end and performs quantization processing on the LiDAR point cloud to be compressed through the logarithmic polar coordinate quantization module to obtain the quantized LiDAR point cloud. The quantized LiDAR point cloud is then input into the parallel occupancy code generator for layer-by-layer downsampling and combined with the first partition-aware occupancy probability prediction module to obtain the first predicted occupancy probability at each scale. The first predicted occupancy probability distribution at each scale and the coordinates and occupancy code information of the lowest scale are input into the arithmetic encoder for compression to obtain the compressed bitstream. The compressed bitstream is input to the decoding end, where the arithmetic decoder reads the lowest-scale reconstructed coordinates and predicted occupancy code information. Based on the lowest-scale reconstructed coordinates and predicted occupancy code information, the parallel coordinate generator performs layer-by-layer upsampling, and combines it with the second partition-aware occupancy probability prediction module to generate reconstructed coordinates for each scale, thus constructing a reconstructed LiDAR point cloud. The highest-scale reconstructed coordinates of each point in the reconstructed LiDAR point cloud are input to the logarithmic polar coordinate inverse quantization module for inverse quantization processing, resulting in the restored LiDAR point cloud.

8. An electronic device, comprising: One or more processors; A storage device for storing one or more programs, characterized in that, when the one or more programs are executed by the one or more processors, the one or more processors implement the method as described in any one of claims 1-6.

9. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the program is executed by the processor, it implements the method as described in any one of claims 1-6.

10. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the method as described in any one of claims 1-6.