A method and system for lightweight compression of three-dimensional models based on layered depth images
By employing perception-guided attribute quantization and sparse coding strategies, combined with a controllable fidelity mechanism, efficient and lightweight compression of 3D models is achieved. This solves the problem of balancing visual fidelity, storage overhead, and rendering performance in 3D models, making it suitable for efficient rendering and interaction on various devices.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ZHEJIANG UNIV
- Filing Date
- 2026-03-03
- Publication Date
- 2026-06-16
AI Technical Summary
Existing 3D model lightweighting techniques struggle to achieve both high visual fidelity and extremely low storage overhead and stable real-time rendering performance, and lack a systematic mechanism for a controllable trade-off between fidelity and data volume.
By employing a perception-oriented attribute quantization model, a structure-aware sparse coding strategy, and an end-to-end controllable fidelity mechanism, lightweight compression of 3D models is achieved through multi-level collaborative compression and visual perception reconstruction.
It achieves nearly 30 times single-view data compression and an overall compression ratio of 50:1 to 100:1 for the entire model under visual high fidelity (SSIM>0.99), supports high-quality visualization and key interactive operations, and is suitable for efficient rendering on ordinary PCs, web browsers and mobile devices.
Smart Images

Figure CN121767570B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of computer graphics and 3D model data processing, and more particularly to a lightweight compression method and system for 3D models based on layered depth images. This invention also relates to electronic devices and storage media for implementing the above method. Background Technology
[0002] The deep integration of Model-Based Systems Engineering (MBSE) and digital twin technology has placed extreme demands on the lightweight visualization and collaboration of large-scale, high-precision 3D CAD models throughout their entire lifecycle. Currently, in fields such as industrial equipment and smart cities, the size of models often reaches several gigabytes or even tens of gigabytes, posing significant challenges to their network transmission, terminal loading, and real-time rendering.
[0003] Existing lightweight 3D model technologies mainly include three technical approaches, but each has its inherent drawbacks:
[0004] 1. Geometric simplification method: This method reduces the number of faces by means of mesh thinning, edge folding, etc. Although this method can directly reduce the size, it is a lossy simplification. Under high compression ratio, it will inevitably lead to the loss of geometric features, blurring of sharp edges, and it cannot maintain the accurate boundary representation (B-Rep) semantics of the original CAD, which contradicts the requirements of digital twins for accuracy and interactivity.
[0005] 2. Deep learning-based point cloud or mesh processing methods: These include point cloud quantization, neural network autoencoder compression, and implicit feature representation. These methods can learn latent distribution patterns from data, achieving high compression ratios. However, they are still in the research and exploration stage and have inherent drawbacks such as insufficient interpretability, the need for large amounts of homogeneous data for training, and limited generalization ability to complex engineering CAD models. More importantly, this type of "black box" compression often struggles to maintain and recover the accurate assembly structure, part boundaries, and engineering semantic information of the CAD model, failing to meet the rigid requirements of model localization, querying, analysis, and collaboration in digital twins.
[0006] 3. Level of Detail (LOD) method: This method pre-generates multiple simplified versions and dynamically switches between them based on the viewpoint. However, it requires storing multiple copies of the model data, failing to effectively reduce overall storage overhead. Switching between different levels may result in visual abrupt changes, and each level still uses traditional mesh representation, thus failing to fundamentally solve the storage and transmission problems of large-scale complex models.
[0007] It is worth noting that Image-Based Rendering (IBR) and its important branch—Layered Depth Image (LDI) technology—offer new solutions to the aforementioned challenges. This method transforms 3D geometry into a series of multi-layered 2D images with depth information through pre-rendering. Its core advantage lies in changing the rendering complexity from being related to the number of model faces to being related only to the output image resolution, thus making it possible to achieve stable real-time rendering of large-scale models.
[0008] However, the inventors of this application, through in-depth research, discovered that existing LDI technology, when dealing with the high demands of digital twin scenarios, reveals a fundamental contradiction that is difficult to reconcile with existing solutions: high visual fidelity, extremely low storage / transmission overhead, and stable real-time rendering performance are difficult to achieve simultaneously. Specifically, this manifests as follows:
[0009] 1. In pursuit of visual losslessness, high-precision (such as full-precision floating point) storage attributes are required, and a large number of dense viewpoints are set, resulting in huge data volume for a single viewpoint (such as over 40MB at 512x512 resolution), and multi-view data packets can reach GB level, completely losing the core value of "lightweight".
[0010] 2. Existing compression methods are mostly designed for general images or videos. Directly applying them to LDI will destroy their inherent three-dimensional geometric coherence, interlayer depth structure and part-level semantic information. As a result, the compressed data cannot support accurate reconstruction, lighting calculation and object-level interaction, and it is difficult to meet the analysis needs of digital twins.
[0011] 3. There is a lack of systematic mechanisms that allow users to make quantitative and controllable trade-offs between fidelity and data volume based on actual application scenarios (such as mobile preview, web collaboration, and high-fidelity review). Parameter configuration relies on expert experience, and the process is unpredictable.
[0012] Therefore, a completely new solution is urgently needed to fundamentally resolve the aforementioned contradictions. This solution should achieve a balance between high-fidelity visual detail, extremely low storage overhead, and stable rendering performance, and construct a complete technical system encompassing efficient data representation, intelligent compression encoding, and real-time high-quality rendering. Summary of the Invention
[0013] To overcome the aforementioned problems, this invention proposes a lightweight compression method and system for 3D models based on layered depth images. Its core lies in proposing a multi-level collaborative compression and visual perception reconstruction system, comprising:
[0014] 1. A perception-oriented attribute quantification model that differentiates geometric, normal, and color data based on visual importance.
[0015] 2. A structure-aware sparse coding strategy that utilizes the two-dimensional spatial coherence and three-dimensional interlayer correlation of CAD model projection.
[0016] 3. An end-to-end controllable fidelity mechanism allows users to drive the entire preprocessing pipeline with a single parameter (such as the target data volume or fidelity score), automatically optimizing the number of viewpoints, quantization accuracy, etc., to achieve continuous spectral output from extreme compression to visual lossless output.
[0017] Therefore, the first objective of this invention is to provide a lightweight compression method for 3D models based on layered depth images, which includes the following steps:
[0018] The 3D CAD model is pre-rendered from a selected key viewpoint into layered depth image data, which includes a layered pixel index table and a layered pixel storage area.
[0019] Convert the pixel data in the hierarchical pixel storage area into compact format pixel data;
[0020] The index data in the hierarchical pixel index table is optimized into a sparse index list;
[0021] The compact format pixel data and the sparse index list are compressed, encoded, and encapsulated to generate a lightweight layered depth image file.
[0022] Preferably, the step of pre-rendering the 3D CAD model from a selected key viewpoint into layered depth image data includes:
[0023] Calculate the axis-aligned bounding box of the 3D CAD model to obtain the model space reference;
[0024] Based on the model space benchmark and the preset set of viewpoints, N key viewpoints are determined according to the uniform distribution selection algorithm.
[0025] Perform depth stripping rendering on each of the key viewpoints to obtain multi-layer surface point fragments under that viewpoint;
[0026] Based on all the multi-layer surface point fragments, construct layered depth image data including the layered pixel index table and the layered pixel storage area.
[0027] Preferably, the step of converting the pixel data in the hierarchical pixel storage area into a compact format includes:
[0028] After normalizing the world coordinates of the pixel data to the axis-aligned bounding box range, it is converted into a half-precision floating-point number;
[0029] The normal vector of the pixel data is compressed into a 32-bit integer by octahedral mapping encoding;
[0030] The color values of the pixel data are compressed into 16-bit integers using RGB565 encoding;
[0031] The depth value of the pixel data is converted into a half-precision floating-point number, the texture coordinates are quantized into 16-bit integers, and the part number is converted into a 16-bit integer.
[0032] All fields that have undergone the above transformations are assembled into compact pixel data with a single pixel size of no more than 24 bytes.
[0033] Preferably, optimizing the index data in the hierarchical pixel index table into a sparse index list includes:
[0034] Traverse the layered pixel index table and identify non-blank entries to obtain a list of non-blank pixel positions;
[0035] Based on the non-blank pixels in the non-blank pixel position list, a sparse index structure is created for each non-blank pixel to obtain a set of sparse index entries.
[0036] Organize the set of sparse index entries into a sparse index list.
[0037] Preferably, the step of compactly encoding and encapsulating the compact format pixel data and the sparse index list to generate a lightweight layered depth image file includes:
[0038] The compact format pixel data is compressed and encoded to obtain compressed pixel data blocks;
[0039] The sparse index list is compacted and encoded to obtain compressed index data blocks;
[0040] The compressed pixel data block, the compressed index data block, and the metadata describing the layered depth image data are packaged in a predetermined format to generate the final lightweight layered depth image file.
[0041] Preferably, after compressing and encapsulating the compact format pixel data and the sparse index list to generate a lightweight layered depth image file, the method further includes decompressing and visualizing the lightweight layered depth image file, generating a visualization model, and performing fidelity verification, including:
[0042] Load and decompress the lightweight layered depth image file to obtain decompressed compact format pixel data and decompressed sparse index data;
[0043] The decompressed compact format pixel data and the decompressed sparse index data are uploaded to the GPU buffer;
[0044] When the target viewpoint coincides with the key viewpoint, the GPU buffer data is decoded and rendered in the shader to generate a rendered image.
[0045] When the target viewpoint is located between key viewpoints, viewpoint interpolation is performed based on the rendering results of adjacent key viewpoints to generate a rendered image of the target viewpoint.
[0046] The original model is compared with the rendered image to generate a fidelity evaluation report.
[0047] The second objective of this invention is to provide a lightweight compression system for 3D models based on layered depth images, implementing the steps of any of the aforementioned lightweight compression methods for 3D models based on layered depth images, including:
[0048] The pre-rendering module is used to convert a 3D CAD model into layered depth image data from multiple key perspectives. The layered depth image data includes a layered pixel index table and a layered pixel storage area.
[0049] A format conversion module is used to convert pixel data in the hierarchical pixel storage area into a compact format;
[0050] A sparse storage module is used to optimize the index data in the hierarchical pixel index table into a sparse index list.
[0051] The compression encoding module is used to compress and encode the compact format pixel data and the sparse index list, and package them to generate a lightweight layered depth image file.
[0052] Preferably, a lightweight compression system for 3D models based on layered depth images further includes a rendering and verification module, wherein the rendering and verification module includes:
[0053] The file loading and decompression unit is used to load and decompress the lightweight layered depth image file to recover compact format pixel data and sparse index list.
[0054] The GPU resource management unit is used to upload the decompressed data to the video memory and decode and render it through the shader program;
[0055] A viewpoint interpolation unit is used to interpolate images between key viewpoints to generate smooth viewpoint transitions.
[0056] The fidelity evaluation unit is used to calculate fidelity metrics by comparing the rendered images of the original model and the lightweight model.
[0057] A third objective of this invention is to provide an electronic device comprising:
[0058] Memory, used to store computer programs;
[0059] A processor is used to execute a program stored in memory to implement any of the steps of a lightweight compression method for 3D models based on layered depth images.
[0060] The fourth objective of this invention is to provide a computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program, which, when executed by a processor, implements the steps of any of the above-mentioned lightweight compression methods for three-dimensional models based on layered depth images.
[0061] The beneficial effects of this invention are as follows:
[0062] 1. Breakthrough quality-efficiency ratio: Through a three-level collaborative strategy of "perceptual compression + sparse storage + entropy encoding", it achieves a significant effect of nearly 30 times compression of single-view data and an overall compression ratio of 50:1 to 100:1 for the whole model while achieving visual high fidelity (SSIM>0.99), providing an efficient solution for the network distribution and real-time loading of large-scale CAD models.
[0063] 2. Advanced and Controllable Trade-off Mechanism: By introducing a preprocessing pipeline based on target parameters (data volume or fidelity), users can intuitively quantify the trade-off between fidelity and file size. The system can then automatically recommend near-optimal compression configurations, such as the number of key perspectives, transforming the lightweighting process from an experience-dependent "black box" operation to a transparent and controllable "white box" configuration, significantly improving usability and predictability of results.
[0064] 3. Preservation of complete semantic and interactive capabilities: By innovatively preserving part-level identifiers (PartIDs) in a compact data format, the lightweight model can not only be used for high-quality visualization, but also support key interactive operations such as part picking, highlighting, isolated display, and attribute querying. This overcomes the fundamental defect of traditional image compression methods that lose the engineering semantic information of 3D models, truly meeting the needs of digital twin applications for in-depth model analysis and collaboration.
[0065] 4. Fundamental Improvement in Rendering Performance: Based on the rendering principle of layered depth images, the rendering complexity is fundamentally changed from being related to the number of faces in the original model (O(face count)) to being related only to the output image resolution (O(width × height)). Therefore, no matter how complex the original CAD model is, the client rendering performance remains stable and efficient, making it particularly suitable for smoothly browsing ultra-large-scale models on ordinary PCs, web browsers, and mobile devices.
[0066] 5. Forming a feedback-enabled optimization loop: By moving fidelity assessment from the final verification stage into the processing flow, a feedback-enabled closed loop of "compression-assessment-optimization" is constructed. The system can identify visually weak areas based on quantitative assessment results (such as difference heatmaps), providing a basis for targeted resampling or parameter adjustments, thus enabling the lightweight effect to have the potential for continuous optimization and self-improvement. Attached Figure Description
[0067] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with the invention and, together with the description, serve to explain the principles of the invention.
[0068] Figure 1 This is a flowchart illustrating a lightweight compression method for 3D models based on layered depth images, according to an embodiment of the present invention.
[0069] Figure 2 This is a schematic diagram illustrating the process of pre-rendering a 3D CAD model from a selected key perspective into layered depth image data in an embodiment of the present invention.
[0070] Figure 3 This is a schematic diagram illustrating the process of converting pixel data in the hierarchical pixel storage area into compact format pixel data in an embodiment of the present invention;
[0071] Figure 4 This is a schematic diagram illustrating the process of optimizing the index data in the hierarchical pixel index table into a sparse index list in an embodiment of the present invention;
[0072] Figure 5 This is a schematic diagram of the process of compressing and encapsulating the compact format pixel data and the sparse index list to generate a lightweight layered depth image file in an embodiment of the present invention.
[0073] Figure 6 This is a schematic diagram illustrating the process of decompressing and visualizing the lightweight layered depth image file, generating a visual model, and verifying its fidelity in an embodiment of the present invention.
[0074] The accompanying drawings illustrate specific embodiments of the invention, which will be described in more detail below. These drawings and descriptions are not intended to limit the scope of the invention in any way, but rather to illustrate the concept of the invention to those skilled in the art through reference to particular embodiments. Detailed Implementation
[0075] The present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, rather than all of the embodiments. All other embodiments obtained by those of ordinary skill in the art based on the described embodiments of the present invention without creative efforts fall within the scope of protection of the present invention. Unless otherwise defined, the technical terms or scientific terms used herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which the present invention pertains.
[0076] Term Explanation:
[0077] Uniform Distribution Selection Algorithm: The uniform distribution selection algorithm is a mathematical method in the present invention for intelligently selecting a smaller set of key perspectives with as uniform an observation coverage as possible from a predefined set of candidate perspectives.
[0078] In the lightweight compression based on Layered Depth Images (LDI), to balance the data volume and visual fidelity, it is not necessary to store all possible perspectives. The purpose of this algorithm is to achieve the most balanced sampling coverage of the entire 3D model appearance with the minimum number of perspectives, so as to maximize the reduction of the number of LDI files required to be generated and stored while ensuring a good approximation effect when observing the model from any direction. This is the key first step to achieve overall lightweighting.
[0079] The uniform distribution selection algorithm assumes that the total number of perspective sets that predefine and completely cover the possible viewing directions of the model is M (such as 36 uniformly distributed perspectives around). The expected number of key perspectives set by the user according to the target data volume or fidelity requirement is N (N < M, such as N = 7). The uniform distribution selection algorithm determines the index Index_i of the i-th key perspective in the preset set through the following steps: First, calculate the theoretical ideal position (i * M) / N of each selected perspective in the preset sequence; then map this theoretical position to the closest actual existing preset perspective index through a rounding operation (usually the floor function); finally, perform a modulo operation (%M) on the result to ensure that the index falls within the valid range [0, M - 1].
[0080] Its standard calculation formula can be uniformly expressed as: Index_i = floor((i * M) / N)%M, where i is an integer within the range of [0, N - 1], representing the i-th selected key perspective.
[0081] This algorithm ensures that the selected N key viewpoints are approximately equally spaced within a preset sequence of M viewpoints. For example, when M=36 and N=7, the algorithm automatically selects viewpoints with indices 0, 5, 10, 15, 20, 25, and 30, which are essentially evenly distributed within a 360-degree surround view. This uniformity minimizes blind spots and guarantees the acquisition of the most representative set of model appearance information with the least amount of data (N LDI files). This provides a quantifiable and controllable viewpoint configuration scheme for achieving a significant improvement in lightweight design with only a slight decrease in accuracy.
[0082] Layered Pixel Index Table: The layered pixel index table is the core organization and navigation component in the layered depth image (LDI) data structure described in this invention. It is a fixed-length, indexed data structure used to efficiently manage and access irregular multi-layered pixel data stored in the "layered pixel storage area".
[0083] Its core objective is to establish a fast index pointing to the corresponding data in the hierarchical pixel storage area for each pixel location on the image plane, and to record the actual number of effective data layers stored at each pixel location. It is an array whose size strictly corresponds to the image resolution, with each entry containing two fields: "number of hierarchical pixels" and "hierarchical pixel index". In this invention, by optimizing the table for sparsity, the indexing overhead of blank pixels can be further eliminated, significantly improving storage efficiency.
[0084] Axis-Aligned Bounding Box (AABB): A cubic bounding box whose sides are all parallel to coordinate axes (such as the X, Y, and Z axes of the world coordinate system). It is the smallest cube containing all vertices of a 3D model, defined by the minimum and maximum values of the model on the three coordinate axes, commonly represented by the minimum point (min_point) and the maximum point (max_point). In this invention, it is used to establish the spatial reference of the model, perform coordinate normalization, and construct the viewing sphere.
[0085] Octahedral mapping encoding: an efficient encoding method for compressing 3D unit normal vectors. It utilizes the property that the unit normal vector lies on a unit sphere, projecting it onto an octahedral surface, then unfolding and quantizing this 2D projection into integers. During decoding, the original normal vector direction can be recovered with high precision through an inverse transform. In this invention, it is used to compress normal vector data from 12 bytes to 4 bytes.
[0086] Compact coding: refers to a class of general algorithms that utilize data redundancy for lossless compression, such as the DEFLATE algorithm (which combines LZ77 and Huffman coding). It reduces storage size by identifying and eliminating repetitive patterns in the data. In this invention, it is used to perform final compression on optimized geometric and index data to generate lightweight files.
[0087] Example 1:
[0088] To better understand the concept behind this invention's lightweight compression method for 3D models based on layered depth images, a specific example—the lightweight compression and visualization of a Boeing 777 aircraft model—will be used to illustrate each step of the method and its technical effects, providing a detailed explanation of the entire process. The Boeing 777 aircraft model, containing over 14,000 parts and with an original CAD file size of 587MB, is the object of this processing.
[0089] This invention provides a lightweight compression method for 3D models based on layered depth images, such as... Figure 1 As shown, it includes the following steps:
[0090] S1, pre-render the 3D CAD model from the selected key viewpoint into layered depth image data, the layered depth image data including a layered pixel index table and a layered pixel storage area.
[0091] Storage methods based on layered depth images (LDI) primarily depend on image resolution rather than the number of geometric patches in the original 3D model, thus effectively avoiding the direct dependence of complex models on computational and storage resources.
[0092] The described layered depth image (LDI)-based storage method first inputs the original 3D CAD model. Then, using a rendering engine, it transforms the complex geometry into a graphical representation with depth information and semantic hierarchy, starting from a series of selected key viewpoints. Finally, it outputs a set of layered depth image (LDI) data corresponding to different viewpoints and structural specifications. Each LDI data set contains two core parts: a layered pixel index table for efficient retrieval, and a layered pixel storage area that centrally stores the attribute data of all surface points.
[0093] like Figure 1 As shown, this step can be further broken down into the following sub-steps:
[0094] S11, Calculate the axis-aligned bounding box of the three-dimensional CAD model to obtain the model space reference.
[0095] Read the 3D CAD model and establish the spatial reference system required for subsequent processing. First, input a 3D CAD model file in an industry-standard format, such as STEP, IGES, 3DXML, or JT.
[0096] Next, its geometric and topological data are parsed, and the axis-aligned bounding box of the 3D CAD model is calculated. Finally, the model space reference is obtained, namely the minimum vertex min_point and the maximum vertex max_point of the bounding box.
[0097] In this embodiment: the input model file is a Boeing 777 aircraft model file, which contains boundary representation (B-Rep) geometric data, assembly tree structure, and part identifiers. The system parses the Boeing 777 aircraft model file and converts it into an internal triangular mesh representation.
[0098] Subsequently, all mesh vertices are traversed, and their minimum and maximum values in the X, Y, and Z dimensions in the world coordinate system are calculated to obtain the bounding box parameters: min_point=(x_min,y_min,z_min), max_point=(x_max,y_max,z_max). This axis-aligned bounding box will be used for coordinate normalization in S21.
[0099] S12, Based on the model space reference and the preset view set, N key view points are determined according to the uniform distribution selection algorithm.
[0100] Based on the model space reference obtained in step S11, a set of virtual camera parameters that are uniformly distributed in space and can completely cover the appearance of the model are generated.
[0101] First, input the model space reference output in step S11, which is the axis-aligned bounding box parameter of the 3D CAD model, including the minimum vertex min_point and the maximum vertex max_point, as well as the desired number of key viewpoints N.
[0102] Next, on a virtual sphere that completely surrounds the model, N key observation points are selected from a set of M observation points pre-generated using a uniform latitude and longitude division method, based on a uniform distribution selection algorithm. The calculation formula for the algorithm is as follows:
[0103] Index_i = floor((i*M) / N)%M
[0104] Where i is an integer from 0 to N-1, and floor is a floor function. Finally, the key viewpoint parameters of the camera for each key viewpoint are obtained. The key viewpoint parameters include the camera position calculated based on the observation sphere, the camera orientation pointing to the center of the model bounding box, and the set up direction vector (usually the Y-axis of the world coordinate system).
[0105] In this embodiment: a virtual observation sphere is constructed with the center of the model bounding box calculated in step S11 as the center and a radius of 1.5 times the diagonal length of the model bounding box. On the virtual observation sphere, M observation points are pre-generated using a uniform latitude and longitude division method. The number of observation points can be 36 (i.e., one observation point is set every 10 degrees of longitude). The desired number of key viewpoints N is set, which can be 7. M=36 and N=7 are substituted into the above formula for calculation:
[0106] When i=0, Index_0=floor((0*36) / 7)%36=0
[0107] When i=1, Index_1=floor((1*36) / 7)%36=floor(5.14)%36=5
[0108] When i=2, Index_2=floor((2*36) / 7)%36=floor(10.29)%36=10
[0109] When i=3, Index_3=floor((3*36) / 7)%36=floor(15.43)%36=15
[0110] When i=4, Index_4=floor((4*36) / 7)%36=floor(20.57)%36=20
[0111] When i=5, Index_5=floor((5*36) / 7)%36=floor(25.71)%36=25
[0112] When i=6, Index_6=floor((6*36) / 7)%36=floor(30.86)%36=30
[0113] Calculate sequentially to obtain the selected viewpoint index set as {0,5,10,15,20,25,30}.
[0114] For each selected index, the key viewpoint parameters are: the corresponding spherical coordinates are used as the camera position calculated based on the observed sphere, the camera orientation is always set to point to the center of the model bounding box (model center), and the upward direction is uniformly set to the positive Y-axis direction of the world coordinate system.
[0115] S13, perform depth stripping rendering on each of the key perspectives to obtain multi-layer surface point fragments under that perspective.
[0116] For each key viewpoint, this step aims to capture all visible surface points of the model completely using depth stripping technology, forming a multi-layered collection of fragments ordered by depth. The input is the key viewpoint parameters (camera position, orientation, etc.) determined in step S12, and preset rendering configurations are required, such as viewport resolution (e.g., 512×512), vertical field of view (e.g., 60 degrees), and near and far clipping plane distances.
[0117] Based on the aforementioned camera parameters, a view-projection matrix is constructed, driving the GPU to perform multiple passes of depth stripping rendering. This process automatically records and penetrates all surfaces without requiring manual specification of the stripping layers beforehand; its termination condition is determined by the actual geometric complexity of the model in that viewpoint. Specifically: the first rendering pass uses a standard depth test to capture and output the foremost surface point, while simultaneously writing its depth value to a dedicated texture; each subsequent rendering pass dynamically adjusts the test conditions based on the depth texture generated in the previous pass, capturing only the next layer of surface points with depth values greater than those already recorded, thus stripping layer by layer. Each rendering pass outputs the complete attributes of the captured fragment through a Multiple Render Target (MRT), including world coordinates, normal vector, color, depth, texture coordinates, and part ID. This loop continues until no new fragments are generated, ultimately resulting in a set of multi-layer surface point fragments corresponding to all pixel positions in that viewpoint, strictly ordered by depth.
[0118] In this embodiment: For the key viewpoint with index 0, its camera parameters are determined according to S12: the position is at the specified coordinates on the observation sphere, facing the center of the model, and the upward direction is (0,1,0).
[0119] The rendering configuration was set as follows: viewport resolution 512×512, vertical field of view 60 degrees, near clipping plane distance set to 0.1 times the distance from the camera to the nearest point of the model bounding box, and far clipping plane distance set to twice the distance from the camera to the farthest point of the model bounding box. After performing the above depth stripping process, multiple model surface points corresponding to each pixel location were successfully captured. Each captured pixel fragment contains the following primitive attributes:
[0120] World coordinates (world_position:vec3) <f32>(12 bytes)
[0121] Unit normal vector (normal:vec3) <f32>(12 bytes)
[0122] sRGB color (color:vec3) <f32>(12 bytes)
[0123] View space linear depth (depth: f32, 4 bytes)
[0124] Texture coordinates (texture_coord:vec2) <f32>(8 bytes)
[0125] Part ID (part_id:u32, 4 bytes)
[0126] Each data segment contains a total of 52 bytes.
[0127] S14, construct layered depth image data including the layered pixel index table and the layered pixel storage area based on all the multi-layered surface point fragments.
[0128] The multi-layered surface point fragment set output in step S13, organized by rendering batch, is organized into a layered depth image (LDI) data that supports efficient random access based on screen pixel coordinates. For all multi-layered surface point fragments in a key viewpoint, they are combined according to the screen space pixels corresponding to the surface points. Each screen space pixel coordinate (x, y) may correspond to zero to multiple surface points at different levels. Each surface point contains its depth value z in view space and the original attributes contained in each captured pixel fragment as defined in step S13. The layered depth image (LDI) data consists of two core components:
[0129] Hierarchical pixel index table: A fixed-length array of size equal to the image width × image height. Each array element corresponds to a pixel position and stores a structure containing two fields.
[0130] layer_count: The number of valid surface point layers captured at this pixel location. The data type is a 32-bit unsigned integer (u32).
[0131] start_index: The starting index of the first layer surface point data at this pixel location in the layered pixel storage area. The data type is a 32-bit unsigned integer (u32).
[0132] Each structure occupies 8 bytes.
[0133] Layered pixel storage area: A contiguous memory region used to sequentially and continuously store all layer surface point data for each pixel location. Each surface point is stored as a structure containing the following fields:
[0134] World coordinates (world_position:vec3) <f32>(12 bytes)
[0135] Unit normal vector (normal:vec3) <f32>(12 bytes)
[0136] sRGB color (color:vec3) <f32>(12 bytes)
[0137] View space linear depth (depth:f32, 4 bytes)
[0138] Texture coordinates (texture_coord:vec2) <f32>(8 bytes)
[0139] Part ID (part_id:u32, 4 bytes)
[0140] Each surface point structure occupies 52 bytes.
[0141] Next, execute the following build process:
[0142] Initialization: Create a layered pixel index table with a size of image width × image height. Each entry has a preset layer count of 0 and the start index start_index is initialized to an invalid value, which can be 0xFFFFFFFF. At the same time, create an initially empty layered pixel storage area.
[0143] Fragment Collection and Depth Sort: Traverse all input fragments and assign them to their corresponding pixel units based on their pixel coordinates (x, y). For multiple fragments within each pixel unit, sort them in ascending order by their depth value z to ensure that the storage order is from near to far, and the first fragment accessed is the data of the surface with the shallowest depth value.
[0144] Layered pixel index table filling: sequentially traverse each pixel unit. If the number of segments in the unit is greater than 0, then at the pixel position in the layered pixel index table, layer_count is set to the actual number of segments, and start_index is set to the starting index of the current layered pixel storage area. Thus, when traversing each pixel, each surface point corresponding to the pixel position in the pixel storage area can be accessed in an array-like manner based on the start_index value stored in the pixel.
[0145] Hierarchical pixel storage area filling: For each pixel unit containing fragments, all its sorted fragment data (world coordinates, normal vector, color, depth, texture coordinates, part ID) are appended sequentially to the end of the hierarchical pixel storage area.
[0146] The final result is a complete, structured layered depth image (LDI) dataset, which includes a layered pixel index table for fast lookup and a compact layered pixel storage area for storing all geometric and attribute data.
[0147] In this embodiment: for the rendering result at a resolution of 512×512, after executing the above construction process, the following data structure is obtained:
[0148] Hierarchical pixel index table: This is a fixed-length array of size 262,144 (512×512). Each element corresponds to a pixel position in the image and is a structure containing:
[0149] layer_count(u32): The number of valid surface point layers captured at this pixel location (0 to 3) 4 bytes.
[0150] start_index(u32): The starting index of the first layer surface point data at this pixel location in the layered pixel storage area. 4 bytes.
[0151] Each entry is 8 bytes, and the entire table occupies a fixed 262,144 × 8 bytes = 2,097,152 bytes ≈ 2.0 MB.
[0152] Layered Pixel Storage Area: This is a contiguous memory region that stores all layer surface point data for all valid pixel locations in the order described above. Assuming an average of 3 layers of data per pixel, a total of approximately 262,144 × 3 = 786,432 surface points are stored. Each surface point occupies 52 bytes (as described in S13). The total size of this storage area is approximately 786,432 × 52 bytes = 40,894,464 bytes ≈ 39.0 MB.
[0153] Therefore, the total amount of raw LDI data for a single viewpoint is approximately 41.0 MB.
[0154] S2, convert the pixel data in the layered pixel storage area into compact format pixel data.
[0155] To significantly improve storage efficiency, this step performs compact encoding on the data in the layered pixel storage area of the layered depth image (LDI). The original pixel data stores attributes such as normal vectors and color with full precision, resulting in storage redundancy. In fact, to ensure the final rendered visual effect, many attributes only need to retain a specific level of precision, and the slight loss is imperceptible to the human eye. Therefore, this invention applies targeted lossy compression and quantization strategies to fields such as normal vectors and color, compressing the data size per pixel from 52 bytes to 24 bytes at the cost of negligible loss of visual precision. This achieves a significant increase in storage efficiency while maintaining high fidelity.
[0156] like Figure 3 As shown, this step can be further broken down into the following sub-steps:
[0157] S21, after normalizing the world coordinates of the pixel data to the range of the axis-aligned bounding box, convert it into a half-precision floating-point number.
[0158] Using the model space reference established in step S11, the original world coordinate values from the hierarchical pixel storage area are mapped to the unit space.
[0159] First, input the original world coordinates coord_world (vec3) within the layered pixel storage area from step S14. <f32>The model space reference output in step S11, namely the minimum vertex min_point and the maximum vertex max_point of the axis-aligned bounding box.
[0160] Next, normalization calculations are performed on the world coordinates of each pixel, using the following formula:
[0161] coord_normalized=(coord_world-min_point) / (max_point-min_point)
[0162] This calculation subtracts the bounding box origin `min_point` from `coord_world` to translate it to relative space, then divides it by the bounding box size (`max_point - min_point`) to scale it to a unit cube with a side length of 1, thus linearly mapping all coordinate values to the [0,1] interval. Finally, the normalized 3D vector `coord_normalized` is converted from a 32-bit single-precision floating-point number (`f32`) to a 16-bit half-precision floating-point number (`f16`) for storage. After this step, the world coordinate field is compressed from 12 bytes to 6 bytes of normalized half-precision coordinates.
[0163] In this embodiment: For each pixel in the layered pixel storage area of a single-view LDI of the Boeing777 model, its original world coordinates coord_world are read. Using the min_point and max_point calculated by S11, coord_normalized is calculated according to the above formula, and its X, Y, and Z components are converted into f16 format for storage. This field is compressed from 12 bytes to 6 bytes.
[0164] S22, the normal vector of the pixel data is compressed into a 32-bit integer by octahedral mapping encoding.
[0165] Octahedral mapping encoding is applied to efficiently compress the 3D unit normal vector. First, the normalized 3D normal vector n=(n_x,n_y,n_z) from the layered pixel storage area in step S14 is input. Then, octahedral mapping encoding is performed:
[0166] 1. Calculate the L1 norm: l1 = |n_x| + |n_y| + |n_z|.
[0167] 2. Project onto the octahedral surface: oct_x = n_x / l1, oct_y = n_y / l1.
[0168] 3. Based on the sign of the z-component of the normal vector, the projected coordinates are processed as follows:
[0169] If n_z < 0, then perform a mirror folding operation on the projected coordinates to map the points on the lower hemisphere of the octahedron to the upper hemisphere: oct_x = (1 - |oct_y|) * sign(oct_x), oct_y = (1 - |oct_x|) * sign(oct_y); if n_z >= 0, then keep the projected coordinates oct_x and oct_y unchanged.
[0170] 4. Map and quantize the two-dimensional projected coordinates from [-1, 1] to the integer range of [0, 65535] using the following formula:
[0171] x=round((oct_x*0.5+0.5)*65535),
[0172] y=round((oct_y*0.5+0.5)*65535).
[0173] 5. Packaging output: encoded_normal=(x<<16)|y (a 32-bit unsigned integer).
[0174] After this step, the normal vector field is compressed from 12 bytes to 4 bytes, and the output is a 32-bit compressed normal vector.
[0175] In this embodiment: the above encoding process is performed on the normal vector of each pixel in the hierarchical pixel storage area to complete the compression of the normal vector from 12 bytes to 4 bytes.
[0176] S23, the color value of the pixel data is compressed into a 16-bit integer using RGB565 encoding.
[0177] This sub-step compresses the original 32-bit floating-point color value into a standard 16-bit integer in RGB565 format, significantly reducing the storage footprint of the color data. First, input the original floating-point RGB color value color=(r,g,b) (vec3) from the layered pixel storage area in step S14. <f32>Next, the standard RGB565 encoding process is executed: each floating-point channel value in the range [0.0, 1.0] is quantized into 5-bit (red), 6-bit (green), and 5-bit (blue) integers respectively, and combined into a 16-bit unsigned integer (u16) in the format (R5<<11)|(G6<<5)|B5. After this step, the color field is compressed from 12 bytes to 2 bytes, and the output is a 16-bit compressed color.
[0178] In this embodiment: the above encoding is performed on the color value of each pixel. For example, the color (1.0, 0.5, 0.0) is encoded as 16-bit 0xF820. This field completes the compression from 12 bytes to 2 bytes.
[0179] S24, convert the depth value of the pixel data into a half-precision floating-point number, quantize the texture coordinates into a 16-bit integer, and convert the part number into a 16-bit integer.
[0180] This step targets the data characteristics of depth, texture coordinates, and part number, reducing their storage precision to the minimum bit width that meets application requirements and is lossless (or visually lossless) to achieve efficient compression. The input step S14 involves the original depth (f32) and texture coordinates (vec2) within the layered pixel storage area. <f32>(and part number (u32)). The specific conversion strategy is as follows:
[0181] 1. Depth Conversion: Directly converts a 32-bit single-precision floating-point number (f32) to a 16-bit half-precision floating-point number (f16), and outputs the half-precision depth.
[0182] 2. Texture coordinate transformation: The U and V components are quantized from f32 to 16-bit unsigned integers (u16), with a mapping range of [0, 65535], and the output is the quantized texture coordinates.
[0183] 3. Part Number Conversion: Truncate the 32-bit unsigned integer (u32) into a 16-bit unsigned integer (u16), and output the 16-bit part number.
[0184] After this step, the depth field is compressed from 4 bytes to 2 bytes, the texture coordinates are compressed from 8 bytes to 4 bytes, and the part number is compressed from 4 bytes to 2 bytes.
[0185] In this embodiment, the depth value, texture coordinates, and part number are converted according to the above strategy. These fields are compressed from a total of 16 bytes to 8 bytes. At this point, all pixel attribute data is integrated into a compact 20-byte structure. To meet computer memory alignment requirements (typically multiples of 8 bytes) and optimize storage and access efficiency, 4 bytes of padding are added, bringing the total length of the single compact pixel structure to 24 bytes.
[0186] S25, assemble all the fields that have undergone the above transformation into compact format pixel data with a single pixel size of no more than 24 bytes.
[0187] All compressed fields output from steps S21 to S24 are integrated into a unified compact format pixel data.
[0188] First, input all the compressed fields obtained in the previous steps. Then, arrange these fields in a predefined order (world coordinates, normal vector, texture coordinates, depth, color, part number), and add padding bytes to meet memory alignment requirements (e.g., 4-byte alignment), assembling them into a new pixel structure. Finally, output a compact pixel structure with a total size no greater than 24 bytes. The collection of all compact pixel structures constitutes the final output compact format pixel data of step S2.
[0189] In this embodiment: the above fields are assembled into a compact pixel structure, with the memory layout as follows: [world coordinates (vec3)] <f16>[6 bytes] [Compressed normal vector (u32, 4 bytes)] [Texture coordinates (vec2)] <u16>The data structure consists of [f16, 2 bytes], [depth (f16, 2 bytes)], [color (u16, 2 bytes)], [part number (u16, 2 bytes)], and [padding (4 bytes)], totaling 24 bytes. Thus, the data for a single pixel is compressed from the original 52 bytes to 24 bytes, a compression ratio of approximately 2.17 times. For a single-view LDI, the size of its layered pixel storage area is reduced from approximately 39.0 MB to approximately 18.0 MB. This compact 18.0 MB pixel data is the final output of step S2.
[0190] S3, optimize the index data in the layered pixel index table into a sparse index list.
[0191] This invention optimizes index data using a sparse storage strategy, addressing the problem of numerous blank pixels inefficiently occupying storage space in traditional fixed-length array storage methods based on image resolution. Specifically, this invention fully leverages the highly spatially sparse nature of Layered Depth Images (LDI) on the image plane (i.e., the model surface is projected onto only a portion of the pixels). By identifying, reconstructing, and storing only pixel location indices containing valid surface data, it completely eliminates the storage overhead for blank pixel regions, thereby significantly compressing the size of the index data and ensuring that the lightweight result contains only valid information.
[0192] like Figure 4 As shown, this step can be further broken down into the following sub-steps:
[0193] S31, traverse the layered pixel index table and identify non-blank entries to obtain a list of non-blank pixel positions.
[0194] Analyze the data distribution of the hierarchical pixel index table to locate the positions of all pixels containing valid surface point data.
[0195] First, input the layered pixel index table constructed in step S14. The layered pixel index table is a fixed-length array with a size of image width × image height. Each entry of the fixed-length array contains layer_count and start_index fields.
[0196] Next, each entry in the array is linearly traversed, and the value of its `layer_count` field is checked. If `layer_count > 0`, the pixel position corresponding to that entry is determined to be a non-blank pixel, and its one-dimensional position index in the entire image is recorded (usually calculated as y * image width + x). Finally, a list of non-blank pixel positions containing all non-blank pixel position indices is output, and the total number and proportion of non-blank pixels are calculated, providing an accurate basis for subsequent sparse storage.
[0197] In this embodiment: The layered pixel index table of the LDI for a single viewpoint of the Boeing 777 model (a total of 262,144 entries) is traversed. Statistics show that entries with a layer_count greater than 0 account for approximately 14.3%, totaling about 37,500; these are non-blank pixels. The system simultaneously generates a list containing these approximately 37,500 position indices (such as one-dimensional array subscripts). The remaining approximately 85.7% of entries have a layer_count of 0, corresponding to blank pixels, and no storage space needs to be allocated for them.
[0198] S32, based on the non-blank pixels in the non-blank pixel position list, create a sparse index structure for each non-blank pixel to obtain a set of sparse index entries.
[0199] For each non-blank pixel in the list of non-blank pixel positions output in step S31, create a new structure containing complete index information.
[0200] First, input the list of non-blank pixel locations output in step S31, and the corresponding layer_count and start_index values read from the original layered pixel index table. Next, create a sparse index entry for each location in the list of non-blank pixel locations. This sparse index entry contains the following three fields:
[0201] image_position: Records the position identifier of the non-blank pixel in the image grid. The data type is a 32-bit unsigned integer (u32).
[0202] layer_count: Directly inherited from the original index table, it records the number of surface point layers at this location. The data type is a 32-bit unsigned integer (u32).
[0203] pixel_index: Directly inherited from the original index table, it records the starting index of the first layer surface point data at this position in the compact pixel storage area (after S2 conversion), and the data type is a 32-bit unsigned integer (u32).
[0204] Each sparse index entry occupies 12 bytes. Finally, the output is a set of all sparse index entries.
[0205] In this embodiment: a SparseIndex structure is created for each of the approximately 37,500 non-blank pixel positions obtained in step S31. For example, for a non-blank pixel located at image coordinates (x, y), its image_position is set to y*512+x. The sparse index entries total 12 bytes.
[0206] S33, organize the set of sparse index entries into a sparse index list.
[0207] Organize scattered sparse index entries into an ordered, continuous data structure to replace the hierarchical pixel index table.
[0208] First, input the set of sparse index entries output by S32.
[0209] Next, all sparse index entries in the sparse index entry set are sorted in ascending order according to the value of their image_position field to ensure the order and predictability of the index. Then, these sorted sparse index entry sets are stored contiguously in a memory region to form a sequential list.
[0210] Ultimately, the sequence list is a sparse index list, which serves as an equivalent and compressed replacement for the hierarchical pixel index table, used to quickly retrieve the corresponding data layer information based on the pixel position during rendering.
[0211] In this embodiment: approximately 37,500 sparse index entries are sorted by image_position and stored contiguously. The total size of this sparse index list is approximately 37,500 × 12 bytes = 450,000 bytes ≈ 0.45 MB. Compared to the original 2.0 MB hierarchical pixel index table, the index data is compressed by approximately 78.6%. Thus, the total data volume of a single-view LDI consists of the compact format pixel data output by S2 (approximately 18.0 MB) and the sparse index list output in this step (approximately 0.45 MB), totaling approximately 18.45 MB.
[0212] S4, the compact format pixel data and the sparse index list are compressed and encapsulated to generate a lightweight layered depth image file.
[0213] Based on the compact pixel data output by S2 and the sparse index list output by S3, lossless compression algorithms (such as DEFLATE) are applied to eliminate data redundancy within the compact pixel data and the sparse index list, and then standardized file encapsulation is performed to achieve an extremely high overall compression ratio. Specifically:
[0214] 1. For compact pixel data: The redundancy within the compact pixel data mainly stems from spatial locality. Due to the continuous surface of the 3D model, the geometric attributes such as world coordinates and normal vectors of adjacent pixels change gradually, forming a large number of continuous, progressively changing byte sequences in the data stream, which are very suitable for dictionary-matching-based compression algorithms.
[0215] 2. Regarding sparse index lists: The redundancy within the sparse index list mainly stems from statistical regularity. Since the image_position and pixel_index fields in the sparse index list are typically ordered, increasing integer sequences, while the layer_count field's values are concentrated in a relatively small range, this data pattern has low entropy and is easily compressed efficiently using entropy encoding.
[0216] Finally, you get a complete, self-describing, lightweight layered depth image (LDI) file (usually with the .ldi extension).
[0217] like Figure 5 As shown, this step can be further broken down into the following sub-steps:
[0218] S41, the compact format pixel data is compressed and encoded to obtain a compressed pixel data block.
[0219] By leveraging the high coherence of adjacent pixels in spatial coordinates, normal vectors, and colors in geometric data, efficient lossless compression is achieved.
[0220] First, the compact format pixel data output by S2 is input. This compact format pixel data is a continuous byte stream containing a compact 24-byte structure of all surface points. Next, this byte stream is used as input to call a compression encoder (such as a zlib library compression function based on the DEFLATE algorithm). The encoder uses the LZ77 algorithm to identify and eliminate repeating string patterns in the byte stream, and then uses Huffman coding to perform variable-length encoding on the symbols, ultimately generating compressed data blocks. This process significantly reduces data redundancy caused by half-precision floating-point numbers, quantized integer storage, and structured arrangement. The final output is a compressed pixel data block.
[0221] In this embodiment: the approximately 18.0MB compact format pixel data byte stream obtained in step S2 is fed into the deflate function of the zlib library for compression. Since attributes such as world coordinates and normal vectors change smoothly between adjacent pixels in the image space, the data has extremely strong local similarity, and compact encoding can achieve a high compression ratio.
[0222] S42, perform compact encoding on the sparse index list to obtain compressed index data blocks.
[0223] Lossless compression is performed on a sparse index list with regular structure and increasing numerical values.
[0224] First, the sparse index list output from S3 is input, which is a continuous byte stream of SparseIndex structures (12 bytes each) sorted by image_position. Since the image_position and pixel_index fields typically exhibit a monotonically increasing trend, and the layer_count field has a finite value range, this data stream has a predictable pattern. Next, this byte stream is input to the same compression encoder as in step S41 for compression. The final output is a compressed index data block.
[0225] In this embodiment: the approximately 0.45MB sparse index list byte stream obtained in step S3 is also compressed using the deflate function of the zlib library. The ordered nature of the index data enables the compaction encoding to work efficiently.
[0226] S43, the compressed pixel data block, the compressed index data block, and the metadata describing the layered depth image data are packaged in a predetermined format to generate the final lightweight layered depth image file.
[0227] Define a binary file format to integrate compressed data and necessary descriptive information into a single file. First, input the compressed pixel data block output in step S41, the compressed index data block output in step S42, and metadata obtained during the execution of this method. The metadata includes at least:
[0228] 1. The image resolution preset by step S13.
[0229] 2. Key view parameters (such as view index and camera orientation) output from step S12.
[0230] 3. The model axis aligned bounding box (min_point, max_point) output from step S11.
[0231] 4. The original data size (original pixel and index data size) calculated based on the data structure in step S14.
[0232] 5. Size of the compressed data blocks generated in steps S41 and S42.
[0233] Next, these components are written sequentially according to the predefined LDI file format:
[0234] 1. File header: A fixed-size header containing the magic number (e.g., 0x4C444931, representing "LDI1"), the file version number, and the offsets and sizes of each part of the file (metadata area, pixel data block, index data block).
[0235] 2. Metadata area: The above metadata is stored in a structured format (such as JSON or a custom binary structure).
[0236] 3. Compress pixel data blocks: the output of step S41.
[0237] 4. Compress index data blocks: Output of step S42.
[0238] The final output is a complete lightweight layered depth image file.
[0239] In this embodiment: a key viewpoint (e.g., index 0) of the Boeing 777 model is packaged into a file. The metadata includes: resolution (512x512), viewpoint index 0, bounding box coordinates, original pixel data size (approximately 18.0MB), and original index data size (approximately 0.45MB). After compression encoding, the compressed pixel data block is approximately 1.2MB, and the compressed index data block is approximately 0.05MB. These are packaged with the file header and metadata area according to the specified format, ultimately generating a .ldi file of approximately 1.41MB. Compared to the original 41.0MB LDI data for this viewpoint, a compression ratio of approximately 29 times is achieved. For a complete configuration of 7 views, the total file size is approximately 9.9MB, achieving a lightweight effect of nearly 60:1 compared to the original 587MB CAD model.
[0240] In one alternative implementation, after step S4, the method further includes: S5: decompressing and visualizing the lightweight layered depth image file to generate a visualization model and verify its fidelity.
[0241] In the client or rendering engine, the lightweight layered depth image file generated in step S4 is loaded and processed. The visualization of the 3D model is restored through the reverse decoding and rendering process, and the impact of the compression process on the model's visual quality is quantitatively evaluated to ensure that the lightweight effect meets application requirements. This results in a real-time rendered visualized 3D model image and a quantitative fidelity evaluation report.
[0242] like Figure 6 As shown, this step can be further broken down into the following sub-steps:
[0243] S51, Load and decompress the lightweight layered depth image file to obtain decompressed compact format pixel data and decompressed sparse index data.
[0244] Reverse the file encapsulation process in step S4 to extract and decompress the data that can be processed by the rendering pipeline. First, input the lightweight layered depth image file generated in step S4. Then, execute the following sequentially:
[0245] 1. File parsing: Read the file header, verify the magic number and version number, and obtain the location and size of the metadata area, compressed pixel data block and compressed index data block in the file.
[0246] 2. Metadata extraction: Parse the metadata area to recover information such as image resolution, key viewpoint parameters, and model bounding boxes.
[0247] 3. Data decompression: For the compressed pixel data block, call the DEFLATE decompression algorithm (such as the inflate function in the zlib library) to recover the compact format pixel data byte stream output in step S2.
[0248] For the compressed index data block, the DEFLATE decompression algorithm is also called to recover the sparse index list byte stream output in step S3.
[0249] 4. Finally, the output includes decompressed compact format pixel data, sparse index data, and metadata.
[0250] In this embodiment: In a web browser environment, a 1.41MB .ldi file of a Boeing777 model from a certain perspective is loaded via an HTTP request. Using the WebAssembly version of the zlib library, the data blocks within the file are decompressed sequentially, restoring approximately 18.0MB of compact pixel data and approximately 0.45MB of sparse index data in memory, and then reading the metadata required for rendering.
[0251] S52, upload the decompressed compact format pixel data and the decompressed sparse index data to the GPU buffer.
[0252] Prepare data storage for GPU rendering.
[0253] First, input the compactly formatted pixel data and sparse indexed data output by the S51. Then, execute the following via a graphics API (such as WebGPU, Vulkan, or DirectX 12):
[0254] 1. Create buffers: Create two storage buffers in the GPU memory, one for storing pixel data (pixelBuffer) and the other for storing index data (indexBuffer).
[0255] 2. Data Upload: Asynchronously copy the compact format pixel data byte stream to the pixelBuffer, and asynchronously copy the sparse index data byte stream to the indexBuffer.
[0256] Finally, the pixel data buffer and index data buffer are created and filled on the GPU side, making them accessible to the compute shader.
[0257] In this embodiment: using the WebGPU API, a pixelBuffer (approximately 18.0 MB in size) and an indexBuffer (approximately 0.45 MB in size) are created, and the data decompressed by the S51 is transferred from the host memory to the GPU memory via a command encoder.
[0258] S53, when the target viewpoint coincides with the key viewpoint, the GPU buffer data is decoded and rendered in the shader to generate a rendered image.
[0259] The data decoding and coloring calculations for each pixel are performed in parallel on the GPU.
[0260] First, the pixel data buffer and index data buffer created in step S52 are bound together through the rendering pipeline, and the compute shader or render shader is started. Then, for each screen pixel to be rendered, the shader performs the following core operations:
[0261] 1. Index lookup: Based on the screen coordinates of the current pixel, perform a binary search in the sparse index list (via indexBuffer) to find the corresponding SparseIndex entry and obtain the layer_count and pixel_index at that position.
[0262] 2. Data Reading and Decoding: For each surface point at this pixel location (starting from pixel_index, for a total of layer_count layers), read its compact 24-byte structure from the pixelBuffer and perform reverse decoding:
[0263] 3. Normal vector decoding: Perform the octahedral mapping inverse operation on the 32-bit encoded_normal to recover the three-dimensional unit normal vector.
[0264] 4. Color Decoding: Unpack the 16-bit encoded_color in RGB565 format, normalize each component, and restore it to a floating-point RGB value in the range of [0,1].
[0265] 5. Coordinates and Depth: Converts half-precision floating-point numbers to full precision for calculations.
[0266] 6. Lighting Calculation: For each surface point after decoding, its color is calculated using a lighting model (such as Blinn-Phong). The lighting calculation is based on the decoded normal vector, color, preset light source position and intensity, and camera view parameters.
[0267] Finally, the shader outputs multiple pixel color layers to be blended, sorted by depth value, and their transparency for each pixel location.
[0268] In this embodiment: the above operations are performed in parallel for each pixel in the 512×512 image. A parallel directional light and ambient light are configured, and specular highlights and diffuse reflections are calculated using the Blinn-Phong model. The decoded normal vectors are used for accurate lighting calculations, thereby restoring the model's stereoscopic appearance and material texture.
[0269] S54, when the target viewpoint is located between key viewpoints, viewpoint interpolation is performed based on the rendering results of adjacent key viewpoints to generate a rendered image of the target viewpoint.
[0270] When the user's viewing angle is between two pre-stored key viewpoints, a continuous visual transition is generated through interpolation.
[0271] First, the parameters of the target viewpoint are input, along with the indices of two key viewpoints adjacent to the target viewpoint, determined in step S12, denoted as viewpoint A and viewpoint B. Next, the system renders two images, Image_A and Image_B, using the LDI data of viewpoint A and viewpoint B respectively (through steps S51-S53). Then, the interpolation parameter t (0≤t≤1) of the target viewpoint relative to viewpoints A and B is calculated. Finally, the surface point attributes of each corresponding pixel in Image_A and Image_B are interpolated to generate the final image Image_Out. The interpolation strategy is as follows:
[0272] 1. Position and color: Linear interpolation (LERP) is used, with the formula: P_out=(1-t)*P_A+t*P_B.
[0273] 2. Normal vector: Spherical linear interpolation (SLERP) is used to preserve the interpolation result as a unit vector. The formula is:
[0274] n_out=(sin((1-t)*Ω) / sin(Ω))*n_A+(sin(t*Ω) / sin(Ω))*n_B,
[0275] Where Ω = arccos(clamp(n_A·n_B,-1,1)).
[0276] Finally, a rendered image of the target viewpoint with a smooth transition is output.
[0277] In this embodiment: the user rotates the Boeing777 model from viewpoint index 5 to viewpoint index 10. The system calculates the interpolation parameter t in real time and performs the above interpolation on the corresponding pixel layers rendered from these two viewpoints to synthesize intermediate frames, thereby achieving a smooth rotation animation without any visual jumps.
[0278] S55, compare the original model with the rendered image to generate a fidelity evaluation report.
[0279] This invention proposes a quantifiable visual fidelity evaluation and control mechanism. Since the number of key viewpoints used can be set autonomously during the pre-rendering stage, the visual fidelity of models under different configurations can be systematically quantified and tested. This allows users to precisely balance visual quality and storage overhead based on actual application needs. For example, if higher visual fidelity is required, more pre-rendered viewpoints can be used, achieving better rendering quality at the cost of appropriate storage space; conversely, if extreme compression is required, the number of viewpoints can be reduced, further lowering storage overhead within an acceptable range of visual fidelity loss. This solution achieves objective quantification and controllable management of visual information loss introduced during the lightweighting process.
[0280] First, acquire two images under identical rendering conditions (including camera parameters, lighting, resolution, and output format):
[0281] Reference image: directly rendered from the original 3D CAD model (input to S1).
[0282] Test image: refers to the rendered image corresponding to the target viewpoint, and its generation method depends on the relationship between the target viewpoint and the key viewpoint.
[0283] 1. If the target viewpoint coincides with a key viewpoint, then the target viewpoint is directly decoded and rendered in step S53;
[0284] 2. If the target view is located between two key viewpoints, then the rendering results of the adjacent key viewpoints are interpolated and generated through step S54.
[0285] Next, the following objective image quality assessment metrics are calculated:
[0286] 1. Structural Similarity Index (SSIM): This index assesses image similarity based on three aspects: brightness, contrast, and structure. The calculation formula is as follows:
[0287]
[0288] Where x represents the reference image, which is an image generated from the original, uncompressed 3D CAD model (the input of step S1) under specific viewpoints and rendering conditions.
[0289] y: represents the test image, which is the image obtained from the same viewpoint after being lightweighted by steps S1 to S4 of this invention and then rendered and restored by steps S51 to S54.
[0290] l is the luminance comparison function, c is the contrast comparison function, and s is the structure comparison function. α, β, and γ are exponents that adjust the weights of each component; in this embodiment, they are all set to 1, meaning the weights of the three components are equal. The closer the SSIM value is to 1, the better the structure is preserved.
[0291] 2. Peak Signal-to-Noise Ratio (PSNR): Measures the degree of distortion at the pixel level of an image. The calculation formula is:
[0292] PSNR = 10 * log10(MAX² / MSE).
[0293] Where MAX is the maximum pixel value (e.g., 255), and MSE is the mean square error between the two images.
[0294] Calculate the sub-indices for normal vector quality and color quality separately; combine all indicators to calculate the overall fidelity score, which is used to weigh the trade-off between the number of LDI viewpoints used and the visual fidelity of the visualization model.
[0295] Finally, by integrating all indicators, a fidelity assessment report is generated, containing specific numerical values and a comprehensive evaluation. This fidelity assessment report is a document containing the following core elements:
[0296] 1. Objective quantitative indicators, including:
[0297] 1) Global image quality metrics:
[0298] Peak Signal-to-Noise Ratio (PSNR): A specific value (unit: dB) that reflects the degree of distortion at the pixel level;
[0299] Structural Similarity Index (SSIM): A specific value (range 0-1) that evaluates the overall preservation of brightness, contrast, and structural information in an image.
[0300] 2) Quality indicators for individual attributes:
[0301] Normal vector quality: such as mean angular error or root mean square error (RMSE), to assess the degree of preservation of geometric details and lighting accuracy;
[0302] Color fidelity: such as color difference ΔE (calculated in the CIELAB color space), quantifies the accuracy of color reproduction;
[0303] Geometric position error: such as the average Euclidean distance error of the reprojected position.
[0304] 2. Analysis and Conclusion:
[0305] 1) Overall score: The above indicators are weighted according to the preset weights to obtain an overall fidelity score (e.g., 0-100 points or 0-1 points), which is used to intuitively compare the compression quality under different configurations (e.g., different number of viewpoints N).
[0306] 2) Quality rating: The lightweighting results are qualitatively rated based on industry standards or application thresholds (e.g., SSIM>0.95 is "visually non-destructive").
[0307] 3) Visual difference analysis: Indicates which regions or conditions (such as high curvature surfaces, thin-walled structures, texture edges) may show relatively obvious quality degradation, and may be visualized with a difference heatmap.
[0308] 3. Configuration and context information:
[0309] 1) Test configuration: Specify the lightweight parameters used in this evaluation, including but not limited to: LDI resolution, number of key viewpoints N, maximum depth layers, and compression algorithm version.
[0310] 2) Data size comparison: Clearly list the size of the original CAD model, the size of the original LDI data, the size of the final lightweight layered depth image file, and the calculated overall compression ratio.
[0311] 3) Test environment: Describe the hardware and software environment used for rendering and evaluation.
[0312] In this embodiment: the fidelity evaluation report generated for the 7-view lightweight configuration of the Boeing 777 model includes:
[0313] 1. Quantitative indicators: PSNR=46.88dB, SSIM=0.9999, mean angle error of normal vector=0.8 degrees.
[0314] 2. Analysis and Conclusion: Overall score 92 / 100. Based on SSIM > 0.98, it is classified as "visually non-destructive". The report indicates that at extremely sharp edges such as winglets, the local SSIM value is slightly lower (~0.96), but still better than the acceptable threshold.
[0315] 3. Configuration Information: The test configuration was 7-view, 512x512 resolution, and DEFLATE compression. Data size: Original model 587MB → Lightweight LDI 9.9MB, overall compression ratio approximately 59:1.
[0316] Example 2:
[0317] This embodiment provides a lightweight compression system for 3D models based on layered depth images, used to implement the method described in Embodiment 1. The system architecture follows the method and data processing flow and includes the following five functional modules that work in sequence:
[0318] 1. A pre-rendering module for converting a 3D CAD model into layered depth image data from multiple key perspectives, wherein the layered depth image data includes a layered pixel index table and a layered pixel storage area.
[0319] The pre-rendering module is responsible for converting the original 3D CAD model into raw layered depth image (LDI) data from multiple key perspectives. Its input is the original 3D CAD model file, and its output is a set of structured raw LDI data packages, including:
[0320] 1) Spatial reference calculation unit: Loads and parses the 3D CAD model file, calculates the axis alignment bounding box of the model, and outputs the model spatial reference (min_point, max_point).
[0321] 2) Key Viewpoint Determination Unit: Receives the model space reference, combines the number of target viewpoints N configured by the system, pre-generates observation points on the observation sphere, and applies a uniform distribution selection algorithm to determine N key viewpoints, outputting the key viewpoint parameters (position, orientation, and up direction) for each viewpoint.
[0322] 3) Deep Stripping Rendering Unit: Receives key viewpoint parameters, configures the rendering pipeline, executes GPU deep stripping technology, captures multi-layer surface point fragments at each pixel location, and outputs a set of rendering fragments containing original attributes such as world coordinates, normal vectors, and colors.
[0323] 4) LDI data structure construction unit: Receives a set of rendered fragments, organizes them into a hierarchical pixel index table and a hierarchical pixel storage area, and outputs a complete raw LDI data packet.
[0324] 2. A format conversion module for converting pixel data in the layered pixel storage area into a compact format.
[0325] The format conversion module is responsible for compressing the raw pixel data into a compact format. Its input is the layered pixel storage data from the raw LDI data packet output by the pre-rendering module, and its output is compact format pixel data, including:
[0326] 1) Coordinate normalization and compression unit: Receives the original world coordinates and model space reference, executes the normalization formula coord_normalized=(coord_world-min_point) / (max_point-min_point), and converts the result into a half-precision floating-point number (f16).
[0327] 2) Normal vector encoding unit: Receives the original unit normal vector, performs the octahedral mapping encoding algorithm, and compresses it into a 32-bit integer (u32).
[0328] 3) Color encoding unit: Receives the original floating-point RGB color, executes the RGB565 encoding algorithm, and compresses it into a 16-bit integer (u16).
[0329] 4) Auxiliary Field Conversion Unit: Receives the original depth, texture coordinates, and part number, and converts them into half-precision floating-point numbers (f16) and 16-bit integers (vec2), respectively. <u16>) and 16-bit integer (u16).
[0330] 5) Pixel Structure Assembly Unit: This unit receives all the compressed fields described above and assembles them into a compact 24-byte pixel structure according to memory alignment rules. The collection of all structures constitutes the compact format pixel data output by this module.
[0331] 3. A sparse storage module, used to optimize the index data in the hierarchical pixel index table into a sparse index list.
[0332] The sparse storage module is responsible for optimizing the raw index data. Its input is the hierarchical pixel index table in the raw LDI data packet output by the pre-rendering module, and its output is a sparse index list.
[0333] 1) Sparsity analysis unit: Traverse the original layered pixel index table, identify and count all non-blank entries with layer_count>0, and generate a list of non-blank pixel positions.
[0334] 2) Sparse index building unit: Create a SparseIndex structure (12 bytes) containing three fields: image_position, layer_count, and pixel_index for each non-blank pixel position.
[0335] 3) List organization unit: Sort all SparseIndex structures by image_position and store them contiguously to form a sparse index list to replace the original dense index table.
[0336] 4. A compression encoding module, used to compress and encode the compact format pixel data and the sparse index list, and package them to generate a lightweight layered depth image file.
[0337] The compression encoding module is responsible for the final compression and packaging of the optimized data. Its inputs are the compact pixel data output from the format conversion module and the sparse index list output from the sparse storage module. The output is a lightweight layered depth image file (.ldi format), including:
[0338] 1) Data compression unit: Perform DEFLATE compression encoding on the compact format pixel data byte stream and the sparse index list byte stream respectively to generate compressed pixel data blocks and compressed index data blocks.
[0339] 2) File Packaging Unit: Receives compressed data blocks and metadata (such as resolution, view parameters, bounding box, data size, etc.) obtained from the pre-rendering module and this process, and generates a complete .ldi file containing a file header, metadata area and compressed data blocks according to a predefined binary format.
[0340] In one alternative implementation, a lightweight compression system for 3D models based on layered depth images further includes a rendering and verification module.
[0341] The rendering and verification module is responsible for the visualization and quality assessment of lightweight data. Its input is a lightweight layered depth image file output by the compression encoding module, and its output is a visualized rendered image and a fidelity assessment report.
[0342] 1) File loading and decompression unit, used to load and decompress the lightweight layered depth image file, and recover compact format pixel data and sparse index list.
[0343] Load the .ldi file, parse the file header and metadata, and decompress the compressed data blocks using DEFLATE to recover compact pixel data and sparse index data.
[0344] 2) GPU resource management unit, which is used to upload the decompressed data to the video memory and decode and render it through the shader program.
[0345] A GPU storage buffer is created, and the decompressed pixel data and index data are uploaded to the GPU memory. Then, the GPU shader program is executed, which queries the pixel data according to the sparse index list, performs real-time decoding of the compact pixel format (inverse octahedral mapping, RGB565 decoding, etc.), applies the lighting model and performs depth blending, and finally renders the image.
[0346] 3) Viewpoint interpolation unit, used to interpolate images between key viewpoints to generate smooth viewpoint transitions.
[0347] When the target viewpoint is located between two key viewpoints, pixel-level interpolation (linear interpolation and spherical linear interpolation) is performed on the rendering results of the adjacent key viewpoints to generate a smooth image of the target viewpoint.
[0348] 4) Fidelity evaluation unit, used to calculate fidelity index by comparing the rendered images of the original model and the lightweight model.
[0349] Under the same conditions, the rendered images of the original model and the lightweight model are compared, and objective quality indicators such as PSNR and SSIM are calculated to generate a comprehensive fidelity evaluation report.
[0350] Example 3:
[0351] This embodiment proposes an electronic device, including:
[0352] Memory, used to store computer programs;
[0353] The processor is used to execute the program stored in the memory to implement the steps of the above embodiment of a lightweight compression method for 3D models based on layered depth images.
[0354] For details on the specific implementation of each step and related explanations, please refer to the aforementioned embodiment of a lightweight compression method for 3D models based on layered depth images, which will not be repeated here.
[0355] The memory of the electronic device mentioned in this embodiment may include random access memory (RAM) or non-volatile memory (NVM), such as at least one disk storage device.
[0356] The processors mentioned above can be general-purpose processors, including central processing units (CPUs), network processors (NPs), etc.; they can also be digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components.
[0357] Example 4:
[0358] This invention also proposes a computer-readable storage medium storing a computer program. When executed by a processor, this computer program implements the steps of the above-described embodiment of a lightweight compression method for 3D models based on layered depth images. For details on the specific implementation and explanations of each step, please refer to the aforementioned embodiment of a lightweight compression method for 3D models based on layered depth images; further elaboration is not provided here.
[0359] The above description is merely an embodiment of the present invention and is not intended to limit the invention. Various modifications and variations can be made to the present invention by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principle of the present invention should be included within the scope of the claims of the present invention.
Claims
1. A lightweight compression method for 3D models based on layered depth images, characterized in that, include: The 3D CAD model is pre-rendered from a selected key perspective into layered depth image data, which includes a layered pixel index table and a layered pixel storage area. The pre-rendering includes: the first rendering uses a standard depth test to capture and output the foremost layer surface points, and writes their depth values to a dedicated texture. Subsequent renderings dynamically adjust the test conditions based on the depth texture generated in the previous rendering, capturing only the next layer surface points with depth values greater than the existing records, thus peeling away layers one by one. Each rendering outputs the complete attributes of the captured fragments through multiple rendering targets, including world coordinates, normal vectors, color, depth, texture coordinates, and part ID. This process is repeated until no new fragments are generated. The pixel data in the layered pixel storage area is converted into compact format pixel data; the conversion includes at least: compressing the normal vector of the pixel data into a 32-bit integer through octahedral mapping encoding; converting the part number into a 16-bit integer; wherein the normal vector is used for lighting calculation, the lighting calculation is based on the decoded normal vector, color, preset light source position and intensity, and camera view parameters; The index data in the hierarchical pixel index table is optimized into a sparse index list; The compact format pixel data and the sparse index list are compressed, encoded, and encapsulated to generate a lightweight layered depth image file.
2. The method according to claim 1, characterized in that, The process of pre-rendering a 3D CAD model from a selected key viewpoint into layered depth image data includes: Calculate the axis-aligned bounding box of the 3D CAD model to obtain the model space reference; Based on the model space benchmark and the preset set of viewpoints, N key viewpoints are determined according to the uniform distribution selection algorithm. Perform depth stripping rendering on each of the key viewpoints to obtain multi-layer surface point fragments under that viewpoint; Based on all the multi-layer surface point fragments, construct layered depth image data including the layered pixel index table and the layered pixel storage area.
3. The method according to claim 2, characterized in that, The step of converting pixel data in the hierarchical pixel storage area into a compact format includes: After normalizing the world coordinates of the pixel data to the axis-aligned bounding box range, it is converted into a half-precision floating-point number; The color values of the pixel data are compressed into 16-bit integers using RGB565 encoding; The depth values of the pixel data are converted into half-precision floating-point numbers, and the texture coordinates are quantized into 16-bit integers. All fields that have undergone the above transformations are assembled into compact pixel data with a single pixel size of no more than 24 bytes.
4. The method according to claim 1, characterized in that, The step of optimizing the index data in the hierarchical pixel index table into a sparse index list includes: Traverse the layered pixel index table and identify non-blank entries to obtain a list of non-blank pixel positions; Based on the non-blank pixels in the non-blank pixel position list, a sparse index structure is created for each non-blank pixel to obtain a set of sparse index entries. Organize the set of sparse index entries into a sparse index list.
5. The method according to claim 1, characterized in that, The step of compactly encoding and encapsulating the compact format pixel data and the sparse index list to generate a lightweight layered depth image file includes: The compact format pixel data is compressed and encoded to obtain compressed pixel data blocks; The sparse index list is compacted and encoded to obtain compressed index data blocks; The compressed pixel data block, the compressed index data block, and the metadata describing the layered depth image data are packaged in a predetermined format to generate the final lightweight layered depth image file.
6. The method according to any one of claims 1-5, characterized in that, After compressing and encapsulating the compact format pixel data and the sparse index list to generate a lightweight layered depth image file, the method further includes decompressing and visualizing the lightweight layered depth image file, generating a visualization model, and performing fidelity verification, including: Load and decompress the lightweight layered depth image file to obtain decompressed compact format pixel data and decompressed sparse index data; The decompressed compact format pixel data and the decompressed sparse index data are uploaded to the GPU buffer; When the target viewpoint coincides with the key viewpoint, the GPU buffer data is decoded and rendered in the shader to generate a rendered image. When the target viewpoint is located between key viewpoints, viewpoint interpolation is performed based on the rendering results of adjacent key viewpoints to generate a rendered image of the target viewpoint. The original model is compared with the rendered image to generate a fidelity evaluation report.
7. A lightweight compression system for 3D models based on layered depth images, used to implement the method according to any one of claims 1 to 6, characterized in that, include: The pre-rendering module is used to convert a 3D CAD model into layered depth image data from multiple key perspectives. The layered depth image data includes a layered pixel index table and a layered pixel storage area. A format conversion module is used to convert pixel data in the layered pixel storage area into a compact format; the conversion includes: compressing the normal vector of the pixel data into a 32-bit integer through octahedral mapping encoding; and converting the part number into a 16-bit integer; A sparse storage module is used to optimize the index data in the hierarchical pixel index table into a sparse index list. The compression encoding module is used to compress and encode the compact format pixel data and the sparse index list, and package them to generate a lightweight layered depth image file.
8. The system according to claim 7, characterized in that, It also includes a rendering and verification module, which includes: The file loading and decompression unit is used to load and decompress the lightweight layered depth image file to recover compact format pixel data and sparse index list. The GPU resource management unit is used to upload the decompressed data to the video memory and decode and render it through the shader program; A viewpoint interpolation unit is used to interpolate images between key viewpoints to generate smooth viewpoint transitions. The fidelity evaluation unit is used to calculate fidelity metrics by comparing the rendered images of the original model and the lightweight model.
9. An electronic device, characterized in that, include: Memory, used to store computer programs; A processor for executing a computer program stored in the memory to implement the method as described in any one of claims 1 to 6.
10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that, when executed by a processor, implements the method as described in any one of claims 1 to 6.