Method for sparse view building reconstruction based on 2D gaussian sputtering
By using Gaussian densification guided by proximity and plane normal and relative depth prior optimization, combined with TSDF fusion, the discontinuity and overfitting problems of 3D building reconstruction under sparse views are solved, and efficient and complete 3D building reconstruction is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- ANHUI NORMAL UNIV
- Filing Date
- 2026-04-27
- Publication Date
- 2026-06-19
Smart Images

Figure CN122244328A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of digital reconstruction of buildings, and in particular to a method for three-dimensional reconstruction of buildings based on sparse views using 2D Gaussian sputtering. Background Technology
[0002] With the accelerating pace of urban digitalization, the construction of accurate, lightweight, and rapidly updatable 3D building models has become increasingly important. 3D building models have become crucial geographic information foundation data for major national strategies and industrial applications such as land spatial planning, digital protection of cultural heritage, and the construction of digital twin cities. However, existing active reconstruction methods for 3D buildings, such as laser scanning and structured light methods, often rely on specialized equipment, resulting in high data acquisition costs and long operation cycles, making them unsuitable for flexible and low-cost application scenarios. While image-based passive reconstruction methods have lower equipment requirements, they generally require a large number of multi-view, highly overlapping images as input to ensure the integrity and accuracy of the model, which is often difficult to meet in real-world scenarios where data acquisition is limited. Against this backdrop, achieving structurally complete and geometrically accurate 3D building reconstruction under conditions of limited image availability has gradually become a cutting-edge issue of common interest to researchers.
[0003] In recent years, although implicit scene representation methods, represented by Neural Radiance Field (NeRF), have made significant progress in novel view synthesis, and some studies have attempted to apply them to 3D building reconstruction, their application in practical 3D reconstruction tasks is limited by their long training time, large model size, and lack of explicit geometric structure. Subsequently, 3D Gaussian Splatting (3DGS) was proposed as a novel explicit scene representation. It uses a series of optimizable 3D Gaussian ellipsoids to represent the scene, achieving high-quality, real-time, and realistic view synthesis with impressive visual quality. However, the 3D Gaussian ellipsoid-based representation struggles to accurately align with the thin, structured characteristics of real-world surfaces, and its rasterization suffers from multi-view inconsistencies, making it a significant challenge for 3D surface reconstruction tasks. Subsequently, 2D Gaussian Splatting (2DGS) proposed using 2D Gaussian disks as the basic representation unit of a scene. Due to its flattened Gaussian representation and integral normal regularization, it exhibits better adaptability in 3D reconstruction tasks.
[0004] However, existing Gaussian sputtering-based methods for building 3D reconstruction mostly rely on dense, multi-view inputs for optimization. Applying Gaussian sputtering to building 3D reconstruction tasks with sparse views still presents significant challenges:
[0005] (1) When the number of input images is very small, feature matching-based methods struggle to extract and stably match enough initial 3D points in regular planar regions with large areas of repetitive texture, such as buildings. This results in an extremely sparse and incomplete initial point cloud. Furthermore, buildings generally have complex geometric shapes and are prone to self-occlusion. Gaussian spectroscopy struggles to form stable and significant projection gradient signals in these regions. Moreover, the gradient-based splitting and densification mechanisms in Gaussian spectroscopy are difficult to generate and grow without effective gradient constraints, leading to persistent geometric holes and surface discontinuities in the reconstruction results, which severely affects the integrity of the building surface structure.
[0006] (2) Another challenge in 3D reconstruction of buildings based on Gaussian sputtering is the problem of insufficient geometric constraints. Under sparse view conditions, the limited disparity information is difficult to provide reliable geometric constraints for the large spatial scale and rich depth levels of buildings, making it impossible to accurately determine the depth position of Gaussian elements in 3D space. Furthermore, when optimizing based on photometric loss from a limited number of viewpoints, the model is prone to overfitting to the limited observations and tends to reduce the overall photometric error by expanding the Gaussian scale. This results in overly smooth reconstruction results, a lack of detail and texture variation on the building surface, and a significant reduction in the overall geometric accuracy of the model. Summary of the Invention
[0007] The purpose of this invention is to overcome the shortcomings of the prior art and provide a 3D reconstruction method for sparse view buildings based on 2D Gaussian sputtering. This method fully leverages the advantages of efficient explicit representation of 2D Gaussian sputtering, breaks through the dependence of existing Gaussian representation on dense multi-view data, and can perform robust and complete reconstruction in challenging building scenarios.
[0008] To achieve the above objectives, the technical solution adopted by the present invention is as follows:
[0009] A method for 3D reconstruction of buildings based on sparse views using 2D Gaussian sputtering includes the following steps:
[0010] S1. Obtain a dataset of sparse viewpoint building images;
[0011] S2. Initialize the two-dimensional Gaussian disk;
[0012] S3. Optimize the two-dimensional Gaussian disk to make it suitable for buildings;
[0013] S4. Generate building mesh based on two-dimensional Gaussian distribution. For the optimized Gaussian disk, use TSDF fusion method to extract explicit and continuous surface mesh to obtain the three-dimensional model of the building.
[0014] In step S1, a multi-view building image dataset for 3D reconstruction is obtained, targeting sparse views. Figure 3The dimensional reconstruction task involves selecting and constructing a subset of images containing a small number of perspectives from the multi-view image data.
[0015] Step S2 includes: processing the acquired building image dataset to obtain sparse point cloud of the building scene and camera pose; then, initializing the acquired sparse point cloud based on 2D Gaussian Splatting to obtain an initialized 2D Gaussian distribution.
[0016] Step S2 also includes placing a two-dimensional Gaussian disk at the location of each data point, with each 2D Gaussian disk having a center point. Two principal tangent vectors and And the scaling factor used to control the two-dimensional Gaussian variance. Common definition; the initial normal vector is formed by the cross product of two mutually orthogonal tangent vectors. Get; will Stack them in columns to form a rotation matrix. At the same time, the scaling factor is organized into a diagonal matrix The third diagonal element is set to zero; based on this, a two-dimensional Gaussian is defined in the local tangent plane in the world coordinate system, with the following parameterization:
[0017]
[0018]
[0019] Among them, matrix Represents the homogeneous geometric transformation of a two-dimensional Gaussian in space; for any point in space Its corresponding two-dimensional Gaussian response is calculated using the standard Gaussian function:
[0020]
[0021] In this representation, Gaussian center Scaling factor and the direction of the tangent vector All are treated as learnable variables for optimization; each two-dimensional Gaussian element is also associated with an opacity parameter. And a set of spherical harmonic coefficients for modeling its appearance color as a function of viewing angle. .
[0022] The optimization methods in step S3 include: a Gaussian densification step guided by proximity and plane normal, and a depth optimization step based on relative depth prior.
[0023] Among them, the Gaussian densification step guided by proximity and plane normal jointly considers the spatial proximity relationship and normal consistency constraint between Gaussians to achieve adaptive completion and refinement of the three-dimensional surface structure of the building;
[0024] The depth optimization step based on relative depth prior extracts relative depth ranking information from the monocular depth map and introduces it as a geometric prior into the rendering depth optimization process, thereby improving the consistency of rendering depth while avoiding the influence of scale uncertainty.
[0025] The Gaussian densification steps guided by proximity and plane normal include:
[0026] (1) Normalized proximity score
[0027] In the optimization process of Gaussians, a directed proximity graph based on Euclidean distance is constructed, and the spatial proximity relationship between each Gaussian and its K nearest neighbors is defined accordingly. The original Gaussian at the head is represented as the "source" Gaussian, and the Gaussian at the tail is represented as the "target" Gaussian, which is one of the K neighbors of the source. The center of each Gaussian is set as... For the source Gaussian With the target Gauss The Euclidean distance between their centers is defined as:
[0028]
[0029] For each source Gaussian From all Select the K nearest neighbor distances and determine the Gaussian distribution of the target using the following rules:
[0030]
[0031] A two-dimensional Gaussian disk is used as the scene representation, and each Gaussian disk has two principal axis scales on the local tangent plane. Description, source Gauss The effective dimension is defined as the length of its maximum principal axis in the plane:
[0032]
[0033] Define normalized proximity score To quantify the relative isolation of the Gaussian, this score is calculated as the source Gaussian. The ratio of the average distance to its K nearest neighbors to its effective scale:
[0034]
[0035] The larger the value, the sparser the Gaussian disk is relative to its own size in the local space; during the optimization process, this score will be used to identify candidate regions that need to be supplemented; the Gaussian densityization and pruning processes will update the distribution and proximity of the Gaussian disk;
[0036] (2) Normal consistency constraint
[0037] For each Gaussian The geometric modeling method for 2D Gaussian disks is adopted from 2DGS: each Gaussian disk is represented by two orthogonal tangent vectors in its local tangent plane. and The description is given, and its normal vector is defined as follows:
[0038]
[0039] The three elements are arranged into a rotation matrix:
[0040]
[0041] The normal vector is equivalently represented by the third column of the rotation matrix:
[0042]
[0043] in For the source Gaussian Selected target Gaussian in the graph and its neighborhood The consistency of local geometric orientation is measured by the cosine similarity between normal vectors:
[0044]
[0045] like If the source Gaussian and the target Gaussian are aligned in the normal direction, then they are considered to be aligned in the normal direction; otherwise, they are marked as not aligned. The threshold is set based on the strong directional characteristics of the building's planar structure to effectively filter out coplanar or approximately coplanar Gaussian distributions;
[0046] (3) Gaussian reconstruction and survival protection strategy
[0047] After completing the sparsity determination and normal consistency filtering, for Gaussian pairs that meet the conditions and A Gaussian generation strategy based on midpoint interpolation is used, and this process is performed dynamically in each optimization iteration, including... and A new Gaussian disk is reconstructed at the midpoint of the line connecting the two points. The midpoint of the reconstructed Gaussian disk is... Its opacity and planar scale parameters are inherited from the adjacent Gaussian. The appearance parameter vector related to the viewpoint is initialized to zero, that is:
[0048]
[0049] The rotation parameters of the newly generated Gaussian disk are initialized to the unit rotation matrix, thus allowing its local tangential direction to be adaptively adjusted based on observation data during subsequent optimization. ;
[0050] A Gaussian survival protection strategy is introduced to protect newly generated Gaussian disks by the proximity-guided strategy in the early stages of optimization, including for each Gaussian disk generated by interpolation. According to its generation time and the current iteration number Define its survival protection conditions:
[0051]
[0052] in For a preset survival protection window length, Gaussian disks that meet the conditions are temporarily prohibited from participating in pruning judgment, even if their opacity or scale has not yet reached the pruning threshold.
[0053] The depth optimization steps based on relative depth priors include:
[0054] (1) Depth prior generation
[0055] Using a sparse view image as input, and employing DepthAnythingV2 as the monocular depth estimation model, a monocular depth map is obtained, denoted as... Then, relative depth prior information is generated from the monocular depth map;
[0056] (2) Depth optimization based on relative depth prior
[0057] In obtaining the depth map predicted by the model Then, relative depth ranking information is extracted from it and introduced as a geometric prior into the rendering depth. The optimization process divides the image into several local regions (patches), denoted as the i-th patch. Within each patch, pixel pairs are sampled from the pixel set to form a pixel pair set:
[0058]
[0059] in and It represents two different pixel positions within the same local area. By constructing pixel pairs within the local area, it effectively suppresses the unstable sorting that may be introduced by monocular depth at the global scale or in the far-distance region, while strengthening the constraint on the local geometry.
[0060] According to the monocular depth map The prediction results are defined by the following relative depth sorting indicator function:
[0061]
[0062] In rendering depth maps In the middle, for pixel pairs The rendering depth difference is defined as:
[0063]
[0064] Based on this, the loss function is defined as:
[0065]
[0066] Where m is a preset interval hyperparameter. When the sorting direction of the rendering depth is consistent with the monocular depth prior and the depth difference satisfies the interval constraint, no penalty is imposed on the corresponding item; otherwise, a constraint is imposed on the optimization of the Gaussian parameter.
[0067] In step S4, the building mesh generation method based on two-dimensional Gaussian distribution includes: after completing the 2D Gaussian disk optimization of the building, a set of Gaussian elements that accurately represent the geometric structure and appearance characteristics of the building is obtained; and the building surface is reconstructed using the TSDF fusion method.
[0068] The method of using TSDF fusion to reconstruct the mesh of a building surface includes the following steps:
[0069] 1) Initialization of 3D voxel mesh
[0070] First, based on the spatial distribution of the center points of all Gaussian disks in the scene, determine the 3D bounding box of the building, and construct a regular 3D voxel mesh within this bounding box. The voxel resolution can be set according to the reconstruction accuracy requirements, for each voxel in the voxel mesh. Initialize and store the following information: truncated symbolic distance function value The initial value is set to a preset constant, and the accumulated weight is... The initial value is set to zero;
[0071] 2) Generation of multi-view depth maps based on 2D Gaussian disks
[0072] Using an optimized set of 2D Gaussian disks, differentiable rendering is performed under multiple known camera viewpoints to generate corresponding depth map sets. The depth map of each viewpoint represents the distance information from the camera viewpoint along the line of sight to the building surface. For each preset camera viewpoint, based on its intrinsic and extrinsic parameters, the 2D Gaussian disk is projected onto the image plane, and pixel-level depth values are calculated based on the Gaussian geometric parameters and opacity information to obtain the depth map under that viewpoint.
[0073] 3) TSDF value calculation and fusion based on depth map
[0074] For each voxel center point in the voxel grid The geometric observations were then fused using multi-view depth maps sequentially. For any given viewpoint, the voxel center point was... Projecting this image onto the image plane corresponding to the viewpoint yields its pixel coordinates and depth value along the viewing direction. At that pixel location, obtain the corresponding surface depth value from the depth map. Based on this, the signed distance from the voxel center point to the surface is calculated:
[0075]
[0076] in, The symbols are used to represent the spatial positional relationship of voxels relative to the surface: when When the value is positive, it indicates that the voxel is located on the outer side of the surface; when... A negative value indicates that the voxel is located inside the surface; to improve numerical stability and suppress the influence of noise far from the surface, the signed distance is truncated and limited to a preset threshold. Within the specified range, the corresponding TSDF observations are normalized to obtain the values. Subsequently, based on factors such as depth confidence, Gaussian opacity, or viewpoint consistency at that perspective, appropriate weights are assigned to the TSDF observations, and the TSDF values of the voxels are fused and updated using a weighted accumulation method.
[0077]
[0078] Through the above process, depth observations from multiple perspectives are gradually integrated into a unified TSDF voxel field, thereby forming a stable, continuous, and multi-view consistent implicit surface representation.
[0079] 4) Surface mesh extraction
[0080] After completing the TSDF fusion of all views, the obtained 3D TSDF voxel field is subjected to isosurface extraction processing. The Marching Cubes algorithm is used to extract the zero isosurface, and the output is a building mesh.
[0081] The advantages of this invention are: 1. The proposed efficient 3D reconstruction framework for buildings based on sparse views achieves structurally complete and geometrically consistent 3D reconstruction of buildings under the condition of relying only on a limited number of input images.
[0082] 2. A Gaussian densification strategy guided by proximity and plane normal is proposed. This strategy guides the reasonable growth of the Gaussian disk through steps such as constructing a proximity graph, constraining normal consistency, reconstructing the Gaussian disk, and implementing a survival protection strategy, effectively improving scene coverage and enhancing scene detail representation.
[0083] 3. A depth optimization strategy based on relative depth prior is proposed. By extracting relative depth prior information from the monocular depth map and introducing it into the rendering depth optimization process, geometric constraints are provided for the optimization process, and the accuracy of the model results is significantly improved. Attached Figure Description
[0084] The following is a brief explanation of the contents of each of the accompanying drawings and the markings in the drawings:
[0085] Figure 1 This is a flowchart of the reconstruction method steps of the present invention;
[0086] Figure 2 This is a sparse view of a building image according to the present invention;
[0087] Figure 3 This is a sparse point cloud image of the buildings in this invention;
[0088] Figure 4 This is the optimized point cloud image of the building according to the present invention;
[0089] Figure 5 This is a schematic diagram of the final 3D model of the building generated by this invention. Detailed Implementation
[0090] The specific embodiments of the present invention will be further described in detail below with reference to the accompanying drawings and the description of the preferred embodiments.
[0091] To address the technical challenges outlined in the background, this embodiment proposes a framework for 3D building reconstruction based on sparse views. This framework, while fully leveraging the advantages of efficient explicit representation using 2D Gaussian sputtering, overcomes the dependence of existing Gaussian representations on dense multi-view data and enables robust and complete reconstruction in challenging building scenes. Specifically, utilizing the explicit modeling capabilities of 2D Gaussian disks on local tangent planes, and combining the geometric continuity and normal consistency characteristics of building surfaces, a Gaussian densification method guided by proximity and planar normals is proposed. This method guides the reasonable growth of the Gaussian distribution, effectively improving the overall coverage and geometric integrity of the building scene. Simultaneously, addressing the issues of unstable Gaussian depth distribution and lack of geometric constraints due to limited parallax information in sparse views, this embodiment introduces a depth optimization strategy based on relative depth priors. This effectively stabilizes the depth distribution of the Gaussian disk during optimization, further improving the geometric consistency and accuracy of the reconstruction results. In summary, the method in this embodiment significantly improves the quality of 3D building reconstruction and provides technical support for the efficient acquisition and reliable utilization of 3D building information in various practical application scenarios.
[0092] This embodiment proposes a framework for 3D building reconstruction based on sparse views. While fully leveraging the advantages of efficient explicit representation using 2D Gaussian sputtering, this framework overcomes the dependence of existing Gaussian representations on dense multi-view data and enables robust and complete reconstruction in challenging building scenes. Specifically, this embodiment utilizes the explicit modeling capability of 2D Gaussian disks on local tangent planes, combined with the geometric continuity and normal consistency characteristics of building surfaces, to propose a Gaussian densification method guided by proximity and planar normals. This method guides the reasonable growth of the Gaussian distribution, effectively improving the overall coverage and geometric integrity of the building scene. Simultaneously, addressing the problem of unstable Gaussian depth distribution and lack of geometric constraints due to limited parallax information in sparse views, this embodiment introduces a depth optimization strategy based on relative depth priors. This effectively stabilizes the depth distribution of the Gaussian disk during the optimization process, further improving the geometric consistency and accuracy of the reconstruction results. In summary, the method in this embodiment significantly improves the quality of 3D building reconstruction and provides technical support for the efficient acquisition and reliable utilization of 3D building information in various practical application scenarios.
[0093] This invention proposes a novel method for 3D building reconstruction based on sparse views. This method, while fully leveraging the advantages of efficient explicit representation using 2D Gaussian sputtering, overcomes the dependence of existing Gaussian representations on dense multi-view data and can achieve robust and complete reconstruction in challenging building scenarios. The main contributions of this invention include:
[0094] 1. This embodiment proposes an efficient 3D building reconstruction framework for sparse views, which achieves structurally complete and geometrically consistent 3D building reconstruction with only a limited number of input images.
[0095] 2. A Gaussian densification strategy guided by proximity and plane normal is proposed. This strategy guides the reasonable growth of the Gaussian disk through steps such as constructing a proximity graph, constraining normal consistency, reconstructing the Gaussian disk, and implementing a survival protection strategy, effectively improving scene coverage and enhancing scene detail representation.
[0096] 3. A depth optimization strategy based on relative depth prior is proposed. By extracting relative depth prior information from monocular depth maps and incorporating it into the rendering depth optimization process, geometric constraints are provided for the optimization process, significantly improving the accuracy of the model results.
[0097] like Figure 1 As shown in this embodiment, a novel method for 3D reconstruction of buildings based on sparse views includes the following steps:
[0098] S1. Acquisition of sparse viewpoint building image dataset;
[0099] S2. Initialization of 2D Gaussian disk: Based on Colmap, sparse point cloud reconstruction of buildings is performed from sparse view building images, and then the sparse point cloud is initialized into a 2D Gaussian disk through 2D Gaussian Splatting.
[0100] S3. Optimization of 2D Gaussian Disks Applicable to Buildings: First, the initialized sparse 2D Gaussian disk is densified using a Gaussian densification method guided by proximity and plane normals to increase the density of the Gaussian disk and fill blank areas. Then, a depth optimization strategy based on relative depth priors is used to add geometric constraints to the densified Gaussian disk and optimize its depth distribution, resulting in an optimized Gaussian disk.
[0101] S4. Building mesh generation method based on two-dimensional Gaussian distribution:
[0102] For the optimized Gaussian disk, the TSDF fusion method is used to extract an explicit, continuous surface mesh to obtain the 3D model of the building. The surface mesh obtained by TSDF fusion is an explicit representation of the 3D model, describing the external geometry of the building in the form of triangular meshes. Therefore, obtaining the surface mesh is considered sufficient to complete the construction of the 3D model.
[0103] The specific solutions for steps S1-S4 above are described below:
[0104] In step S1, a multi-view building image dataset for 3D reconstruction is acquired. This image dataset may include multi-view image data from public data sources and real-scene image data collected by a drone. The public data sources include the DTU dataset and the BlendedMVS dataset, while the self-collected data is the Tower dataset. (The last sentence appears to be incomplete and possibly refers to a specific step involving sparse views.) Figure 3 The 3D reconstruction task selects and constructs a subset of images containing a small number of perspectives from the multi-view image data, which serves as the input data for the entire model framework.
[0105] In step S2, the sparse building image data obtained in step S1 is processed using Colmap to obtain a sparse point cloud of the building scene and the camera pose. Subsequently, the obtained sparse point cloud is initialized based on 2D Gaussian Splatting to obtain an initialized 2D Gaussian distribution. The basic principle is to place a two-dimensional Gaussian disk at the location of each data point, with each 2D Gaussian disk having a center point... Two principal tangent vectors and And the scaling factor used to control the two-dimensional Gaussian variance. Common definition. It should be noted that the initial normal vector is formed by the cross product of two mutually orthogonal tangent vectors. Received. Then, Stack them in columns to form a rotation matrix. At the same time, the scaling factor is organized into a diagonal matrix The third diagonal element is set to zero. Based on this, a two-dimensional Gaussian can be defined in the local tangent plane in the world coordinate system, with the following parameterized form:
[0106]
[0107]
[0108] Among them, matrix This represents the homogeneous geometric transformation of a two-dimensional Gaussian in space. : No. The position of the center point of a two-dimensional Gaussian disk in three-dimensional space; : Coordinates in the local tangent plane coordinate system of a two-dimensional Gaussian disk; : by local coordinates The coordinates of a point in three-dimensional space obtained through mapping; This is the first tangent vector on the local tangent plane; This is the second tangent vector on the local tangent plane; The normal vector is obtained by the cross product; , Indicates the scaling factor; This is a scaled diagonal matrix; the superscript T indicates matrix transpose.
[0109] for any point in space Its corresponding two-dimensional Gaussian response can be calculated using the standard Gaussian function:
[0110]
[0111] Here, exp represents the natural exponential function.
[0112] In this representation, Gaussian center Scaling factor and the direction of the tangent vector All of these are treated as learnable variables for optimization. Furthermore, each two-dimensional Gaussian element is also associated with an opacity parameter. And a set of spherical harmonic coefficients for modeling its appearance color as a function of viewing angle. After the above process, an initialized 2D Gaussian disk can be obtained.
[0113] In step S3, the optimization of the two-dimensional Gaussian disk applicable to buildings is performed, and the optimization scheme includes:
[0114] I. Proximity- and Plane Normal-Guided Gaussian Denseization
[0115] Building scenes typically consist of large-area planar structures, exhibiting significant geometric continuity and normal consistency in local regions. However, under sparse view conditions, limited observation information often fails to provide stable gradient signals for the adaptive splitting and densification of Gaussian sputtering, resulting in insufficient recovery of local geometry. Although 2DGS can explicitly model local tangent planes and provide reliable normal information, how to fully utilize the structural features of the building surface and guide the reasonable growth of Gaussians in space remains a key issue in 3D building reconstruction. To address this issue, this paper proposes a Gaussian densification method guided by proximity and planar normals. By jointly considering the spatial proximity relationship and normal consistency constraints between Gaussians, adaptive completion and refinement of the 3D surface structure of buildings can be achieved.
[0116] (1) Normalized proximity score
[0117] To identify locally sparse but geometrically continuous regions within a building scene, this embodiment constructs a directed proximity graph based on Euclidean distance during the Gaussian optimization process, and defines the spatial proximity relationship between each Gaussian and its K nearest neighbors. Specifically, this embodiment represents the original Gaussian at the head as the "source" Gaussian, and the Gaussian at the tail as the "target" Gaussian, which is one of the source's K neighbors. This embodiment sets the center of each Gaussian as... For the source Gaussian With the target Gauss The Euclidean distance between their centers is defined as:
[0118]
[0119] For each source Gaussian This embodiment is from all Select K nearest neighbor distances from the source Gaussian source. The set of Euclidean distances between a given K nearest neighbors is defined as follows:
[0120]
[0121] in, Indicates that all conditions are met. In Gaussian, according to distance The set of the top K Gaussian indices selected after sorting from smallest to largest.
[0122] Since this embodiment uses a two-dimensional Gaussian disk as the scene representation, each Gaussian disk has two principal axis scales on the local tangent plane. Description, therefore, the source Gaussian The effective dimension is defined as the length of its maximum principal axis in the plane:
[0123]
[0124] Where max is the maximum value function, which is used to find the maximum value of the two principal axis scales.
[0125] Furthermore, this embodiment defines a normalized proximity score. This is used to quantify the relative isolation of the Gaussian. The score is calculated based on the source Gaussian. The ratio of the average distance to its K nearest neighbors to its effective scale:
[0126]
[0127] A larger value indicates that the Gaussian is sparser relative to its own size in the local space. During the optimization process, this embodiment uses this score to identify candidate regions that need to be supplemented. Furthermore, the Gaussian densityization and pruning processes update the distribution and proximity of the Gaussian disk; therefore, the construction of the proximity map and the proximity between each source Gaussian and the target Gaussian also change. In practice, this embodiment sets K to 3.
[0128] (2) Normal consistency constraint
[0129] In building scenarios, many structures consist of regular planes or near-planes, whose surface normals typically exhibit strong consistency within local regions. However, proximity relationships built solely based on Euclidean distance cannot distinguish differences in Gaussian distributions along geometric directions, and may still introduce Gaussian distributions that do not conform to local geometric consistency at locations where the geometry abruptly changes. Therefore, this embodiment further introduces normal consistency constraints to ensure the reasonableness of the newly generated Gaussian disk in local geometric directions.
[0130] Specifically, for each Gaussian This embodiment follows the geometric modeling method for 2D Gaussian disks in 2DGS. Each Gaussian disk is represented by two orthogonal tangent vectors in its local tangent plane. and The description is given, and its normal vector is defined as follows:
[0131]
[0132] in : indicates Gaussian The normal vector;
[0133] For Gauss Another representation of a normal vector, and Completely equivalent;
[0134] For Gauss The first orthogonal tangent vector within the local tangent plane;
[0135] ;
[0136] : Represents the cross product operation of vectors.
[0137] Furthermore, the three can be arranged into a rotation matrix:
[0138]
[0139] In the implementation, the normal vector is equivalently represented as the third column of the rotation matrix:
[0140]
[0141] in For the source Gaussian Selected target Gaussian in the graph and its neighborhood This embodiment measures the consistency of local geometric orientation by using the cosine similarity between normal vectors:
[0142]
[0143] in Gauss The normal vector. If If the source Gaussian and the target Gaussian are aligned in the normal direction, then they are considered to be incompatible; otherwise, they are marked as incompatible. In practice, this embodiment is fixed. =0.9, this threshold is set based on the strong directional characteristics of the building's planar structure, and can effectively filter out coplanar or approximately coplanar Gaussian distributions.
[0144] (3) Gaussian reconstruction and survival protection strategy
[0145] After completing the sparsity determination and normal consistency filtering, for Gaussian pairs that meet the conditions and This embodiment uses a Gaussian generation strategy based on midpoint interpolation, which is dynamically executed in each optimization iteration. Specifically, this embodiment... and A new Gaussian disk is reconstructed at the midpoint of the line connecting the two points. The midpoint of the reconstructed Gaussian disk is... Its opacity parameter The plane scale parameter is inherited from the adjacent Gaussian plane. The appearance parameter vector related to the viewpoint Initialize to zero, that is:
[0146]
[0147] in ;
[0148] c: Represents a vector of appearance parameters related to the viewpoint;
[0149] The two principal axis dimensions of the Gaussian disk on the local tangent plane;
[0150] Two principal axis dimensions on the local tangent plane.
[0151] The rotation parameters of the newly generated Gaussian disk are initialized to the unit rotation matrix, thus allowing its local tangential direction to be adaptively adjusted based on observation data during subsequent optimization.
[0152] Furthermore, since the newly generated Gaussian disks have not accumulated sufficient gradient information during the initialization phase, if they are directly involved in subsequent pruning operations, they are prone to being removed prematurely due to low opacity or unstable scale, thereby weakening the geometric completion effect of sparse regions. Therefore, this embodiment further introduces a Gaussian survival protection strategy to protect the Gaussian disks newly generated by the proximity-guided strategy in the early stages of optimization. Specifically, for each Gaussian disk generated by interpolation... This embodiment is based on its generation time. and the current iteration number Define its survival protection conditions:
[0153]
[0154] in For a preset survival protection window length, this embodiment temporarily prohibits Gaussian disks that meet the conditions from participating in pruning decisions, even if their opacity or scale has not yet reached the pruning threshold. This strategy ensures that newly generated Gaussian disks undergo sufficient view supervision before being subject to pruning rules consistent with the original Gaussian disk. This method alleviates the conflict between densification and pruning operations, allowing newly generated Gaussian disks to be stably integrated into the overall optimization process, thereby improving the reconstruction integrity and geometric consistency of sparse regions.
[0155] II. Depth Optimization Based on Relative Depth Prior
[0156] Introducing monocular depth information into the optimization process helps alleviate the overfitting problem of the model under sparse view conditions. However, unlike methods that use absolute scale alignment of monocular depth to supervise rendering depth or enhance the numerical correlation between monocular depth and rendering depth, this embodiment does not directly rely on absolute depth values. Instead, it extracts relative depth ranking information from the monocular depth map and introduces it as a geometric prior into the rendering depth optimization process, thereby improving the consistency of rendering depth while avoiding the influence of scale uncertainty.
[0157] (1) Depth prior generation
[0158] Monocular depth estimation models are typically trained on large-scale, diverse datasets, effectively capturing the local geometry and relative depth relationships of a scene from a single image. Benefiting from statistical priors such as typical scene layouts and surface orientations, these models can infer structurally consistent relative depth distributions even in the absence of multi-view constraints. However, due to the lack of multi-view geometric constraints, their predictions often exhibit scale uncertainty and may be affected by noise at the numerical level. Therefore, directly using the absolute value of monocular depth as geometric supervision is often unstable.
[0159] In this paper, this embodiment uses sparse view images as input and uses DepthAnythingV2 as a monocular depth estimation model to obtain a monocular depth map, denoted as . The model then generates relative depth prior information from the monocular depth map. Through a training strategy combining large-scale synthetic data with real images, and incorporating a knowledge distillation mechanism, the model demonstrates good robustness and generalization ability across different scenarios. In building scenes, its predictions are relatively stable at the structural level, providing reliable priors for subsequent geometric constraints based on relative depth relationships.
[0160] (2) Depth optimization based on relative depth prior
[0161] In obtaining the depth map predicted by the model Subsequently, this embodiment extracts the relative depth ranking information and introduces it as a geometric prior into the rendering depth. The optimization process. Specifically, in this embodiment, the image is divided into several local regions (patches), and the i-th patch is denoted as... Within each patch, this embodiment samples pixel pairs from the pixel set to form a pixel pair set:
[0162]
[0163] in and This represents the positions of two different pixels within the same local region. By constructing pixel pairs within a local area, unstable sorting that may be introduced by monocular depth at the global scale or in long-distance regions can be effectively suppressed, while strengthening the constraint on local geometry.
[0164] According to the monocular depth map The prediction results, in this embodiment, are defined by the relative depth sorting indicator function as follows:
[0165]
[0166] In rendering depth maps In the middle, for pixel pairs The rendering depth difference is defined as:
[0167]
[0168] Based on this, the loss function is defined as:
[0169]
[0170] Where m is the preset interval hyperparameter, and max is the function that takes the maximum value. When the sorting direction of the rendering depth is consistent with the monocular depth prior, and the depth difference satisfies the interval constraint, no penalty is imposed on the corresponding item; otherwise, constraints are imposed on the optimization of the Gaussian parameters.
[0171] In step S4, the method for generating building meshes based on a two-dimensional Gaussian distribution specifically includes:
[0172] After optimizing the building using a 2D Gaussian disk, a set of Gaussian elements that accurately characterizes the building's geometry and appearance was obtained. To obtain an explicit and continuous surface mesh representation, this invention employs a TSDF fusion method to reconstruct the building's surface mesh. The specific process is as follows:
[0173] 1. Initialization of 3D Voxel Mesh
[0174] First, based on the spatial distribution of the center points of all Gaussian disks in the scene, determine the 3D bounding box of the building. Then, construct a regular 3D voxel mesh within this bounding box. The voxel resolution can be set according to the required reconstruction accuracy. For each voxel in the voxel mesh... Initialize and store the following information: truncated symbolic distance function value The initial value is set to a preset constant, and the accumulated weight is... The initial value is set to zero.
[0175] 2. Generation of multi-view depth maps based on 2D Gaussian disks
[0176] Using an optimized set of 2D Gaussian disks, differentiable rendering is performed under multiple known camera viewpoints to generate corresponding depth map sets. The depth map for each viewpoint represents the distance information from that camera viewpoint along the line of sight to the building surface. Specifically, for each preset camera viewpoint, based on its intrinsic and extrinsic parameters, the 2D Gaussian disk is projected onto the image plane, and pixel-level depth values are calculated according to the Gaussian's geometric parameters and opacity information, thus obtaining the depth map for that viewpoint. This constructs a set of multi-view depth maps covering different observation directions, providing geometric observation basis for subsequent TSDF fusion.
[0177] 3. TSDF Value Calculation and Fusion Based on Depth Map
[0178] For each voxel center point in the voxel grid The geometric observations were then fused using multi-view depth maps. For any given viewpoint, the voxel center point was first... Projecting this image onto the image plane corresponding to the viewpoint yields its pixel coordinates and depth value along the viewing direction. At that pixel location, obtain the corresponding surface depth value from the depth map. Based on this, the signed distance from the voxel center to the surface is calculated:
[0179]
[0180] in, The symbols are used to represent the spatial positional relationship of voxels relative to the surface: when When the value is positive, it indicates that the voxel is located on the outer side of the surface; when... A negative value indicates that the voxel is located inside the surface. To improve numerical stability and suppress the influence of noise far from the surface, the signed distance is truncated and limited to a preset threshold. Within the specified range, the corresponding TSDF observations are obtained by normalization. Subsequently, considering factors such as depth confidence, Gaussian opacity, or viewpoint consistency at that perspective, appropriate weights are assigned to these TSDF observations, and a weighted accumulation method is used to fuse and update the TSDF values of the voxels.
[0181]
[0182] In the formula:
[0183] v: represents each voxel in the three-dimensional voxel mesh;
[0184] The merged TSDF value;
[0185] The currently observed TSDF value;
[0186] ;
[0187] : Indicates the current observation weight.
[0188] Through the above process, depth observations from multiple perspectives are gradually integrated into a unified TSDF voxel field, thereby forming a stable, continuous implicit surface representation with multi-view consistency.
[0189] 4. Surface Mesh Extraction (Marching Cubes)
[0190] After completing TSDF fusion from all perspectives, isosurface extraction is performed on the resulting 3D TSDF voxel field. Specifically, the Marching Cubes algorithm is used to extract the zero isosurface (i.e., The implicit surface is used to output a building mesh as the result of the 3D reconstruction of the building, which preserves the geometric details and topological integrity of the building surface.
[0191] Through the TSDF fusion process described above, this embodiment can effectively transform the optimized 2D Gaussian distribution into an explicit and continuous surface representation, thereby achieving high integrity and high geometric consistency in 3D building reconstruction even under sparse view conditions.
[0192] Figure 2 The image shown is a typical building scene from the DTU dataset used in this invention. The DTU multi-view stereo dataset, released by the Technical University of Denmark, contains high-resolution image sequences of multiple scenes under different lighting conditions and provides high-precision point clouds acquired by structured light scanners as ground truth for 3D geometry. It is widely used for evaluation and comparison in 3D reconstruction tasks. Figure 2 The image shown is a building scene from the scan24 dataset in the DTU dataset, which contains 49 images with a resolution of 1600×1200. This invention randomly samples these 49 images to construct a subset of building images containing 3 images, forming a sparse view input configuration.
[0193] Figure 3 This is a sparse point cloud result obtained by processing sparse building images with Colmap according to the present invention. As can be seen from the figure, the sparse point cloud obtained by Colmap has obvious holes and uneven point distribution, which reflects the typical technical difficulties of sparse view reconstruction.
[0194] Figure 4 The image shows the building point cloud result obtained after initialization and optimization using 2D Gaussian Splatting according to the present invention. As can be seen from the image, the 2D Gaussian point cloud optimized by the present invention has a more uniform point cloud density and a more continuous and complete surface, proving that the present invention can effectively fill sparse areas and significantly improve geometric integrity.
[0195] Figure 5 This is a schematic diagram of the final 3D building model generated by the present invention. The 3D model in the figure has a smooth surface, regular structure, and sufficient detail retention, indicating that the present invention can obtain a high-precision, explicit, and engineering-usable 3D building model under sparse input, fully demonstrating the technological advancement and practical value of the present invention in sparse view building 3D reconstruction.
[0196] Obviously, the specific implementation of this invention is not limited to the above-described methods. Any non-substantial improvements made using the inventive concept and technical solution of this invention are within the protection scope of this invention.
Claims
1. A method for 3D reconstruction of buildings based on sparse views using 2D Gaussian sputtering, characterized in that: Includes the following steps: S1. Obtain a dataset of sparse viewpoint building images; S2. Initialize the two-dimensional Gaussian disk; S3. Optimize the two-dimensional Gaussian disk to make it suitable for buildings; S4. Generate building mesh based on two-dimensional Gaussian distribution. For the optimized Gaussian disk, use TSDF fusion method to extract explicit and continuous surface mesh to obtain the three-dimensional model of the building.
2. The method for 3D reconstruction of buildings based on sparse views using 2D Gaussian sputtering as described in claim 1, characterized in that: In step S1, a multi-view building image dataset for 3D reconstruction is obtained. For the sparse view 3D reconstruction task, an image subset containing a small number of views is selected and constructed from the multi-view image data.
3. The method for 3D reconstruction of buildings based on sparse views using 2D Gaussian sputtering as described in claim 1, characterized in that: Step S2 includes: processing the acquired building image dataset to obtain sparse point cloud of the building scene and camera pose; then, initializing the acquired sparse point cloud based on 2D Gaussian Splatting to obtain an initialized 2D Gaussian distribution.
4. The method for 3D reconstruction of sparse view buildings based on 2D Gaussian sputtering as described in claim 3, characterized in that: Step S2 also includes placing a two-dimensional Gaussian disk at the location of each data point, with each 2D Gaussian disk having a center point. Two principal tangent vectors and And the scaling factor used to control the two-dimensional Gaussian variance. Common definition; the initial normal vector is formed by the cross product of two mutually orthogonal tangent vectors. Get; will Stack them in columns to form a rotation matrix. At the same time, the scaling factor is organized into a diagonal matrix The third diagonal element is set to zero; based on this, a two-dimensional Gaussian is defined in the local tangent plane in the world coordinate system, with the following parameterization: ; ; Among them, matrix Represents the homogeneous geometric transformation of a two-dimensional Gaussian in space; for any point in space Its corresponding two-dimensional Gaussian response is calculated using the standard Gaussian function: ; In this representation, Gaussian center Scaling factor and the direction of the tangent vector All are treated as learnable variables for optimization; each two-dimensional Gaussian element is also associated with an opacity parameter. And a set of spherical harmonic coefficients for modeling its appearance color as a function of viewing angle. .
5. The method for 3D reconstruction of sparse view buildings based on 2D Gaussian sputtering as described in claim 1, characterized in that: The optimization methods in step S3 include: a Gaussian densification step guided by proximity and plane normal, and a depth optimization step based on relative depth prior. Among them, the Gaussian densification step guided by proximity and plane normal jointly considers the spatial proximity relationship and normal consistency constraint between Gaussians to achieve adaptive completion and refinement of the three-dimensional surface structure of the building; The depth optimization step based on relative depth prior extracts relative depth ranking information from the monocular depth map and introduces it as a geometric prior into the rendering depth optimization process, thereby improving the consistency of rendering depth while avoiding the influence of scale uncertainty.
6. The method for 3D reconstruction of sparse view buildings based on 2D Gaussian sputtering as described in claim 5, characterized in that: The Gaussian densification steps guided by proximity and plane normal include: (1) Normalized proximity score In the optimization process of Gaussians, a directed proximity graph based on Euclidean distance is constructed, and the spatial proximity relationship between each Gaussian and its K nearest neighbors is defined accordingly. The original Gaussian at the head is represented as the "source" Gaussian, and the Gaussian at the tail is represented as the "target" Gaussian, which is one of the K neighbors of the source. The center of each Gaussian is set as... For the source Gaussian With the target Gauss The Euclidean distance between their centers is defined as: ; For each source Gaussian From all Select the K nearest neighbor distances and determine the Gaussian distribution of the target using the following rules: ; A two-dimensional Gaussian disk is used as the scene representation, and each Gaussian disk has two principal axis scales on the local tangent plane. Description, source Gauss The effective dimension is defined as the length of its maximum principal axis in the plane: ; Define normalized proximity score To quantify the relative isolation of the Gaussian, this score is calculated as the source Gaussian. The ratio of the average distance to its K nearest neighbors to its effective scale: ; The larger the value, the sparser the Gaussian disk is relative to its own size in the local space; during the optimization process, this score will be used to identify candidate regions that need to be supplemented; the Gaussian densityization and pruning processes will update the distribution and proximity of the Gaussian disk; (2) Normal consistency constraint For each Gaussian The geometric modeling method for 2D Gaussian disks is adopted from 2DGS: each Gaussian disk is represented by two orthogonal tangent vectors in its local tangent plane. and The description is given, and its normal vector is defined as follows: ; The three elements are arranged into a rotation matrix: ; The normal vector is equivalently represented by the third column of the rotation matrix: ; in For the source Gaussian Selected target Gaussian in the graph and its neighborhood The consistency of local geometric orientation is measured by the cosine similarity between normal vectors: ; like If the source Gaussian and the target Gaussian are aligned in the normal direction, then they are considered to be aligned in the normal direction; otherwise, they are marked as not aligned. The threshold is set based on the strong directional characteristics of the building's planar structure to effectively filter out coplanar or approximately coplanar Gaussian distributions; (3) Gaussian reconstruction and survival protection strategy After completing the sparsity determination and normal consistency filtering, for Gaussian pairs that meet the conditions and A Gaussian generation strategy based on midpoint interpolation is used, and this process is performed dynamically in each optimization iteration, including... and A new Gaussian disk is reconstructed at the midpoint of the line connecting the two points. The midpoint of the reconstructed Gaussian disk is... Its opacity and planar scale parameters are inherited from the adjacent Gaussian. The appearance parameter vector related to the viewpoint is initialized to zero, that is: ; The rotation parameters of the newly generated Gaussian disk are initialized to the unit rotation matrix, thus allowing its local tangential direction to be adaptively adjusted based on observation data during subsequent optimization. ; A Gaussian survival protection strategy is introduced to protect newly generated Gaussian disks by the proximity-guided strategy in the early stages of optimization, including for each Gaussian disk generated by interpolation. According to its generation time and the current iteration number Define its survival protection conditions: ; in For a preset survival protection window length, Gaussian disks that meet the conditions are temporarily prohibited from participating in pruning judgment, even if their opacity or scale has not yet reached the pruning threshold.
7. The method for 3D reconstruction of sparse view buildings based on 2D Gaussian sputtering as described in claim 5, characterized in that: The depth optimization steps based on relative depth priors include: (1) Depth prior generation Using a sparse view image as input, and employing DepthAnythingV2 as the monocular depth estimation model, a monocular depth map is obtained, denoted as... Then, relative depth prior information is generated from the monocular depth map; (2) Depth optimization based on relative depth prior In obtaining the depth map predicted by the model Then, relative depth ranking information is extracted from it and introduced as a geometric prior into the rendering depth. The optimization process divides the image into several local regions (patches), denoted as the i-th patch. Within each patch, pixel pairs are sampled from the pixel set to form a pixel pair set: ; in and It represents two different pixel positions within the same local area. By constructing pixel pairs within the local area, it effectively suppresses the unstable sorting that may be introduced by monocular depth at the global scale or in the far-distance region, while strengthening the constraint on the local geometry. According to the monocular depth map The prediction results are defined by the following relative depth sorting indicator function: ; In rendering depth maps In the middle, for pixel pairs The rendering depth difference is defined as: ; Based on this, the loss function is defined as: ; Where m is a preset interval hyperparameter. When the sorting direction of the rendering depth is consistent with the monocular depth prior and the depth difference satisfies the interval constraint, no penalty is imposed on the corresponding item; otherwise, a constraint is imposed on the optimization of the Gaussian parameter.
8. The method for 3D reconstruction of buildings based on sparse views using 2D Gaussian sputtering as described in claim 1, characterized in that: In step S4, the building mesh generation method based on two-dimensional Gaussian distribution includes: after completing the 2D Gaussian disk optimization of the building, a set of Gaussian elements that accurately represent the geometric structure and appearance characteristics of the building is obtained; and the building surface is reconstructed using the TSDF fusion method.
9. The method for 3D reconstruction of sparse view buildings based on 2D Gaussian sputtering as described in claim 1, characterized in that: The method of using TSDF fusion to reconstruct the mesh of a building surface includes the following steps: 1) Initialization of 3D voxel mesh First, based on the spatial distribution of the center points of all Gaussian disks in the scene, determine the 3D bounding box of the building, and construct a regular 3D voxel mesh within this bounding box. The voxel resolution can be set according to the reconstruction accuracy requirements, for each voxel in the voxel mesh. Initialize and store the following information: truncated symbolic distance function value The initial value is set to a preset constant, and the accumulated weight is... The initial value is set to zero; 2) Generation of multi-view depth maps based on 2D Gaussian disks Using an optimized set of 2D Gaussian disks, differentiable rendering is performed under multiple known camera viewpoints to generate corresponding depth map sets. The depth map of each viewpoint represents the distance information from the camera viewpoint along the line of sight to the building surface. For each preset camera viewpoint, based on its intrinsic and extrinsic parameters, the 2D Gaussian disk is projected onto the image plane, and pixel-level depth values are calculated based on the Gaussian geometric parameters and opacity information to obtain the depth map under that viewpoint. 3) TSDF value calculation and fusion based on depth map For each voxel center point in the voxel grid The geometric observations were then fused using multi-view depth maps sequentially. For any given viewpoint, the voxel center point was... Projecting this image onto the image plane corresponding to the viewpoint yields its pixel coordinates and depth value along the viewing direction. At that pixel location, obtain the corresponding surface depth value from the depth map. Based on this, the signed distance from the voxel center point to the surface is calculated: ; in, The symbols are used to represent the spatial positional relationship of voxels relative to the surface: when When the value is positive, it indicates that the voxel is located on the outer side of the surface; when... A negative value indicates that the voxel is located inside the surface; to improve numerical stability and suppress the influence of noise far from the surface, the signed distance is truncated and limited to a preset threshold. Within the specified range, the corresponding TSDF observations are normalized to obtain the values. Subsequently, based on factors such as depth confidence, Gaussian opacity, or viewpoint consistency at that perspective, appropriate weights are assigned to the TSDF observations, and the TSDF values of the voxels are fused and updated using a weighted accumulation method. ; Through the above process, depth observations from multiple perspectives are gradually integrated into a unified TSDF voxel field, thereby forming a stable, continuous, and multi-view consistent implicit surface representation. 4) Surface mesh extraction After completing the TSDF fusion of all views, the obtained 3D TSDF voxel field is subjected to isosurface extraction processing. The Marching Cubes algorithm is used to extract the zero isosurface, and the output is a building mesh.