A method for three-dimensional reconstruction of air-ground scene based on Gaussian splash
By using a Gaussian splashing method combined with multi-metric adaptive density control and multi-dimensional geometric constraints, the problem of unstable density control in air-ground fusion scenes was solved, achieving high-precision 3D reconstruction and new perspective rendering, and improving the reconstruction consistency and robustness of air-ground fusion scenes.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- NANCHANG CAMPUS OF EAST CHINA UNIV OF TECH
- Filing Date
- 2026-05-15
- Publication Date
- 2026-06-12
AI Technical Summary
Existing 3D reconstruction methods for air-ground fusion scenes are prone to density control instability when dealing with large parallax features, leading to floating artifacts, geometric misalignment, and training fluctuations, which reduces the reconstruction consistency across viewpoint regions.
A Gaussian splashing-based approach is adopted, which uses a multi-metric adaptive density control mechanism to dynamically evaluate the contribution of Gaussian primitives by combining gradient, viewpoint importance and geometric features. Cloning and splitting, pruning and similarity merging operations are performed, multi-dimensional geometric constraints are introduced to ensure the consistency of physical surfaces, and the robustness of the model is improved by synergistic optimization of photometric reconstruction loss and geometric structuring constraints.
It effectively suppresses floating artifacts and structural fractures, improves the coherence and stability of geometric structures in air-ground fusion scenes, enhances the robustness of the model under large parallax and non-uniform observation conditions, and improves reconstruction accuracy and rendering efficiency.
Smart Images

Figure CN122199858A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of 3D reconstruction technology, specifically to a 3D reconstruction method for an open ground scene based on Gaussian splashing. Background Technology
[0002] With the rapid development of multi-source sensing technologies such as UAV photogrammetry and ground-based mobile data acquisition, 3D reconstruction and novel perspective synthesis technologies based on air-to-ground fusion images have demonstrated significant application value in fields such as urban digital twins, autonomous driving, and surveying and mapping. Unlike single-source reconstruction tasks, air-to-ground fusion scenes typically exhibit large scale variations, heterogeneous data distributions, and complex occlusion relationships, posing complex challenges to geometric consistency modeling and texture detail restoration. Especially when processing multi-source images with large parallax characteristics, bridging the significant differences in observation scales between acquisition devices and exploring robust and adaptable 3D reconstruction methods remains a key research focus in computer vision and photogrammetry. In recent years, Neural Radiation Fields (NeRF) based on implicit representations have made significant progress in novel perspective synthesis tasks. Although these methods achieve high-quality perspective synthesis through volume rendering frameworks, they typically face challenges such as long training times, high memory consumption, and insufficient adaptability to large-scale scenes. The subsequently proposed 3D Gaussian Splatting (3DGS) combines explicit Gaussian representation with differentiable rasterization, achieving a good balance between efficiency and quality. However, when standard 3DGS is directly applied to air-to-ground fusion scenarios, the algorithm is prone to density control instability due to the non-uniformity of the observation viewpoint and differences in image scale. This instability manifests as floating artifacts, geometric misalignment, and training fluctuations, thereby reducing the reconstruction consistency across different viewpoint regions. Therefore, further research is needed. Summary of the Invention
[0003] The purpose of this application is to provide a 3D reconstruction method for open ground scenes based on Gaussian splashing, and the specific technical solution is as follows:
[0004] A Gaussian splash-based 3D reconstruction method for open-air ground scenes includes: S1, acquiring multi-source open-air ground scene images and camera poses, including aerial images with large pitch angles and ground views; S2, instantiating an initial set of 3D spatial primitives based on the sparse point cloud generated by motion reconstruction structure; S3, introducing a multi-metric adaptive density control mechanism to achieve adaptive matching of Gaussian density at different scales; S4, projecting the 3D spatial primitives onto a 2D view plane using the camera Jacobian matrix, sorting them by depth, extracting color and opacity along the ray direction, and outputting the predicted 2D rendered image through integration using the Alpha mixing formula; S5, introducing multi-dimensional geometric constraints to ensure the consistency of the physical surface; S6, after the closed-loop flow from S2 to S5 to convergence, extracting and outputting a high-precision 3D reconstructed point cloud and a new perspective 2D rendered image.
[0005] The multi-metric adaptive density control mechanism in S3 dynamically evaluates the contribution of Gaussian elements based on gradient, viewpoint importance, and geometric features. Guided by this scoring mechanism, three topological operations are performed: S3.1, cloning and splitting, to increase the density of under-reconstructed regions; S3.2, pruning, to remove invalid points; and S3.3, similarity merging, to compress redundant features.
[0006] S3.1 employs a multi-criteria fusion splitting strategy to achieve refined modeling of multi-scale scenes, specifically including:
[0007] S3.11. Establish a joint judgment criterion based on location gradient threshold and spatial scale constraint, specifically as follows:
[0008]
[0009] in, Indicates Vidi The position vectors of Gaussian points Let its position gradient be... To define the spatial extent of the scene, this criterion identifies key areas requiring refinement through a joint assessment of location gradient magnitude and scale constraints. For the first A scale vector of Gaussian elements, scale vector The largest scale component in.
[0010] S3.12. Construct a viewpoint importance assessment model to strengthen key regions that have significant visual contributions from multiple viewpoints, specifically:
[0011]
[0012] in, For point To the camera directional vector, For the first Each camera forward vector For camera position, Total number of cameras, by perspective importance Filter visually important areas, when Splitting operations are performed in a timely manner to ensure that computing resources are concentrated in visually salient areas; For the first The Gaussian element relative to the first The angle consistency term for each camera is derived from Gaussian elements. Pointing at the camera unit direction vector With the camera forward direction vector The dot product is calculated and used to characterize the geometric alignment between the Gaussian element and the camera observation direction; Indicates the first The Gaussian element relative to the first The distance attenuation term for each camera is determined by the Gaussian element position. With the camera center The Euclidean distance between them is calculated and used to characterize the influence of the observation distance on the Gaussian perspective. Indicates the first The importance score of each Gaussian perspective.
[0013] S3.13. Design a local geometric complexity index, which comprehensively considers the consistency between local point cloud density and scale to characterize the geometric complexity of a region. Specifically:
[0014]
[0015] in, Represents local density. For point and neighboring points distance, The variance is the scale. At the neighborhood point scale, For the first Each Gaussian element corresponds to a geometric complexity index for a local region. No. Each Gaussian element corresponds to a geometric complexity index for a local region. For the number of neighborhoods, Here is the numerical stability constant. The total number of Gaussian elements participating in the geometric complexity normalization calculation.
[0016] S3.14, The adaptive splitting decision strategy, based on the geometric complexity calculated in S3.11-S3.13, abandons the traditional fixed-number splitting and instead adopts an adaptive splitting decision strategy to dynamically determine the optimal number of split copies for each Gaussian point. This adaptively increases the Gaussian unit density in regions rich in detail. Specifically:
[0017]
[0018] in, Point The number of splits, This indicates that the results will be restricted to... Within this range, the strategy allocates more Gaussian points in geometrically complex regions, achieving adaptive optimal allocation of computing resources.
[0019] The pruning strategy in S3.2 achieves the removal of redundant point clouds through a multi-dimensional joint evaluation of opacity, visibility, and scale. Specifically, it includes:
[0020] S3.21. A dynamic opacity threshold strategy that decreases linearly with the number of iterations is adopted to retain sufficient geometric exploration capability in the early stage of training to adapt to large-scale search in open-field scenarios. Specifically:
[0021]
[0022] in, The dynamic threshold that changes over time. For the number of iterations, This represents the total number of iterations. To achieve pruning masking, this strategy preserves more exploration points in the early stages of training and focuses on effective contribution points in the later stages.
[0023] S3.22. Introduce a visibility scoring mechanism based on omnidirectional observation data, which identifies visual redundancy through joint weighting of distance and angle, specifically:
[0024]
[0025] in, For distance attenuation, For angle consistency, For its overall visibility score, when Pruning operations are performed in real time to effectively remove point clouds that are uniformly invisible from most viewpoints. For the first The position vector of each Gaussian element in three-dimensional space. For the first The center position of each camera, For the first The overall visibility score of each Gaussian unit.
[0026] S3.23. A dual-scale constraint pruning mechanism based on physical space extent and screen projection size is added to prevent large-scale dilated Gaussian points from occluding distant details or causing occlusion artifacts during open-world viewpoint transitions. Specifically:
[0027]
[0028] in, Let be the projection radius of the Gaussian point in screen space. To maximize the screen size allowed, this mechanism ensures that the reconstruction results remain visually consistent.
[0029] In S3.3, during similarity merging, to further compress the model volume and eliminate overlapping redundancy, especially in high-density areas of overlapping air and ground data, a composite feature similarity merging algorithm is introduced. This algorithm fuses Gaussian point pairs that are spatially close, have similar color features, and are of consistent scale. Specifically:
[0030]
[0031] in, They represent the first The and the first Position vectors of Gaussian elements; They represent the first The and the first The color characteristics of each high-resolution element; They represent the first The and the first A scale vector of Gaussian elements; Indicates the first The and the first Spatial distance between Gaussian elements; Indicates the first The and the first The color feature differences between individual Gaussian elements; Indicates the first The and the first The scale difference between Gaussian points; Gaussian point pairs that meet these conditions will be weighted and fused into a new Gaussian point. This multi-dimensional filtering mechanism effectively eliminates redundant computation in the overlapping areas of empty and ground data, significantly reduces memory usage, and improves rendering efficiency in large-scale scenes.
[0032] The optimization process in S5 is guided by geometric constraints to address surface noise, normal disorder, and opacity ambiguity in the initial state. By comparing the unconstrained state with the optimized stable structure, four geometric regularization mechanisms are proposed to correct the Gaussian element distribution. These constraints work synergistically as follows: S5.1 Spatial smoothness, which forces the construction of a continuous geometric surface by minimizing the spatial variance in the local neighborhood; S5.2 Normal consistency, which constrains the parallelism of adjacent Gaussian normal directions, effectively eliminating non-physical surface fragmentation artifacts; S5.3 Scale constraint, which limits the excessive expansion and variation of Gaussian elements to ensure the compactness of the geometric representation; and S5.4 Opacity entropy, which promotes the convergence of the opacity distribution towards binarization, thereby suppressing translucent artifacts and enhancing the clarity of object boundaries.
[0033] S5.1 introduces a spatial smoothing loss, which forces Gaussian elements in the local neighborhood to remain continuous in the depth direction. By minimizing the variance of the spatial position between the current point and the neighborhood centroid, it effectively smooths discrete surface noise. Specifically:
[0034]
[0035] in, To represent Gauss The spatial geometric center of the nearest neighbor, For spatial smoothing loss, For the first The position vector of a Gaussian element in three-dimensional space.
[0036] Using explicit normal properties in S5.2 By imposing a normal consistency constraint, penalizing the non-parallelism of adjacent Gaussian normal directions, and forcing local surface normals to tend towards uniformity, the fragmentation artifacts of the reconstructed surface are eliminated. Specifically:
[0037]
[0038] in, For loss of normal uniformity, For the first A high-ranking official The nearest neighbor Gaussian set, , For the first , The surface normal vector of Gaussian elements.
[0039] In S5.3, to prevent Gaussian scaling Overgrowth leads to loss of geometric details or the production of needle-like artifacts. Applying a regularization penalty to the Gaussian scale limits the maximum norm of the scale, forcing Gaussian units to maintain a compact geometric shape, specifically:
[0040]
[0041] in, This represents the scale regularization loss.
[0042] In S5.4, to suppress buoyancy noise and obtain a clear geometric surface, an opacity entropy loss is introduced, forcing the opacity to converge towards binary 0 or 1, reducing the blurred areas of semi-transparent regions. Specifically:
[0043]
[0044] in, This represents the entropy loss due to opacity.
[0045] Through the synergistic effect of the four constraints S5.1-S5.4, the model significantly improves the coherence and stability of the geometric structure in the air-to-ground fusion scene while maintaining high-frequency details; the overall optimization objective of the model is... The aim is to combine image fidelity and geometric stability to address the multiple challenges posed by air-to-ground data fusion. The total loss function consists of a weighted sum of the core photometric reconstruction loss and a series of geometrically structured constraint losses. Defined as:
[0046]
[0047] in, This represents the weight coefficient of each regularization term, used to balance geometric constraints and photometric consistency.
[0048] The beneficial effects of this application are that it enhances the robustness of the model under conditions of large parallax and non-uniform observation by co-optimizing geometric constraints and adaptive density control. To effectively suppress levitation artifacts and structural fractures, it introduces neighborhood-based spatial smoothing, normal consistency, opacity entropy regularization, and scale penalty, which are crucial for high-quality air-to-ground geometric alignment. By fusing gradient information, viewpoint importance, and geometric complexity, it dynamically manages the splitting and pruning of Gaussian points, improving the representation accuracy of detailed regions while controlling the model size. Attached Figure Description
[0049] Figure 1 This application demonstrates the proposed AGAS-GS overall framework. The method takes aerial and ground multi-view images and SfM sparse point clouds as input.
[0050] Figure 2 This is a schematic diagram of the adaptive density control mechanism for the air-to-ground fusion scenario in this application.
[0051] Figure 3Visualize the Gaussian morphological evolution of the geometric constraints in this application.
[0052] Figure 4 This paper presents a comparison of the reconstructions of AGAS-GS and the baseline model on the SF dataset in this application.
[0053] Figure 5 This is a comparison of the rendering results of AGAS-GS and the baseline model in this application.
[0054] Figure 6 Visual comparison of point cloud reconstructions of the New York City dataset under different perspective inputs. From left to right: aerial view only, ground view only, and aerial-ground blended view. Red and orange boxes highlight the texture details at the street level, while green boxes indicate the geometry of building roofs.
[0055] Figure 7 This study compares the quality of 3D point cloud reconstruction achieved by the density control module ablation in this application.
[0056] Figure 8 This paper compares the quality of 3D point cloud reconstruction by regularization module ablation in this application. Detailed Implementation
[0057] To make the objectives, technical solutions, and advantages of this application clearer, the application will be further described in detail below with reference to specific embodiments and accompanying drawings. It should be understood that these descriptions are merely exemplary and not intended to limit the scope of this application. Furthermore, descriptions of well-known structures and technologies are omitted in the following description to avoid unnecessarily obscuring the concepts of this application.
[0058] A method for 3D reconstruction of an open ground scene based on Gaussian splashing includes:
[0059] S1. Acquire multi-source aerial and ground scene images, including aerial images with large pitch angles and ground images at eye level, as well as camera pose.
[0060] S2. Based on the motion-reconstructed structure, the sparse point cloud of the scene is instantiated to generate an initial set of 3D spatial primitives. This application uses 3DGS as the basic framework for scene representation. This process mainly includes three basic stages: Gaussian geometric surface features, covariance construction, and splash-based rasterization rendering. 3DGS utilizes a set of discrete Gaussian spheres. Explicitly represent the 3D scene, each Gaussian sphere As an isotropic kernel function, its density distribution in three-dimensional space follows a Gaussian function. Each Gaussian is determined by its central location. Scale vector controlling geometry and rotation quaternions Opacity And through spherical harmonic coefficients The defined viewpoint-dependent colors are jointly determined. The covariance matrix of the Gaussian density. It is determined by its rotation matrix and scale matrix It is constructed by combination, specifically as follows:
[0061]
[0062] Among them, color information A third-order spherical harmonic function is used for encoding to achieve effective modeling of changes in illumination and viewing angle.
[0063] S3. A multi-metric adaptive density control mechanism is introduced to achieve adaptive matching of Gaussian density at different scales. Specifically, the multi-metric adaptive density control mechanism dynamically evaluates the contribution of Gaussian elements based on gradient, viewpoint importance, and geometric features. Guided by this scoring mechanism, three topological operations are performed: S3.1, cloning and splitting, to increase the density of under-reconstructed regions; S3.2, pruning, to remove invalid points; and S3.3, similarity merging, to compress redundant features.
[0064] S3.1 employs a multi-criteria fusion splitting strategy to achieve refined modeling of multi-scale scenes, specifically including:
[0065] S3.11. Establish a joint judgment criterion based on location gradient threshold and spatial scale constraint, specifically as follows:
[0066]
[0067] in, Indicates Vidi The position vectors of Gaussian points Let its position gradient be... To define the spatial extent of the scene, this criterion identifies key areas requiring refinement through a joint assessment of location gradient magnitude and scale constraints. For the first A scale vector of Gaussian elements, scale vector The largest scale component in.
[0068] S3.12. Construct a viewpoint importance assessment model to strengthen key regions that have significant visual contributions from multiple viewpoints, specifically:
[0069]
[0070] in, For point To the camera directional vector, For the first Each camera forward vector For camera position, Total number of cameras, by perspective importance Filter visually important areas, when Splitting operations are performed in a timely manner to ensure that computing resources are concentrated in visually salient areas; For the first The Gaussian element relative to the first The angle consistency term for each camera is derived from Gaussian elements. Pointing at the camera unit direction vector With the camera forward direction vector The dot product is calculated and used to characterize the geometric alignment between the Gaussian element and the camera observation direction; Indicates the first The Gaussian element relative to the first The distance attenuation term for each camera is determined by the Gaussian element position. With the camera center The Euclidean distance between them is calculated and used to characterize the influence of the observation distance on the Gaussian perspective. Indicates the first The importance score of each Gaussian perspective.
[0071] S3.13. Design a local geometric complexity index, which comprehensively considers the consistency between local point cloud density and scale to characterize the geometric complexity of a region. Specifically:
[0072]
[0073] in, Represents local density. For point and neighboring points distance, The variance is the scale. At the neighborhood point scale, For the first Each Gaussian element corresponds to a geometric complexity index for a local region. No. Each Gaussian element corresponds to a geometric complexity index for a local region. For the number of neighborhoods, Here is the numerical stability constant. The total number of Gaussian elements participating in the geometric complexity normalization calculation.
[0074] S3.14, The adaptive splitting decision strategy, based on the geometric complexity calculated in S3.11-S3.13, abandons the traditional fixed-number splitting and instead adopts an adaptive splitting decision strategy to dynamically determine the optimal number of split copies for each Gaussian point. This adaptively increases the Gaussian unit density in regions rich in detail. Specifically:
[0075]
[0076] in, Point The number of splits, This indicates that the results will be restricted to... Within this range, the strategy allocates more Gaussian points in geometrically complex regions, achieving adaptive optimal allocation of computing resources.
[0077] The pruning strategy in S3.2 achieves the removal of redundant point clouds through a multi-dimensional joint evaluation of opacity, visibility, and scale. Specifically, it includes:
[0078] S3.21. A dynamic opacity threshold strategy that decreases linearly with the number of iterations is adopted to retain sufficient geometric exploration capability in the early stage of training to adapt to large-scale search in open-field scenarios. Specifically:
[0079]
[0080] in, The dynamic threshold that changes over time. For the number of iterations, This represents the total number of iterations. To achieve pruning masking, this strategy preserves more exploration points in the early stages of training and focuses on effective contribution points in the later stages.
[0081] S3.22. Introduce a visibility scoring mechanism based on omnidirectional observation data, which identifies visual redundancy through joint weighting of distance and angle, specifically:
[0082]
[0083] in, For distance attenuation, For angle consistency, For its overall visibility score, when Pruning operations are performed in real time to effectively remove point clouds that are uniformly invisible from most viewpoints. For the first The position vector of each Gaussian element in three-dimensional space. For the first The center position of each camera, For the first The overall visibility score of each Gaussian unit.
[0084] S3.23. A dual-scale constraint pruning mechanism based on physical space extent and screen projection size is added to prevent large-scale dilated Gaussian points from occluding distant details or causing occlusion artifacts during open-world viewpoint transitions. Specifically:
[0085]
[0086] in, Let be the projection radius of the Gaussian point in screen space. To maximize the screen size allowed, this mechanism ensures that the reconstruction results remain visually consistent.
[0087] In S3.3, during similarity merging, to further compress the model volume and eliminate overlapping redundancy, especially in high-density areas of overlapping air and ground data, a composite feature similarity merging algorithm is introduced. This algorithm fuses Gaussian point pairs that are spatially close, have similar color features, and are of consistent scale. Specifically:
[0088]
[0089] in, They represent the first The and the first Position vectors of Gaussian elements; They represent the first The and the first The color characteristics of each high-resolution element; They represent the first The and the first A scale vector of Gaussian elements; Indicates the first The and the first Spatial distance between Gaussian elements; Indicates the first The and the first The color feature differences between individual Gaussian elements; Indicates the first The and the first The scale difference between Gaussian points; Gaussian point pairs that meet these conditions will be weighted and fused into a new Gaussian point. This multi-dimensional filtering mechanism effectively eliminates redundant computation in the overlapping areas of empty and ground data, significantly reduces memory usage, and improves rendering efficiency in large-scale scenes.
[0090] S4. Project the 3D spatial primitives onto the 2D view plane using the camera Jacobian matrix. After sorting by depth, extract color and opacity along the ray direction, and output the predicted 2D rendered image by integrating using the Alpha blending formula. This application explicitly extends the basic Gaussian primitives by additionally introducing surface normals. and deep uncertainty These properties are updated jointly with the base parameters during optimization. During rendering, the 3D Gaussian is first projected onto the 2D image plane; the projection process uses the camera's Jacobian matrix J to convert the 3D covariance... Convert to 2D covariance matrix All Gaussians located within the camera's view frustum are then sorted in parallel according to depth. The final pixel color... The weighted opacity Alpha mixing formula is used to calculate:
[0091]
[0092] in, It is a sorted set of Gaussians. This rendering pipeline forms the basic computational framework of the algorithm in this study, enabling all Gaussian parameters to be jointly optimized end-to-end through photometric loss.
[0093] S5. Introducing multidimensional geometric constraints to ensure the consistency of the physical surface. Specifically, the optimization process is guided by geometric constraints to address surface noise, normal disorder, and opacity ambiguity in the initial state. By comparing the unconstrained state with the optimized stable structure, four geometric regularization mechanisms are proposed to correct the Gaussian element distribution. These constraints work synergistically as follows: S5.1 Spatial smoothness, which forces the construction of a continuous geometric surface by minimizing the spatial variance within the local neighborhood; S5.2 Normal consistency, which constrains the parallelism of adjacent Gaussian normal directions, effectively eliminating non-physical surface fragmentation artifacts; S5.3 Scale constraint, which limits the excessive expansion and variation of Gaussian elements to ensure the compactness of the geometric representation; S5.4 Opacity entropy, which promotes the convergence of the opacity distribution towards binarization, thereby suppressing translucent artifacts and enhancing the clarity of object boundaries.
[0094] S5.1 introduces a spatial smoothing loss, which forces Gaussian elements in the local neighborhood to remain continuous in the depth direction. By minimizing the variance of the spatial position between the current point and the neighborhood centroid, it effectively smooths discrete surface noise. Specifically:
[0095]
[0096] in, To represent Gauss The spatial geometric center of the nearest neighbor, For spatial smoothing loss, For the first The position vector of a Gaussian element in three-dimensional space.
[0097] Using explicit normal properties in S5.2 By imposing a normal consistency constraint, penalizing the non-parallelism of adjacent Gaussian normal directions, and forcing local surface normals to tend towards uniformity, the fragmentation artifacts of the reconstructed surface are eliminated. Specifically:
[0098]
[0099] in, For loss of normal uniformity, For the first A high-ranking official The nearest neighbor Gaussian set, , For the first , The surface normal vector of Gaussian elements.
[0100] In S5.3, to prevent Gaussian scaling Overgrowth leads to loss of geometric details or the production of needle-like artifacts. Applying a regularization penalty to the Gaussian scale limits the maximum norm of the scale, forcing Gaussian units to maintain a compact geometric shape, specifically:
[0101]
[0102] in, This represents the scale regularization loss.
[0103] In S5.4, to suppress buoyancy noise and obtain a clear geometric surface, an opacity entropy loss is introduced, forcing the opacity to converge towards binary 0 or 1, reducing the blurred areas of semi-transparent regions. Specifically:
[0104]
[0105] in, This represents the entropy loss due to opacity.
[0106] Through the synergistic effect of the four constraints S5.1-S5.4, the model significantly improves the coherence and stability of the geometric structure in the air-to-ground fusion scene while maintaining high-frequency details; the overall optimization objective of the model is... The aim is to combine image fidelity and geometric stability to address the multiple challenges posed by air-to-ground data fusion. The total loss function consists of a weighted sum of the core photometric reconstruction loss and a series of geometrically structured constraint losses. Defined as:
[0107]
[0108] in, This represents the weight coefficient of each regularization term, used to balance geometric constraints and photometric consistency.
[0109] S6. After the closed-loop flow from S2 to S5 converges, high-precision 3D reconstructed point cloud and new perspective 2D rendered image are extracted and output.
[0110] To make this application easier to understand, experimental examples are provided below for further explanation.
[0111] Dataset composition and experimental scenario: In order to fully verify the effectiveness and generalization ability of the proposed AGAS-GS method in the air-ground image fusion scenario, this study follows the experimental setup of UC-GS
[23] (UAV-assisted road Gaussian sputtering based on cross-view uncertainty). Since no single dataset can provide high-quality air-ground coverage for diverse topologies at the same time, the multi-source fusion dataset provided by this benchmark was adopted:
[0112] Aerial imagery from UrbanScene3D is used to represent the NYC (New York) scene, providing high-density urban structure. Aerial imagery from the DroneDeploy dataset is used to represent the SF (San Francisco) scene, providing extensive terrain coverage.
[0113] For ground-based sources, the ground images corresponding to both scenes were retrieved and matched from the Mapillary platform.
[0114] These two scenarios represent distinctly different geometric challenges in air-ground fusion: NYC, utilizing the high-rise building features of UrbanScene3D, can be used to analyze the impact of occlusion and viewpoint differences on geometric consistency in urban environments. SF, emphasizing the large spatial scale and the difference in air-ground viewpoint resolution, poses a challenge to the robustness of algorithms under large-scale parallax conditions.
[0115] This dataset provides two key complementary perspectives: the drone aerial perspective, which provides sparse global geometric constraints and scene coverage, and the ground perspective, which provides detailed textures from the perspective of vehicles or pedestrians.
[0116] Data preprocessing: All raw images were resized to approximately 1600. The resolution is 1200 pixels. To ensure geometric consistency across multiple views, this experiment used pre-computed camera poses provided by a benchmark dataset. These poses were pre-estimated using COLMAP with joint feature matching and bundle adjustment, thus achieving accurate registration of aerial and ground imagery in a unified 3D coordinate system.
[0117] Evaluation Metrics: To objectively and comprehensively evaluate the synthetic results from the new perspective, this paper employs three industry-standard rendering quality metrics: Peak Signal-to-Noise Ratio (PSNR), used to measure absolute pixel error; Structural Similarity Index (SSIM), used to evaluate structural similarity; and Learned Perceptual Patch Similarity (LPIPS), used to reflect differences in human perception. These metrics are used to quantify the accuracy of the proposed method in restoring texture and structural details from different perspectives when processing spatial-ground fusion data.
[0118] To verify the effectiveness of the proposed Gaussian splash-based 3D reconstruction method for air-ground scenes (AGAS-GS) in air-ground fusion scenes and the actual contribution of its improved modules, several representative Gaussian representation methods were selected as comparative baselines. These baselines cover different research directions, including the original framework, multi-scale optimization, feature enhancement, and structural improvement. Table 1 details the specific features and selection rationale for each comparative model.
[0119] Table 1 Comparison of Benchmark Models
[0120]
[0121] Experimental Setup and Fairness: To ensure rigorous comparability and fairness of all performance comparisons, all comparison models were reproduced using publicly available implementations or official source code. Testing was conducted under identical conditions, including the same dataset, input resolution, optimizer parameters, and completely consistent training settings.
[0122] Environment Configuration: To ensure consistency, all experiments were conducted on the same high-performance computing server. See Table 2 for detailed hardware and software configurations.
[0123] Table 2 Hardware and Software Environment Configuration
[0124]
[0125] The model was trained using the Adam optimizer, with a total of 30,000 iterations. The learning rate for the position parameters employed an exponential decay scheduling strategy: the initial learning rate was set to... After 2,000 iterations of linear preheating, it decays smoothly to its initial value using an exponential function. Furthermore, a dynamic feedback mechanism based on gradient variance is introduced during training to monitor optimization stability; once oscillations are detected, dynamic learning rate decay and gradient norm pruning will be automatically triggered.
[0126] The total loss function consists of a weighted sum of the photometric reconstruction loss and four geometric constraints. The weights of each component are set as follows: =1.0, =0.01, =0.005, =0.001, =0.01.
[0127] For density control, adaptive encryption is performed every 100 iterations. Splitting decisions rely on a joint assessment of the fusion location gradient, viewpoint importance, and local geometric complexity. The pruning strategy effectively manages point cloud distribution in air-to-ground fusion scenarios by evaluating dynamic opacity thresholds, multi-view visibility scores, and physical scale constraints.
[0128] Table 3. Comparison of quantitative performance between AGAS-GS and SOTA methods in air-ground fusion scenarios.
[0129]
[0130] Table 3 presents the quantitative evaluation results of AGAS-GS in the air-to-ground fusion scenarios of NYC and SF. Compared with existing Gaussian methods, AGAS-GS demonstrates improved performance. In terms of PSNR, the proposed method achieves 29.37 dB and 32.30 dB in the two scenarios, respectively, exceeding the original 3DGS by more than 2 dB. These data validate the effectiveness of the multi-metric fusion splitting strategy in improving geometric consistency and local detail representation. Meanwhile, the SSIM metrics reach 0.920 and 0.910, respectively, outperforming Scaffold-GS and other baseline models, indicating that the proposed adaptive splitting and scale constraint mechanism can maintain structural stability under mixed perspective conditions.
[0131] Furthermore, AGAS-GS achieved a superior LPIPS (Perceptual Quality Indicator) value, demonstrating an improvement over 3DGS and Mip-GS. This reflects the effectiveness of the pruning mechanism in suppressing redundant Gaussians and projection artifacts, thereby enhancing the visual realism of the rendering. Notably, UC-GS and 2DGS exhibited varying degrees of performance degradation in both scenarios, particularly with a significant increase in LPIPS in the SF (Spatial Free) scenario, while AGAS-GS maintained relative stability. This further corroborates the geometric adaptability and cross-scale robustness of this method under the non-uniform viewpoint distribution characteristics of air-to-ground fusion. The comprehensive performance of AGAS-GS on these three core metrics confirms the practical contribution of the proposed split scheduling and multi-dimensional pruning strategies to improving the reconstruction performance of large-scale air-to-ground fusion scenes.
[0132] Figure 4This paper presents a qualitative comparison of the point cloud reconstruction quality for large scenes, revealing the differences between various methods in terms of geometric integrity, noise control, and detail fidelity. Specifically, 3D-GS exhibits structural discontinuities in areas such as building edges and beam-column structures, manifesting as sheet-like fractures and floating artifacts, with relatively vague boundary definitions. While 2D-GS and Mip-GS generate high-density point clouds, visible geometric blurring and deformation still exist in thin structures such as window frames. Scaffold-GS improves the recovery of some main outlines; however, noise accumulation and local structural fractures exist in its internal areas, reflecting its limitations in modeling complex mixed open and ground scenes. The reconstruction results of UC-GS are scattered with discrete noise, resulting in a loose overall geometric structure and insufficient edge coherence. In contrast, the AGAS-GS method proposed in this application suppresses anomalous floating points and random noise, generating point clouds with sharper boundaries, more uniform distribution, and better topological coherence in detailed areas such as building facades, window structures, and road surfaces, demonstrating better geometric consistency and structural stability. This visual result confirms that the adaptive Gaussian splitting scheduling and saliency-aware pruning strategy can effectively constrain the proliferation of redundant Gaussian ellipsoids and optimize their spatial distribution, thereby improving the accuracy and robustness of geometric reconstruction. Following the domain-general paradigm, this study will subsequently extract point clouds from the optimized Gaussian centers for rigorous quantitative geometric quality analysis.
[0133] like Figure 5 As shown in the figure, this experiment qualitatively compares the proposed method with current representative Gaussian rendering techniques in two large-scale outdoor scenes in NYC and SF. For high-frequency texture regions in the NYC scene, all baseline methods exhibit varying degrees of detail loss: 2D-GS and 3D-GS show character edge diffusion; Mip-GS and UC-GS retain some sharpness in local structure, but are still accompanied by texture smoothing; Scaffold-GS's multi-layer filtering mechanism leads to the loss of font texture details and a decrease in local recognizability. In contrast, the proposed method can better recover text boundaries and high-contrast textures, and its reconstruction results are closer to the real image in terms of local gradient changes, indicating that the proposed representation and optimization mechanism has good robustness in preserving high-frequency information.
[0134] In structurally complex areas such as building facades in the SF scene, significant differences in geometric consistency exist among different methods. 2D-GS, Scaffold-GS, and UC-GS exhibit problems such as blurring, misalignment, and incomplete structures in distant building outlines and facade edges, reflecting their susceptibility to parallax-induced reconstruction biases when cross-scale geometric constraints are insufficient. Mip-GS and 3D-GS suppress random noise to some extent, but still exhibit slight structural deformation and texture fluctuations. In contrast, the proposed method maintains better geometric continuity at building edges, with relatively intact high-frequency structures in window frames, and a consistent transition between shadows and materials.
[0135] The introduction of a ground-air fusion perspective provides complementary observation coverage for scene geometry: the ground perspective enhances texture constraints in lower-level and street areas, while the aerial perspective supplements global structural and scale information. This joint constraint across perspectives alleviates the geometric degradation problem that traditional methods are prone to in sparse observation areas, enabling the proposed method to exhibit good structural consistency and detail restoration capabilities in large-scale outdoor scenes.
[0136] To verify the crucial role of perspective complementarity in large-scale urban reconstruction, this section systematically evaluates the reconstruction performance of ground-only, air-only, and air-ground fusion modes on the NYC and SF datasets. The quantitative results in Table 4 show that single-view input exhibits certain performance limitations when processing complex urban scenes, while the air-ground fusion strategy achieves continuous improvement across all metrics.
[0137] Table 4. Quantitative comparison of reconstruction quality between air-to-ground fusion and single-source (air / ground) scenes on the NYC and SF datasets.
[0138]
[0139] In SF scenes, reconstruction relying solely on ground-based perspectives faces challenges due to large-scale parallax and occlusion, resulting in a PSNR of 27.72 dB. By integrating aerial data, the proposed method improves the PSNR to 32.30 dB and reduces LPIPS from 0.111 to 0.090. This result demonstrates that the global geometric constraints provided by the aerial perspective play a crucial role in enhancing reconstruction quality under complex terrain conditions.
[0140] In the NYC scenario, although the aerial mode alone achieved high structural similarity (SSIM 0.949) due to its coverage of the tops of tall buildings, its LPIPS score (0.118) was still higher than the fused mode, indicating a perceptual bias in near-ground texture recovery when relying solely on aerial data. The aerial-ground fusion mode further optimized the LPIPS to 0.060, achieving superior visual perception quality. Comparative analysis shows that the proposed method outperforms baseline methods such as 3D-GS in the fusion mode, validating the model's effectiveness in handling heterogeneous data.
[0141] The point cloud visualization results in Figure 6 further corroborate the quantitative analysis, demonstrating the differences in geometric contributions from data from different viewpoints. Reconstruction based solely on the ground view is limited by the upward perspective; while it preserves good texture details in close-up street areas, it exhibits significant geometric loss at the tops of buildings, failing to restore a complete urban skyline.
[0142] Conversely, reconstruction based solely on an aerial perspective effectively completes the geometry of the building's top and maintains topological connectivity, but the point cloud exhibits significant sparsity and texture blurring at the street level, which is insufficient to meet the needs of high-precision close-range observation.
[0143] The air-ground fusion mode combines the advantages of both perspectives, reconstructing the geometry of high-rise building rooftops while maintaining sufficient texture density at the street level. Visual comparisons show that the fusion reconstruction achieves a better balance between coverage and detail fidelity, effectively mitigating blind spots and blurring issues present in single-view modes, and enabling high-quality reconstruction of complex urban scenes.
[0144] To verify the effectiveness of the proposed method in processing multi-source air-ground data, particularly the specific contributions of the regularization and density control modules to improving reconstruction quality, this section presents a quantitative analysis on the NYC and SF datasets. As shown in Table 5, the baseline 3DGS method achieves a PSNR of 27.27 dB in both scenarios, indicating that relying solely on the original 3DGS method addresses the geometric differences between air-ground views.
[0145] Table 5 Comparison of Module Ablation Experiment Results
[0146]
[0147] The introduction of the regularization module improved the PSNR in the SF scene to 28.67 dB and reduced the LPIPS error in the NYC scene to 0.13, demonstrating that regularization constraints can effectively promote the smooth transition of air-to-ground data at the feature level and enhance geometric continuity.
[0148] The introduction of the density control module utilizes the wide-area coverage characteristics of the aerial view to suppress artifacts in the ground blind zone, improving the PSNR of the NYC scene to 28.12 dB.
[0149] Through the synergistic integration of the above modules, the complete model achieves the most significant performance improvement: in the NYC scenario, the PSNR reaches 29.37 dB, and the LPIPS is the lowest (0.06); in the complex SF scenario, the PSNR is significantly higher, reaching 32.30 dB.
[0150] The visualization results in Figure 7 further corroborate the quantitative analysis above. Comparing the reconstruction results of the ablation variant and the complete model reveals that although individual modules improve the point cloud distribution to some extent, structural sparseness or noise interference still remains in certain areas.
[0151] In contrast, the point cloud generated by the complete model eliminates structural defects present in the ablation variant, exhibiting higher density and integrity. This indicates that only by combining the geometric smoothing effect of regularization with the blind zone suppression effect of density control can the complementary advantages of air-to-ground fusion data be fully utilized to achieve high-fidelity 3D reconstruction.
[0152] This paper proposes the AGAS-GS method to address the significant challenges of large parallax and geometric incompleteness in large-scale urban scene reconstruction. By establishing a unified spatial-ground fusion modeling framework, this method co-optimizes geometric constraints and adaptive density control mechanisms. Experimental results show that the proposed method achieves superior reconstruction quality compared to existing baselines in complex scenes such as NYC and SF, particularly in high-frequency detail recovery, boundary sharpness, and photometric consistency. Ablation experiments further confirm that the regularization term promotes cross-view feature smoothing, while the density control strategy suppresses artifacts in ground blind spots; the combination of these two factors contributes to achieving high-fidelity spatial-ground geometric alignment. In conclusion, AGAS-GS provides an effective solution for heterogeneous multi-view urban 3D reconstruction.
Claims
1. A method for 3D reconstruction of an open ground scene based on Gaussian splashing, characterized in that, include: S1. Acquire multi-source aerial and ground scene images and camera poses, including aerial images with large pitch angles and ground images at eye level. S2. Based on the motion recovery structure, the sparse point cloud of the scene is instantiated to generate the initial set of three-dimensional spatial primitives; S3. Introduce a multi-metric adaptive density control mechanism to achieve adaptive matching of Gaussian density in different scale spaces; S4. Project the three-dimensional spatial primitives onto the two-dimensional view plane using the camera Jacobian matrix, sort them by depth, extract the color and opacity along the ray direction, and output the predicted two-dimensional rendered image by integrating the Alpha mixing formula. S5. Introduce multidimensional geometric constraints to ensure the consistency of the physical surface; S6. After the closed-loop flow of S2-S5 converges, high-precision 3D reconstructed point cloud and new perspective 2D rendered image are extracted and output.
2. The method for 3D reconstruction of an open-air scene based on Gaussian splashing as described in claim 1, characterized in that, The multi-metric adaptive density control mechanism in S3 dynamically evaluates the contribution of Gaussian elements based on gradient, viewpoint importance, and geometric features. Guided by this scoring mechanism, three topological operations are performed: S3.1 Cloning and splitting are used to increase the density of under-reconstructed regions; S3.2 Pruning, used to remove invalid points; S3.3 Similarity merging to compress redundant features.
3. The method for 3D reconstruction of an open-air scene based on Gaussian splashing as described in claim 2, characterized in that, The multi-criteria fusion splitting strategy employed in S3.1 enables refined modeling of multi-scale scenes, specifically including: S3.
11. Establish a joint judgment criterion based on location gradient threshold and spatial scale constraint, specifically as follows: ; in, Indicates Vidi The position vectors of Gaussian points Let its position gradient be... To define the spatial extent of the scene, this criterion identifies key areas requiring refinement through a joint assessment of location gradient magnitude and scale constraints. For the first A scale vector of Gaussian elements, scale vector The largest scale component in; S3.
12. Construct a viewpoint importance assessment model to strengthen key regions that have significant visual contributions from multiple viewpoints, specifically: ; in, For point To the camera directional vector, For the first Each camera forward vector For camera position, Total number of cameras, by perspective importance Filter visually important areas, when Splitting operations are performed in a timely manner to ensure that computing resources are concentrated in visually salient areas; For the first The Gaussian element relative to the first The angle consistency term for each camera is derived from Gaussian elements. Pointing at the camera unit direction vector With the camera forward direction vector The dot product is calculated and used to characterize the geometric alignment between the Gaussian element and the camera observation direction; Indicates the first The Gaussian element relative to the first The distance attenuation term for each camera is determined by the Gaussian element position. With the camera center The Euclidean distance between them is calculated and used to characterize the influence of the observation distance on the Gaussian perspective. Indicates the first The importance score of each Gaussian perspective; S3.
13. Design a local geometric complexity index, which comprehensively considers the consistency between local point cloud density and scale to characterize the geometric complexity of a region. Specifically: ; in, Represents local density. For point and neighboring points distance, The variance is the scale. At the neighborhood point scale, For the first Each Gaussian element corresponds to a geometric complexity index for a local region. No. Each Gaussian element corresponds to a geometric complexity index for a local region. For the number of neighborhoods, Here is the numerical stability constant. The total number of Gaussian elements participating in the geometric complexity normalization calculation; S3.
14. The adaptive splitting decision strategy, based on the geometric complexity calculated in S3.11-S3.13, abandons the traditional fixed-number splitting and instead adopts an adaptive splitting decision strategy to dynamically determine the optimal number of split copies for each Gaussian point, so as to adaptively increase the Gaussian unit density in regions rich in detail. Specifically: ; in, Point The number of splits, This indicates that the results will be restricted to... Within this range, the strategy allocates more Gaussian points in geometrically complex regions, achieving adaptive optimal allocation of computing resources.
4. The method for 3D reconstruction of an open-air scene based on Gaussian splashing as described in claim 3, characterized in that, The pruning strategy in S3.2 achieves the removal of redundant point clouds through multi-dimensional joint evaluation of opacity, visibility, and scale, specifically including: S3.
21. A dynamic opacity threshold strategy that decreases linearly with the number of iterations is adopted to retain sufficient geometric exploration capability in the early stage of training to adapt to large-scale search in open-field scenarios. Specifically: ; in, The dynamic threshold that changes over time. For the number of iterations, This represents the total number of iterations. To achieve pruning masking, this strategy preserves more exploration points in the early stages of training and focuses on effective contribution points in the later stages. For the first The opacity parameter of each Gaussian element; S3.
22. Introduce a visibility scoring mechanism based on omnidirectional observation data, which identifies visual redundancy through joint weighting of distance and angle, specifically: ; in, For distance attenuation, For angle consistency, For its overall visibility score, when Pruning operations are performed in real time to effectively remove point clouds that are uniformly invisible from most viewpoints. For the first The position vector of each Gaussian element in three-dimensional space. For the first The center position of each camera, For the first The overall visibility score of each Gaussian element; S3.
23. A dual-scale constraint pruning mechanism based on physical space extent and screen projection size is added to prevent large-scale dilated Gaussian points from occluding distant details or causing occlusion artifacts during open-world viewpoint transitions. Specifically: ; in, Let be the projection radius of the Gaussian point in screen space. To maximize the screen size allowed, this mechanism ensures that the reconstruction results remain visually consistent.
5. The method for 3D reconstruction of an open-air scene based on Gaussian splashing as described in claim 4, characterized in that, In the similarity merging process described in S3.3, to further compress the model volume and eliminate overlapping redundancy, especially in high-density areas of overlapping air and ground data, a composite feature similarity merging algorithm is introduced. This algorithm fuses Gaussian point pairs that are spatially close, have similar color features, and consistent scale. Specifically: ; in, They represent the first The and the first Position vectors of Gaussian elements; They represent the first The and the first The color characteristics of each high-resolution element; They represent the first The and the first A scale vector of Gaussian elements; Indicates the first The and the first Spatial distance between Gaussian elements; Indicates the first The and the first The color feature differences between individual Gaussian elements; Indicates the first The and the first The scale difference between Gaussian points; Gaussian point pairs that meet these conditions will be weighted and fused into a new Gaussian point. This multi-dimensional filtering mechanism effectively eliminates redundant computation in the overlapping areas of empty and ground data, significantly reduces memory usage, and improves rendering efficiency in large-scale scenes.
6. The method for 3D reconstruction of an open-air scene based on Gaussian splashing as described in claim 5, characterized in that, The optimization process in S5 is guided by geometric constraints to address surface noise, normal disorder, and opacity ambiguity in the initial state. By comparing the unconstrained state with the optimized stable structure, the proposed four geometric regularization mechanisms correct the Gaussian element distribution. These constraints work synergistically to: S5.1 Spatial smoothness forces the construction of continuous geometric surfaces by minimizing the spatial variance within the local neighborhood; S5.2 Normal consistency: Constrains the parallelism of adjacent Gaussian normal directions, effectively eliminating non-physical surface fragmentation artifacts; S5.3 Scale constraints limit the excessive expansion and variation of Gaussian units to ensure the compactness of the geometric representation; S5.4 Opacity Entropy: Promotes the convergence of the opacity distribution towards binarization, thereby suppressing translucent artifacts and enhancing the clarity of object boundaries.
7. The method for 3D reconstruction of an open-air scene based on Gaussian splashing as described in claim 6, characterized in that, S5.1 introduces a spatial smoothing loss, which forces Gaussian elements in the local neighborhood to remain continuous in the depth direction. By minimizing the variance of the spatial position between the current point and the centroid of the neighborhood, it effectively smooths the discrete surface noise, specifically: ; in, To represent Gauss The spatial geometric center of the nearest neighbor, For spatial smoothing loss, For the first The position vector of a Gaussian element in three-dimensional space; The S5.2 section utilizes explicit normal properties. By imposing a normal consistency constraint, penalizing the non-parallelism of adjacent Gaussian normal directions, and forcing local surface normals to tend towards uniformity, the fragmentation artifacts of the reconstructed surface are eliminated. Specifically: ; in, For loss of normal uniformity, For the first A high-ranking official The nearest neighbor Gaussian set, , For the first , Surface normal vectors of Gaussian elements; In S5.3, to prevent Gaussian scaling... Overgrowth leads to loss of geometric details or the production of needle-like artifacts. Applying a regularization penalty to the Gaussian scale limits the maximum norm of the scale, forcing Gaussian units to maintain a compact geometric shape, specifically: ; in, This is the loss due to scale regularization. In S5.4, to suppress buoyancy noise and obtain a clear geometric surface, an opacity entropy loss is introduced, forcing the opacity to converge towards binary 0 or 1, reducing the semi-transparent blurry areas. Specifically: ; in, For the entropy loss of opacity; Through the synergistic effect of the four constraints S5.1-S5.4, the model significantly improves the coherence and stability of the geometric structure in the air-to-ground fusion scene while maintaining high-frequency details; the overall optimization objective of the model is... The aim is to combine image fidelity and geometric stability to address the multiple challenges posed by air-to-ground data fusion. The total loss function consists of a weighted sum of the core photometric reconstruction loss and a series of geometrically structured constraint losses. Defined as: ; in, This represents the weight coefficient of each regularization term, used to balance geometric constraints and photometric consistency.