A panoramic image three-dimensional reconstruction method and device based on three-dimensional Gaussian sputtering

CN122289508APending Publication Date: 2026-06-26SHENZHEN INST OF ADVANCED TECH CHINESE ACAD OF SCI

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: SHENZHEN INST OF ADVANCED TECH CHINESE ACAD OF SCI
Filing Date: 2024-12-26
Publication Date: 2026-06-26

Smart Images

Figure CN122289508A_ABST

Patent Text Reader

Abstract

This invention relates to the field of 3D reconstruction technology, specifically to a method and apparatus for 3D reconstruction of panoramic images based on 3D Gaussian sputtering. The method and apparatus include: segmenting an acquired panoramic image into several perspective images; performing sparse reconstruction processing on the perspective images to obtain sparse point clouds and camera intrinsic and extrinsic parameters; initializing Gaussian primitives using the 3D Gaussian sputtering method based on the representation of the sparse point clouds in the scene, and refining the Gaussian distribution through iterative rendering from multiple perspectives. This invention utilizes panoramic images or videos captured by a panoramic camera to perform realistic 3D reconstruction of panoramic images using a 3DGS 3D reconstruction method based on depth map regularization, optimizing scene geometry and reducing stitching artifacts and floating objects.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of 3D reconstruction technology, and more specifically, to a method and apparatus for 3D reconstruction of panoramic images based on 3D Gaussian sputtering. Background Technology

[0002] 3D reconstruction refers to the process of reconstructing or restoring the geometric structure and appearance information of a 3D object or scene from multiple 2D images or other perceptual data sources. The goal of this field is to fuse information from multiple perspectives or data sources to generate a more accurate 3D model. Image-based 3D reconstruction is a key technology in computer graphics and vision, widely used in robotics, autonomous navigation, filmmaking, medical imaging, heritage conservation, and virtual / augmented reality. This task involves reconstructing the 3D structure and geometry of an object or scene from one or more 2D images. Recent years have shown that neural networks are effective for 2D and 3D inference; however, most 3D estimation methods rely on supervised training mechanisms and expensive annotations, making it challenging to collect all the properties of 3D observations. Therefore, recent efforts have focused on leveraging more readily available 2D information and different levels of supervision to understand 3D scenes.

[0003] In recent years, novel implicit continuous representation methods have gained attention, utilizing functions or neural networks to represent geometric information. These implicit methods take the coordinates of a spatial point as input and output information about the object at that point. For example, Neural Radiance Fields (NeRF) predicts color and density values given the 3D coordinates of a point. NeRF learns an implicit function to represent complex 3D shapes, which is learned by a multilayer perceptron (MLP). The input consists of the 3D point coordinates (x, y, z) and the viewing direction. The output is a 4D vector composed of emission colors (r, g, b) and volume density σ. However, the training and inference times of the original NeRF are very high. Although NeRF and other methods assist in optimization through continuity, the high cost of random sampling required during rendering may lead to noise.

[0004] Existing NeRF-based methods, such as Mip-NeRF360, while producing excellent rendering quality, still have very long training and rendering times, requiring up to 48 hours of training time, which presents certain technical limitations. Summary of the Invention

[0005] This invention provides a method and apparatus for three-dimensional reconstruction of panoramic images based on three-dimensional Gaussian sputtering, so as to at least solve the technical problem of poor performance in existing three-dimensional reconstruction.

[0006] According to an embodiment of the present invention, a method for 3D reconstruction of panoramic images based on 3D Gaussian sputtering is provided, comprising the following steps:

[0007] The acquired panoramic image is segmented into several perspective images;

[0008] Sparse reconstruction processing is performed on the perspective image to obtain sparse point cloud and camera intrinsic and extrinsic parameters;

[0009] The Gaussian distribution is refined by initializing Gaussian primitives based on the representation of the sparse point cloud in the scene using a 3D Gaussian sputtering method and iterative rendering from multiple perspectives.

[0010] Furthermore, the method also includes: regularizing the training process of the 3D Gaussian sputtering method using a pre-trained monocular depth estimation model.

[0011] Furthermore, regularizing the training process of the 3D Gaussian sputtering method using a pre-trained monocular depth estimation model includes:

[0012] Depth map is obtained by performing depth estimation on the sparsely reconstructed perspective image;

[0013] The depth map is calibrated using a depth loss function.

[0014] Furthermore, the method also includes: using a panoramic camera to record an indoor panoramic video of a preset resolution over several time periods, and using FFmpeg to extract panoramic images from the panoramic video at a preset frame rate.

[0015] Furthermore, the acquired panoramic image is segmented into several perspective images, including:

[0016] Meshroom is used to divide each panoramic image into several perspective images with preset resolutions.

[0017] Furthermore, sparse reconstruction processing is performed on the perspective image to obtain sparse point clouds and camera intrinsic and extrinsic parameters, including:

[0018] Sparse reconstruction of perspective images using COLMAP was performed to obtain sparse point clouds and camera intrinsic and extrinsic parameters.

[0019] Furthermore, the regularization of the 3D Gaussian sputtering training process using a pre-trained monocular depth estimation model specifically includes:

[0020] Using the inverse depth D provided by COLMAP per frame sfm To obtain the calibrated depth map D * ; Rendered depth map D and calibrated depth map D * The results are obtained using the following formulas:

[0021]

[0022] Where d i For the average depth of each Gaussian sphere, t(D) = median(D), SfM represents the index from the motion reconstruction structure, M represents the total number of SfM points in the image, and t(D) is the median.

[0023] Furthermore, deep loss is incorporated into the training process. Regularization is performed; the gradient of the deep supervision is transmitted to each Gaussian sphere, changing the parameters and spatial position of each Gaussian sphere.

[0024] According to another embodiment of the present invention, a panoramic image 3D reconstruction device based on 3D Gaussian sputtering is provided, comprising:

[0025] The segmentation unit is used to segment the acquired panoramic image into several perspective images;

[0026] The sparse reconstruction processing unit is used to perform sparse reconstruction processing on the perspective image to obtain sparse point cloud and camera intrinsic and extrinsic parameters.

[0027] The 3D reconstruction unit is used to initialize Gaussian units based on the representation of sparse point clouds in the scene using the 3D Gaussian sputtering method, and to refine the Gaussian distribution through iterative rendering from multiple perspectives.

[0028] Furthermore, the device also includes:

[0029] The regularization unit is used to regularize the training process of the 3D Gaussian sputtering method using a pre-trained monocular depth estimation model.

[0030] A storage medium storing program files capable of implementing any of the above-mentioned panoramic image 3D reconstruction methods based on 3D Gaussian sputtering.

[0031] A processor for running a program, wherein the program executes any of the above-mentioned methods for 3D reconstruction of panoramic images based on 3D Gaussian sputtering.

[0032] The panoramic image 3D reconstruction method and apparatus based on 3D Gaussian sputtering in this embodiment of the invention utilize panoramic images or panoramic videos captured by a panoramic camera, and perform realistic 3D reconstruction of the panoramic images using a 3DGS 3D reconstruction method based on depth map regularization, thereby optimizing the scene geometry and reducing stitching artifacts and floating objects. Attached Figure Description

[0033] The accompanying drawings, which are included to provide a further understanding of the invention and form part of this application, illustrate exemplary embodiments of the invention and, together with their description, serve to explain the invention and do not constitute an undue limitation thereof. In the drawings:

[0034] Figure 1 This is the overall flowchart of the 3DGS invention;

[0035] Figure 2 This is a 3D reconstruction image of a panoramic image based on 3DGS, as presented in this invention.

[0036] Figure 3 This is a visual comparison of the two methods in this invention during 7000 iterations;

[0037] Figure 4 This is a visual comparison of the two methods in this invention over 30,000 iterations. Detailed Implementation

[0038] To enable those skilled in the art to better understand the present invention, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present invention.

[0039] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this invention are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of the invention described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.

[0040] Example 1

[0041] 3DGS (3D Gaussian Splatting), as a new method, has the following overall process: Figure 1As shown. The input to 3DGS consists of a set of still images calibrated by an SfM camera and a sparse point cloud. The sparse point cloud generated by SfM creates a set of anisotropic Gaussian ellipsoids as a high-quality, unstructured representation of the scene, maintaining continuity and differentiability within the Gaussian ellipsoids while being discrete throughout space. Specifically, the 3D Gaussian distribution is defined by the following formula:

[0042]

[0043] in The center of the Gaussian distribution is determined by the covariance matrix Σ, while the shape and orientation of the Gaussian sphere are determined by the covariance matrix Σ. Furthermore, to ensure the positive semidefiniteness of the covariance matrix during optimization, Σ can be decomposed into scaling vectors. Sum and rotation quaternions To achieve differentiability optimization. The color C of each ellipsoid is represented by a spherical harmonic function, as shown in the following formula:

[0044]

[0045] Where c is a coefficient, y is a spherical harmonic function, l is the order, m is the degree, θ is the polar angle, and φ is the azimuth angle. A higher order results in stronger color representation. Higher orders enhance the color expression. The original 3DGS used k=3rd order spherical harmonic coefficients. Each Gaussian sphere represents one of the RGB color channels. Additionally, each Gaussian sphere has an opacity coefficient.

[0046] To project a 3D Gaussian ellipsoid onto a 2D image space, given the view transformation matrix W and the Jacobian matrix J of the affine approximation of the projection transformation, the 2D covariance matrix Σ′ is calculated using the following formula:

[0047] Σ′=JWΣW T J T

[0048] Based on the ellipse projected onto the two-dimensional image space, a tile-based differentiable rasterization method is used to render the true RGB image at the corresponding viewpoint, where the pixel color C in the image is calculated using the following formula:

[0049] C = c i α i T i

[0050]

[0051] In this formula, c i The color represents the color of each point, while α i This represents the opacity (density) value for each point. α jThis represents the opacity (density) value of the preceding points before point i. (1-α) j The cumulative product, used as the weight of the current point i, can be understood as the transmittance T. The more transparent the preceding points are, the greater the weight of the current point i, and therefore the greater the impact of its opacity. Conversely, the more opaque the preceding points are, the smaller the weight of the current point i, and the smaller the impact of its opacity.

[0052] Finally, the following loss function is used. This is used to calculate the difference between the rendered image and the real image, and to optimize the parameters of the scene representation through backpropagation. This is the structural similarity loss.

[0053] This invention utilizes panoramic images or videos captured by a panoramic camera to perform realistic 3D reconstruction of panoramic images using a 3DGS 3D reconstruction method based on depth map regularization. This invention applies 3DGS technology to the 3D reconstruction of panoramic images and simultaneously uses depth maps obtained from pre-trained depth estimation models for depth regularization to optimize scene geometry and reduce stitching artifacts and floating objects.

[0054] This invention proposes a panoramic image 3D reconstruction method based on 3D Gaussian Splatting (3DGS): Realistic 3D reconstruction of real-world scenes is performed using panoramic images or videos captured by a panoramic camera. First, a 5760×2880 resolution panoramic video is recorded using Insta360X3. Then, multiple panoramic images are extracted from the panoramic video using FFmpeg at a certain frame rate (FPS). Since 3DGS requires multi-view perspective images as input, each input panoramic image is first segmented into multiple perspective images. Then, SfM sparse reconstruction is performed on the perspective images using COLMAP to obtain sparse point clouds and the intrinsic and extrinsic parameters of the camera. 3DGS initializes Gaussian primitives in the scene representation using sparse point clouds and refines these Gaussian distributions through iterative rendering from multiple perspectives. Then, the panoramic image is segmented to introduce stitching artifacts into the rendered image. To reduce these artifacts or floating objects while preserving the overall scene geometry, a depth map obtained from a pre-trained depth estimation model is used to supervise the scene geometry.

[0055] The technical solution of the present invention is described in detail below:

[0056] See Figure 2 3D reconstruction workflow of panoramic images based on 3DGS:

[0057] 1. Record a 10-second indoor panoramic video with a resolution of 5760×2880 using an Insta360X3 panoramic camera. Use FFmpeg to extract panoramic images from the panoramic video at a certain frame rate (FPS=3).

[0058] 2. Use Meshroom to divide each panoramic image into 8 perspective images with a resolution of 1200×1200.

[0059] 3. Use COLMAP to perform SfM processing on the perspective image to obtain sparse point cloud and camera intrinsic and extrinsic parameters.

[0060] 4. Use the pre-trained monocular depth estimation model DepthAnythingV2 to perform depth estimation on all perspective images to obtain depth maps D. mon .

[0061] 5. To resolve the differences between the depth map D in 3DGS rendering and the estimated depth map D. mon To address the scale inconsistency issue, we utilize the inverse depth D provided by COLMAP for each frame. sfm To obtain the calibrated depth map D * Rendered depth map D and calibrated depth map D * The results are obtained using the following formulas:

[0062]

[0063] Where d i For the average depth of each Gaussian sphere, t(D) = median(D), SfM represents the index from the motion reconstruction structure, and M represents the total number of SfM points in the image. t(D) is the median, which can effectively reduce the influence of outliers. s(D) uses the mean absolute deviation to represent the scaling factor.

[0064] 6. By incorporating depth loss during training. Regularization is applied. The gradients from the deep supervision are propagated to each Gaussian sphere, thus affecting the parameters and spatial position of each sphere. This reduces stitching artifacts or floating objects while preserving the overall scene geometry.

[0065] The key points and areas to be protected in this invention are:

[0066] 1. Process the image or video data acquired by the panoramic camera using 3DGS technology to achieve realistic 3D scene reconstruction, including multi-view image extraction and segmentation, sparse reconstruction and point cloud initialization, and depth estimation supervision.

[0067] 2. Multi-view image extraction and segmentation: Extract multiple panoramic images from panoramic videos and segment each panoramic image into multiple perspective images to meet 3DGS input requirements.

[0068] 3. Sparse Reconstruction and Point Cloud Initialization: Sparse reconstruction of the perspective image is performed using COLMAP to obtain sparse point clouds and camera parameters, providing a foundation for the initialization of Gaussian elements in 3DGS.

[0069] 4. Depth Estimation and Supervision: The depth map generated by the pre-trained depth estimation model is introduced to regularize the 3DGS training process. At the same time, the inverse depth of each frame's SfM point is used to solve the problem of scale inconsistency between the estimated depth map and the rendered depth map.

[0070] Compared with the prior art, the advantages of the present invention are:

[0071] 1. A 3D reconstruction method for panoramic images based on 3DGS

[0072] 2. By using a pre-trained depth estimation model to regularize the training process of 3DGS, our overall performance surpasses that of the original 3DGS method. Furthermore, our model matched the results of 3DGS after 30,000 iterations in 7,000 iterations, using a smaller number of Gaussian units for scene representation. Figure 3 and Figure 4 The visual evaluation of the rendered color and depth maps shown demonstrates that the proposed method generates a more accurate geometric environment and effectively mitigates the floating and stitching artifacts generated by panoramic image segmentation.

[0073] This invention has been proven feasible through experiments, simulations, and usage. It was trained on an RTX 4090 GPU (24GB) and experimentally verified. The training / test dataset partitioning strategy is implemented according to the Mip-NeRF360 recommended method, requiring testing with every 8 images to ensure consistent and important evaluation. This invention uses widely accepted metrics such as PSNR, L-PIPS, and SSIM to evaluate performance. To assess the effectiveness of the proposed method, quantitative and qualitative comparisons were performed. The results of the original and improved 3DGS models were recorded at 7K and 30K iterations, respectively. The visual quality differences between the two configurations are shown below. Figure 3 and Figure 4 As shown, Table 1 provides the numerical evaluation of the metrics.

[0074]

[0075] Table 1 compares the PSNR, SSIM, LPIPS, and points Num of the original 3DGS and the proposed method at different iteration numbers.

[0076] As can be seen from the numerical analysis in Table 1, the model of this invention outperforms the original 3DGS in overall performance. Furthermore, the model of this invention matches the results of 3DGS after 30,000 iterations in 7,000 iterations, using a smaller number of Gaussian primitives for scene representation. Figure 3 and Figure 4 Visual evaluations of the rendered color and depth maps show that the method of the present invention generates a more accurate geometric environment and effectively mitigates the floating and stitching artifacts caused by panoramic image segmentation.

[0077] Example 2

[0078] According to another embodiment of the present invention, a panoramic image 3D reconstruction device based on 3D Gaussian sputtering is provided, comprising:

[0079] The segmentation unit is used to segment the acquired panoramic image into several perspective images;

[0080] The sparse reconstruction processing unit is used to perform sparse reconstruction processing on the perspective image to obtain sparse point cloud and camera intrinsic and extrinsic parameters.

[0081] The 3D reconstruction unit is used to initialize Gaussian units based on the representation of sparse point clouds in the scene using the 3D Gaussian sputtering method, and to refine the Gaussian distribution through iterative rendering from multiple perspectives.

[0082] The device also includes:

[0083] The regularization unit is used to regularize the training process of the 3D Gaussian sputtering method using a pre-trained monocular depth estimation model.

[0084] This invention utilizes panoramic images or videos captured by a panoramic camera to perform realistic 3D reconstruction of panoramic images using a 3DGS 3D reconstruction method based on depth map regularization. This invention applies 3DGS technology to the 3D reconstruction of panoramic images and simultaneously uses depth maps obtained from pre-trained depth estimation models for depth regularization to optimize scene geometry and reduce stitching artifacts and floating objects.

[0085] This invention proposes a panoramic image 3D reconstruction device based on 3D Gaussian Splatting (3DGS): It utilizes panoramic images or videos captured by a panoramic camera to perform realistic 3D reconstruction of real-world scenes. First, a 5760×2880 resolution panoramic video is recorded using Insta360X3. Then, multiple panoramic images are extracted from the panoramic video using FFmpeg at a certain frame rate (FPS). Since 3DGS requires multi-view perspective images as input, each input panoramic image is first segmented into multiple perspective images. Then, SfM sparse reconstruction is performed on the perspective images using COLMAP to obtain sparse point clouds and the intrinsic and extrinsic parameters of the camera. 3DGS initializes Gaussian primitives in the scene representation using the sparse point cloud and refines these Gaussian distributions through iterative rendering from multiple perspectives. Then, the panoramic image is segmented to introduce stitching artifacts into the rendered image. To reduce these artifacts or floating objects while preserving the overall scene geometry, a depth map obtained from a pre-trained depth estimation model is used to supervise the scene geometry.

[0086] The technical solution of the present invention is described in detail below:

[0087] See Figure 2 3D reconstruction workflow of panoramic images based on 3DGS:

[0088] 1. Record a 10-second indoor panoramic video with a resolution of 5760×2880 using an Insta360X3 panoramic camera. Use FFmpeg to extract panoramic images from the panoramic video at a certain frame rate (FPS=3).

[0089] 2. Use Meshroom to divide each panoramic image into 8 perspective images with a resolution of 1200×1200.

[0090] 3. Use COLMAP to perform SfM processing on the perspective image to obtain sparse point cloud and camera intrinsic and extrinsic parameters.

[0091] 4. Use the pre-trained monocular depth estimation model DepthAnythingV2 to perform depth estimation on all perspective images to obtain depth maps D. mon .

[0092] 5. To resolve the differences between the depth map D in 3DGS rendering and the estimated depth map D. mon To address the scale inconsistency issue, we utilize the inverse depth D provided by COLMAP for each frame. sfm To obtain the calibrated depth map D * Rendered depth map D and calibrated depth map D * The results are obtained using the following formulas:

[0093]

[0094] Where d i For the average depth of each Gaussian sphere, t(D) = median(D), SfM represents the index from the motion reconstruction structure, and M represents the total number of SfM points in the image. t(D) is the median, which can effectively reduce the influence of outliers. s(D) uses the mean absolute deviation to represent the scaling factor.

[0095] 6. By incorporating depth loss during training. Regularization is applied. The gradients from the deep supervision are propagated to each Gaussian sphere, thus affecting the parameters and spatial position of each sphere. This reduces stitching artifacts or floating objects while preserving the overall scene geometry.

[0096] The key points and areas to be protected in this invention are:

[0097] 1. Process the image or video data acquired by the panoramic camera using 3DGS technology to achieve realistic 3D scene reconstruction, including multi-view image extraction and segmentation, sparse reconstruction and point cloud initialization, and depth estimation supervision.

[0098] 2. Multi-view image extraction and segmentation: Extract multiple panoramic images from panoramic videos and segment each panoramic image into multiple perspective images to meet 3DGS input requirements.

[0099] 3. Sparse Reconstruction and Point Cloud Initialization: Sparse reconstruction of the perspective image is performed using COLMAP to obtain sparse point clouds and camera parameters, providing a foundation for the initialization of Gaussian elements in 3DGS.

[0100] 4. Depth Estimation and Supervision: The depth map generated by the pre-trained depth estimation model is introduced to regularize the 3DGS training process. At the same time, the inverse depth of each frame's SfM point is used to solve the problem of scale inconsistency between the estimated depth map and the rendered depth map.

[0101] Compared with the prior art, the advantages of the present invention are:

[0102] 1. A 3D reconstruction device for panoramic images based on 3DGS

[0103] 2. By using a pre-trained depth estimation model to regularize the training process of 3DGS, our overall performance surpasses that of the original 3DGS method. Furthermore, our model matched the results of 3DGS after 30,000 iterations in 7,000 iterations, using a smaller number of Gaussian units for scene representation. Figure 3 and Figure 4The visual evaluation of the rendered color and depth maps shown demonstrates that the proposed method generates a more accurate geometric environment and effectively mitigates the floating and stitching artifacts generated by panoramic image segmentation.

[0104] This invention has been proven feasible through experiments, simulations, and usage. It was trained on an RTX 4090 GPU (24GB) and experimentally verified. The training / test dataset partitioning strategy is implemented according to the Mip-NeRF360 recommended method, requiring testing with every 8 images to ensure consistent and important evaluation. This invention uses widely accepted metrics such as PSNR, L-PIPS, and SSIM to evaluate performance. To assess the effectiveness of the proposed method, quantitative and qualitative comparisons were performed. The results of the original and improved 3DGS models were recorded at 7K and 30K iterations, respectively. The visual quality differences between the two configurations are shown below. Figure 3 and Figure 4 As shown, Table 1 provides the numerical evaluation of the metrics.

[0105]

[0106] Table 1 compares the PSNR, SSIM, LPIPS, and points Num of the original 3DGS and the proposed method at different iteration numbers.

[0107] As can be seen from the numerical analysis in Table 1, the model of this invention outperforms the original 3DGS in overall performance. Furthermore, the model of this invention matches the results of 3DGS after 30,000 iterations in 7,000 iterations, using a smaller number of Gaussian primitives for scene representation. Figure 3 and Figure 4 Visual evaluations of the rendered color and depth maps show that the method of the present invention generates a more accurate geometric environment and effectively mitigates the floating and stitching artifacts caused by panoramic image segmentation.

[0108] Example 3

[0109] A storage medium storing program files capable of implementing any of the above-mentioned panoramic image 3D reconstruction methods based on 3D Gaussian sputtering.

[0110] Example 4

[0111] A processor for running a program, wherein the program executes any of the above-mentioned methods for 3D reconstruction of panoramic images based on 3D Gaussian sputtering.

[0112] The sequence numbers of the above embodiments of the present invention are for descriptive purposes only and do not represent the superiority or inferiority of the embodiments.

[0113] In the above embodiments of the present invention, the descriptions of each embodiment have different focuses. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions of other embodiments.

[0114] In the several embodiments provided in this application, it should be understood that the disclosed technical content can be implemented in other ways. The system embodiments described above are merely illustrative; for example, the division of units can be a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection of units or modules may be electrical or other forms.

[0115] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0116] Furthermore, the functional units in the various embodiments of the present invention can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.

[0117] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, read-only memory (ROM), random access memory (RAM), portable hard drives, magnetic disks, or optical disks.

[0118] The above description is only a preferred embodiment of the present invention. It should be noted that for those skilled in the art, several improvements and modifications can be made without departing from the principle of the present invention, and these improvements and modifications should also be considered within the scope of protection of the present invention.

Claims

1. A method for 3D reconstruction of panoramic images based on 3D Gaussian sputtering, characterized in that, Includes the following steps: The acquired panoramic image is segmented into several perspective images; Sparse reconstruction processing is performed on the perspective image to obtain sparse point cloud and camera intrinsic and extrinsic parameters; The Gaussian distribution is refined by initializing Gaussian primitives based on the representation of the sparse point cloud in the scene using a 3D Gaussian sputtering method and iterative rendering from multiple perspectives.

2. The panoramic image 3D reconstruction method based on 3D Gaussian sputtering according to claim 1, characterized in that, The method further includes: regularizing the training process of the three-dimensional Gaussian sputtering method using a pre-trained monocular depth estimation model.

3. The panoramic image 3D reconstruction method based on 3D Gaussian sputtering according to claim 2, characterized in that, Regularizing the training process of the 3D Gaussian sputtering method using a pre-trained monocular depth estimation model includes: Depth map is obtained by performing depth estimation on the sparsely reconstructed perspective image; The depth map is calibrated using a depth loss function.

4. The panoramic image 3D reconstruction method based on 3D Gaussian sputtering according to claim 3, characterized in that, The method further includes: recording an indoor panoramic video of a preset resolution over several time periods using a panoramic camera, and extracting panoramic images from the panoramic video using FFmpeg at a preset frame rate.

5. The panoramic image 3D reconstruction method based on 3D Gaussian sputtering according to claim 4, characterized in that, The acquired panoramic image is segmented into several perspective images, including: Meshroom is used to divide each panoramic image into several perspective images with preset resolutions.

6. The panoramic image 3D reconstruction method based on 3D Gaussian sputtering according to claim 5, characterized in that, Sparse reconstruction of the perspective image yields sparse point clouds and camera intrinsic and extrinsic parameters, including: Sparse reconstruction of perspective images using COLMAP was performed to obtain sparse point clouds and camera intrinsic and extrinsic parameters.

7. The panoramic image 3D reconstruction method based on 3D Gaussian sputtering according to claim 6, characterized in that, Regularizing the training process of the 3D Gaussian sputtering method using a pre-trained monocular depth estimation model specifically includes: Using the inverse depth D provided by COLMAP per frame sfm To obtain the calibrated depth map D * ; Rendered depth map D and calibrated depth map D * The results are obtained using the following formulas: Where d i For the average depth of each Gaussian sphere, t(D) = median(D), SfM represents the index from the motion reconstruction structure, M represents the total number of SfM points in the image, and t(D) is the median.

8. The panoramic image 3D reconstruction method based on 3D Gaussian sputtering according to claim 7, characterized in that, Add depth loss during training Regularization is performed; the gradient of the deep supervision is transmitted to each Gaussian sphere, changing the parameters and spatial position of each Gaussian sphere.

9. A panoramic image 3D reconstruction device based on 3D Gaussian sputtering, characterized in that, include: The segmentation unit is used to segment the acquired panoramic image into several perspective images; The sparse reconstruction processing unit is used to perform sparse reconstruction processing on the perspective image to obtain sparse point cloud and camera intrinsic and extrinsic parameters. The 3D reconstruction unit is used to initialize Gaussian units based on the representation of sparse point clouds in the scene using the 3D Gaussian sputtering method, and to refine the Gaussian distribution through iterative rendering from multiple perspectives.

10. The panoramic image 3D reconstruction device based on 3D Gaussian sputtering according to claim 9, characterized in that, The device further includes: The regularization unit is used to regularize the training process of the 3D Gaussian sputtering method using a pre-trained monocular depth estimation model.