A generative three-dimensional reconstruction method and system based on global constraints of position information

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By employing a generative 3D reconstruction method based on global constraints of location information, and utilizing feedforward neural networks and GNSS information to optimize sparse point clouds, combined with 3D Gaussian splashing technology, the method solves the problems of complex reconstruction process and underutilization of location information in existing technologies, achieving fast and efficient 3D reconstruction. The generated model has absolute geographic coordinates and high visual fidelity.

CN122244335APending Publication Date: 2026-06-19MOGANSHAN DIXIN LABORATORY

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: MOGANSHAN DIXIN LABORATORY
Filing Date: 2026-05-21
Publication Date: 2026-06-19

Application Information

Patent Timeline

21 May 2026

Application

19 Jun 2026

Publication

CN122244335A

IPC: G06T17/00; G06T17/05; G06T15/00; G06N3/0455; G06N3/0499; G06N3/084; G06N3/09; G06N3/0985; G06T15/04; G06T15/50

AI Tagging

Application Domain

Biological models 3D-image rendering

Technical Efficacy Phrases

Improve precision surveying and mapping capabilitiesimprove consistency

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing real-scene 3D reconstruction technologies suffer from complex and inefficient processes, making it difficult to meet the needs of rapid modeling and large-scale applications. Furthermore, they are highly dependent on image acquisition conditions, and location information is not fully involved in the joint optimization during the 3D reconstruction process, resulting in uncertainties in the scale and direction of the reconstruction results.

Method used

A generative 3D reconstruction method based on global constraints of location information is adopted. Sparse point clouds are predicted from UAV imagery through a feedforward neural network, and GNSS location information is introduced as a global constraint into the point cloud reconstruction and optimization framework to realize the absolute geographic coordinate mapping of sparse point clouds. Combined with 3D Gaussian splashing technology, a real-scene 3D model with real geographic coordinates is generated.

Benefits of technology

It achieves rapid sparse reconstruction, significantly improving reconstruction speed and accuracy. The generated model has absolute geographic coordinates, supports efficient rendering and high-quality real-time 3D reconstruction, reduces field and office costs, and is suitable for city-level real-scene 3D applications.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122244335A_ABST

Patent Text Reader

Abstract

This invention belongs to the interdisciplinary field of computer vision and 3D reconstruction, specifically relating to a generative 3D reconstruction method and system based on global constraints of location information. The method includes the following steps: S1, inputting multi-view UAV image sequences into a pre-trained feedforward neural network, which predicts sparse point clouds and the camera pose for each image; S2, extracting GNSS location information from the image sequence and using it as a global constraint to jointly optimize the sparse point cloud and camera pose, obtaining optimized point clouds and optimized camera poses in an absolute geographic coordinate system; S3, based on the optimized point cloud, employing a 3D Gaussian splashing method and utilizing the optimized camera pose for differentiable rendering optimization, generating a realistic 3D model with absolute geographic coordinates. This method, while ensuring 3D reconstruction accuracy, rapidly generates a geographic coordinate system 3D model, providing an efficient and scalable technical solution for city-level realistic 3D construction.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the interdisciplinary field of computer vision and 3D reconstruction, and specifically relates to a generative 3D reconstruction method and system based on global constraints of location information. Background Technology

[0002] With their advantages of maneuverability, low deployment cost, and high imaging resolution, drones are gradually becoming the core carrier for acquiring high-precision 3D information about the Earth's surface. However, how to improve 3D reconstruction efficiency and reduce the cost of collaboration between field and office work while ensuring geometric accuracy and spatial consistency remains a key issue restricting the large-scale application of drone-based 3D reality technology.

[0003] Traditional oblique photogrammetry methods acquire target area information through multi-view, multi-heading UAV imagery, combining image feature matching, SfM (Structure-from-Motion) sparse reconstruction, multi-view image dense matching (MVS), and triangular mesh and texture mapping to construct 3D models. This method is highly mature in terms of geometric accuracy, structural integrity, and texture realism, and has established a stable application system in urban surveying, cadastral surveys, and terrain modeling. However, this type of method has stringent requirements for data acquisition conditions. On the one hand, to ensure the quality of image matching and 3D reconstruction, flight paths must be strictly designed and high image overlap must be maintained, increasing flight time and data acquisition costs. On the other hand, its modeling process is computationally complex and involves massive data volumes; the process from sparse reconstruction to dense point cloud generation is often time-consuming, making it difficult to meet the practical needs of rapid modeling and high-frequency updates. Furthermore, to achieve accurate alignment between the 3D model and the real geographic space, traditional oblique photogrammetry typically requires the deployment of manual control points and the acquisition of high-precision control point coordinates through field measurements, followed by coordinate transformation and georegistration of the model during office processing. This approach not only increases the workload and manpower costs in both field and office work, but also makes it difficult to implement in complex environments or large-scale scenarios, severely restricting the automation level and operational efficiency of UAV 3D reconstruction.

[0004] In recent years, Neural Radiance Field (NeRF) and other neural radiance field methods have achieved the ability to generate high-quality 3D scenes and synthesize new perspectives from multi-view imagery by continuously modeling scenes using implicit neural networks. These methods have significant advantages in detail representation, lighting consistency, and rendering realism, and can generate renderable 3D scenes without explicit mesh modeling. However, NeRF methods typically rely on long iterative training processes, demanding high computational resources and making it difficult to meet the engineering requirements of large-scale scenes or rapid reconstruction. Furthermore, their training and rendering efficiency are highly sensitive to the viewpoint coverage and uniformity of the input imagery; when UAV imagery has sparse viewpoints, severe occlusion, or insufficient overlap, reconstruction accuracy and geometric consistency can easily decrease significantly. In addition, NeRF models typically operate in relative coordinate systems, lacking the ability to directly model absolute geographic coordinates, making seamless integration with GIS (Geographic Information System) and surveying results difficult, thus limiting their application in urban-level real-world 3D rendering.

[0005] 3D Gaussian Splatting (3DGS) models scenes using a 3D Gaussian distribution, significantly improving rendering efficiency while maintaining detail representation, making it suitable for rapid visualization of large-scale scenes. However, existing 3DGS and its improved methods still heavily rely on traditional SfM tools (such as COLMAP) for initial sparse point cloud and camera pose estimation, placing high demands on image viewpoint distribution, overlap, and quality. The low computational efficiency and high stability of the sparse reconstruction stage are key factors limiting overall modeling efficiency. Furthermore, in most cases, GPS location information recorded in UAV imagery is only used as reference information in the post-processing stage and fails to participate in the joint optimization during 3D reconstruction, resulting in uncertainties in scale and orientation in the reconstruction results, making it difficult to directly obtain a 3D model in an absolute geographic coordinate system.

[0006] In summary, existing real-scene 3D reconstruction technologies generally suffer from the following problems: 1) The reconstruction process is complex and inefficient, making it difficult to meet the needs of rapid modeling and large-scale application; 2) It is highly dependent on image acquisition conditions and has high requirements for viewpoint distribution and overlap; 3) Location information was not fully involved in the joint optimization process of 3D reconstruction, and the absolute geographic coordinates and scale information were restored by manually setting up image control points.

[0007] Therefore, there is an urgent need for a new method that can make full use of the drone's own location information to achieve fast, efficient and geographically consistent real-world 3D reconstruction while reducing or eliminating the need for ground control points. Summary of the Invention

[0008] This invention addresses the common problems of low modeling efficiency, strong dependence on high-overlap image acquisition conditions, and insufficient utilization of GPS location information in existing real-scene 3D reconstruction methods. It proposes a generative 3D reconstruction method and system based on global constraints of location information. This method uses an end-to-end feedforward neural network to predict sparse point clouds from UAV imagery, achieving rapid sparse reconstruction. It extracts the location information from the UAV imagery and incorporates this information as a global constraint into the point cloud reconstruction and optimization framework. Through global optimization, it jointly corrects the spatial position, scale, and orientation of the sparse point cloud, achieving consistent mapping in the absolute geographic coordinate system. Using Gaussian splashing technology, the optimized sparse point cloud serves as a geometric guide. Combined with the corresponding optimized camera pose and the original UAV imagery, iterative optimization through differentiable rendering generates a dense, efficient real-scene 3D model with accurate geographic coordinates.

[0009] To address the aforementioned technical problems, this invention provides a generative 3D reconstruction method based on global constraints of location information, comprising the following steps: S1. Using a pre-trained feedforward neural network, predict the sparse point cloud of the scene and the camera pose corresponding to each image from the sequence of drone images taken from multiple perspectives of the scene. S2. Extract GNSS location information from the image sequence and use it as a global constraint to jointly optimize the sparse point cloud and camera pose in step S1, so as to obtain the optimized point cloud and optimized camera pose under the absolute geographic coordinate system. S3. Based on the optimized point cloud in step S2, a three-dimensional Gaussian splashing method is used, and the optimized camera pose is used for differentiable rendering optimization to generate a real-world three-dimensional model with absolute geographic coordinates.

[0010] Preferably, the training process of the feedforward neural network in step S1 includes: S11. Obtain a training dataset containing multi-view image sequences and their corresponding real 3D scene sets; S12. Construct a VGGT feedforward neural network model based on the Transformer architecture; S13. With the goal of minimizing the error between the sparse point cloud and camera pose predicted by the VGGT feedforward neural network model and the real data, end-to-end supervised training is performed on the VGGT feedforward neural network model.

[0011] Preferably, the joint optimization in step S2 includes: S21. The extracted GNSS location information is processed by coordinate system unification, and its accuracy is optimized by network adjustment method; S22. Based on the optimized GNSS location information, abnormal observations with errors exceeding a preset threshold are removed to obtain a set of high-precision GNSS control point coordinates. S23. Using the GNSS control point coordinates as a global fixed constraint, construct and solve a joint optimization model that integrates visual reprojection error and GNSS position constraints. Perform overall similarity transformation correction on the sparse point cloud and camera pose to accurately map them to the absolute geographic coordinate system, generating optimized sparse point cloud and camera pose. The absolute geographic coordinate system refers to a coordinate system based on geodetic benchmarks that can be converted into plane rectangular coordinates through map projection. Its coordinate system has a real physical scale and a clear geographical location meaning.

[0012] Preferably, step S3 specifically includes: S31. Initialization: Based on the optimized sparse point cloud, camera pose and original image information, each sparse point is initialized as a three-dimensional Gaussian distribution, and initial parameters are assigned to it. Prior constraints are applied to the initial parameters of the three-dimensional Gaussian distribution. The prior constraints include at least the limitations on the spatial scale and location of the Gaussian distribution. S32. Differentiable rendering and comparison: Based on the optimized camera pose, perform differentiable rendering on the three-dimensional Gaussian distribution set to generate a synthetic image, and compare the synthetic image with the corresponding original UAV image at the pixel level. S33. Parameter Adaptive Optimization: Based on the comparison results, the parameters of each three-dimensional Gaussian distribution are adaptively optimized using the gradient descent algorithm. The parameters include: geometric parameters: at least spatial location and covariance matrix; appearance parameters: at least color and transparency represented by spherical harmonic functions. S34. Model Generation: Repeat steps S32 to S33 until convergence, generating a real-world 3D Gaussian model with absolute geographic coordinates.

[0013] Preferably, no ground control points need to be set up in steps S1 to S3, and the geometric error of the real-scene 3D Gaussian model is 8-12 cm.

[0014] Another aspect of this invention discloses a generative 3D reconstruction system based on global constraints of location information, including a sparse reconstruction and pose estimation module, a global georegistration module, and a real-scene 3D model generation module, wherein... The sparse reconstruction and pose estimation module is used to predict sparse point clouds and camera poses from UAV image sequences using a feedforward neural network. The global georeferencing module uses GNSS location information from UAV imagery to constrain and optimize sparse point clouds and camera poses, outputting sparse point clouds and camera poses in absolute geographic coordinates. The real-scene 3D model generation module is used to generate real-scene 3D models with true scale and coordinate information based on sparse point clouds in the absolute geographic coordinate system and camera pose, using 3D Gaussian splashing technology.

[0015] Preferably, the global georeferencing module includes a GNSS data processing unit and a joint optimization unit, wherein, The GNSS data processing unit is used to perform reference transformation and accuracy optimization on the raw location information; Joint optimization unit, used to perform bundle adjustment that incorporates global GNSS constraints.

[0016] Preferably, the real-scene 3D model generation module includes a differentiable renderer and a parameter optimization unit, wherein, Differentiable renderer for rendering Gaussian models from optimized camera poses; The parameter optimization unit is used to optimize the parameters of all 3D Gaussians through backpropagation based on the differences between the rendered result and the real image.

[0017] Preferably, the system also includes an input / output interface for receiving UAV image sequence data streams and outputting the generated 3D model with absolute geographic coordinates to a geographic information system or a 3D visualization platform.

[0018] Preferably, the sparse reconstruction and pose estimation module employs a pre-trained VGGT feedforward Transformer network.

[0019] The beneficial effects of this invention are as follows: 1. A breakthrough improvement in technical performance has been achieved. (1) A feedforward VGGT network is adopted to achieve end-to-end direct prediction, transforming sparse reconstruction from iterative calculation to single inference, increasing the speed by 1-2 orders of magnitude and meeting the requirements of near real-time reconstruction; overcoming the problem of time-consuming and lengthy feature matching and iterative optimization using existing SfM tools such as COLMAP. (2) By integrating GNSS location information and visual geometry in a joint optimization step, the problem of scale ambiguity and lack of absolute positioning in generative 3D reconstruction is fundamentally solved. This step not only accurately maps the model to the absolute geographic coordinate system, realizing high-precision mapping capability without image control, but also significantly improves the internal consistency and geometric accuracy of sparse point cloud and camera pose, providing crucial and accurate initial conditions for subsequent high-quality dense reconstruction, thereby ensuring that the final real-scene 3D model has both high visual fidelity and high spatial measurement value. (3) 3D Gaussian splashing technology is adopted to generate a model with both high-quality visual details and real-time rendering efficiency, and the model itself carries absolute geographic coordinates.

[0020] 2. Significantly optimized project implementation efficiency. It eliminates the traditional cumbersome fieldwork processes such as control point layout, measurement and marking. From raw images to measurable real-world 3D models, the entire process is automated, requiring no manual intervention for feature matching, manual registration and other operations. It provides decision support capabilities ranging from minutes to hours for scenarios that require rapid acquisition of 3D information.

[0021] 3. It provides deep empowerment scenarios for downstream industry applications. It generates real-world 3D models with accurate geographic coordinates, which can be seamlessly integrated with existing GIS data and directly used as the spatiotemporal data foundation for urban information models and digital twin cities. It transforms the originally complex and professional surveying-grade 3D reconstruction into a more convenient and economical standardized spatial data generation service, which can be widely used in downstream industry applications. Attached Figure Description

[0022] Figure 1 This is a schematic diagram of the invention's architecture and process. Figure 2 This is a reconstruction effect diagram after adopting this technical solution. Detailed Implementation

[0023] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the invention.

[0024] like Figure 1 As shown, a generative 3D reconstruction method and system based on global constraints of location information mainly includes three core modules: a sparse reconstruction and pose estimation module based on a feedforward neural network, a global georegistration module, and a realistic 3D model generation module. The sparse reconstruction and pose estimation module predicts sparse point clouds and camera poses from UAV image sequences using a feedforward neural network. The global georegistration module uses GNSS location information from UAV imagery to constrain and optimize the sparse point clouds and camera poses, outputting the sparse point clouds and camera poses in an absolute geographic coordinate system. The realistic 3D model generation module generates a realistic 3D model with true scale and coordinate information based on the sparse point clouds and camera poses in the absolute geographic coordinate system using 3D Gaussian splashing technology.

[0025] Its overall reconstruction methods include: 1. Data Preparation and Input Steps: Using a GNSS-equipped drone, perform multi-view surround flight photography of the target scene to acquire a set of highly overlapping image sequences {I1, I2...I... n Each image I iThe metadata automatically records the raw GNSS observation values (latitude, longitude, and elevation) at the time of shooting, as well as the corresponding attitude angle information.

[0026] 2. Fast Sparse Reconstruction and Camera Pose Prediction Steps: The purpose of this step is to replace the traditional time-consuming SfM process and quickly obtain the initial geometry and shooting parameters of the scene. Specifically, it includes: 2.1 Neural Network Model Construction and Training The core model is a VGGT feedforward neural network based on the Transformer architecture. Its training process is as follows: (1) Constructing a training dataset: Collect a large number of data pairs containing multi-view image sequences and their corresponding real 3D scene sets.

[0027] (2) Model design: The network consists of three parts: Patch Embedding layer: Divides each input image into fixed-size image patches and projects them linearly into feature vectors; Stacked Transformer Blocks: Rely on multi-layer self-attention mechanisms to model global feature dependencies within and between sequences; The parallel decoding heads include a 3D coordinate regression head for outputting sparse 3D point clouds of the scene, and a camera parameter regression head for outputting the camera pose (including position and rotation) for each image.

[0028] (3) End-to-end supervised training: With the goal of minimizing the joint loss between the predicted sparse 3D point cloud, the predicted camera pose, and the ground truth label, the gradient descent algorithm is used to optimize the network weights until the model converges. After training, the network has the ability to directly infer 3D geometry and camera parameters from multiple visual images.

[0029] The joint loss is a weighted sum of multiple sub-losses, used to simultaneously monitor the difference between the neural network's predicted output and the true label. For joint losses, with For camera loss, For deep loss, with For point plot loss, with For point tracking loss, then (1), where, This is the proportionality coefficient.

[0030] Network weights are a collective term for all trainable parameters in a neural network, including: Patch Embedding layers, Transformer Blocks, and the decoder head. After initialization, they are continuously updated during training using backpropagation and gradient descent algorithms. Once training is complete, they are fixed and used for prediction of new input images during the inference phase. The mathematical expression for weight updates is: (2), where W new W represents the network weights updated after each iteration. old The network weights before the iteration. For learning rate, This is the gradient of the joint loss with respect to the network weights.

[0031] Model convergence refers to the point during training where updating network weights no longer significantly reduces the joint loss, and the model reaches a stable optimal state. In this application, the convergence criterion includes the decrease in the joint loss value over N consecutive iterations being less than a threshold ε, where N can be 20-50 and ε is 0.5%, which can be adjusted according to the application scenario requirements.

[0032] Example 1: Training a VGGT network on a campus building scene as an example: (1) Training data preparation: Use a drone to fly around a teaching building and collect 200 images, each with a resolution of And obtain its real three-dimensional scene set data.

[0033] (2) The network structure parameters are set as follows: Patch Embedding: Image Patch Embedding dimension 768, Transformer Blocks: 12 layers, 12 attention heads per layer Point cloud regression head: 3-layer fully connected layer, output dimension (K=5000 points) Camera pose regression head: 3-layer fully connected layer, output dimension (Position 3D + Quaternion 4D).

[0034] (3) Joint loss setting: Set the proportional coefficient in formula (1) It is 0.6.

[0035] (4) The training process is shown in the table below: (5) Convergence determination: If the decrease in joint loss for 20 consecutive epochs starting from epoch 81 is less than the threshold of 0.5%, and the convergence condition is met, then training will stop at epoch 100.

[0036] Obviously, convergence criteria for the model can also include conditions such as the loss value no longer decreasing or starting to increase on the validation set, or reaching the preset maximum number of training rounds.

[0037] 2.2 Inference and Prediction The acquired UAV image sequence {I1, I2, ..., I n The input is fed into the pre-trained VGGT network. After a single forward propagation, the network directly outputs the predicted sparse 3D point cloud and the predicted camera pose set. The predicted sparse 3D point cloud contains thousands to tens of thousands of key 3D points, representing the basic geometric skeleton of the scene, but it is located in an arbitrary, scale-indeterminate relative coordinate system; while the predicted camera pose set corresponds one-to-one with the input image sequence.

[0038] GNSS Global Geographic Constraint Registration Steps: This step aims to use the GNSS information carried by the UAV to accurately correct the predicted relative model to the absolute geographic coordinate system. It includes a GNSS data processing unit and a joint optimization unit. The GNSS data processing unit is used to perform benchmark transformation and accuracy optimization on the original location information. The joint optimization unit is used to perform bundle adjustment that incorporates GNSS global constraints.

[0039] 3. The detailed process is as follows: 3.1 GNSS Data Refinement Processing (1) Coordinate system one: from each image I i The raw GNSS observations are extracted from the metadata and transformed from the original geodetic coordinate system (such as WGS-84) into three-dimensional rectangular coordinates in the target absolute geographic coordinate system (such as the Chinese CGCS2000 coordinate system) through a rigorous model, forming the initial coordinate set {G_gnss}. i}

[0040] (2) Network adjustment optimization: The initial coordinate set {G_gnss} i Treating} as an observation network, the least squares network adjustment method is used to eliminate random errors at each point, improve the relative accuracy between points in this coordinate system, and obtain the adjusted coordinate set {G_adjusted}. i}

[0041] (3) Gross error removal: Calculate the residuals of each point after adjustment, and identify and remove observations with residuals greater than a preset threshold (such as 3 times the mean square error) as gross errors, thus obtaining the final high-precision GNSS control point coordinate set {G_ctrl} used for constraints.i}

[0042] 3.2 Joint Optimization and Geographic Correction Steps: This step adjusts the entire 3D model (including sparse point clouds and camera pose) predicted by the neural network and located in any internal coordinate system to the previously defined target absolute geographic coordinate system (such as the China CGCS2000 coordinate system) in one go, and precisely aligns it with the actual shooting location recorded by GNSS.

[0043] In implementation, the similarity transformation adjustment scheme is first defined, and then a joint optimization problem is constructed to find the optimal adjustment scheme. This joint optimization problem needs to consider both visual and positional consistency. A well-known nonlinear optimization algorithm (such as the Levenberg-Marquardt method) is used to automatically iteratively solve the problem, ultimately outputting an optimal set of similarity transformation parameters. This optimal set of similarity transformation parameters is applied to the sparse point cloud and camera pose predicted by the neural network, thereby achieving a precise mapping from a relative coordinate system to an absolute geographic coordinate system. Thus, the model obtains the true physical scale, correct geographic orientation, and accurate absolute position.

[0044] Example 2: Taking a power transmission line corridor in a mountainous area (approximately 5 kilometers long and 500 meters wide, with significant terrain undulations and an elevation difference of approximately 300 meters) as the target area, this example illustrates the joint optimization steps based on GNSS global constraints.

[0045] Scenario setting: Data collection was carried out using a DJI Matrice 350 RTK drone equipped with a Zenmuse P1 camera (45 megapixels). Flight parameters were set as follows: flight altitude 150 meters (relative to takeoff point), 5 zigzag flight paths, 450 images were collected, with a forward overlap rate of 80% and a lateral overlap rate of 70%.

[0046] Input data: Approximately 12,000 3D points in a sparse point cloud predicted by the VGGT network, located in a relative coordinate system with coordinate ranges of X:0-1000, Y:0-500, and Z:0-300 (arbitrary units), scale and orientation undetermined; and 450 camera poses (position plus attitude quaternions), also located in a relative coordinate system. Simultaneously, raw GNSS observations (RTK fixed solution, accuracy ±2cm) are extracted from the metadata of each image, recording the longitude, latitude, and elevation in the WGS84 coordinate system corresponding to each image.

[0047] Coordinate System 1: The latitude and longitude coordinates in the WGS84 coordinate system are rigorously converted to three-dimensional rectangular coordinates in the Chinese CGCS2000 coordinate system using a rigorous model. For example, the converted coordinates of image 0001 are (40512345.67m, 3123456.78m, 156.32m), image 0002 are (40512348.92m, 3123459.01m, 156.45m), and so on until image 0450 is (40512987.65m, 3122987.65m, 162.18m). Subsequently, the least squares network adjustment method is used to treat all 450 GNSS observation points as an observation network and construct an error equation. This equation describes the deviation relationship between the actual measured values of each observation point and the coordinates to be determined after adjustment. By minimizing the weighted sum of squares of the residuals of all observation points—that is, by weighting the residuals according to the accuracy level of each observation point—the overall weighted sum of squares of residuals is minimized, thereby eliminating random errors at each point and improving the relative accuracy between points. Then, gross error removal is performed: the residuals of each point after adjustment are calculated, and observations with residuals greater than three times the standard error are identified as gross errors and removed. After gross error removal, a set of 438 valid high-precision GNSS control point coordinates is obtained.

[0048] The sparse point cloud predicted by the VGGT network and the camera pose (located in a relative coordinate system) are treated as a whole and mapped to an absolute geographic coordinate system (CGCS2000) through a similarity transformation Sim(3). The similarity transformation parameters include the rotation matrix R (3 degrees of freedom), the translation vector t (3 degrees of freedom), and the scaling factor s (1 degree of freedom). For the predicted point cloud point p pred and camera center c pred The transformed coordinates are p geo =sRp pred +t(3)c geo = sRc pred +t(4).

[0049] A joint optimization model integrating visual reprojection error and GNSS position constraints is constructed. The model aims to minimize the sum of two errors: the first is the visual reprojection error, which is the deviation between the pixel position of a 3D point projected onto the image and the actual observed feature point position; the second is the GNSS position constraint error, which is the spatial distance deviation between the transformed camera center and the GNSS measurement. These two errors are minimized simultaneously by adjusting the similarity transformation parameters (scaling factor s, rotation matrix R, and translation vector t). During optimization, the Huber kernel function is used to suppress the influence of outlier matching points, and a balance coefficient is used to adjust the relative weights of GNSS and visual constraints. Finally, the optimal similarity transformation parameters are obtained through iterative solving using the Levenberg-Marquardt algorithm, achieving a precise mapping from the relative coordinate system to the absolute geographic coordinate system. Convergence is determined when the total loss decreases by less than 0.1% after five consecutive iterations.

[0050] The optimized similarity transformation parameters were applied to the sparse point cloud and camera pose predicted by the VGGT network. Before the transformation, the sparse point cloud was located in a relative coordinate system, and the distance between the two poles was predicted to be 85.3 (unit unknown, unable to be directly measured). After the transformation, the sparse point cloud was accurately mapped to the CGCS2000 absolute geographic coordinate system, and the point cloud range became X: 40512000-40513000, Y: 3122000-3123500, Z: 100-400 (unit: meters). The actual distance between the two poles was 85.3 × 0.4732 = 40.36 meters (obtaining the true physical scale). The model orientation was corrected to an angle of 12.28° with geographic north, and the error between the center coordinates of each camera and the GNSS measurement was approximately ±9 cm. Verification showed that the absolute positioning accuracy of this embodiment is ±8 cm in the planar direction and ±12 cm in the elevation direction, with a scale accuracy of 0.9998 (relative to the true value), and no ground control points need to be set up throughout the process. The 3D Gaussian splash reality model generation steps are as follows: This step generates a high-quality, real-time renderable reality 3D model based on the georegistered sparse point cloud and camera pose. The reality 3D model generation module includes a differentiable renderer and a parameter optimization unit. The differentiable renderer is used to render the Gaussian model from the optimized camera pose; the parameter optimization unit is used to optimize the parameters of all 3D Gaussians through backpropagation based on the differences between the rendered result and the real image.

[0051] 4. Specifically includes: 4.1 Gaussian Model Initialization: Each 3D point in the georegistered sparse point cloud is initialized as a 3D Gaussian distribution, defined by the following parameters: Average position: the coordinates of each three-dimensional Gaussian distribution; Covariance matrix: Initialized as an isotropic small ellipsoid, controlling the spatial shape and orientation of the Gaussian distribution; Opacity: Initial value set to 0.5; Spherical harmonic coefficients: from the original image I most relevant to a three-dimensional point i The mid-sampled color is used for initialization to represent view-dependent colors and lighting.

[0052] 4.2 Differentiable Rendering and Adaptive Optimization: This is an iterative and optimization process designed to fine-tune the parameters of each Gaussian distribution so that the rendered result from any known viewpoint is consistent with the real image. Specific steps include: (1) Differentiable rendering: For each known real camera pose, a differentiable rasterizer with Gaussian splashing is used to render the three-dimensional Gaussian distribution set into a synthetic image I. i_synth ; (2) Loss calculation: Calculate the loss of the synthesized image I i_synth Corresponding original UAV image I i Color loss and structural loss between them; (3) Gradient backpropagation and parameter update: The loss function in the loss calculation is differentiable with respect to the Gaussian parameters. The gradient is calculated through the backpropagation algorithm and the Gaussian parameters are updated adaptively. During the optimization process, the system will adaptively split (in places where details are insufficient), prune (Gaussian distributions that contribute little to rendering) or merge Gaussians according to the gradient information to optimize the expression efficiency and reconstruction quality.

[0053] (4) Iteration loop: Repeat steps (1)-(3) to iterate through all camera poses for multiple rounds until the rendering quality converges.

[0054] Example 3: Color loss in the loss calculation step refers to the color difference between the synthesized image and the original image at each pixel, and is the most fundamental supervision signal in 3D reconstruction. Specifically, (5), where L color For color loss, L1 is the sum of absolute errors of pixel values, which is robust to outliers, and L2 is the sum of squared errors of pixel values, which is sensitive to overall error. α1 and α2 are the balance coefficients between the two.

[0055] Structural loss measures the similarity between the synthesized image and the original image in high-level features such as local structure, texture, and edges, compensating for the shortcomings of pixel-level loss in perceptual quality assessment. Its calculation method in 3D Gaussian splashing is as follows: L D-SSIM =1- (6), where M is the number of image patches extracted by the sliding window, and SSIM is the structural similarity index. For the j-th synthesized image patch, and Let j be the j-th real image patch. The structural similarity index is calculated as follows: (7), where x and y are two image patches to be compared, and μ x and μ y It is the average brightness of the image patch. and It is the contrast variance of the image patch. C1 and C2 are small constants to prevent the denominator from being zero, representing the structural covariance.

[0056] Total rendering loss (8); In the formula, t1 and t2 are both proportionality coefficients.

[0057] Taking the 3D Gaussian splash optimization of a campus building scene as an example: (1) Scenario and data preparation: A university library building, approximately [size missing] ; The input data includes: approximately 8000 3D points in the optimized coefficient point cloud, camera intrinsic and extrinsic parameters corresponding to 150 UAV images with optimized camera pose, and 150 original UAV images with varying resolutions. .

[0058] (2) Initialization: Each point in the sparse point cloud is initialized as a three-dimensional Gaussian distribution, and its parameter values and descriptions are shown in the table below: After initialization, a total of 8000 three-dimensional Gaussian distributions were obtained, covering the overall outline of the library.

[0059] (3) Differentiable rendering and loss calculation: Differentiable rendering settings: resolution (Consistent with the original image), rasterizer 3DGS differentiable rasterizer, rendering 1-4 images per iteration.

[0060] Loss function configuration: L D-SSIM Set the window size to The step size is 1. .

[0061] Calculate all image losses for each iteration.

[0062] (4) Gradient backpropagation and parameter update: Loss function L render The gradient is (9), of which This represents a Gaussian parameter. It is the total rendering loss. It is the loss function with respect to parameters The partial derivatives of .

[0063] The parameter update formula is as follows (10), of which For the updated parameters, For parameters before the update, is a coefficient.

[0064] During the optimization process, the system will adaptively perform operations such as splitting, pruning, or merging Gaussians. For example, when the position gradient of a Gaussian is greater than 0.0005 and the covariance scale is greater than 0.05m, the original Gaussian is split into two, and the position of the new Gaussian is offset from the original position by ±0.01m, which is the splitting operation; when the opacity is less than 0.01, the Gaussian is directly deleted, which is the pruning operation; when the center distance between two Gaussians is less than 0.02m and the covariance similarity is greater than 0.9, they are merged into one Gaussian, which is the merging operation.

[0065] (5) Convergence criterion: When L is converged for 100 consecutive iterations render When the decrease is less than 0.5%, or when the maximum number of iterations (30,000) is reached, the model is considered converged and optimization is stopped. 4.3 Model Output and Application Steps: After optimization, the following is obtained: Figure 2 The final 3D model of the building (3D Gaussian splash model) shown is presented from various perspectives. This model has realistic geographic coordinates and supports efficient, high-quality real-time rendering, enabling direct 3D measurement. The geometric error range of the obtained real-scene 3D Gaussian model is 8-12 cm, depending on the model and precision of the drone.

[0066] The system also includes input / output interfaces for receiving UAV image sequence data streams and outputting the generated 3D model with absolute geographic coordinates to a geographic information system or 3D visualization platform.

[0067] In this invention, GNSS is an abbreviation for Global Navigation Satellite System.

[0068] In this invention, the relevant terms are defined as follows: 1. Real-world 3D model, in the preferred embodiment, is a renderable 3D model with absolute geographic coordinates generated based on the 3D Gaussian splashing method; 2. An absolute geographic coordinate system refers to a coordinate system based on the geoid that has a real physical scale and a clear geographic meaning. It is the basic framework for surveying, geographic information systems, remote sensing, navigation and other fields. In this embodiment, it refers to the Chinese CGCS2000 coordinate system. 3. The opposite of the absolute geographic coordinate system is the relative coordinate system, which refers to the coordinate system of the 3D point cloud and camera pose directly predicted from UAV image sequences by a feedforward neural network (VGGT). It has no real geographic reference; the origin and direction are arbitrary, usually referenced to the first image or the center of the scene. Furthermore, the coordinate values only have relative proportional relationships, lacking a real physical scale, and are only suitable for visual geometric calculations, unable to be directly interfaced with GIS or the real world. Only after global constraint correction is it mapped to the absolute geographic coordinate system. 4. 3D Gaussian Splatting is an explicit representation method for efficient 3D scene reconstruction and real-time rendering. It uses a large number of learnable 3D Gaussian distributions as the basic representation units of the scene, and achieves high-quality, high-frame-rate new perspective compositing and 3D reconstruction through differentiable rendering and gradient optimization.

[0069] The above description is only a preferred embodiment of the present invention. It should be noted that for those skilled in the art, several improvements and modifications can be made without departing from the technical principles of the present invention, and these improvements and modifications should also be considered within the scope of protection of the present invention.

Claims

1. A generative 3D reconstruction method based on global constraints of location information, characterized in that: Includes the following steps: S1. Input the multi-view UAV image sequence into a pre-trained feedforward neural network, which predicts the sparse point cloud of the scene and the camera pose corresponding to each image. S2. Extract GNSS location information from the image sequence and use it as a global constraint to jointly optimize the sparse point cloud and camera pose described in step S1. This joint optimization first optimizes the accuracy of the GNSS location information, then removes abnormal observations to obtain the coordinates of the GNSS control points. Using the GNSS control point coordinates as a global fixed constraint, a similarity transformation correction is performed on the sparse point cloud and camera pose to obtain the optimized point cloud and optimized camera pose located in the absolute geographic coordinate system. S3. Based on the optimized point cloud in step S2, a priori constraints are applied to the initial parameters of the three-dimensional Gaussian distribution using the three-dimensional Gaussian splashing method; the optimized camera pose is used for differentiable rendering optimization, and the resulting synthetic image is compared with the corresponding original UAV image at the pixel level. Based on the comparison results, the parameters are adaptively optimized until the model converges, generating a real-world three-dimensional model with absolute geographic coordinates.

2. The generative 3D reconstruction method based on global constraints of location information according to claim 1, characterized in that: The training process for the feedforward neural network in step S1 includes: S11. Obtain a training dataset containing multi-view image sequences and their corresponding real 3D scene sets; S12. Construct a VGGT feedforward neural network model based on the Transformer architecture; S13. With the goal of minimizing the error between the sparse point cloud and camera pose predicted by the VGGT feedforward neural network model and the real data, end-to-end supervised training is performed on the VGGT feedforward neural network model.

3. The generative 3D reconstruction method based on global constraints of location information according to claim 1, characterized in that: The joint optimization in step S2 includes: S21. The extracted GNSS location information is processed by coordinate system unification, and its accuracy is optimized by network adjustment method; S22. Based on the optimized GNSS location information, abnormal observations with errors exceeding a preset threshold are removed to obtain a set of high-precision GNSS control point coordinates. S23. Using the coordinates of the GNSS control points as global fixed constraints, construct and solve a joint optimization model that integrates visual reprojection error and GNSS position constraints, perform overall similarity transformation correction on the sparse point cloud and camera pose, so that they are accurately mapped to the absolute geographic coordinate system, and generate the optimized sparse point cloud and camera pose. The absolute geographic coordinate system refers to a coordinate system based on geodetic benchmarks that can be converted into plane rectangular coordinates through map projection. This coordinate system has a real physical scale and a clear geographical location meaning.

4. The generative 3D reconstruction method based on global constraints of location information according to claim 1, characterized in that... Step S3 specifically includes: S31. Initialization: Based on the optimized sparse point cloud, camera pose and original image information, each sparse point is initialized as a three-dimensional Gaussian distribution, and initial parameters are assigned to it. Prior constraints are applied to the initial parameters of the three-dimensional Gaussian distribution. The prior constraints include at least the limitations on the spatial scale and location of the Gaussian distribution. S32. Differentiable rendering and comparison: Based on the optimized camera pose, perform differentiable rendering on the three-dimensional Gaussian distribution set to generate a synthetic image, and compare the synthetic image with the corresponding original UAV image at the pixel level. S33. Adaptive Parameter Optimization: Based on the comparison results, the parameters of each three-dimensional Gaussian distribution are adaptively optimized using a gradient descent algorithm. The parameters include: Geometric parameters: including at least the spatial location and covariance matrix; Appearance parameters: including at least color and transparency expressed by spherical harmonic functions; S34. Model Generation: Repeat steps S32 to S33 until convergence, generating a real-world 3D Gaussian model with absolute geographic coordinates.

5. The generative 3D reconstruction method based on global constraints of location information according to claim 1, characterized in that: No ground control points need to be set up in steps S1 to S3, and the geometric error range of the real-scene 3D Gaussian model is 8-12 cm.

6. A generative 3D reconstruction system based on global constraints of location information, utilizing the generative 3D reconstruction method based on global constraints of location information as described in any one of claims 1-5, characterized in that: It includes a sparse reconstruction and pose estimation module, a global georegistration module, and a realistic 3D model generation module. The sparse reconstruction and pose estimation module is used to predict sparse point clouds and camera poses from UAV image sequences using a feedforward neural network. The global georeferencing module uses GNSS location information from UAV imagery to constrain and optimize sparse point clouds and camera poses, outputting sparse point clouds and camera poses in absolute geographic coordinates. The real-scene 3D model generation module is used to generate real-scene 3D models with true scale and coordinate information based on sparse point clouds in the absolute geographic coordinate system and camera pose, using 3D Gaussian splashing technology.

7. A generative 3D reconstruction system based on global constraints of location information according to claim 6, characterized in that: The global georeferencing module includes a GNSS data processing unit and a joint optimization unit, wherein... The GNSS data processing unit is used to perform reference transformation and accuracy optimization on the raw location information; Joint optimization unit, used to perform bundle adjustment that incorporates global GNSS constraints.

8. A generative 3D reconstruction system based on global constraints of location information according to claim 6, characterized in that: The real-scene 3D model generation module includes a differentiable renderer and a parameter optimization unit, wherein... Differentiable renderer for rendering Gaussian models from optimized camera poses; The parameter optimization unit is used to optimize the parameters of all 3D Gaussians through backpropagation based on the differences between the rendered result and the real image.

9. A generative 3D reconstruction system based on global constraints of location information according to claim 6, characterized in that: It also includes input / output interfaces for receiving UAV image sequence data streams and outputting the generated 3D model with absolute geographic coordinates to a geographic information system or 3D visualization platform.

10. A generative 3D reconstruction system based on global constraints of location information according to claim 6, characterized in that: The sparse reconstruction and pose estimation module uses a pre-trained VGGT feedforward Transformer network.