A surface reconstruction and novel view synthesis method for remote sensing scenes
By using a surface reconstruction and new view synthesis method for remote sensing scenes, and by optimizing SDF and color values using MLP networks and multi-view geometric constraints, the problem of color and weight deviation in remote sensing scene reconstruction is solved, and high-accuracy image synthesis is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- HARBIN ENG UNIV
- Filing Date
- 2023-03-06
- Publication Date
- 2026-06-12
AI Technical Summary
Traditional remote sensing scene reconstruction techniques suffer from color and weight deviations in view synthesis, leading to inaccurate synthesized images.
A surface reconstruction and new view synthesis method for remote sensing scenes is adopted. Point clouds are reconstructed by motion recovery structure algorithm and multi-view stereo matching algorithm. MLP network is used to estimate SDF value and color value. Combined with multi-view geometric constraints and photometric consistency constraints, the MLP network is optimized to reduce color and weight bias.
It achieves accurate reconstruction of remote sensing scenes, improves the accuracy of synthetic images, and adapts to the characteristics of sparse remote sensing scene views and complex ground features, providing realistic scene reproduction capabilities.
Smart Images

Figure CN116310228B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of remote sensing image processing technology, specifically a method for surface reconstruction and new view synthesis of remote sensing scenes. Background Technology
[0002] 3D spatial modeling of natural scenes, and scene view synthesis based on spatial modeling priors, have been important directions for information technology efforts related to human interactive experience. For decades, researchers have been trying to transform real-world natural scenes into effective digital assets, but this is no easy task, especially for remote sensing images.
[0003] The core of traditional 3D reconstruction methods is stereo matching technology, which reconstructs depth information by matching feature points across views. Its reconstruction pipeline mainly consists of several stages: feature detection and matching, sparse point cloud reconstruction, dense point cloud reconstruction, and Poisson surface reconstruction. Each stage can introduce errors, leading to error accumulation. Furthermore, simply rendering the reconstructed mesh model cannot provide a realistic scene reconstruction.
[0004] Compared to traditional methods based on explicit representations (such as voxels, point clouds, and meshes), recent research in neural rendering has focused on using implicit functions to represent scenes and reconstructing 3D representations of scenes from images by optimizing the difference between pixel colors and camera ray colors (i.e., differential rendering). This approach has achieved excellent performance in view synthesis tasks. We categorize these studies into two types based on their rendering methods: surface rendering and volume rendering. Surface rendering defines the color of the ray at the intersection of the camera ray and the object surface. In other words, surface rendering directly optimizes the surface color, thus eliminating the bias between color and weight estimation. However, during backpropagation, the gradient exists only at a single point on the ray, resulting in an overly localized spatial receptive field that makes it difficult to accurately reconstruct the surface representation of the scene. Volume rendering, on the other hand, uses the color integral along the camera ray as the ray color. Compared to surface rendering, volume rendering expands the spatial receptive field through color integration. However, due to color and weight biases, the color loss it relies on cannot provide accurate geometric constraints.
[0005] Color bias refers to the difference between the result of volume rendering color integration and the surface color. For differentiable rendering techniques that rely on color to optimize geometry, this color bias inevitably affects the reconstruction of accurate surfaces. Current volume rendering methods assume that the color weight of a ray reaches its maximum value at the first intersection of the ray and the surface. However, in actual optimization, the maximum weight does not necessarily occur at the intersection of the ray and the surface; this situation is called weight bias. Weight bias not only hinders volume rendering from perceiving accurate surfaces but also induces differences between surface color and color integration.
[0006] In summary, traditional scene reconstruction techniques perform poorly in view compositing; surface-based scene reconstruction techniques have a small receptive field of space and are prone to getting stuck in local minima; and volume-based scene reconstruction techniques cannot provide accurate geometric constraints due to color and weight deviations, resulting in inaccurate composite images. Summary of the Invention
[0007] The purpose of this application is to address the problem of inaccurate synthesized images caused by color and weight deviations in existing technologies, and to propose a surface reconstruction and new view synthesis method for remote sensing scenes.
[0008] The technical solution adopted in this application to solve the above-mentioned technical problems is as follows:
[0009] A method for surface reconstruction and new view synthesis in remote sensing scenes includes the following steps:
[0010] Step 1: The structure-on-motion algorithm is used to recover the camera intrinsic and extrinsic parameters corresponding to each image from the remote sensing image. Based on the camera intrinsic and extrinsic parameters, the point cloud p is reconstructed using a multi-view stereo matching algorithm. mvs ;
[0011] Step 2: Randomly select a remote sensing image as reference view I r Simultaneously draw L sheets and I sheets r Adjacent remote sensing images as source view I s ={I is |i=1,...,L}, and obtain I r with I s The corresponding grayscale image I′ r and I′ s ={I′ is |i=1,...,L}, and then from reference view I r Randomly select a pixel P ixel And construct a ray p(t, v) for pixel P based on the camera's intrinsic and extrinsic parameters. ixel The process involves penetration, and finally, N points are sampled on the ray p(t, v) to obtain pixel P. ixel The corresponding set of sampling points P = {p(t)} i ,v)|i=0,1,2...,N-1}, where the ray p(t,v) is a three-dimensional vector function, i.e. p(t,v)=o+tv, where o represents the spatial coordinates of the camera, v represents the unit direction vector of the ray, i.e. the ray direction, and t represents the depth of the ray;
[0012] Step 3: Feed the sampling point set P into the MLP network F θThe corresponding SDF value is obtained, and then linear interpolation is performed on the sampling point set P based on the SDF value to obtain the MLP network F. θ When the output SDF value is zero, the corresponding spatial point P * , and P * The corresponding ray depth t * That is, P * =o+t * v;
[0013] Step 4: Combine the sampling point set P with P * Taking the union of the two sets, we get the point set P′;
[0014] Step 5: Estimate the weights of the point set P′ to obtain the color weights {w(t)} corresponding to the point set P′. i )|i=0,1,2,...,N};
[0015] Step 6: Input the point set P′ and the unit direction vector v of the ray into the MLP network. Obtain the color value {c(t)} corresponding to the point set P′ along the ray direction v. i The color values are then weighted and summed according to their color weights, and the sum is used as the pixel P. ixel The corresponding reconstructed color c volume_renderin,q ;
[0016] Step 7: Place spatial point P * The unit direction vector v of the ray is fed into the MLP network. Get P * The color value c(t) corresponding to the ray direction v * v), and take it as pixel P ixel The corresponding reconstructed color c surface_rendering ;
[0017] Step 8: Calculate spatial point P from camera intrinsic and extrinsic parameters * In the grayscale images I′ of the reference view and the source view r with I' s ={I′ is Image block I′ corresponding to |i=1,...,L} r (q) and I′ s (q)={I′ is (q is |i = 1, ..., L};
[0018] Step 9: Reconstruct color c volume_rendering Construct the loss function Loss volume_rendering , Reconstructed color c surface_rendering Construct the loss function Loss surface_rendering , from point cloud Pmvs Construct the loss function Loss sdf From image block I′ r (q) and I′ s (q) Construct the loss function Loss photo Then, based on the above loss function, backpropagation is used to optimize the MLP network F. θ and
[0019] Step 10: Repeat steps 2 through 9 K times for the MLP network F. θ and Optimization is performed, where K is a pre-defined hyperparameter;
[0020] Step 11: Utilize the optimized MLP network F θ and Complete the surface reconstruction and new view synthesis task.
[0021] Furthermore, the reconstructed color c in step six... volume_rendering Represented as:
[0022]
[0023] Wherein, c(t) i ,v),w(t) i ) represent spatial points o and t respectively. i The color value and color weight of v along the ray direction.
[0024] Furthermore, the reconstructed color c in step seven... surface_rendering Represented as:
[0025] c surface_rendering =c(t) * v)
[0026] Wherein, c(t) * v) represents spatial point P * The color value along the ray direction v.
[0027] Furthermore, the linear interpolation is expressed as:
[0028]
[0029] Where f represents the MLP network F θ The SDF function represented by p(t) i v) and p(t) i+1 Let f(p(t, v) represent two adjacent rays p(t, v) that satisfy the relation f(p(t, v)). i ,v))·f(p(t i+1 Sampling points where v)) < 0, t iand t i+1 p(t) i v) and p(t) i+1 , v) is the depth corresponding to the ray direction v.
[0030] Furthermore, the loss function Loss volume_rendering Represented as:
[0031] loss volume_rendering =|c volume_rendering -c pixel |
[0032] Among them, c pixel Represents pixel p ixel The corresponding real color.
[0033] Furthermore, the loss function Loss surface_rendering Represented as:
[0034] loss suface_rendering =|c surface_rendering -c pixel |
[0035] Furthermore, the loss function Loss sdf Represented as:
[0036]
[0037] in, p mvs The number of spatial points in p i Indicates that it comes from p mvs Any point in space.
[0038] Furthermore, the loss function loss weight Represented as:
[0039]
[0040] Among them, t i Represents the sampling point p(t) i , v) the depth corresponding to the ray direction v, t * Represents spatial point P * The depth corresponding to the ray direction v, w(t) i ) represents the sampling point p(t) i The color weights corresponding to v).
[0041] Furthermore, the loss function Loss photo Represented as:
[0042]
[0043] Among them, NCC(I′) r (q i ), I′ s (q is )) represents image block I′ r (q i ) and I′ s (q is Normalized cross-correlation between )
[0044] Furthermore, the NCC(I′) r (q i ), I′ s (q is )) is represented as:
[0045]
[0046] Where Cov represents covariance and Var represents variance.
[0047] The beneficial effects of this application are:
[0048] This application not only possesses the ability to realistically reproduce scenes using neural rendering, but also balances the unbiasedness of surface rendering with the receptive field of volume rendering to accurately reconstruct the surface representation of the scene, thereby improving the accuracy of the synthesized image. Furthermore, this application can be optimized for remote sensing scenes to adapt to the characteristics of sparse views and complex terrain features in remote sensing scenarios. Attached Figure Description
[0049] Figure 1 The image shows the result after eliminating color and weight biases;
[0050] Figure 2 The flowchart for step three – optimizing the implicit representation of the 3D scene;
[0051] Figure 3 A flowchart illustrating the process of deriving a Mesh model from implicit representations;
[0052] Figure 4 This is a flowchart illustrating the rendering of the new view in two different ways after optimization in this application;
[0053] Figure 5 This is a rendering of the surface reconstruction capability of this application;
[0054] Figure 6 This is a rendering of the surface reconstruction and view synthesis capabilities of this application. Detailed Implementation
[0055] It should be noted that, where there is no conflict, the various embodiments disclosed in this application can be combined with each other.
[0056] Specific implementation method one: Refer to Figure 1 This embodiment describes a method for surface reconstruction and new view synthesis in remote sensing scenes, comprising the following steps:
[0057] Step 1: The structure-on-motion algorithm is used to recover the camera intrinsic and extrinsic parameters corresponding to each image from the remote sensing image. Based on the camera intrinsic and extrinsic parameters, the point cloud p is reconstructed using a multi-view stereo matching algorithm. mnvs ;
[0058] Step 2: Randomly select a remote sensing image as reference view I r Simultaneously draw L sheets and I sheets r Adjacent remote sensing images as source view I s ={I is |i=1,...,L}, and obtain I r with I s The corresponding grayscale image I′ r and I′ s ={I′ is |i=1,...,L}, and then from reference view I r Randomly select a pixel P ixel And construct a ray p(t, v) for pixel P based on the camera's intrinsic and extrinsic parameters. ixel The process involves penetration, and finally, N points are sampled on the ray p(t, v) to obtain pixel P. ixel The corresponding set of sampling points P = {p(t)} i ,v)|i=0,1,2...,N-1}, where the ray p(t,v) is a three-dimensional vector function, i.e. p(t,v)=o+tv, where o represents the spatial coordinates of the camera, v represents the unit direction vector of the ray, i.e. the ray direction, t represents the depth of the ray, the value of L is set according to the density of the view, generally between 5 and 10, and the value of N is set according to the size of the GPU memory, generally 32 or 64 or 128;
[0059] Step 3: Feed the sampling point set P into the MLP network F θ The corresponding SDF value is obtained, and then linear interpolation is performed on the sampling point set P based on the SDF value to obtain the MLP network F. θ When the output SDF value is zero, the corresponding spatial point P * , and P * The corresponding ray depth t * That is, P * =o+t * v;
[0060] Step 4: Combine the sampling point set P with P * Taking the union of the two sets, we get the point set P′;
[0061] Step 5: Estimate the weights of the point set P′ to obtain the color weights {w(t)} corresponding to the point set P′. i )|i=0,1,2,...,N};
[0062] Step 6: Input the point set P′ and the unit direction vector v of the ray into the MLP network. Obtain the color value {c(t)} corresponding to the point set P′ along the ray direction v. i The color values are then weighted and summed according to their color weights, and the sum is used as the pixel P. ixel The corresponding reconstructed color c volume_rendering ;
[0063] Step 7: Place spatial point P * The unit direction vector v of the ray is fed into the MLP network. Get P * The color value c(t) corresponding to the ray direction v * v), and take it as pixel P ixel The corresponding reconstructed color c surface_rendering ;
[0064] Step 8: Calculate spatial point P from camera intrinsic and extrinsic parameters * In the grayscale images I′ of the reference view and the source view r with I' s ={I′ is Image block I′ corresponding to |i=1,...,L} r (q) and I′ s (q)={I′ is (q is |i = 1, ..., L};
[0065] Step 9: Reconstruct color c volume_rendering Construct the loss function Loss volume_rendering , Reconstructed color c surface_rendering Construct the loss function Loss surface_rendering , from point cloud P mvs Construct the loss function Loss sdf From image block I′ r (q) and I′ s (q) Construct the loss function Loss photo Then, based on the above loss function, backpropagation is used to optimize the MLP network F. θ and
[0066] Step 10: Repeat steps 2 through 8K times for the MLP network F. θ and Optimization is performed, where K is a pre-defined hyperparameter;
[0067] Step 11: Utilize the optimized MLP network F θ and Complete the surface reconstruction and new view synthesis task.
[0068] Specific Implementation Method Two: This implementation method is a further explanation of Specific Implementation Method One. The difference between this implementation method and Specific Implementation Method One is the reconstruction of color c in step six. volume_rendering Represented as:
[0069]
[0070] Wherein, c(t) i ,v),w(t) i ) represent spatial points o and t respectively. i The color value and color weight of v along the ray direction.
[0071] Specific Implementation Method Three: This implementation method is a further explanation of Specific Implementation Method Two. The difference between this implementation method and Specific Implementation Method Two is the reconstruction of color c in step seven. surface_rendering Represented as:
[0072] c surface_rendering =c(t) * v)
[0073] Wherein, c(t) * v) represents spatial point P * The color value along the ray direction v.
[0074] Specific Implementation Method Four: This implementation method is a further explanation of Specific Implementation Method Three. The difference between this implementation method and Specific Implementation Method Three is that the linear interpolation is expressed as:
[0075]
[0076] Where f represents the MLP network F θ The SDF function represented by p(t) i v) and p(t) i+1 Let f(p(t, v) represent two adjacent rays p(t, v) that satisfy the relation f(p(t, v)). i ,v))·f(p(t i+1 Sampling points where v)) < 0, t i and t i+1 p(t) i v) and p(t) i+1 , v) is the depth corresponding to the ray direction v.
[0077] Specific Implementation Method Five: This implementation method is a further explanation of Specific Implementation Method Four. The difference between this implementation method and Specific Implementation Method Four is that the loss function Loss... volume_rendering Represented as:
[0078] loss volume_rendering =|c volume_rendering -c pixel |
[0079] Among them, c pixel Represents pixel p ixel The corresponding real color.
[0080] Specific Implementation Method Six: This implementation method is a further explanation of Specific Implementation Method Five. The difference between this implementation method and Specific Implementation Method Five is that the loss function Loss... surface_rendering Represented as:
[0081] loss sufacerendering =|c surface_rendering -c pixel |
[0082] Specific Implementation Method Seven: This implementation method is a further explanation of Specific Implementation Method Six. The difference between this implementation method and Specific Implementation Method Six is that the loss function Loss... sdf Represented as:
[0083]
[0084] in, p mvs The number of spatial points in p i Indicates that it comes from p mvs Any point in space.
[0085] Specific Implementation Method Eight: This implementation method is a further explanation of Specific Implementation Method Seven. The difference between this implementation method and Specific Implementation Method Seven is that the loss function is described in the following way. weight Represented as:
[0086]
[0087] Among them, t i Represents the sampling point p(t) i , v) the depth corresponding to the ray direction v, t * Represents spatial point P * The depth corresponding to the ray direction v, w(t) i ) represents the sampling point p(t) i The color weights corresponding to v).
[0088] Specific Implementation Method Nine: This implementation method is a further explanation of Specific Implementation Method Eight. The difference between this implementation method and Specific Implementation Method Eight is that the loss function Loss... photo Represented as:
[0089]
[0090] Among them, NCC(I′) r (q i ), I′ s (q is )) represents image block I′ r (q i ) and I′ s (q is Normalized cross-correlation between )
[0091] Specific Implementation Method Ten: This implementation method is a further explanation of Specific Implementation Method Nine. The difference between this implementation method and Specific Implementation Method Nine is that the NCC(I′) is... r (q i ), I′ s (q is )) is represented as:
[0092]
[0093] Where Cov represents covariance and Var represents variance.
[0094] This application uses the COLMAP algorithm to recover the camera pose and reconstruct the point cloud.
[0095] Optimize the implicit representation of the 3D scene; the entire optimization process is as follows: Figure 2 As shown, the specific implementation process is described in three parts:
[0096] 1. Scene Representation Methods
[0097] This application uses two functions f and c to represent the reconstruction target, where f: R 3 →R represents the spatial location x∈R 3 The mapping is the SDF (Signed Distance Function) value from the point to the reconstructed object, c:R 3 ×S 2 →R 3 Then a spatial location x∈R 3 With a unit direction vector v c ∈R 3 This is mapped to an RGB value. The surface S of the target is then represented by the zero-level set of the SDF, i.e.:
[0098] S={p∈R3 |f(p)=0}#(1)
[0099] This application uses two MLP (Multi-Layer Perceptron) networks F θ and To estimate the functions f and c respectively, i.e., f =
[0100] Given a pixel P ixel Suppose that the ray emitted by the camera and passing through this pixel is {p(t, v) = o + tv | t ≥ 0}. Here, o is the center of the camera, and v is the unit direction vector of the ray, representing the direction of the ray. Therefore, the color integral result c(o, v) of the volume rendering along the ray direction v can be expressed as:
[0101]
[0102] Here, w(t) represents the color weight at spatial point p(t, v), while the integration variable t represents the depth.
[0103]
[0104] Here, ρ(t) is the density at the spatial point p(t, v). Since w(t) needs to be derived from the density p(t), this application models the density p(t) as a function of the SDF function f:
[0105]
[0106]
[0107] Where s is a trainable parameter, which increases as the network converges. -1 It will tend to 0.
[0108] 2. A unified method for volume rendering and surface rendering
[0109] For ease of analysis, assume that the ray intersects only a single plane and has an intersection point p(t). * ,v), due to the intersection point p(t) * v) is a surface point, so the intersection point p(t) * The color at point (t, v) is the surface color. The SDF value near the plane is linear along the ray p(t, v), i.e., f(p(t, v)) = -|cos(θ)|(tt) * ) and f(p(t) * , v))=0, and θ is the local constant angle between the ray direction and the plane normal. By means of equations (3) to (5), the relationship between the parameter s and w(t) can be obtained:
[0110] w(t) = |cosθ|φ s (f(p(t,v)))#(6)
[0111]
[0112] Equations (6) and (7) explain s -1 The smaller the value, the more the weight w(t) will tend to an impulse function δ(tt). * The density p(t) will increasingly approach a step function. When w(t) becomes a complete impulse function, the color integral result c(o, v) of the volume rendering along the ray direction v will be compared with that of the MLP network. At space point p(t) * The color along the ray direction v at point v, i.e., the surface color c(t) * ,v) are consistent, that is:
[0113]
[0114] Therefore, the weight bias and color bias will also tend to zero. Thus, the core of this application lies in enabling the network to adaptively optimize the parameters s, making s... -1 Gradually approaching 0, this allows us to retain the spatial perception capabilities of volume rendering while gradually achieving unbiased surface rendering, thus unifying volume rendering and surface rendering.
[0115] Specifically, this application uses three parts—sampling point interpolation, color constraints, and weight constraints—to enable the network to complete this optimization.
[0116] Sampling point interpolation: Assume that a ray p(t, v) has been constructed, and a set P = {p(t, v)} consisting of N sampling points has been obtained. i First, find the spatial point P where the output of the SDF function f is zero by interpolation. * The corresponding ray depth t * Specifically, assuming that ray p(t, v) intersects the object surface, then there will be two adjacent sampling points p(t, v) on ray p(t, v). i v) and p(t) i+1 v) satisfies the following condition:
[0117] f(p(t i ,v))·f(p(t i+1 ,v))<0#(9)
[0118] Among them, t i , t i+1 Represents the sampling point p(t) i v) and p(t) i+1Let v be the depth corresponding to the ray direction v. Then, the depth t corresponding to all intersections between the rays and the surface can be obtained by interpolation. * :
[0119]
[0120] Therefore, the intersection point P between the ray and the surface can be obtained. * =p(t) * v), then surface point P * Adding it to the original set of sampling points P, we have:
[0121] P′:=P∪P * #(11)
[0122] Finally, it is rendered by the point set P′ constructor, using the pixel color C that intersects with the ray. pixel The color integral result of the supervised volume rendering is as follows:
[0123]
[0124] in, It is the color integral result c(o, v) along the ray direction of the volume rendering, which is also the discretized representation of equation (2).
[0125] Color Constraint: To improve the accuracy of the network's surface color estimation, this application proposes to supervise the surface rendering color using a surface rendering loss function, namely:
[0126] loss suface rendering =|c(t) * ,v)-C pixel |#(13)
[0127] Wherein, c(t) * v) is a spatial point o+t * The color value at point v along the ray direction v is a constraint designed to reduce the deviation between the volume rendering color integral and the surface color.
[0128] Weight Constraint: Assuming that in one optimization iteration, the camera emits only one ray p(t, v), and each ray has N sampling points, this application proposes a weighted regularization loss:
[0129]
[0130] Among them, t i Represents the sampling point p(t) i v) the depth corresponding to the ray direction, t * Represents spatial point P * =p(t) * The depth corresponding to v in the ray direction, w(t)i ) represents the sampling point p(t) i The color weights corresponding to v).
[0131] If, during a single optimization iteration, the camera emits M rays {p(t, v}} j Given a ray with j = 0, 1, 2, ..., M-1, and N sampling points on each ray, the weighted regularization loss is:
[0132]
[0133] Among them, t i,j For the j-th ray, the i-th sampling point p(t) i,j v j The corresponding depth; Let j be the intersection point of the ray j and the surface. The corresponding depth, that is w j (t i,j Then it is point p(t) i,j v j The color weight corresponding to ().
[0134] 3. Multi-view geometric constraint method
[0135] In the optimization process of volume rendering, the latent geometry is constrained by the color of only one image in each iteration. This does not guarantee that the optimization direction of the latent geometry is consistent for each view, i.e., ambiguity exists in the optimization process. The sparser the input views, the larger the space of ambiguity, and the smaller the possibility of recovering the correct latent geometry. Undoubtedly, this will seriously affect the application of volume rendering in remote sensing scenes. To alleviate this ambiguity, this application introduces traditional multi-view geometric constraints in the volume rendering process. Specifically, the traditional multi-view geometric constraints introduced in this application include two aspects: point cloud constraints and image patch consistency constraints.
[0136] Point Cloud Constraints: In the classic 3D reconstruction pipeline, firstly, the SFM algorithm calculates the camera pose and calibration parameters for each input image, along with a set of sparse keypoints; then, the MVS algorithm estimates the pixel depth and normals for each calibration image; finally, backprojecting the estimated depth map yields a dense point cloud. Considering that dense point clouds provide a relatively accurate representation of the scene, this application utilizes dense point clouds to explicitly guide the optimization of the signed distance function f. Based on dense point cloud p mvs SDF value f(p) mvs Based on the view that the value should be an approximation of zero, this application proposes an SDF loss function based on dense point clouds:
[0137]
[0138] in, P represents mvs The number of spatial points in p i Indicates that it comes from p mvs For any point in the space p, |·| represents the L1 distance. This loss function aims to fit the dense point cloud p to a forced function f. mvs This guides the optimization of function f.
[0139] Photometric consistency constraint: Assuming the signed distance function f captures the correct implicit surface, its geometry should remain consistent across different views. Based on this viewpoint, this application uses a photometric consistency constraint to constrain the implicit surface captured by the function f.
[0140] For a small area s on an implicit surface, suppose s is in the grayscale image I′ of the reference view. r The projection in is image block I′ r (q), while in the source view I S ={I is The grayscale image I′ of {i = 1, ..., L} s ={I′ is The projection of i = 1, ..., L into the image patch I′ is the image block I′. s (q)={I′ is (q is Let I' be a set of source views, i = 1, ..., L, where L represents the number of source views, which is a pre-defined parameter. Then, I''... r (q) and I′ s (q is There should be photometric consistency between them. Therefore, this application uses I′ r (q) and I′ is (q is Normalized cross-correlation (NCC) between the two is used to measure photometric consistency.
[0141]
[0142] Where Cov represents covariance and Var represents variance. In one iteration of optimization, this application calculates the NCC score for each source image and uses it to construct the loss function:
[0143]
[0144] This loss function aims to ensure the geometric consistency of implicit surfaces across multiple views through photometric consistency.
[0145] After the above optimizations, the SDF network F θ With color network The geometry and appearance of the target have been captured. At this point, to export the mesh model of the scene, simply perform uniform sampling in cubic space; then, the SDF network determines the SDF value of each sampling point's coordinates; finally, the MarchingCubes algorithm reconstructs the mesh model based on the coordinates of the sampling points and their SDF values. The entire process is as follows: Figure 3 As shown.
[0146] To generate a new view, simply provide the new camera position o′ and the corresponding set of ray directions {v x,y Given a vector |x = 1, ..., img_width, y = 1, ..., img_height}, the SDF network and the color network are then subjected to forward inference. The result of the forward inference is the view synthesized from the new viewpoint, where img_width and img_height represent the width and height of the view, respectively. The entire process is as follows: Figure 4 As shown.
[0147] To generate a walkthrough video of a scene, simply input a sequence of view directions and then repeat the process of generating new views.
[0148] It should be noted that the specific embodiments are merely explanations and illustrations of the technical solutions of this application and should not be used to limit the scope of protection. Any modifications made in accordance with the claims and description of this application that are only partial should still fall within the scope of protection of this application.
Claims
1. A method for surface reconstruction and new view synthesis of remote sensing scenes, characterized in that... Includes the following steps: Step 1: The structure-on-motion algorithm is used to recover the camera intrinsic and extrinsic parameters corresponding to each image from the remote sensing image. Based on the camera intrinsic and extrinsic parameters, the point cloud p is reconstructed using a multi-view stereo matching algorithm. mvs ; Step 2: Randomly select a remote sensing image as reference view I r Simultaneously draw L sheets and I sheets r Adjacent remote sensing images as source view I s ={I is |i=1,...,L}, and obtain I r with I s The corresponding grayscale image I′ r and I′ s ={I′ is |i=1,...,L}, and then from reference view I r Randomly select a pixel P ixel And construct a ray p(t, v) for pixel P based on the camera's intrinsic and extrinsic parameters. ixel The process involves penetration, and finally, N points are sampled on the ray p(t, v) to obtain pixel P. ixel The corresponding set of sampling points P = {p(t)} i ,v)|i=0,1,2...,N-1},where, the ray p(t,v) is a three-dimensional vector function, i.e. p(t,v)=o+tv,where o represents the spatial coordinates of the camera, v represents the unit direction vector of the ray, i.e. the ray direction, and t represents the depth of the ray; Step 3: Feed the sampling point set P into the MLP network F θ The corresponding SDF value is obtained, and then linear interpolation is performed on the sampling point set P based on the SDF value to obtain the MLP network F. θ When the output SDF value is zero, the corresponding spatial point P * , and P * The corresponding ray depth t * That is, P * =o+t * v; Step 4: Combine the sampling point set P with P * Taking the union of the two sets, we get the point set P′; Step 5: Estimate the weights of the point set P′ to obtain the color weights {w(t)} corresponding to the point set P′. i )|i=0,1,2,...,N}; Step 6: Input the point set P′ and the unit direction vector v of the ray into the MLP network. Obtain the color value {c(t)} corresponding to the point set P′ along the ray direction v. i The color values are then weighted and summed according to their color weights, and the sum is used as the pixel P. ixel The corresponding reconstructed color c volume_rendering ; Step 7: Place spatial point P * The unit direction vector v of the ray is fed into the MLP network. Get P * The color value c(t) corresponding to the ray direction v * v), and take it as pixel P ixel The corresponding reconstructed color c surface_rendering ; Step 8: Calculate spatial point P from camera intrinsic and extrinsic parameters * In the grayscale images I′ of the reference view and the source view r with I' s ={I′ is Image block I′ corresponding to |u=1,...,L} r (q) and I′ s (q)={I′ is (q is |i = 1, ..., L}; Step 9: Reconstruct color c volume_rendering Construct the loss function Loss volume_rendering , Reconstructed color c surface_rendering Construct the loss function Loss surface_rendering , from point cloud P mvs Construct the loss function Loss sdf From image block I′ r (q) and I′ s (q) Construct the loss function Loss photo Then, based on the above loss function, backpropagation is used to optimize the MLP network F. θ and Step 10: Repeat steps 2 through 9 K times for the MLP network F. θ and Optimization is performed, where K is a pre-defined hyperparameter; Step 11: Utilize the optimized MLP network F θ and Complete the surface reconstruction and new view synthesis task.
2. The method for surface reconstruction and new view synthesis of remote sensing scenes according to claim 1, characterized in that, The reconstructed color c in step six volume_rendering Represented as: Wherein, c(t) i ,v),w(t) i ) represent spatial points o and t respectively. i The color value and color weight of v along the ray direction.
3. The method for surface reconstruction and new view synthesis of remote sensing scenes according to claim 2, characterized in that, The reconstructed color c in step seven surface_rendering Represented as: c surface_rendering =c(t * ,v) Wherein, c(t) * v) represents spatial point P * The color value along the ray direction v.
4. The method for surface reconstruction and new view synthesis of remote sensing scenes according to claim 3, characterized in that, The linear interpolation is expressed as: Where f represents the MLP network F θ The SDF function represented by p(t) i v) and p(t) i+1 Let f(p(t, v) represent two adjacent rays p(t, v) that satisfy the relation f(p(t, v)). i ,v))·f(p(t i+1 Sampling points where v)) < 0, t i and t i+1 p(t) i v) and p(t) i+1 , v) is the depth corresponding to the ray direction v.
5. The method for surface reconstruction and new view synthesis in remote sensing scenes according to claim 4, characterized in that, The loss function Loss volume_rendering Represented as: loss volume_rendering =|c volume_rendering -c pixel | Among them, c pixel Represents pixel p ixel The corresponding real color.
6. The method for surface reconstruction and new view synthesis in remote sensing scenes according to claim 5, characterized in that, The loss function Loss surface_rendering Represented as: loss sufacerendering =|c surface_rendering -c pixel |。 7. The method for surface reconstruction and new view synthesis of remote sensing scenes according to claim 6, characterized in that, The loss function Loss sdf Represented as: in, P represents mvs The number of spatial points in p i Indicates that it comes from p mvs Any point in space.
8. The method for surface reconstruction and new view synthesis of remote sensing scenes according to claim 7, characterized in that, The loss function weight Represented as: Among them, t i Represents the sampling point p(t) i , v) the depth corresponding to the ray direction v, t * Represents spatial point P * The depth corresponding to the ray direction v, w(t) i ) represents the sampling point p(t) i The color weights corresponding to v).
9. A method for surface reconstruction and new view synthesis in remote sensing scenes according to claim 8, characterized in that, The loss function Loss photo Represented as: Among them, NCC(I′) r (q i ), I′ s (q is )) represents image block I′ r (q i ) and ′(q is Normalized cross-correlation between ) 10. A method for surface reconstruction and new view synthesis in a remote sensing scene according to claim 9, characterized in that, The NCC(I′) r (q i ), I′ s (q is )) is represented as: Where Cov represents covariance and Var represents variance.