Digital terrain model reconstruction method and device, storage medium and computer equipment

By using a multi-stage progressive reconstruction network for upsampling and residual refinement, the problem of insufficient terrain detail recovery in low-resolution digital surface models is solved, and high-quality generation and stable reconstruction of high-resolution digital terrain models are achieved.

CN122289591APending Publication Date: 2026-06-26TIANFU JIANGXI LAB

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
TIANFU JIANGXI LAB
Filing Date
2026-05-28
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing technologies struggle to effectively recover the detailed structure and local elevation changes of high-resolution terrain from low-resolution digital surface models, and traditional methods are ineffective at suppressing elevation deviations and reconstruction instability caused by surface cover.

Method used

A multi-stage progressive reconstruction network is constructed, which upsamples and refines residuals step by step through cascaded reconstruction stages, and integrates high-resolution optical images with low-resolution digital surface models to achieve digital terrain model reconstruction from low resolution to high resolution.

Benefits of technology

It improves the reconstruction accuracy and fidelity of terrain details in digital terrain models, and achieves stable enhancement and high-quality generation of terrain structures at multiple resolutions.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122289591A_ABST
    Figure CN122289591A_ABST
Patent Text Reader

Abstract

This disclosure relates to the field of data processing technology, and provides a method, apparatus, storage medium, and computer device for digital terrain model reconstruction. The method includes: acquiring a training dataset; constructing a multi-stage progressive reconstruction network and training the network using the training dataset to obtain a trained multi-stage progressive reconstruction network; the multi-stage progressive reconstruction network includes multiple cascaded reconstruction stages; acquiring high-resolution optical images and low-resolution digital surface models, and inputting these images and models into the trained multi-stage progressive reconstruction network; and performing progressive upsampling and residual refinement processing on the high-resolution optical images and low-resolution digital surface models of the area to be reconstructed through multiple cascaded reconstruction stages to obtain a digital terrain model of the target resolution. This embodiment achieves stable enhancement of terrain structure at multiple resolutions and high-quality generation of high-resolution digital terrain models.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of data processing technology, and more specifically, to a method, apparatus, storage medium, and computer equipment for digital terrain model reconstruction. Background Technology

[0002] Digital terrain models (DTMs) are fundamental geographic data describing the elevation undulations of bare land surfaces, widely used in terrain analysis, hydrological simulation, urban planning, disaster assessment, and 3D scene construction. High-resolution DTMs can finely represent surface slope variations, terrain boundaries, and local undulation features, significantly improving the accuracy of related applications. However, acquiring large-scale, high-precision DTMs typically relies on expensive surveying methods such as airborne LiDAR and photogrammetry, which are difficult and costly in practice. In contrast, globally available low-resolution digital surface models (DSMs) offer wide coverage and low acquisition costs, but they contain information on surface cover such as buildings and vegetation, and have limited spatial resolution, making them difficult to directly use for reconstructing the detailed structure and local elevation changes of bare land terrain.

[0003] In related technologies, most local neighborhood interpolation techniques such as bilinear interpolation and cubic interpolation are used to reconstruct terrain. Although simple to implement, these techniques rely on fixed mathematical models and cannot recover the high-frequency terrain information missing in low-resolution data. This results in blurred boundaries, insufficient detail, and difficulty in eliminating elevation deviations caused by land cover. While deep learning methods have emerged in recent years and can learn the mapping from low-resolution to high-resolution inputs, these techniques still have significant drawbacks: first, when directly recovering high-resolution terrain from extremely low-resolution elevation data in a single stage, structural blurring and reconstruction instability are prone to occur; second, relying solely on elevation data is insufficient to fully recover the boundaries and local details of complex surfaces. Summary of the Invention

[0004] This disclosure provides at least one digital terrain model reconstruction method, apparatus, storage medium, and computer equipment, which improves the reconstruction accuracy and fidelity of terrain details of digital terrain models, and achieves stable enhancement of terrain structure under multi-level resolution and high-quality generation of high-resolution digital terrain models.

[0005] This disclosure provides a digital terrain model reconstruction method, including: Obtain a training dataset; wherein the training dataset includes multiple training data subsets, each of the training data subsets including high-resolution optical training images, low-resolution digital surface training models, and high-resolution digital terrain model samples corresponding to the high-resolution optical training images and the low-resolution digital surface training models; A multi-stage progressive reconstruction network is constructed, and the multi-stage progressive reconstruction network is trained using the training dataset to obtain a trained multi-stage progressive reconstruction network; wherein, the multi-stage progressive reconstruction network includes multiple cascaded reconstruction stages. A high-resolution optical image and a low-resolution digital surface model of the target area to be reconstructed are acquired, and the high-resolution optical image and the low-resolution digital surface model are input into the trained multi-stage progressive reconstruction network. The high-resolution optical image and the low-resolution digital surface model are upsampled and the residual is refined step by step through the multiple cascaded reconstruction stages to obtain a target resolution digital terrain model corresponding to the target area to be reconstructed.

[0006] This disclosure provides a digital terrain model reconstruction apparatus, comprising: The data acquisition module is used to acquire a training dataset; wherein the training dataset includes multiple training data subsets, each of the training data subsets includes a high-resolution optical training image, a low-resolution digital surface training model, and a high-resolution digital terrain model sample corresponding to the high-resolution optical training image and the low-resolution digital surface training model. A network training module is used to construct a multi-stage progressive reconstruction network and train the multi-stage progressive reconstruction network using the training dataset to obtain a trained multi-stage progressive reconstruction network; wherein, the multi-stage progressive reconstruction network includes multiple cascaded reconstruction stages. The network processing module is used to acquire high-resolution optical images and low-resolution digital surface models of the target area to be reconstructed, and input the high-resolution optical images and low-resolution digital surface models into the trained multi-stage progressive reconstruction network. Through the multiple cascaded reconstruction stages, the high-resolution optical images and low-resolution digital surface models are upsampled and the residuals are refined step by step to obtain a target resolution digital terrain model corresponding to the target area to be reconstructed.

[0007] This disclosure provides a computer device including a processor, a memory, and a bus. The memory stores machine-readable instructions executable by the processor. When the computer device is running, the processor communicates with the memory via the bus. When the machine-readable instructions are executed by the processor, they perform a digital terrain model reconstruction method as described in any of the above possible embodiments.

[0008] This disclosure provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the digital terrain model reconstruction method as described in any of the possible embodiments above.

[0009] The digital terrain model reconstruction method, apparatus, storage medium, and computer equipment provided in this disclosure, by constructing a multi-stage progressive reconstruction network and utilizing high-resolution optical images and low-resolution digital surface models for progressive upsampling and residual refinement, can effectively fuse optical texture information and surface model elevation information. This preserves local terrain details while suppressing the cumulative propagation of reconstruction errors. Thus, it improves the reconstruction accuracy and fidelity of terrain details, achieving stable enhancement of terrain structure at multiple resolutions and high-quality generation of high-resolution digital terrain models.

[0010] To make the above-mentioned objects, features and advantages of this disclosure more apparent and understandable, preferred embodiments are described below in detail with reference to the accompanying drawings. Attached Figure Description

[0011] To more clearly illustrate the technical solutions of the embodiments of this disclosure, the accompanying drawings referenced in the embodiments will be briefly described below. These drawings are incorporated in and constitute a part of this specification. They illustrate embodiments conforming to this disclosure and, together with the specification, serve to explain the technical solutions of this disclosure. It should be understood that the following drawings only show some embodiments of this disclosure and should not be considered as limiting the scope. Those skilled in the art can obtain other related drawings based on these drawings without creative effort.

[0012] Figure 1 A flowchart of a digital terrain model reconstruction method provided by an embodiment of this disclosure is shown; Figure 2 A flowchart of a first reconstruction stage processing method provided by an embodiment of this disclosure is shown; Figure 3 A flowchart of a subset loss calculation method provided by an embodiment of this disclosure is shown; Figure 4 This diagram illustrates the structure of a digital terrain model reconstruction apparatus provided in an embodiment of the present disclosure. Figure 5 A schematic diagram of the structure of a computer device provided in an embodiment of this disclosure is shown. Detailed Implementation

[0013] To make the objectives, technical solutions, and advantages of the embodiments of this disclosure clearer, the technical solutions of the embodiments of this disclosure will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of this disclosure, and not all of them. The components of the embodiments of this disclosure described and shown in the accompanying drawings can generally be arranged and designed in various different configurations. Therefore, the following detailed description of the embodiments of this disclosure provided in the accompanying drawings is not intended to limit the scope of the claimed disclosure, but merely represents selected embodiments of this disclosure. All other embodiments obtained by those skilled in the art based on the embodiments of this disclosure without inventive effort are within the scope of protection of this disclosure.

[0014] It should be noted that similar labels and letters in the following figures indicate similar items. Therefore, once an item is defined in one figure, it does not need to be further defined and explained in subsequent figures.

[0015] In this document, the term "and / or" merely describes a relationship, indicating that three relationships can exist. For example, A and / or B can represent three cases: A alone, A and B simultaneously, and B alone. Furthermore, the term "at least one" in this document means any combination of at least two of any one or more elements. For example, including at least one of A, B, and C can mean including any one or more elements selected from the set consisting of A, B, and C.

[0016] Digital Terrain Models (DTMs) are crucial foundational data for describing the elevation undulations of bare land surfaces, widely used in terrain analysis, hydrological simulation, urban planning, disaster assessment, and 3D scene construction. High-resolution DTMs can more accurately represent surface slope variations, terrain boundaries, and local undulation features, significantly improving the accuracy and reliability of related applications.

[0017] However, acquiring large-scale, high-precision digital elevation models (DTMs) typically relies on costly surveying methods or high-quality elevation data sources, making practical acquisition quite difficult. In contrast, publicly available low-resolution digital elevation data (DEMs) offer wider coverage and lower acquisition costs, but these data often include information on surface cover such as buildings and vegetation, making them closer to digital surface models (DSMs) and less suitable for directly representing detailed bare terrain. Furthermore, low-resolution DSMs themselves have limited spatial resolution, making it difficult to recover the detailed structures and local elevation changes contained in high-resolution terrain.

[0018] Research has revealed that terrain reconstruction or elevation interpolation methods in related technologies mainly include bilinear interpolation, cubic interpolation, and other interpolation methods based on local neighborhoods. These methods are simple to implement and computationally efficient, and can complete elevation data scale conversion to a certain extent. However, because they inherently rely on fixed mathematical models, they are difficult to recover the missing high-frequency terrain information in the original low-resolution data, often resulting in blurred boundaries and insufficient detail in the output results, and they are also unable to effectively suppress elevation deviations caused by land cover.

[0019] In recent years, deep learning methods, represented by convolutional neural networks, have made significant progress in image super-resolution reconstruction and remote sensing data reconstruction. These methods can learn the mapping relationship between low-resolution inputs and high-resolution outputs in a data-driven manner, exhibiting stronger detail recovery capabilities compared to traditional interpolation methods. However, existing methods still have several shortcomings for elevation data reconstruction tasks: on the one hand, directly recovering high-resolution terrain output from extremely low-resolution elevation input in a single stage is prone to structural blurring and reconstruction instability; on the other hand, relying solely on elevation data often fails to fully recover the boundaries and local details in complex terrain surfaces.

[0020] Based on the above research, this disclosure provides a digital terrain model reconstruction method, apparatus, storage medium, and computer device. Specifically, firstly, a training dataset is acquired; secondly, a multi-stage progressive reconstruction network is constructed and trained using the training dataset to obtain a trained multi-stage progressive reconstruction network; wherein, the multi-stage progressive reconstruction network includes multiple cascaded reconstruction stages; high-resolution optical images and low-resolution digital surface models are acquired and input into the trained multi-stage progressive reconstruction network; through multiple cascaded reconstruction stages, the high-resolution optical images and low-resolution digital surface models are progressively upsampled and residual refined to obtain a target resolution digital terrain model.

[0021] In this embodiment, by constructing a multi-stage progressive reconstruction network and utilizing high-resolution optical images and low-resolution digital surface models for progressive upsampling and residual refinement, optical texture information and surface model elevation information can be effectively fused. This preserves local terrain details while suppressing the cumulative propagation of reconstruction errors. Consequently, the reconstruction accuracy and fidelity of terrain details of the digital terrain model are improved, achieving stable enhancement of terrain structure at multiple resolutions and high-quality generation of high-resolution digital terrain models.

[0022] To facilitate understanding of this embodiment, the executing entity of the digital terrain model reconstruction method provided in this disclosure will first be described in detail. The executing entity of the digital terrain model reconstruction method provided in this disclosure is a computer device. This computer device can be a terminal device or a server. The terminal device can also be a mobile device, user terminal, terminal, handheld device, computing device, vehicle-mounted device, wearable device, etc. The server can be an independent physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud storage, big data, and artificial intelligence platforms. Optionally, this method can also be applied to an implementation environment composed of computer devices and servers.

[0023] The digital terrain model reconstruction method provided in this application embodiment will be described in detail below with reference to the accompanying drawings. See also Figure 1 The diagram shown is a flowchart of a digital terrain model reconstruction method provided in this embodiment of the present disclosure. The method includes the following steps S101 to S103: S101, Obtain the training dataset.

[0024] It is understood that the training dataset refers to the set of samples used to train the proposed multi-stage progressive reconstruction network. This set of samples contains multiple sets of input data and corresponding supervision labels, which can be used to train the multi-stage progressive reconstruction network to establish the mapping relationship from low-resolution digital surface models and high-resolution optical imagery to high-resolution digital terrain models. Here, the training dataset includes multiple training data subsets, each consisting of three corresponding data sets: high-resolution optical training imagery, low-resolution digital surface training model, and high-resolution digital terrain model samples corresponding to the high-resolution optical training imagery and low-resolution digital surface training model.

[0025] Specifically, high-resolution optical training imagery refers to multi-channel remote sensing images with a spatial resolution of sub-meter level or higher, such as visible light or infrared images with a ground sampling interval of 0.25 meters and a pixel size of 400×400. This imagery records the spectral reflectance information of land cover, including the texture and edge features of different types of land features such as building roofs, vegetation canopies, bare soil, and roads. High-resolution optical training imagery can provide rich spatial structural clues for terrain reconstruction, such as the direction of ridgelines, shadow variations at steep cliffs, and the uniform texture of flat farmland. This information is used in the network reconstruction process to assist in determining terrain boundaries and local undulations. Low-resolution digital surface training models are represented as single-channel elevation raster data, which are resampled or obtained from publicly available global datasets with lower spatial resolution, such as 25-meter resolution elevation data from products like SRTM or ASTER GDEM. This data records the elevation values ​​of the tops of land cover, including building tops, tree canopies, and bare ground surfaces, where the elevations of buildings and vegetated areas are significantly higher than the true bare ground elevations. The low-resolution digital surface training model, as the main input to the network, provides a low-resolution initial terrain profile, but it cannot distinguish between surface cover and bare ground and lacks high-frequency details.

[0026] Specifically, the high-resolution digital terrain model samples corresponding to the high-resolution optical training images and low-resolution digital surface training models refer to bare ground elevation data with a spatial resolution of meters or higher, after removing the elevations of non-ground objects such as buildings and vegetation. For example, a 1-meter resolution digital terrain model generated by filtering point clouds from an airborne LiDAR system has a pixel size of 100×100 and covers the exact same geographical area as the optical images and low-resolution digital surface models. During training, the high-resolution digital terrain model samples can serve as supervisory labels, guiding the network to learn the nonlinear mapping from a low-resolution digital surface model containing cover to a clean, high-resolution digital terrain model.

[0027] Here, the high-resolution optical training image in this disclosure is preferably a multi-channel image of 400×400 pixels, the low-resolution digital surface training model is preferably a single-channel elevation data of 4×4 pixels, and the high-resolution digital terrain model sample is preferably a single-channel terrain elevation data of 100×100 pixels.

[0028] In some other embodiments, the size of each training data subset can also be scaled proportionally according to the resolution of the actual data source or the needs of the target application. For example, the input image can be reduced to 200×200, the DSM can be adjusted to 2×2, and the DTM can be adjusted to 50×50. No specific limitation is made here.

[0029] In some possible implementations, to eliminate training instability caused by differences in units and numerical ranges between different samples, dynamic normalization can be applied to the high-resolution optical training images, low-resolution digital surface training models, and high-resolution digital terrain model samples when constructing the training dataset. Here, dynamic normalization refers to independently calculating the minimum and maximum values ​​for each data sample, then subtracting the minimum value from each value in the sample and dividing by the difference between the maximum and minimum values, supplemented by a very small constant to prevent division by zero errors, thereby mapping all values ​​of the sample to a closed interval between 0 and 1. In this way, through dynamic normalization, data samples from different regions, different time phases, and different elevation amplitudes can be unified to the same numerical scale, avoiding the unbalanced impact of elevation differences between mountainous and plain areas on network gradient updates, thus improving the convergence speed and stability of the training process. Furthermore, the dynamically normalized data satisfies the sensitivity range of the neural network activation function to the input amplitude, which helps to keep the activation gradient within the effective interval.

[0030] S102, construct a multi-stage progressive reconstruction network, and train the multi-stage progressive reconstruction network using the training dataset to obtain the trained multi-stage progressive reconstruction network.

[0031] Specifically, the multi-stage progressive reconstruction network is a cascaded neural network structure based on deep learning proposed in this disclosure. This network can take a low-resolution digital surface model as the initial elevation input, combine it with texture and edge features provided by high-resolution optical imagery, and progressively improve the spatial resolution through multiple sequentially connected reconstruction stages, ultimately outputting a high-resolution digital terrain model. The network can include multiple cascaded reconstruction stages, each responsible for upsampling the terrain data output from the previous stage from the current resolution to a higher target resolution, and using optical imagery of the corresponding scale for residual refinement, achieving a gradual restoration from the overall outline to local details.

[0032] Furthermore, after constructing the multi-stage progressive reconstruction network, it can be trained under supervision using high-resolution optical training images and low-resolution digital surface training models from the training dataset. This yields network parameters capable of accurately reconstructing high-resolution digital terrain models from the input high-resolution optical images and low-resolution digital surface models. Here, when training the network using the training dataset, mini-batch stochastic gradient descent or an adaptive moment estimation optimizer can be used to update the parameters. By continuously minimizing the loss function between the predicted results and the true labels, the network converges to its optimal state.

[0033] For example, since directly upsampling from a low-resolution digital surface model with lower spatial resolution to a high-resolution digital terrain model with higher spatial resolution in one step can lead to blurred and structurally unstable reconstruction results, the multi-stage progressive reconstruction network proposed in this disclosure includes three cascaded reconstruction stages: a first reconstruction stage, a second reconstruction stage, and a third reconstruction stage. Each reconstruction stage focuses on a fixed upsampling factor, which can enlarge the input terrain data by a preset factor in spatial resolution, while using optical imagery of the corresponding scale to compensate for high-frequency details. Specifically, the first reconstruction stage can upsample the input low-resolution digital surface model from 4×4 pixels to 20×20 pixels, achieving coarse-scale reconstruction from low resolution to medium resolution; the second reconstruction stage can further upsample the 20×20 pixel intermediate result output by the first reconstruction stage to 100×100 pixels, achieving fine-scale reconstruction from medium to high resolution; the third reconstruction stage performs secondary residual refinement on the preliminary result of 100×100 pixels without changing the spatial resolution, repairing local errors. Thus, by setting a progressive reconstruction structure with two upsampling stages and one refinement stage, the problems of reconstruction blurring and structural instability caused by directly mapping from extremely low resolution to high resolution in one step can be avoided.

[0034] Here, because high-resolution optical training images possess rich texture, edge, and spatial structure information, they can provide reference cues for terrain boundaries and local undulations during network training. Therefore, before training the network, the high-resolution optical training images in each training data subset can be downsampled to construct intermediate-scale training images matching the spatial scale of the first reconstruction stage, and target-scale training images matching the spatial scale of the second reconstruction stage. The intermediate-scale training images represent the low-resolution version obtained by downsampling the original high-resolution optical training images, providing texture information at the same scale as the intermediate-resolution terrain data in the first reconstruction stage, used to assist in recovering coarse-scale terrain features. The target-scale training images represent the medium-resolution version obtained by downsampling the original high-resolution optical training images, providing fine texture information matching the output resolution for the second and third reconstruction stages, used to guide boundary localization and local detail reconstruction.

[0035] For example, the spatial resolution of the intermediate-scale training image can preferably be 20×20 pixels, and the spatial resolution of the target-scale training image can preferably be 100×100 pixels; or, it can be scaled proportionally according to the original resolution of the actual input data or the desired output resolution, without specific limitations.

[0036] Furthermore, after obtaining multi-scale optical images of the high-resolution optical training images after downsampling, the following training process can be executed repeatedly until the preset training termination condition is reached: (1) Select a preset number of training data subsets from the training dataset, and use each selected training data subset as a training sample. All selected training data subsets together constitute a current training batch. (2) For each subset of training data in the current training batch, perform the following operations: The low-resolution digital surface training model and the intermediate-scale training image corresponding to the high-resolution optical training image in the training data subset are input into the first reconstruction stage to obtain intermediate-resolution terrain training data. The intermediate resolution terrain training data and the target scale training image corresponding to the high resolution optical training image are input into the second reconstruction stage to obtain the initial target resolution digital terrain training model. The initial target resolution digital terrain training model and the target scale training image are input into the third reconstruction stage to obtain a predictive digital terrain training model corresponding to the training data subset. (3) Calculate the subset loss between the predicted digital terrain training model corresponding to each training data subset in the current training batch and the high-resolution digital terrain model sample in the training data subset; and determine the total batch loss of the current training batch based on the subset loss of all training data subsets in the current training batch. (4) Adjust the parameters of the multi-stage progressive reconstruction network according to the total batch loss, and return to step 1 to select the next training batch to continue training.

[0037] Here, at the initial stage of each training round, a subset of training data can be randomly selected from the training dataset to serve as the current training batch. The specific selection method can be uniform random sampling or random sampling without replacement, for example, using a method of random shuffling followed by sequential sampling or stratified sampling based on class balance.

[0038] In some possible embodiments, during the training phase, to improve the generalization ability of the multi-stage progressive reconstruction network to different terrain orientations and local structural changes, consistent geometric enhancement can be performed on the high-resolution optical training images, downsampled intermediate-scale training images, target-scale training images, low-resolution digital surface training models, and corresponding high-resolution digital terrain model samples participating in the training. The geometric enhancement operations can include transformations such as horizontal flipping, vertical flipping, and 90-degree rotation, each applied independently to the current training sample with a preset probability. During the geometric enhancement, the data of each modality remains synchronously transformed; that is, all training images, digital surface models, and digital terrain model samples undergo the exact same flipping or rotation operation to ensure that the spatial correspondence between the high-resolution optical training images, multi-scale optical images, low-resolution digital surface training models, and supervision labels is not disrupted. Thus, through this consistent geometric enhancement processing, the network can see the performance of the same terrain under different orientations during training, thereby reducing its sensitivity to the orientation of the input data and improving the reconstruction stability of the network when facing different terrain orientations in practical applications.

[0039] Specifically, after multiple subsets of training data constitute a current training batch, for each subset of training data in the current training batch, the intermediate-scale training image corresponding to the low-resolution digital surface training model and the high-resolution optical training image can be input into the first reconstruction stage. After upsampling and residual refinement operations, intermediate-resolution terrain training data is obtained. (Refer to...) Figure 2 As shown, the process of inputting data to the first reconstruction stage may include the following steps S201 to S205: S201, input the low-resolution digital surface training model into the first sampling module of the first reconstruction stage.

[0040] Here, the first reconstruction stage is the initial reconstruction stage of the entire network, used to upsample the low-resolution input to an intermediate resolution. The first reconstruction stage may include a first sampling module and a first residual module. Specifically, the first sampling module can be used to enlarge the spatial size of the low-resolution input to a target size preset by the first reconstruction stage (e.g., from 4×4 pixels to 20×20 pixels), and can output preliminary upsampled terrain data; the first residual module can predict terrain residuals from features fused with optical imagery to achieve detail compensation of the upsampled results.

[0041] The first sampling module may include a bilinear interpolation path and a transposed convolution residual compensation path. The bilinear interpolation path is used to provide a smooth and stable basic terrain trend and extract low-frequency terrain contours. The transposed convolution residual compensation path can recover high-frequency details through learnable convolution kernels and compensate for the edge information lost by bilinear interpolation.

[0042] S202, using the bilinear interpolation path to perform bilinear interpolation upsampling on the low-resolution digital surface training model, and outputting a first intermediate resolution upsampling result; and using the transposed convolution residual compensation path to perform a transposed convolution operation on the low-resolution digital surface training model, and outputting a second intermediate resolution upsampling result.

[0043] Furthermore, after simultaneously inputting the low-resolution digital surface training model into the bilinear interpolation path and the transposed convolution residual compensation path, the bilinear interpolation path can be used to perform bilinear interpolation upsampling on the low-resolution digital surface training model. By calculating the elevation-weighted average of the four nearest neighbor pixels around the target pixel, a first intermediate resolution upsampling result is output. This result has a smooth and jagged terrain surface, but edge details are relatively blurred. The transposed convolution residual compensation path can be used to perform transposed convolution on the low-resolution digital surface training model. By a learnable convolution kernel, the input feature map is amplified and filled to output a second intermediate resolution upsampling result. This result can recover some high-frequency terrain features, but may have checkerboard artifacts.

[0044] Among them, bilinear interpolation upsampling is a parameter-free deterministic method that can maintain the overall continuity of the terrain and is more conducive to training stability; transposed convolution operation is a learnable upsampling method that can adaptively learn high-frequency compensation patterns from the data.

[0045] S203, add the first intermediate resolution upsampling result to the second intermediate resolution upsampling result to obtain the preliminary upsampled intermediate resolution terrain data.

[0046] Here, after obtaining the first intermediate resolution upsampling result and the second intermediate resolution upsampling result, the two can be added pixel by pixel to fuse the smooth body of bilinear interpolation and the high-frequency compensation of transposed convolution, and determine the preliminary upsampled intermediate resolution terrain data. This data contains both stable terrain trends and enhanced edge details, and can provide a better initial value for subsequent residual refinement.

[0047] For example, the first sampling module can be represented as: ; in, This represents the low-resolution digital surface training model as input; This represents the intermediate resolution terrain data after initial upsampling; Indicates the bilinear interpolation path; This indicates the path for compensating the residuals of the transposed convolution.

[0048] S204, the intermediate-scale training image is stitched together with the initially upsampled intermediate-resolution terrain data to obtain a first fusion feature; and the first fusion feature is input into the first residual module of the first reconstruction stage to obtain a first terrain residual.

[0049] Understandably, the intermediate-scale training image is a 20×20 pixel low-resolution optical image obtained through downsampling, which can provide texture and edge structure information that matches the current terrain scale. The initially upsampled intermediate-resolution terrain data only contains elevation information and lacks the ability to distinguish surface cover. By stitching the intermediate-scale training image and the initially upsampled intermediate-resolution terrain data along the channel dimension, the first fusion feature is obtained. This feature integrates multi-source information, enabling the subsequent first residual module to utilize both elevation trends and optical features to better determine which elevation changes originate from real terrain and which originate from buildings or vegetation.

[0050] Furthermore, after obtaining the first fusion feature, the first fusion feature can be input into the first residual module of the first reconstruction stage. The deep features are extracted from it through the stacked convolutional layers and activation functions in the first residual module to obtain the first terrain residual. This information is the error estimate between the current upsampling result and the real terrain, which is usually expressed as the compensation amount of high-frequency details.

[0051] For example, taking the first residual module using a three-layer convolutional structure, the first layer can be a 9×9 convolutional kernel with a ReLU activation function to extract local structural features; the second layer can be a 1×1 convolutional kernel with a ReLU activation function for nonlinear mapping and channel compression; and the third layer can be a 5×5 convolutional kernel to output a single-channel residual. This structure allows the fused features to be mapped to a residual map of the same size as the upsampled terrain, better compensating for missing high-frequency components, i.e., the first terrain residual.

[0052] Here, the expression for the first residual module can be represented as: ; ; ; ; in, Represented as intermediate-scale training images; This is represented as a convolution operation using a 9×9 convolution kernel; This is represented as a convolution operation using a 1×1 convolution kernel; This is represented as a convolution operation using a 5×5 convolution kernel; This is represented as the feature map output by the first convolutional layer in the first residual module; This is represented as the feature map output by the second convolutional layer in the first residual module; The first topographic residual, This is intermediate resolution terrain training data.

[0053] S205, Based on the first terrain residual and the initially upsampled intermediate resolution terrain data, determine the intermediate resolution terrain training data.

[0054] Here, since the first terrain residual can represent the difference between the current terrain and the real terrain, by adding the initial upsampled intermediate resolution terrain data and the first terrain residual pixel by pixel, the error in the upsampling result can be better corrected, and intermediate resolution terrain training data can be obtained. Compared with the initial upsampling result, this data has more accurate terrain boundaries and local undulation details, which can provide more accurate initial values ​​for the second reconstruction stage.

[0055] Understandably, after obtaining the intermediate-resolution terrain training data, it can be input together with the target-scale training image corresponding to the high-resolution optical training image into the second reconstruction stage. Similar to the first reconstruction stage, the second reconstruction stage also has a sampling module and a residual module, namely, a second sampling module and a second residual module. After inputting the intermediate-resolution terrain training data into the second reconstruction stage, the following steps (a) to (d) can be performed: (a) Perform bilinear interpolation upsampling on the intermediate resolution terrain training data using the bilinear interpolation path to output a first target resolution upsampling result; and perform transposed convolution operation on the intermediate resolution terrain training data using the transposed convolution residual compensation path to output a second target resolution upsampling result. (b) Add the first target resolution upsampling result to the second target resolution upsampling result to obtain the preliminary upsampled target resolution terrain data; (c) The target-scale training image is stitched together with the initially upsampled target-resolution terrain data to obtain a second fusion feature; and the second fusion feature is input into the second residual module of the second reconstruction stage to obtain a second terrain residual. (d) Based on the second terrain residual and the preliminary upsampled target resolution terrain data, determine the initial target resolution digital terrain training model.

[0056] Here, steps (a) to (d) of the second reconstruction stage are based on the same operating principle as steps S202 to S205 of the first reconstruction stage, except that the input data size and target resolution are different. Specifically, the first stage is from 4×4 to 20×20, and the second stage is from 20×20 to 100×100. For details, please refer to the description of steps S202 to S205 above, which will not be repeated here. Among them, the target resolution terrain data initially upsampled has a higher spatial density and smoother area filling compared to the intermediate resolution terrain training data, and has a macroscopic terrain outline that is closer to the final output. The initial target resolution digital terrain training model is a high-resolution terrain after the second stage residual refinement, which can reflect the bare ground elevation at a resolution of 1 meter and can provide relatively accurate local slope and boundary information.

[0057] For example, the second sampling module can be represented as: ; in, This represents the initial upsampled target resolution terrain data output. This represents intermediate resolution terrain training data.

[0058] For example, taking the second residual module as having the same three-layer convolutional structure as the first residual module, its first layer can be a 9×9 convolutional kernel with a ReLU activation function, used to extract local structural features from the features fused with the target-scale training image; the second layer can be a 1×1 convolutional kernel with a ReLU activation function, used for nonlinear mapping and channel compression; the third layer can be a 5×5 convolutional kernel outputting a single-channel residual. This structure allows the fused features to be mapped to a residual map of the same size as the upsampled terrain, better compensating for the missing high-frequency components, i.e., the second terrain residual.

[0059] Here, the expression for the second residual module can be represented as: ; ; ; ; in, Represented as training images at the target scale; This is represented as the feature map output by the first convolutional layer in the second residual module; This is represented as the feature map output by the second convolutional layer in the second residual module; This is the second topographic residual. The digital terrain model is trained to the initial target resolution.

[0060] In some possible embodiments, the first sampling module of the first reconstruction stage and the second sampling module of the second reconstruction stage, as well as the first residual module of the first reconstruction stage and the second residual module of the second reconstruction stage, can adopt the same network structure settings. During training, the convolution parameters can be automatically adapted according to the scale of the input data. Different convolution kernel sizes or number of channels can also be used according to different upsampling factors. No specific limitation is made here.

[0061] It is understandable that after obtaining the initial target resolution digital terrain training model through the second reconstruction stage, it can be input together with the target scale training image into the third reconstruction stage. The third reconstruction stage only includes the residual module and does not repeat the upsampling operation. Its input is the third fusion feature obtained by stitching the initial target resolution digital terrain training model and the target scale training image, and its output is the predicted digital terrain training model. This predicted digital terrain training model is the final prediction result after being refined by the third residual module, so as to further correct the remaining local errors.

[0062] For example, the third residual module can be represented as: ; ; ; ; in, This is represented as the feature map output from the first convolutional layer in the third residual module; This is represented as the feature map output by the second convolutional layer in the third residual module; The third topographic residual, To train a model for predicting digital terrain.

[0063] Furthermore, after completing the forward propagation of a batch, the network parameters can be updated by backpropagation based on the calculated total loss of the batch. Then, based on the updated network, the above step (1) is returned, and the process is repeated until the preset training termination conditions are met, such as the validation set loss no longer decreasing or reaching the preset maximum number of rounds, thus obtaining the multi-stage progressive reconstruction network that has been trained.

[0064] In this disclosure, a multi-stage progressive reconstruction network is constructed to upsample the low-resolution digital surface model step by step and fuse multi-scale optical image features, avoiding the structural blurring problem caused by single-stage direct mapping; the edge preservation effect is enhanced by utilizing the stability of dual-path upsampling fused with bilinear interpolation and the detail recovery capability of transposed convolution; at the same time, the learning difficulty is reduced by the residual module, and the detail recovery accuracy is improved, thereby achieving high-quality reconstruction of high-resolution digital terrain models.

[0065] Here, the total batch loss is obtained by summing or averaging the subset losses of all training data subsets in the current training batch. To balance the contributions of different loss terms to network optimization and prevent any one term from dominating the training process, this disclosure proposes constructing a joint loss function that includes pixel-level mixing loss, slope consistency loss, and total variational regularization term. Adjustable weight coefficients are assigned to each of the three losses to better adapt to reconstruction tasks with different terrain features and resolution requirements. (Refer to...) Figure 3 As shown, calculating the subset loss of a single subset of training data may include the following steps S301 to S304: S301, calculate the L1 loss between the predicted digital terrain training model and the high-resolution digital terrain model sample, and the SmoothL1 loss between the predicted digital terrain training model and the high-resolution digital terrain model sample, and add the L1 loss and the SmoothL1 loss according to a preset weighting coefficient to obtain the pixel-level hybrid loss.

[0066] Here, L1 loss refers to the average or sum of the absolute errors of each pixel between the predicted digital terrain training model and the high-resolution digital terrain model samples. It is a robust loss function that is insensitive to outliers. Smooth L1 loss is a smooth loss function that approximates L2 loss when the difference between the predicted and true values ​​is small, and degenerates into L1 loss when the difference is large. It is a combination of L1 robustness and L2 smoothness. By calculating L1 loss and Smooth L1 loss and combining them with preset weighting coefficients, a pixel-level hybrid loss can be obtained. This can enhance the fine-grained optimization capability for small errors while maintaining robustness to large errors. The preset weighting coefficients can be set according to the error distribution of the specific terrain data, for example, α can be set to 0.7.

[0067] The pixel-level blending loss can be expressed as: ; in, Represented as pixel-level blending loss; Represented as preset weighting coefficients; This is represented as a training model for predicting digital terrain. Represented as a high-resolution digital terrain model sample; This is represented as the Smooth L1 loss between the predicted digital terrain training model and the high-resolution digital terrain model samples.

[0068] S302, calculate the first absolute error between the gradient of the predicted digital terrain training model in the horizontal direction and the gradient of the high-resolution digital terrain model sample in the horizontal direction, calculate the second absolute error between the gradient of the predicted digital terrain training model in the vertical direction and the gradient of the high-resolution digital terrain model sample in the vertical direction, and add the first absolute error and the second absolute error to obtain the slope consistency loss.

[0069] Specifically, slope consistency loss refers to the gradient difference between the predicted digital terrain training model and the high-resolution digital terrain model samples in the horizontal and vertical directions. It can be used to represent the degree to which the predicted terrain matches the real terrain in terms of slope variation and edge localization. By calculating the absolute error between the gradient of the predicted digital terrain training model and the gradient of the high-resolution digital terrain model samples, the network can be forced to learn the local slope characteristics of the real terrain to determine more accurate terrain structure boundaries.

[0070] Here, the slope consistency loss can be expressed as: ; The gradient in the horizontal direction is defined as: ; The gradient in the vertical direction is defined as: ; For boundary pixels, the definition is: , ; in, This is represented as slope consistency loss; This is represented as the gradient in the horizontal direction of the predicted digital terrain training model; This is represented as the gradient of a high-resolution digital terrain model sample in the horizontal direction; This is represented as the gradient in the vertical direction of the predicted digital terrain training model; This is represented as the gradient of a high-resolution digital terrain model sample in the vertical direction; It is represented as the horizontal gradient value of the digital terrain model Z at location (i, j), where Z represents the training model of the predicted digital terrain or a sample of the high-resolution digital terrain model. It is represented as the vertical gradient value of the digital terrain model Z at location (i, j); It is represented as the pixel width of the digital terrain model Z in the horizontal direction; It is represented as the pixel height of the digital terrain model Z in the vertical direction.

[0071] S303, calculate the average absolute value of the elevation difference between two adjacent pixels in the horizontal direction and the average absolute value of the elevation difference between two adjacent pixels in the vertical direction in the predicted digital terrain training model, and add the average absolute value of the elevation difference between the two adjacent pixels in the horizontal direction and the average absolute value of the elevation difference between the two adjacent pixels in the vertical direction to obtain the total variational regularization term.

[0072] Here, the total variational regularization term refers to the sum of the average absolute values ​​of the differences between adjacent pixels in the horizontal and vertical directions in the digital terrain prediction training model. It is a regularization index that measures the smoothness of the terrain surface. By calculating and summing the average absolute values ​​of the differences between adjacent pixels in the horizontal and vertical directions, local oscillations in the predicted terrain can be penalized, thereby suppressing unnecessary noise and checkerboard artifacts, making the reconstruction results more natural and continuous.

[0073] The total variational regularization term can be expressed as: ; in, Represented as the total variation regularization term; It represents the pixel value at the horizontally adjacent position (x+1, y) of position (x, y) in the digital terrain prediction training model; This is represented as the pixel value at position (x, y) in the predicted digital terrain training model; This is expressed as the average of the absolute values ​​of the elevation differences between all horizontally adjacent pixel pairs in the predicted digital terrain training model. It represents the pixel value at position (x, y+1) that is the vertical neighbor of position (x, y) in the training model for predicting digital terrain; This is expressed as the average of the absolute values ​​of the elevation differences between all vertically adjacent pixel pairs in the predicted digital terrain training model.

[0074] S304, the pixel-level mixing loss, the slope consistency loss, and the total variation regularization term are weighted and summed according to preset weights to obtain the subset loss.

[0075] Furthermore, after calculating the pixel-level mixing loss, slope consistency loss, and total variational regularization term, the three losses can be weighted according to preset weights, and the weighted results can be summed to obtain the subset loss of a single training data subset. The preset weights can be set to values ​​such as 1, 0.1, and 0.01, and the specific weight coefficients can be adjusted according to application requirements such as terrain complexity, noise level, or output resolution; no specific limitations are imposed here.

[0076] Here, the total loss function, which is a weighted sum of all losses, can be expressed as: ; in, Indicates the weight of slope consistency loss. This represents the weight of the total variation regularization term.

[0077] In this embodiment, a joint loss function comprising pixel-level mixing loss, slope consistency loss and total variational regularization term is constructed. This function constrains the learning process of the network from three dimensions: pixel accuracy, terrain slope structure and surface smoothness. This achieves high-quality reconstruction of high-resolution digital terrain models and improves the performance of the reconstruction results in terms of terrain edge preservation, natural slope transition and noise suppression.

[0078] In some other embodiments, during the evaluation of the multi-stage progressive reconstruction network, indicators such as mean absolute error, root mean square error, bias, coefficient of determination, and correlation coefficient can be calculated based on the difference between the predicted digital terrain training model and the high-resolution digital terrain model samples, so as to objectively evaluate the reconstruction performance of the multi-stage progressive reconstruction network.

[0079] In some possible embodiments, in order to adapt to the application requirements of different input resolutions or output accuracies, the number of multiple reconstruction stages of the multi-stage progressive reconstruction network can be increased or decreased according to the actual sampling rate. Specifically, a cascaded structure of four stages or more can be adopted. However, the internal working mechanism of the reconstruction stage is the same as the principle of progressive upsampling, optical image fusion and residual refinement proposed in this disclosure, and all of them fall within the protection scope of this disclosure.

[0080] S103, acquire high-resolution optical images and low-resolution digital surface models of the target area to be reconstructed, and input the high-resolution optical images and low-resolution digital surface models into the trained multi-stage progressive reconstruction network. Through the multiple cascaded reconstruction stages, the high-resolution optical images and low-resolution digital surface models are upsampled and the residuals are refined step by step to obtain a target resolution digital terrain model corresponding to the target area to be reconstructed.

[0081] Understandably, in the practical application of a trained multi-stage progressive reconstruction network, a high-resolution optical image of the target area to be reconstructed and a corresponding low-resolution digital surface model can be acquired first. Further, the high-resolution optical image and the low-resolution digital surface model are input into the trained multi-stage progressive reconstruction network. Internally, the network first downsamples the high-resolution optical image to generate an intermediate-scale image (20×20 pixels) matching the spatial scale of the first reconstruction stage, and a target-scale image (100×100 pixels) matching the spatial scale of the second and third reconstruction stages. Then, according to the parameters determined during training, the network performs dual-path upsampling and residual refinement on the low-resolution digital surface model in the first reconstruction stage, outputting intermediate-resolution terrain data (20×20 pixels); the second reconstruction stage performs the same dual-path upsampling and residual refinement on the intermediate-resolution terrain data, outputting an initial target-resolution digital terrain model (100×100 pixels); the third reconstruction stage fuses the initial target-resolution digital terrain model with the target-scale image and performs residual refinement again, outputting the final target-resolution digital terrain model (100×100 pixels).

[0082] In this way, through the above-mentioned step-by-step upsampling and residual refinement processing, accurate reconstruction from low-resolution digital surface model to high-resolution digital terrain model is achieved. The resulting digital terrain model has high spatial resolution and eliminates elevation interference from non-ground objects such as buildings and vegetation, and can be directly used for applications such as terrain analysis, hydrological simulation or urban planning.

[0083] The digital terrain model reconstruction method, apparatus, storage medium, and computer equipment provided in this disclosure, by constructing a multi-stage progressive reconstruction network and utilizing high-resolution optical images and low-resolution digital surface models for progressive upsampling and residual refinement, can effectively fuse optical texture information and surface model elevation information. This preserves local terrain details while suppressing the cumulative propagation of reconstruction errors. Thus, it improves the reconstruction accuracy and fidelity of terrain details, achieving stable enhancement of terrain structure at multiple resolutions and high-quality generation of high-resolution digital terrain models.

[0084] Those skilled in the art will understand that, in the above-described method of the specific implementation, the order in which each step is written does not imply a strict execution order and does not constitute any limitation on the implementation process. The specific execution order of each step should be determined by its function and possible internal logic.

[0085] Based on the same inventive concept, this disclosure also provides a digital terrain model reconstruction device corresponding to the digital terrain model reconstruction method. Since the principle of the device in this disclosure for solving the problem is similar to the digital terrain model reconstruction method described above in this disclosure, the implementation of the device can refer to the implementation of the method, and the repeated parts will not be described again.

[0086] Reference Figure 4 The diagram shown is a schematic representation of a digital terrain model reconstruction device 400 provided in an embodiment of this disclosure. The device includes: The data acquisition module 401 is used to acquire a training dataset; wherein the training dataset includes multiple training data subsets, each of the training data subsets includes a high-resolution optical training image, a low-resolution digital surface training model, and a high-resolution digital terrain model sample corresponding to the high-resolution optical training image and the low-resolution digital surface training model. The network training module 402 is used to construct a multi-stage progressive reconstruction network and train the multi-stage progressive reconstruction network using the training dataset to obtain a trained multi-stage progressive reconstruction network; wherein, the multi-stage progressive reconstruction network includes multiple cascaded reconstruction stages. The network processing module 403 is used to acquire high-resolution optical images and low-resolution digital surface models of the target area to be reconstructed, and input the high-resolution optical images and low-resolution digital surface models into the trained multi-stage progressive reconstruction network. Through the multiple cascaded reconstruction stages, the high-resolution optical images and low-resolution digital surface models are upsampled and the residuals are refined step by step to obtain a target resolution digital terrain model corresponding to the target area to be reconstructed.

[0087] In some possible embodiments, the network training module 402 is specifically used for: Construct the multi-stage progressive reconstruction network to be trained; wherein the multi-stage progressive reconstruction network includes a cascaded first reconstruction stage, a second reconstruction stage, and a third reconstruction stage; the first reconstruction stage is used to upsample the input information to an intermediate resolution and perform residual refinement, the second reconstruction stage is used to upsample the output result of the first reconstruction stage to a target resolution and perform residual refinement, and the third reconstruction stage is used to perform residual refinement on the output result of the second reconstruction stage to correct local errors; For each subset of training data, based on the high-resolution optical training image, an intermediate-scale training image matching the spatial scale of the first reconstruction stage and a target-scale training image matching the spatial scale of the second reconstruction stage are constructed. Repeat the following training process until the preset training termination condition is met: Step 1: Select a preset number of training data subsets from the training dataset, and use each selected training data subset as a training sample. All selected training data subsets together constitute a current training batch. Step 2, for each subset of training data in the current training batch, perform the following operations: The low-resolution digital surface training model and the intermediate-scale training image corresponding to the high-resolution optical training image in the training data subset are input into the first reconstruction stage to obtain intermediate-resolution terrain training data. The intermediate resolution terrain training data and the target scale training image corresponding to the high resolution optical training image are input into the second reconstruction stage to obtain the initial target resolution digital terrain training model. The initial target resolution digital terrain training model and the target scale training image are input into the third reconstruction stage to obtain a predictive digital terrain training model corresponding to the training data subset. Step 3: Calculate the subset loss between the predicted digital terrain training model corresponding to each subset of training data in the current training batch and the high-resolution digital terrain model samples in the subset of training data; and determine the total batch loss of the current training batch based on the subset loss of all subsets of training data in the current training batch. Step 4: Adjust the parameters of the multi-stage progressive reconstruction network according to the total batch loss, and return to execute Step 1 to select the next training batch to continue training; When the preset training termination condition is met, training stops, and the multi-stage progressive reconstruction network with completed training is obtained.

[0088] In some possible embodiments, the network training module 402 is specifically used for: The low-resolution digital surface training model is input into the first sampling module of the first reconstruction stage; wherein, the first sampling module includes a bilinear interpolation path and a transposed convolution residual compensation path; The low-resolution digital surface training model is subjected to bilinear interpolation upsampling using the bilinear interpolation path to output a first intermediate resolution upsampling result; and the low-resolution digital surface training model is subjected to transposed convolution operation using the transposed convolution residual compensation path to output a second intermediate resolution upsampling result. The first intermediate resolution upsampling result is added to the second intermediate resolution upsampling result to obtain the preliminary upsampled intermediate resolution terrain data; The intermediate-scale training image is stitched together with the initially upsampled intermediate-resolution terrain data to obtain a first fusion feature; and the first fusion feature is input into the first residual module of the first reconstruction stage to obtain a first terrain residual. Based on the first terrain residual and the initially upsampled intermediate resolution terrain data, the intermediate resolution terrain training data is determined.

[0089] In some possible embodiments, the network training module 402 is specifically used for: The intermediate resolution terrain training data is input into the second sampling module of the second reconstruction stage; wherein, the second sampling module includes a bilinear interpolation path and a transposed convolution residual compensation path; The bilinear interpolation path is used to perform bilinear interpolation upsampling on the intermediate resolution terrain training data to output a first target resolution upsampling result; and the transposed convolution residual compensation path is used to perform transposed convolution operation on the intermediate resolution terrain training data to output a second target resolution upsampling result. The first target resolution upsampling result is added to the second target resolution upsampling result to obtain the preliminary upsampled target resolution terrain data; The target-scale training image is stitched together with the initially upsampled target-resolution terrain data to obtain a second fusion feature; and the second fusion feature is input into the second residual module of the second reconstruction stage to obtain a second terrain residual. Based on the second terrain residual and the initially upsampled target resolution terrain data, the initial target resolution digital terrain training model is determined.

[0090] In some possible embodiments, the network training module 402 is specifically used for: The initial target resolution digital terrain training model is stitched together with the target scale training image to obtain the third fusion feature; The third fusion feature is input into the third residual module of the third reconstruction stage, and the third terrain residual is output. The predicted digital terrain training model is obtained based on the third terrain residual and the initial target resolution digital terrain training model.

[0091] In some possible embodiments, the network training module 402 is specifically used for: Calculate the L1 loss between the predicted digital terrain training model and the high-resolution digital terrain model sample, and the Smooth L1 loss between the predicted digital terrain training model and the high-resolution digital terrain model sample. Add the L1 loss and the Smooth L1 loss according to a preset weighting coefficient to obtain the pixel-level mixing loss. Calculate the first absolute error between the gradient of the predicted digital terrain training model in the horizontal direction and the gradient of the high-resolution digital terrain model sample in the horizontal direction, calculate the second absolute error between the gradient of the predicted digital terrain training model in the vertical direction and the gradient of the high-resolution digital terrain model sample in the vertical direction, and add the first absolute error and the second absolute error to obtain the slope consistency loss. In the predicted digital terrain training model, the average absolute value of the elevation difference between two adjacent pixels in the horizontal direction and the average absolute value of the elevation difference between two adjacent pixels in the vertical direction are calculated. The total variational regularization term is obtained by adding the average absolute value of the elevation difference between adjacent pixels in the horizontal direction and the average absolute value of the elevation difference between adjacent pixels in the vertical direction. The subset loss is obtained by weighting and summing the pixel-level mixing loss, the slope consistency loss, and the total variation regularization term according to preset weights.

[0092] In some possible embodiments, the pixel-level blending loss is expressed as: ; in, Represented as pixel-level blending loss; Represented as preset weighting coefficients; This is represented as a training model for predicting digital terrain. Represented as a high-resolution digital terrain model sample; This is represented as the Smooth L1 loss between the predicted digital terrain training model and the high-resolution digital terrain model samples; The slope consistency loss is expressed as: ; The gradient in the horizontal direction is defined as: ; The gradient in the vertical direction is defined as: ; For boundary pixels, the definition is: , ; in, This is represented as slope consistency loss; This is represented as the gradient in the horizontal direction of the predicted digital terrain training model; This is represented as the gradient of a high-resolution digital terrain model sample in the horizontal direction; This is represented as the gradient in the vertical direction of the predicted digital terrain training model; This is represented as the gradient of a high-resolution digital terrain model sample in the vertical direction; It is represented as the horizontal gradient value of the digital terrain model Z at location (i, j), where Z represents the training model of the predicted digital terrain or a sample of the high-resolution digital terrain model. It is represented as the vertical gradient value of the digital terrain model Z at location (i, j); It is represented as the pixel width of the digital terrain model Z in the horizontal direction; Represented as the pixel height of the digital terrain model Z in the vertical direction; The total variational regularization term is expressed as: ; in, Represented as the total variation regularization term; It represents the pixel value at the horizontally adjacent position (x+1, y) of position (x, y) in the digital terrain prediction training model; This is represented as the pixel value at position (x, y) in the predicted digital terrain training model; This is expressed as the average of the absolute values ​​of the elevation differences between all horizontally adjacent pixel pairs in the predicted digital terrain training model. It represents the pixel value at position (x, y+1) that is the vertical neighbor of position (x, y) in the training model for predicting digital terrain; This is expressed as the average of the absolute values ​​of the elevation differences between all vertically adjacent pixel pairs in the predicted digital terrain training model.

[0093] Based on the same technical concept, this disclosure also provides a computer device. (See also...) Figure 5 The diagram shows the structure of a computer device 500 provided in this embodiment of the present disclosure, including a processor 501, a memory 502, and a bus 503. The memory 502 is used to store execution instructions and includes a main memory 5021 and an external memory 5022. The main memory 5021, also called internal memory, is used to temporarily store computational data in the processor 501, as well as data exchanged with external memory 5022 such as a hard disk. The processor 501 exchanges data with the external memory 5022 through the main memory 5021.

[0094] In this embodiment, the memory 502 is specifically used to store application code that executes the solution of this application, and its execution is controlled by the processor 501. That is, when the computer device 500 is running, the processor 501 communicates with the memory 502 through the bus 503, so that the processor 501 executes the application code stored in the memory 502, and then executes the method described in any of the foregoing embodiments.

[0095] The memory 502 may be, but is not limited to, random access memory (RAM), read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), etc.

[0096] Processor 501 may be an integrated circuit chip with signal processing capabilities. The aforementioned processor can be a general-purpose processor, including a Central Processing Unit (CPU), a Network Processor (NP), etc.; it can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. It can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of this invention. The general-purpose processor can be a microprocessor or any conventional processor.

[0097] It is understood that the structures illustrated in the embodiments of this application do not constitute a specific limitation on the computer device 500. In other embodiments of this application, the computer device 500 may include more or fewer components than illustrated, or combine some components, or split some components, or have different component arrangements. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

[0098] This disclosure also provides a computer-readable storage medium storing a computer program that, when executed by a processor, performs the steps of the digital terrain model reconstruction method described in the above-described method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.

[0099] This disclosure also provides a computer program product carrying program code. The program code includes instructions that can be used to execute the steps of the digital terrain model reconstruction method described in the above method embodiments. For details, please refer to the above method embodiments, which will not be repeated here.

[0100] The aforementioned computer program product can be implemented through hardware, software, or a combination thereof. In one optional embodiment, the computer program product is specifically embodied in a computer storage medium; in another optional embodiment, the computer program product is specifically embodied in a software product, such as a software development kit (SDK), etc.

[0101] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems and devices described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here. In the several embodiments provided in this disclosure, it should be understood that the disclosed systems and methods can be implemented in other ways. The device embodiments described above are merely illustrative. For example, the division of units is only a logical functional division; in actual implementation, there may be other division methods. Furthermore, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Another point is that the displayed or discussed mutual coupling or direct coupling or communication connection may be through some communication interfaces; the indirect coupling or communication connection of devices or units may be electrical, mechanical, or other forms.

[0102] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0103] In addition, the functional units in the various embodiments of this disclosure can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit.

[0104] If the aforementioned functions are implemented as software functional units and sold or used as independent products, they can be stored in a processor-executable, non-volatile, computer-readable storage medium. Based on this understanding, the technical solution of this disclosure, in essence, or the part that contributes to the prior art, or a portion of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this disclosure. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0105] Finally, it should be noted that the above-described embodiments are merely specific implementations of this disclosure, used to illustrate the technical solutions of this disclosure, and not to limit it. The protection scope of this disclosure is not limited thereto. Although this disclosure has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that any person skilled in the art can still modify or easily conceive of changes to the technical solutions described in the foregoing embodiments, or make equivalent substitutions for some of the technical features, within the scope of the technology disclosed in this disclosure. Such modifications, changes, or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this disclosure, and should all be covered within the protection scope of this disclosure. Therefore, the protection scope of this disclosure should be determined by the protection scope of the claims.

Claims

1. A method for reconstructing a digital terrain model, characterized in that, include: Obtain a training dataset; wherein the training dataset includes multiple training data subsets, each of the training data subsets including high-resolution optical training images, low-resolution digital surface training models, and high-resolution digital terrain model samples corresponding to the high-resolution optical training images and the low-resolution digital surface training models; A multi-stage progressive reconstruction network is constructed, and the multi-stage progressive reconstruction network is trained using the training dataset to obtain a trained multi-stage progressive reconstruction network; wherein, the multi-stage progressive reconstruction network includes multiple cascaded reconstruction stages. A high-resolution optical image and a low-resolution digital surface model of the target area to be reconstructed are acquired, and the high-resolution optical image and the low-resolution digital surface model are input into the trained multi-stage progressive reconstruction network. The high-resolution optical image and the low-resolution digital surface model are upsampled and the residual is refined step by step through the multiple cascaded reconstruction stages to obtain a target resolution digital terrain model corresponding to the target area to be reconstructed.

2. The method according to claim 1, characterized in that, The construction of the multi-stage progressive reconstruction network and the training of the multi-stage progressive reconstruction network using the training dataset include: Construct the multi-stage progressive reconstruction network to be trained; wherein the multi-stage progressive reconstruction network includes a cascaded first reconstruction stage, a second reconstruction stage, and a third reconstruction stage; the first reconstruction stage is used to upsample the input information to an intermediate resolution and perform residual refinement, the second reconstruction stage is used to upsample the output result of the first reconstruction stage to a target resolution and perform residual refinement, and the third reconstruction stage is used to perform residual refinement on the output result of the second reconstruction stage to correct local errors; For each subset of training data, based on the high-resolution optical training image, an intermediate-scale training image matching the spatial scale of the first reconstruction stage and a target-scale training image matching the spatial scale of the second reconstruction stage are constructed. Repeat the following training process until the preset training termination condition is met: Step 1: Select a preset number of training data subsets from the training dataset, and use each selected training data subset as a training sample. All selected training data subsets together constitute a current training batch. Step 2, for each subset of training data in the current training batch, perform the following operations: The low-resolution digital surface training model and the intermediate-scale training image corresponding to the high-resolution optical training image in the training data subset are input into the first reconstruction stage to obtain intermediate-resolution terrain training data. The intermediate resolution terrain training data and the target scale training image corresponding to the high resolution optical training image are input into the second reconstruction stage to obtain the initial target resolution digital terrain training model. The initial target resolution digital terrain training model and the target scale training image are input into the third reconstruction stage to obtain a predictive digital terrain training model corresponding to the training data subset. Step 3: Calculate the subset loss between the predicted digital terrain training model corresponding to each subset of training data in the current training batch and the high-resolution digital terrain model samples in the subset of training data; and determine the total batch loss of the current training batch based on the subset loss of all subsets of training data in the current training batch. Step 4: Adjust the parameters of the multi-stage progressive reconstruction network according to the total batch loss, and return to execute Step 1 to select the next training batch to continue training; When the preset training termination condition is met, training stops, and the multi-stage progressive reconstruction network with completed training is obtained.

3. The method according to claim 2, characterized in that, The step of inputting the low-resolution digital surface training model and the intermediate-scale training image corresponding to the high-resolution optical training image from the training data subset into the first reconstruction stage includes: The low-resolution digital surface training model is input into the first sampling module of the first reconstruction stage; wherein, the first sampling module includes a bilinear interpolation path and a transposed convolution residual compensation path; The low-resolution digital surface training model is subjected to bilinear interpolation upsampling using the bilinear interpolation path to output a first intermediate resolution upsampling result; and the low-resolution digital surface training model is subjected to transposed convolution operation using the transposed convolution residual compensation path to output a second intermediate resolution upsampling result. The first intermediate resolution upsampling result is added to the second intermediate resolution upsampling result to obtain the preliminary upsampled intermediate resolution terrain data; The intermediate-scale training image is stitched together with the initially upsampled intermediate-resolution terrain data to obtain a first fusion feature; and the first fusion feature is input into the first residual module of the first reconstruction stage to obtain a first terrain residual. Based on the first terrain residual and the initially upsampled intermediate resolution terrain data, the intermediate resolution terrain training data is determined.

4. The method of claim 2, wherein, The step of inputting the intermediate-resolution terrain training data and the target-scale training image corresponding to the high-resolution optical training image into the second reconstruction stage includes: The intermediate resolution terrain training data is input into the second sampling module of the second reconstruction stage; wherein, the second sampling module includes a bilinear interpolation path and a transposed convolution residual compensation path; The bilinear interpolation path is used to perform bilinear interpolation upsampling on the intermediate resolution terrain training data to output a first target resolution upsampling result; and the transposed convolution residual compensation path is used to perform transposed convolution operation on the intermediate resolution terrain training data to output a second target resolution upsampling result. The first target resolution upsampling result is added to the second target resolution upsampling result to obtain the preliminary upsampled target resolution terrain data; The target-scale training image is stitched together with the initially upsampled target-resolution terrain data to obtain a second fusion feature; and the second fusion feature is input into the second residual module of the second reconstruction stage to obtain a second terrain residual. Based on the second terrain residual and the initially upsampled target resolution terrain data, the initial target resolution digital terrain training model is determined.

5. The method of claim 2, wherein, The step of inputting the initial target resolution digital terrain training model and the target scale training image into the third reconstruction stage includes: The initial target resolution digital terrain training model is stitched together with the target scale training image to obtain the third fusion feature; The third fusion feature is input into the third residual module of the third reconstruction stage, and the third terrain residual is output. The predicted digital terrain training model is obtained based on the third terrain residual and the initial target resolution digital terrain training model.

6. The method according to claim 2, characterized in that, The calculation of the subset loss between the predicted digital terrain training model corresponding to each subset of training data in the current training batch and the high-resolution digital terrain model samples in the training data subset includes: Calculate the L1 loss between the predicted digital terrain training model and the high-resolution digital terrain model sample, and the Smooth L1 loss between the predicted digital terrain training model and the high-resolution digital terrain model sample. Add the L1 loss and the Smooth L1 loss according to a preset weighting coefficient to obtain the pixel-level mixing loss. Calculate the first absolute error between the gradient of the predicted digital terrain training model in the horizontal direction and the gradient of the high-resolution digital terrain model sample in the horizontal direction, calculate the second absolute error between the gradient of the predicted digital terrain training model in the vertical direction and the gradient of the high-resolution digital terrain model sample in the vertical direction, and add the first absolute error and the second absolute error to obtain the slope consistency loss. In the predicted digital terrain training model, the average absolute value of the elevation difference between two adjacent pixels in the horizontal direction and the average absolute value of the elevation difference between two adjacent pixels in the vertical direction are calculated. The total variational regularization term is obtained by adding the average absolute value of the elevation difference between adjacent pixels in the horizontal direction and the average absolute value of the elevation difference between adjacent pixels in the vertical direction. The subset loss is obtained by weighting and summing the pixel-level mixing loss, the slope consistency loss, and the total variation regularization term according to preset weights.

7. The method according to claim 6, characterized in that, The pixel-level blending loss is expressed as: ; in, Represented as pixel-level blending loss; Represented as preset weighting coefficients; This is represented as a training model for predicting digital terrain. Represented as a high-resolution digital terrain model sample; This is represented as the Smooth L1 loss between the predicted digital terrain training model and the high-resolution digital terrain model samples; The slope consistency loss is expressed as: ; The gradient in the horizontal direction is defined as: ; The gradient in the vertical direction is defined as: ; For boundary pixels, the definition is: , ; in, This is represented as slope consistency loss; This is represented as the gradient in the horizontal direction of the predicted digital terrain training model; This is represented as the gradient of a high-resolution digital terrain model sample in the horizontal direction; This is represented as the gradient in the vertical direction of the predicted digital terrain training model; This is represented as the gradient of a high-resolution digital terrain model sample in the vertical direction; It is represented as the horizontal gradient value of the digital terrain model Z at location (i, j), where Z represents the training model of the predicted digital terrain or a sample of the high-resolution digital terrain model. It is represented as the vertical gradient value of the digital terrain model Z at location (i, j); It is represented as the pixel width of the digital terrain model Z in the horizontal direction; Represented as the pixel height of the digital terrain model Z in the vertical direction; The total variational regularization term is expressed as: ; in, Represented as the total variation regularization term; It represents the pixel value at the horizontally adjacent position (x+1, y) of position (x, y) in the digital terrain prediction training model; This is represented as the pixel value at position (x, y) in the predicted digital terrain training model; This is expressed as the average of the absolute values ​​of the elevation differences between all horizontally adjacent pixel pairs in the predicted digital terrain training model. It represents the pixel value at position (x, y+1) that is the vertical neighbor of position (x, y) in the training model for predicting digital terrain; This is expressed as the average of the absolute values ​​of the elevation differences between all vertically adjacent pixel pairs in the predicted digital terrain training model.

8. A digital terrain model reconstruction device, characterized in that, include: The data acquisition module is used to acquire a training dataset; wherein the training dataset includes multiple training data subsets, each of the training data subsets includes a high-resolution optical training image, a low-resolution digital surface training model, and a high-resolution digital terrain model sample corresponding to the high-resolution optical training image and the low-resolution digital surface training model. A network training module is used to construct a multi-stage progressive reconstruction network and train the multi-stage progressive reconstruction network using the training dataset to obtain a trained multi-stage progressive reconstruction network; wherein, the multi-stage progressive reconstruction network includes multiple cascaded reconstruction stages. The network processing module is used to acquire high-resolution optical images and low-resolution digital surface models of the target area to be reconstructed, and input the high-resolution optical images and low-resolution digital surface models into the trained multi-stage progressive reconstruction network. Through the multiple cascaded reconstruction stages, the high-resolution optical images and low-resolution digital surface models are upsampled and the residuals are refined step by step to obtain a target resolution digital terrain model corresponding to the target area to be reconstructed.

9. A storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the method of any one of claims 1 to 7.

10. A computer device, comprising a storage medium, a processor, and a computer program stored on the storage medium and executable on the processor, characterized in that, When the processor executes the computer program, it implements the method of any one of claims 1 to 7.