A physically-guided spatial deep learning forest canopy height inversion method
By employing a physics-guided spatial deep learning approach, combined with RVOG three-stage inversion and spatial convolutional networks, the instability and discontinuity issues in PolInSAR forest height inversion were resolved, resulting in higher accuracy and more stable forest canopy height mapping.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SOUTHWEST FORESTRY UNIVERSITY
- Filing Date
- 2026-04-07
- Publication Date
- 2026-06-16
AI Technical Summary
Existing PolInSAR forest height inversion methods are unstable under complex scattering and coherent mass inhomogeneity conditions, and are prone to producing strip/block artifacts. Deep learning methods lack physical constraints, resulting in insufficient generalization and discontinuity in the whole image.
A physics-guided spatial deep learning approach is adopted, using the initial canopy height obtained through RVOG three-stage inversion as a physical anchor point. This is combined with a spatial convolutional neural network for end-to-end prediction, and confidence maps and joint loss functions are constructed for training. Seamless inference is achieved by fusing EMA parameters with Hann weighted sliding windows.
It improves the accuracy and continuity of forest canopy height inversion, reduces strip/block artifacts and boundary splicing effects, and enhances the robustness of the model in low coherence/low geometry sensitivity regions.
Smart Images

Figure CN122017841B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of forest remote sensing inversion technology, and more specifically to a physical-guided spatial deep learning method for inverting forest canopy height. Background Technology
[0002] Forest canopy height is a crucial parameter reflecting the vertical structure of forests and is closely related to forest biomass estimation, carbon storage assessment, forest health monitoring, and management. Compared to optical remote sensing, which is susceptible to cloud cover and saturation, synthetic aperture radar (SAR) offers all-weather, all-day imaging capabilities. Polarimetric Interferometric Synthetic Aperture Radar (PolInSAR), in particular, combines polarimetric and interferometric information, characterizing both differences in scattering mechanisms and the height information of scatterers. Therefore, it holds significant potential for application in regional-scale forest height retrieval.
[0003] In existing PolInSAR forest height inversion methods, the random volume scattering-surface scattering hybrid model (RVoG) and its three-stage inversion method are widely used. These methods rely on fitting the complex coherence distribution pattern and separating the ground phase / volume coherence, further solving for forest height using lookup tables and other methods. However, in practical applications, forest scattering conditions are complex and spatially heterogeneous. Observed complex coherence is often affected by multiple factors, such as variations in coherence quality with land cover, water content, and baseline settings. In some areas, decoherence enhancement and insufficient assumptions about volume scattering dominance can also occur, leading to systematic biases in pure physical inversion in certain scenarios, accompanied by banded or granular artifacts. Furthermore, when the vertical wavenumber... When the height is small or the geometric sensitivity is insufficient, the model's response to changes in height is weakened, and the inversion results are unstable or prone to height saturation and error amplification.
[0004] On the other hand, in recent years, supervised regression methods based on deep learning have been able to learn the statistical relationship between features and height using sample data, which can improve the local errors of traditional models under certain conditions. However, these methods are often highly dependent on the distribution of training data: when scene type, imaging geometry, or coherence quality changes, the model is prone to a decline in generalization; furthermore, when direct pixel-by-pixel regression is performed or physical constraints are lacking, the prediction results may produce problems such as blocky jumps, texture discontinuities, or high-value compression in low-sensitivity, low-coherence regions, affecting the overall map quality and usability. Especially in applications that require the output of continuous canopy height products, stitching inference, noise propagation, and boundary effects may further amplify spatial inconsistencies.
[0005] Therefore, there is an urgent need for a forest height inversion method that takes into account both physical interpretability and data-driven advantages: on the one hand, it should use the inversion results of the physical model to provide reliable prior anchoring and constrain the physical rationality of the network output; on the other hand, it should introduce spatial modeling and confidence modulation mechanisms to suppress artifacts and improve spatial continuity under the condition of uneven coherence quality, so as to obtain forest canopy height mapping results with higher accuracy and more stable whole map. Summary of the Invention
[0006] The purpose of this invention is to provide a physics-guided spatial deep learning method for inverting forest canopy height, in order to at least solve the technical problems of existing PolInSAR / RVoG physical models, such as instability in inversion under complex scattering and coherence quality inhomogeneity, easy generation of strip / block artifacts, and insufficient generalization and discontinuity of the whole image due to the lack of physical constraints in deep learning methods.
[0007] To achieve the above objectives, this invention provides a physically guided spatial deep learning method for retrieving forest canopy height, the method comprising:
[0008] Acquire multi-baseline fully polarimetric interferometric synthetic aperture radar (PolInSAR) data, estimate complex coherence observations and calculate vertical wavenumber based on the PolInSAR data;
[0009] Based on the complex coherence observations and the vertical wavenumber, a three-stage inversion based on the random volume scattering-surface scattering hybrid model RVOG is performed to separate the ground phase and invert the initial canopy height, while obtaining the volume scattering-dominated complex coherence.
[0010] Based on the volume scattering-dominated complex coherence, the ground phase, the vertical wavenumber, and the initial canopy height, a multi-channel feature map containing PolInSAR feature channels and physical anchor point channels is constructed, and the multi-channel feature map is patched and cropped to obtain feature patches.
[0011] Using the reference height label formed by LiDAR-RH100 as supervision information, the feature patch is input into the spatial convolutional neural network for end-to-end prediction. A confidence map is constructed based on coherent information, and the spatial convolutional neural network is trained by using a joint loss function consisting of confidence modulation supervision loss, low confidence total variation smoothing regularization and low frequency consistency constraint to obtain the canopy height inversion model.
[0012] The multi-channel feature map corresponding to the region to be tested is input into the trained canopy height inversion model. The Sigmoid-Scaling mapping is performed on the model output, and the network with the exponential moving average (EMA) parameter version is fused with the Hann weighted sliding window for seamless inference, outputting the whole map forest canopy height product.
[0013] Optionally, estimating complex coherence observations based on the PolInSAR data includes:
[0014] Registration and interferometry were performed using the main and auxiliary images in the PolInSAR data.
[0015] Complex coherence estimation is performed on the main image and the auxiliary image within a preset window to obtain the complex coherence observation γ;
[0016] The complex coherent observation γ is decomposed to output the real part real(γ), imaginary part imag(γ), and coherence amplitude |γ| of the complex coherence;
[0017] The real part real(γ), imaginary part imag(γ), and coherence amplitude |γ| of the complex coherence are used as input data for subsequent volume scattering-dominated complex coherence screening, confidence map construction, and multi-channel feature map construction.
[0018] Optionally, calculate the vertical wavenumber, including:
[0019] Read the incident angle, slant range, wavelength, vertical baseline, and data acquisition mode information corresponding to the PolInSAR data;
[0020] Calculate the vertical wavenumber kz based on the geometric relationship of interferometric imaging;
[0021] The amplitude |kz| and its sign term of the vertical wavenumber kz are extracted to characterize the geometric sensitivity of canopy height inversion;
[0022] The vertical wavenumber kz, amplitude |kz|, and sign term are used as data inputs for subsequent multi-channel feature map construction and confidence map construction.
[0023] Optionally, perform a three-stage inversion based on the random volume scattering-surface scattering hybrid model RVOG, including:
[0024] The observed complex coherence is fitted with a coherence line, and the intersection of the fitted line and the unit circle is obtained to obtain the candidate ground phase.
[0025] The candidate ground phases are discriminated to determine the true ground phase φ0, and the volume scattering-dominated complex coherence γ is determined accordingly.
[0026] Based on the volume scattering-dominated complex coherence γ, and using a two-dimensional lookup table (LUT) for search and matching, the initial canopy height is obtained through inversion. ;
[0027] The ground phase φ0, the volume scattering dominant complex coherence γ, and the initial canopy height are used. The initial canopy height serves as input data for subsequent multi-channel feature map construction. It serves as a physical anchor point to constrain the subsequent spatial deep learning inversion process.
[0028] Optionally, a multi-channel feature map containing PolInSAR feature channels and physical anchor point channels is constructed, including:
[0029] The real and imaginary parts and coherence amplitude of the volume scattering-dominated complex coherent γ, the ground phase φ0, the vertical wavenumber kz, and the normalized physical anchor channel are included. The images are then stitched together to form a multi-channel feature map.
[0030] The multi-channel feature map is slid-cropped according to the preset patch size and sliding step size to obtain the input patch for training and inference;
[0031] The reference height labels formed by the LiDAR-RH100 aligned with the input patch space are simultaneously cropped to form the label patches required for supervised training.
[0032] The input patch and label patch are fed together into the subsequent training process of the spatial convolutional neural network.
[0033] Optionally, the spatial convolutional neural network is a spatial U-Net with an encoder-decoder structure; inputting the feature patch into the spatial convolutional neural network for end-to-end prediction includes:
[0034] Multi-scale spatial feature extraction is performed on the feature patch using a downsampling encoder;
[0035] The spatial resolution of the multi-scale spatial features is restored using an upsampling decoder.
[0036] By using skip connections, shallow detailed information is fused with deep semantic information;
[0037] The output is the canopy height prediction logits at the same scale as the input patch, which serves as the input for subsequent physical extent mapping and loss function calculation.
[0038] Optionally, a Sigmoid-Scaling mapping is performed on the model output, with the following expression:
[0039] ;
[0040] In the formula, This represents the predicted canopy height after Sigmoid-Scaling mapping; Indicates the preset lower limit of canopy height; Indicates the preset upper limit of canopy height; represents the Sigmoid function; z represents the logits output of the spatial convolutional neural network.
[0041] Optionally, a confidence graph can be constructed based on coherent information, including:
[0042] Confidence maps conf are generated based on coherence amplitude |γ| and geometric sensitivity |kz|, so that regions with higher coherence quality and larger |kz| have higher confidence.
[0043] Based on the confidence graph conf, the supervision term and regularization term in the training loss are weighted and modulated.
[0044] The confidence map conf is used as the weight input when calculating the joint loss function, so as to achieve the training objective of preserving details in high-confidence regions and enhancing stability in low-confidence regions.
[0045] Optionally, the joint loss function includes a confidence modulation supervision loss term, a low-confidence region total variation smoothing regularization term, a low-frequency consistency constraint term, and a weighted sum of the three, specifically:
[0046] The confidence modulation supervision loss term is expressed as follows:
[0047] ;
[0048] in, This represents the confidence modulation supervision loss term; This represents the set of valid supervised cells within patch p. The number of pixels; The set of pixels within patch p that have valid labels and valid SAR; q represents a pixel; The value represents the confidence weight obtained from conf at pixel q; Huber(·) represents the Huber loss function; This represents the network prediction height at pixel q; Indicates the reference height at pixel q;
[0049] The smoothing regularization term for the total variation in the low-confidence region is expressed as follows:
[0050] ;
[0051] in, represents the total variation smoothing regularization term for low-confidence regions; q represents a pixel; W(q) represents the confidence weight at pixel q; This represents the network prediction height at pixel q; This represents the gradient operator in the x-direction; This represents the gradient operator in the y-direction;
[0052] The low-frequency consistency constraint term is expressed as follows:
[0053] ;
[0054] in, LPF(·) represents the low-frequency consistency constraint term; LPF(·) represents the low-pass operator. This represents the prediction result after the first random augmentation of the same input patch; This represents the prediction result after the same input patch has undergone a second random augmentation; Represents the L1 norm;
[0055] The joint loss function is expressed as follows:
[0056] ;
[0057] Where L represents the joint loss function; This represents the confidence modulation supervision loss term; This represents the smoothing regularization term for the total variation in the low-confidence region; This represents a low-frequency consistency constraint term; This represents the weighting coefficients corresponding to the smoothing regularization term of the total variation in the low-confidence region. This represents the weighting coefficient corresponding to the low-frequency consistency constraint term.
[0058] Optionally, a network using the exponential moving average (EMA) parameter version is fused with a Hann-weighted sliding window for seamless inference, specifically including:
[0059] During training, an exponential moving average (EMA) version of the spatial convolutional neural network parameters is maintained, and during the inference phase, the network with the EMA parameter version is called to predict overlapping sliding window patches.
[0060] Apply a Hann weighted window to the prediction results of each sliding window patch, and perform weighted accumulation and normalization fusion on the overlapping regions;
[0061] The fused whole image prediction result is obtained using the following expression:
[0062] ;
[0063] in, The image fusion prediction height is represented at position r; r represents the spatial position in the entire image; i represents the sliding window number. This represents the predicted value of the i-th sliding window at position r; This represents the Hann weight corresponding to the i-th sliding window at position r;
[0064] Output forest canopy height products for continuous whole images to suppress window boundary stitching artifacts and improve spatial consistency.
[0065] Beneficial effects: Through the above technical solution, the present invention proposes a physically guided spatial deep learning method for forest canopy height inversion, using the initial canopy height obtained from the three-stage inversion of RVOG (…). This invention uses a spatial anchor point to impose physical constraints on the spatial deep learning inversion process, thereby enhancing the physical interpretability of the results and reducing the risk of height drift in low-sensitivity areas. Simultaneously, the invention introduces a spatial convolutional network to perform pixel-level regression using neighborhood context information, which helps suppress spatial artifacts such as stripes and blocks, improving the spatial consistency and continuity of canopy height results in stand boundaries and transition areas. Furthermore, during the training phase, this invention constructs a confidence map based on coherence information and employs a joint loss function consisting of confidence modulation supervision loss, total variation smoothing regularization for low-confidence regions, and low-frequency consistency constraints, enabling the model to maintain good robustness and generalization ability in low-coherence and low-geometric-sensitivity regions. Further, during the inference phase, an exponential moving average (EMA) parameter version of the network is used for prediction, and seamless stitching is achieved through Hann weighted sliding window fusion, effectively reducing the window boundary stitching effect and outputting a continuous whole map of forest canopy height products. Therefore, this invention can significantly improve the accuracy and continuity of forest canopy height inversion, and has good engineering application value.
[0066] Other features and advantages of the embodiments of the present invention will be described in detail in the following detailed description section. Attached Figure Description
[0067] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0068] Figure 1 This is a schematic diagram illustrating the process of implementing the method of the present invention.
[0069] Figure 2 Physical anchor points are generated for the RVOG three-stage inversion used in this invention. Schematic diagram ( Figure 2 (a) is a schematic diagram of obtaining two potential ground phases by fitting the coherence line. Figure 2 (b) is a schematic diagram of the actual phase.
[0070] Figure 3 This is a schematic diagram of the CPASR-Net (Physical Prior Augmentation Space U-Net) network structure.
[0071] Figure 4 This is a schematic diagram of the confidence modulation joint loss.
[0072] Figure 5 This is a schematic diagram illustrating the fusion inference of EMA parameters and Hann weighted sliding window.
[0073] Figure 6 A diagram illustrating the local comparison effect. Detailed Implementation
[0074] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0075] The specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are for illustration and explanation only and are not intended to limit the present invention.
[0076] like Figure 1 As shown, this invention provides a physically guided spatial deep learning method for inverting forest canopy height, the method comprising:
[0077] S1: Acquire multi-baseline fully polarimetric interferometric synthetic aperture radar (PolInSAR) data, estimate the complex coherence observation γ based on the PolInSAR data, and calculate the vertical wavenumber. ;
[0078] S2: Three-stage inversion based on the random volume scattering-surface scattering hybrid model RVOG to separate the ground phase and obtain the initial canopy height. Simultaneously, bulk scattering-dominated complex coherent γ is obtained;
[0079] S3: Construct a channel containing PolInSAR feature channels and physical anchor points The multi-channel feature map is obtained, and the multi-channel feature map is patched and cropped to obtain feature patches;
[0080] S4: Using LiDAR-RH100 as the dependent variable, the feature patch is input into a spatial convolutional neural network for end-to-end prediction to obtain the predicted canopy height logits within the patch.
[0081] S5: Perform Sigmoid-Scaling mapping on the logits to limit the output height to a preset physical range;
[0082] S6: During the training phase, a confidence graph is constructed based on coherent information, and the spatial convolutional neural network is trained by using a joint loss function consisting of confidence modulation supervision loss, low confidence total variation smoothing regularization, and low frequency consistency constraint.
[0083] S7: In the inference stage, the exponential moving average (EMA) parameter and the Hann weighted sliding window are used to perform seamless inference on the entire image and output the forest canopy height product for the whole map.
[0084] In a preferred embodiment, calculating the complex coherence observation γ in step S1 specifically includes: registering and interferometrically processing the main image and auxiliary image (or multi-baseline interferometric image pair) in the PolInSAR data, estimating the complex coherence γ within a preset window, and outputting the real part of the complex coherence. virtual part and coherence amplitude .
[0085] In a preferred embodiment, the vertical wavenumber is calculated in step S1. Specifically, this includes: using the incident angle, slant range, wavelength, vertical baseline, and data acquisition mode information of the PolInSAR data, calculating the vertical wavenumber based on the interferometric imaging geometry. , and obtain | |and The symbolic term is used to characterize the geometric sensitivity of canopy height.
[0086] In a preferred embodiment, the three-stage RVOG inversion process in step S2 specifically includes: fitting a coherence line to the observed complex coherence, obtaining the intersection of the fitted line and the unit circle to obtain the candidate ground phase; identifying the true ground phase and determining the volume scattering-dominated complex coherence γ; and, based on this, inverting the initial canopy height using a two-dimensional lookup table (LUT). As a physical anchor point.
[0087] In a preferred embodiment, the construction of a multi-channel feature map in step S3 specifically includes: converting the real and imaginary parts of the volume scattering-dominated complex coherence γ and the coherence amplitude, ground phase φ0, and... and the normalized physical anchor point channel The feature maps are stitched together to form a multi-channel feature map; the multi-channel feature map is then slid-cropped according to a preset patch size and step size to obtain an input patch for training and inference.
[0088] In a preferred embodiment, the spatial convolutional network in step S4 is a spatial U-Net with an encoder-decoder structure (see reference). Figure 3Specifically, this includes: extracting multi-scale spatial features through a downsampling encoder, restoring spatial resolution through an upsampling decoder, and fusing shallow detail information and deep semantic information using skip connections to output canopy height prediction logits at the same scale as the input patch.
[0089] In a preferred embodiment, the expression for the Sigmoid-Scaling mapping in step S5 is:
[0090]
[0091] Where z is the logits output by the spatial convolutional network. For the Sigmoid function, min and `max` represents the preset lower and upper limits for canopy height, respectively.
[0092] In a preferred embodiment, step S, constructing the confidence graph conf, specifically includes: based on coherence amplitude... With geometric sensitivity | |Generate confidence graph (conf) to improve coherence quality and| Larger regions have higher confidence, and the confidence map is used to modulate the weights of the supervision and regularization terms in the training loss.
[0093] In a preferred embodiment, the joint loss function in step S6 specifically includes: a confidence modulation supervision loss term, a low-confidence region total variation smoothing regularization term, and a low-frequency consistency constraint term; wherein, the low-confidence region total variation smoothing regularization term is configured according to... Weighting is used to enhance the smoothness of the low-confidence region. The low-frequency consistency constraint term applies a consistency constraint to the low-pass components of the two prediction results obtained by the same input patch through two random enhancements. The joint loss is obtained by weighting and summing the above terms with weight coefficients.
[0094] In a preferred embodiment, the implementation of the low-frequency consistency constraint in step S6 specifically includes: performing two random enhancements on the same input patch to obtain two prediction results, and calculating the difference between the two prediction results after low-pass processing as a consistency constraint, so as to stabilize the predicted low-frequency structure and reduce block drift.
[0095] In a preferred embodiment, seamless inference in S7 specifically includes: maintaining an exponential moving average (EMA) version of the network parameters during training, using the EMA parameter version for sliding window prediction during the inference phase; multiplying each sliding window prediction result by a Hann weight window, and performing weighted accumulation and normalization fusion on overlapping regions to suppress window boundary stitching artifacts and output a continuous whole image height product.
[0096] In this embodiment, a physical-guided spatial deep learning method for forest canopy height inversion is proposed. Based on multi-baseline fully polarimetric interferometric synthetic aperture radar (PolInSAR) data, the initial canopy height is first obtained through a three-stage inversion using a random volume scattering-surface scattering hybrid model RVOG. ) as physical anchor point and calculate vertical wavenumber ( Geometric sensitivity parameters such as PolInSAR feature channels and physical anchor point channels are then constructed. The patched input is used with LiDAR-RH100 as the dependent variable. A spatial convolutional network is used for end-to-end prediction, and the output height is limited to a preset physical range by Sigmoid-Scaling. During the training phase, a confidence map is constructed based on coherent information and a confidence modulation joint loss is introduced to enhance the robustness of low coherence / low sensitivity regions. During the inference phase, EMA parameters and Hann weighted sliding window fusion are used to achieve seamless mapping of the whole map, which can effectively reduce strip / block artifacts and boundary stitching effects, and improve the accuracy of forest canopy height inversion and the continuity of the whole map.
[0097] To more clearly explain this application, a specific example of a physically guided spatial deep learning method for retrieving forest canopy height is provided below, which includes the following steps:
[0098] Step 1: Complex coherence γ and vertical wavenumber Calculation of (corresponding to S1);
[0099] Acquire multi-baseline PolInSAR data, perform registration and interferometric processing on master and slave images, and estimate complex coherence observations within a preset window. Complex coherence can be represented in complex form, and its real part... virtual part and amplitude It can be directly used as a subsequent feature channel, among which Used to characterize coherence quality and availability.
[0100] Simultaneously, the vertical wavenumber is calculated based on imaging geometric parameters (incident angle, slant range, wavelength, vertical baseline, and data acquisition mode, etc.). This is used to characterize the geometric sensitivity of canopy height inversion. In practical applications, when | When the sensitivity is low, both model inversion and learned regression are more prone to instability. | As a key input for confidence and features.
[0101] Step 2: RVOG Three-Stage Inversion and Physical Anchor Points Generate (corresponding to S2, such as) Figure 2 );
[0102] A three-stage inversion of the observed complex coherence is performed based on the coherent scattering model RCoG. First, a coherence line is fitted and candidate ground phases are obtained. Then, the true phase φ0 is determined and the volume scattering-dominant complex coherence γ is selected. Finally, the initial canopy height is obtained by searching and inverting using the model and a two-dimensional lookup table (LUT). .
[0103] Should As a "physical anchor," its role is to provide an initial height reference derived from the physical model, so that subsequent deep learning predictions will not deviate from the physically reasonable range, especially in low coherence or low sensitivity regions, it can significantly reduce height drift and outliers.
[0104] Step 3: Multi-channel feature construction and patching of input (corresponding to S3);
[0105] PolInSAR observations and physical quantities are used to construct a multi-channel feature map. Preferably, the multi-channel feature map includes at least the real part of the volume scattering-dominated complex coherent γ. virtual part , Earth phase φ0 and and the normalized physical anchor point channel .
[0106] The multi-channel feature map is then sliced using a sliding method according to a preset patch size and stride to obtain the input patch. The advantage of patching is that the network can use neighborhood context to suppress strip / block artifacts and provide a consistent data organization for subsequent seamless sliding window inference.
[0107] Step 4: Spatial convolutional network pixel-level regression (corresponding to S4, such as...) Figure 3 );
[0108] The input patch is fed into a spatial convolutional network for pixel-level regression, and the output is the predicted canopy height logits within the patch. Preferably, the network adopts an encoder-decoder structure (spatial U-Net), which extracts multi-scale features through downsampling, restores spatial resolution through upsampling, and fuses shallow details and deep semantic information through skip connections, thereby simultaneously taking into account boundary details and regional continuity.
[0109] Step 5: Sigmoid-Scaling physical range mapping (corresponding to S5, such as...) Figure 3 );
[0110] To ensure the prediction results meet physical feasibility requirements, a sigmoid-scaling mapping is applied to logits to limit the output height to a preset range. :
[0111] ;
[0112] This mapping avoids abnormally high values caused by unbounded network output and plays a stabilizing role in both the training and inference phases.
[0113] Step 6: Training the joint confidence-modulation loss (corresponding to S6, such as...) Figure 4 );
[0114] During the training phase, a confidence graph conf (also denoted as W) is first constructed, preferably composed of coherent amplitudes. With geometric sensitivity The combined results lead to higher coherence quality and Larger regions have higher confidence levels. The confidence map is used to weight different loss terms to achieve the training objective of "preserving details in high confidence regions and stabilizing the overall structure in low confidence regions".
[0115] Based on this, a confidence modulation joint loss function is constructed to train the network. The joint loss includes at least three parts:
[0116] Confidence modulation supervision loss term: Only the set of SAR-valid pixels within patch p with valid labels. The above is supervised and modulated by confidence weights, with Huber loss being the preferred method:
[0117] ;
[0118] Where q represents a pixel. For network height prediction, For reference height (e.g., CHM / LiDAR derived height). The confidence weight is obtained from conf.
[0119] TV smoothing regularization term for low-confidence regions: Considering that stripes and noise are more likely to appear in low-sensitivity / low-confidence regions, the total variation term is weighted by (1−W) to enhance the smoothness of low-confidence regions and suppress block jumps and strip artifacts.
[0120] ;
[0121] Low-frequency consistency constraint: Two predictions are obtained by performing two random augmentations on the same input patch. Furthermore, consistency constraints are imposed on its low-frequency components to stabilize the overall low-frequency structure and reduce inter-patch offset and overall drift.
[0122] ;
[0123] in, It is a low-pass operator, preferably a Gaussian low-pass, mean low-pass, downsampling-upsampling, or frequency domain low-frequency truncation, but not limited to these.
[0124] The overall objective function is a weighted sum of three terms:
[0125] ;
[0126] in, , This is a weighting coefficient. The preferred value can be set according to the training stability and artifact suppression effect, but it is not limited to a fixed value.
[0127] Step 7: Seamless inference using EMA+Hann weighted sliding window fusion (corresponding to S7, e.g.) Figure 5 );
[0128] To improve inference stability and reduce fluctuations caused by training noise, an exponential moving average (EMA) version of the network parameters is maintained during training. The reasoning phase uses Forward computation is performed. Prediction is made on the entire image using a sliding window overlay method, and weighted fusion is performed in the overlapping areas using a Hann weighted window to achieve seamless continuous mapping.
[0129] Specifically, let the first A sliding window in position The prediction is The corresponding Hann weights are The predicted image after fusion is:
[0130] ;
[0131] By using the above-mentioned Hann weighted normalization fusion, window boundary and block stitching artifacts can be significantly reduced, and continuous forest canopy height products can be output, which are suitable for large-scale mapping tasks.
[0132] In summary, the physically guided spatial deep learning method for forest canopy height inversion proposed in this specific example obtains physical anchor points through a three-stage RVOG inversion. By combining context modeling and confidence modulation joint loss training of spatial convolutional networks, and fusing EMA parameters with Hann weighted sliding windows, seamless inference output of the whole-image product is achieved, such as... Figure 6 The first row represents the reference true value from LiDAR-CHM, the second row represents the inversion height from RVOG, and the third row represents the height inverted by the method of this invention. This method can effectively reduce strip / block artifacts and splicing boundary effects, and improve the accuracy of forest canopy height inversion and the continuity of the entire map.
[0133] Those skilled in the art will understand that all or part of the steps in the methods of the above embodiments can be implemented by a program instructing related hardware. This program is stored in a storage medium and includes several instructions to cause a microcontroller, chip, or processor to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as a USB flash drive, a portable hard drive, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk.
[0134] The optional embodiments of the present invention have been described in detail above with reference to the accompanying drawings. However, the embodiments of the present invention are not limited to the specific details described above. Within the scope of the technical concept of the embodiments of the present invention, various simple modifications can be made to the technical solutions of the embodiments of the present invention, and these simple modifications all fall within the protection scope of the embodiments of the present invention. It should also be noted that the various specific technical features described in the above specific embodiments can be combined in any suitable manner without contradiction. To avoid unnecessary repetition, the embodiments of the present invention will not further describe the various possible combinations.
[0135] Finally, it should be noted that the above embodiments are only used to illustrate and not limit the technical solutions of the present invention. Although the present invention has been described in detail with reference to the above embodiments, those skilled in the art should understand that modifications or equivalent substitutions can still be made to the present invention without departing from the spirit and scope of the present invention. Any modifications or partial substitutions should be covered within the scope of the claims of the present invention.
Claims
1. A physically guided spatial deep learning method for inverting forest canopy height, characterized in that, The method includes: Acquire multi-baseline fully polarimetric interferometric synthetic aperture radar (PolInSAR) data, estimate complex coherence observations and calculate vertical wavenumber based on the PolInSAR data; Based on the complex coherence observations and the vertical wavenumber, a three-stage inversion based on the random volume scattering-surface scattering hybrid model RVOG is performed to separate the ground phase and invert the initial canopy height, while obtaining the volume scattering-dominated complex coherence. Based on the volume scattering-dominated complex coherence, the ground phase, the vertical wavenumber, and the initial canopy height, a multi-channel feature map containing PolInSAR feature channels and physical anchor point channels is constructed, and the multi-channel feature map is patched and cropped to obtain feature patches. Using the reference height label formed by LiDAR-RH100 as supervision information, the feature patch is input into the spatial convolutional neural network for end-to-end prediction. A confidence map is constructed based on coherent information, and the spatial convolutional neural network is trained by using a joint loss function consisting of confidence modulation supervision loss, low confidence total variation smoothing regularization and low frequency consistency constraint to obtain the canopy height inversion model. The multi-channel feature map corresponding to the region to be tested is input into the trained canopy height inversion model. The Sigmoid-Scaling mapping is performed on the model output, and the network with the exponential moving average (EMA) parameter version is fused with the Hann weighted sliding window for seamless inference, outputting the whole map forest canopy height product.
2. The physically guided spatial deep learning method for forest canopy height inversion according to claim 1, characterized in that, Estimating complex coherence observations based on the PolInSAR data includes: Registration and interferometry were performed using the main and auxiliary images in the PolInSAR data. Complex coherence estimation is performed on the main image and the auxiliary image within a preset window to obtain the complex coherence observation γ; The complex coherent observation γ is decomposed to output the real part real(γ), imaginary part imag(γ), and coherence amplitude |γ| of the complex coherence; The real part real(γ), imaginary part imag(γ), and coherence amplitude |γ| of the complex coherence are used as input data for subsequent volume scattering-dominated complex coherence screening, confidence map construction, and multi-channel feature map construction.
3. The physically guided spatial deep learning method for forest canopy height inversion according to claim 1, characterized in that, Calculating the vertical wavenumber includes: Read the incident angle, slant range, wavelength, vertical baseline, and data acquisition mode information corresponding to the PolInSAR data; Calculate the vertical wavenumber kz based on the geometric relationship of interferometric imaging; The amplitude |kz| and its sign term of the vertical wavenumber kz are extracted to characterize the geometric sensitivity of canopy height inversion; The vertical wavenumber kz, amplitude |kz|, and sign term are used as data inputs for subsequent multi-channel feature map construction and confidence map construction.
4. The physically guided spatial deep learning method for forest canopy height inversion according to claim 1, characterized in that, Perform a three-stage inversion based on the random volume scattering-surface scattering hybrid model RVOG, including: The observed complex coherence is fitted with a coherence line, and the intersection of the fitted line and the unit circle is obtained to obtain the candidate ground phase. The candidate ground phases are discriminated to determine the true ground phase φ0, and based on this, the volume scattering-dominated complex coherent γ phase is determined. h ; Based on the bulk scattering-dominated complex coherent γ h By combining a two-dimensional lookup table (LUT) for search and matching, the initial canopy height is obtained through inversion. ; The ground phase φ0 and the volume scattering-dominated complex coherence γ are described. h and initial canopy height The initial canopy height serves as input data for subsequent multi-channel feature map construction. It serves as a physical anchor point to constrain the subsequent spatial deep learning inversion process.
5. The physically guided spatial deep learning method for forest canopy height inversion according to claim 1, characterized in that, Construct a multi-channel feature map containing PolInSAR feature channels and physical anchor point channels, including: Bulk scattering dominates complex coherent γ h The real part, imaginary part, coherent amplitude, ground phase φ0, vertical wavenumber kz, and normalized physical anchor point channels are spliced together to form a multi-channel feature map. The multi-channel feature map is slid-cropped according to the preset patch size and sliding step size to obtain the input patch for training and inference; The reference height labels formed by the LiDAR-RH100 aligned with the input patch space are simultaneously cropped to form the label patches required for supervised training. The input patch and label patch are fed together into the subsequent training process of the spatial convolutional neural network.
6. The physically guided spatial deep learning method for forest canopy height inversion according to claim 1, characterized in that, The spatial convolutional neural network is a spatial U-Net with an encoder-decoder structure; the feature patch is input into the spatial convolutional neural network for end-to-end prediction, including: Multi-scale spatial feature extraction is performed on the feature patch using a downsampling encoder; The spatial resolution of the multi-scale spatial features is restored using an upsampling decoder. By using skip connections, shallow detailed information is fused with deep semantic information; The output is the canopy height prediction logits at the same scale as the input patch, which serves as the input for subsequent physical extent mapping and loss function calculation.
7. The physically guided spatial deep learning method for forest canopy height inversion according to claim 1, characterized in that, Perform a Sigmoid-Scaling mapping on the model output. The specific expression is: ; In the formula, This represents the predicted canopy height after Sigmoid-Scaling mapping; Indicates the preset lower limit of canopy height; Indicates the preset upper limit of canopy height; represents the Sigmoid function; z represents the logits output of the spatial convolutional neural network.
8. The physically guided spatial deep learning method for forest canopy height inversion according to claim 1, characterized in that, Constructing a confidence graph based on coherent information includes: A confidence map conf is generated based on the coherence amplitude |γ| and the magnitude |kz|, so that regions with higher coherence quality and larger |kz| have higher confidence. Based on the confidence graph conf, the supervision term and regularization term in the training loss are weighted and modulated. The confidence map conf is used as the weight input when calculating the joint loss function, so as to achieve the training objective of preserving details in high-confidence regions and enhancing stability in low-confidence regions.
9. The physically guided spatial deep learning method for forest canopy height inversion according to claim 1, characterized in that, The joint loss function includes a confidence modulation supervision loss term, a low-confidence region total variation smoothing regularization term, a low-frequency consistency constraint term, and a weighted sum of the three, specifically: The confidence modulation supervision loss term is expressed as follows: ; in, This represents the confidence modulation supervision loss term; This represents the set of valid supervised cells within patch p. The number of pixels; The set of pixels within patch p that have valid labels and valid SAR; q represents a pixel; The value represents the confidence weight obtained from conf at pixel q; Huber(·) represents the Huber loss function; This represents the network prediction height at pixel q; Indicates the reference height at pixel q; The smoothing regularization term for the total variation in the low-confidence region is expressed as follows: ; in, represents the total variation smoothing regularization term for low-confidence regions; q represents a pixel; W(q) represents the confidence weight at pixel q; This represents the network prediction height at pixel q; This represents the gradient operator in the x-direction; This represents the gradient operator in the y-direction; The low-frequency consistency constraint term is expressed as follows: ; in, LPF(·) represents the low-frequency consistency constraint term; LPF(·) represents the low-pass operator. This represents the prediction result after the first random augmentation of the same input patch; This represents the prediction result after the same input patch has undergone a second random augmentation; Represents the L1 norm; The joint loss function is expressed as follows: ; Where L represents the joint loss function; This represents the confidence modulation supervision loss term; This represents the smoothing regularization term for the total variation in the low-confidence region; This represents a low-frequency consistency constraint term; This represents the weighting coefficients corresponding to the smoothing regularization term of the total variation in the low-confidence region. This represents the weighting coefficient corresponding to the low-frequency consistency constraint term.
10. The physically guided spatial deep learning method for forest canopy height inversion according to claim 1, characterized in that, Seamless inference is achieved by fusing a network with an exponential moving average (EMA) parameter version and a Hann-weighted sliding window, specifically including: During training, an exponential moving average (EMA) version of the spatial convolutional neural network parameters is maintained, and during the inference phase, the network with the EMA parameter version is called to predict overlapping sliding window patches. Apply a Hann weighted window to the prediction results of each sliding window patch, and perform weighted accumulation and normalization fusion on the overlapping regions; The fused whole image prediction result is obtained using the following expression: ; in, The image fusion prediction height is represented at position r; r represents the spatial position in the entire image; i represents the sliding window number. This represents the predicted value of the i-th sliding window at position r; This represents the Hann weight corresponding to the i-th sliding window at position r; Output forest canopy height products for continuous whole images to suppress window boundary stitching artifacts and improve spatial consistency.