A remote sensing image processing method and device based on dynamic degradation modulation
By acquiring degradation parameters from remote sensing images for structured coding and deep feature extraction, and utilizing the SwinTransformer residual module and attention mechanism to reconstruct high-resolution images, the problem of insufficient resolution in remote sensing images is solved, and efficient image restoration in complex degradation scenarios is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- NAT UNIV OF DEFENSE TECH
- Filing Date
- 2026-04-02
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies are difficult to effectively adapt to complex degradation scenarios in remote sensing image processing, especially fuzzy-noise composite degradation scenarios, resulting in insufficient resolution of remote sensing images and making it difficult to meet the needs of high-precision applications.
By acquiring the degradation parameters of the blur kernel width and noise intensity of the remote sensing image, a pre-defined kernel prior embedding module is used for structured encoding. Deep feature extraction and modulation are then performed by combining the SwinTransformer residual module and the attention mechanism. Finally, a high-resolution image is reconstructed through an upsampling module.
It significantly improves the resolution of remote sensing images, enhances the accuracy of image detail restoration and scene robustness, adapts to complex degradation scenes, and improves the user experience.
Smart Images

Figure CN121961849B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of image processing technology, and in particular to a remote sensing image processing method and apparatus based on dynamic degradation modulation. Background Technology
[0002] Currently, high-resolution remote sensing images are the core data support for civilian applications. Their spatial resolution directly determines the accuracy of ground feature detail identification, the completeness of target feature extraction, and the reliability of subsequent decision-making. However, due to factors such as imaging system hardware constraints (payload cost, size, power consumption), atmospheric interference, platform vibration, and data transmission bandwidth limitations, the remote sensing images actually acquired generally suffer from insufficient resolution and blurred details, making it difficult to meet the needs of high-precision applications.
[0003] Super-resolution image technology reconstructs high-resolution images from low-resolution images through information processing, while restoring detailed image information. It boasts advantages such as low cost, strong adaptability, and ease of engineering deployment, providing key technical support for overcoming the bottleneck of remote sensing image resolution. Existing super-resolution technologies can be divided into two stages: traditional methods and deep learning methods. Traditional methods include interpolation, reconstruction, and example learning, each with drawbacks such as blurred edges, computational complexity, and weak generalization ability. Deep learning methods have significantly driven technological progress. Early methods relied on the idealized degradation assumption of bicubic downsampling, leading to a sharp drop in performance in real-world scenarios. Subsequent introductions of attention mechanisms and Transformers improved feature modeling capabilities, but the problem of adapting to complex degradation remains unsolved. To overcome this limitation, researchers have begun to focus on more realistic complex degradation problems and, based on whether degradation information is known, have further divided super-resolution technologies into non-blind super-resolution and blind super-resolution. Non-blind super-resolution requires a known degradation model and often uses a direct concatenation of degradation information and image features, which makes it difficult to capture the spatial variability of remote sensing degradation and causes semantic conflicts due to domain differences. Blind super-resolution requires simultaneous estimation of the degradation model and the reconstructed image, but due to degradation estimation errors, the accuracy of key detail recovery is insufficient.
[0004] In remote sensing systems equipped with collaborative cameras, precise degradation parameters such as blur kernel width and noise level can be directly obtained through hardware collaboration, providing important support for overcoming existing technical bottlenecks. However, existing methods still do not fully utilize this type of degradation information and are difficult to adapt to "blur-noise" composite degradation scenarios. The generalization ability and detail recovery performance need to be improved.
[0005] As can be seen from the above, how to improve the resolution of remote sensing images in the process of remote sensing image processing based on dynamic degradation modulation is an urgent problem to be solved. Summary of the Invention
[0006] In view of this, the purpose of this invention is to provide a remote sensing image processing method and apparatus based on dynamic degradation modulation, which can improve the resolution of remote sensing images during the processing of remote sensing images based on dynamic degradation modulation. The specific solution is as follows:
[0007] In a first aspect, this application provides a remote sensing image processing method based on dynamic degradation modulation, comprising:
[0008] The remote sensing image to be processed and the corresponding degradation parameters, including the blur kernel width and noise intensity, are obtained. The blur kernel width is then structured and encoded using the fully connected layer and the preset activation function in the preset kernel prior embedding module to obtain the encoding result.
[0009] Based on the encoding result and the noise intensity, a structured degradation vector is generated, and the first preset convolutional layer is used to extract shallow features from the remote sensing image to be processed to obtain initial shallow features.
[0010] The first data to be processed, including the structured degradation vector and the initial shallow features, is input into each degradation modulation layer in the preset residual module, and the first data to be processed is modulated using the normalization layer and attention mechanism in each degradation modulation layer to obtain the second data to be processed.
[0011] The second data to be processed is processed using a second preset convolutional layer to obtain data to be fused, and the data to be fused is fused with the initial shallow features to obtain a fusion result;
[0012] The fusion result is processed using a preset upsampling module to obtain a target remote sensing image; the resolution of the target remote sensing image is greater than the resolution of the remote sensing image to be processed.
[0013] Optionally, acquiring the remote sensing image to be processed and the corresponding degradation parameters, including the blur kernel width and noise intensity, includes:
[0014] A remote sensing image to be processed is acquired, and degradation parameters corresponding to the remote sensing image to be processed are obtained based on an anisotropic Gaussian model, a first random feature value, a second random feature value, a random rotation angle, and a preset Gaussian kernel size; wherein, the degradation parameters include the blur kernel width and noise intensity corresponding to the remote sensing image to be processed.
[0015] Optionally, the step of using the fully connected layer and preset activation function in the preset kernel prior embedding module to perform structured encoding on the fuzzy kernel width to obtain the encoding result includes:
[0016] Using several fully connected layers, a preset slope, and a preset activation function in the preset kernel prior embedding module, deep correlation feature extraction is performed on the width of the fuzzy kernel in a preset multilayer perceptron to obtain the correlation feature extraction result;
[0017] The feature extraction results are subjected to feature dimensionality reduction to obtain the result to be encoded, and then the result to be encoded is subjected to structured encoding to obtain the encoded result; wherein, each of the fully connected layers is connected sequentially.
[0018] Optionally, the step of generating a structured degradation vector based on the encoding result and the noise intensity, and using a first preset convolutional layer to perform shallow feature extraction on the remote sensing image to be processed to obtain initial shallow features, includes:
[0019] The encoding result and the noise intensity are concatenated to obtain a structured degradation vector corresponding to the remote sensing image to be processed; the structured degradation vector includes structured prior information corresponding to the degradation parameters;
[0020] Determine the image size corresponding to the remote sensing image to be processed, and extract shallow features from the remote sensing image to be processed based on the image size and using a first preset convolutional layer to obtain initial shallow features with the same size as the image size.
[0021] Optionally, the first data to be processed, including the structured degradation vector and the initial shallow features, is input into each degradation modulation layer in a preset residual module, and the first data to be processed is modulated using the normalization layer and attention mechanism in each degradation modulation layer to obtain the second data to be processed, including:
[0022] The first data to be processed is constructed based on the structured degradation vector and the initial shallow features, and the first data to be processed is input into a preset residual module comprising several cascaded degradation modulation layers.
[0023] The first data to be processed is normalized by the normalization layer in each of the degradation modulation layers in the preset residual module to obtain normalized features. The normalized features are then combined with the structured degradation vector to obtain a combination result. Then, the combination result is generated by using each preset projection matrix to obtain the corresponding query tensor and the key tensor and value tensor fused with the structured degradation vector.
[0024] The query tensor, the key tensor, and the value tensor are matrix generated using learnable relative position bias terms and an attention mechanism to obtain an attention matrix. The attention matrix is then superimposed with the first data to be processed using residuals to obtain intermediate features.
[0025] The intermediate features are normalized to obtain the features to be transformed. Then, the features to be transformed are transformed using a multilayer perceptron including a nonlinear activation function. The transformation result is superimposed with the intermediate features to obtain the superimposed result. The superimposed result is then used to extract and fuse features using the normalization layer and attention mechanism in each of the degradation modulation layers to obtain the second data to be processed.
[0026] Optionally, the step of processing the second data to be processed using a second preset convolutional layer to obtain data to be fused, and fusing the data to be fused with the initial shallow features to obtain a fusion result, includes:
[0027] The second preset convolutional layer is used to adjust the channel dimension of the second data to be processed to obtain the data to be fused; the second preset convolutional layer has the same number of convolutional kernels as the first preset convolutional layer.
[0028] The data to be fused is added element-wise to the initial shallow features at the corresponding spatial locations and channels to obtain a fusion result that includes deep semantic information and low-level detail information; the fusion result includes global structural features and local detail features corresponding to the remote sensing image to be processed.
[0029] Optionally, after processing the fusion result using a preset upsampling module to obtain the target remote sensing image, the method further includes:
[0030] The remote sensing image to be processed is processed using several super-resolution algorithms to obtain corresponding processing results; the super-resolution algorithms include DPSR algorithm, USRNet algorithm, DASR algorithm and KDSR algorithm;
[0031] Determine the peak signal-to-noise ratio corresponding to each processing result in several complex scenarios, and determine the visual fidelity corresponding to each processing result under different noise intensities. Then, perform generalization verification on each processing result in a real scenario to obtain the corresponding verification results. Determine the performance corresponding to each processing result to obtain the corresponding performance results.
[0032] The target remote sensing image is processed based on the peak signal-to-noise ratio, the visual fidelity, the verification result, and the performance result to obtain a new target remote sensing image.
[0033] Secondly, this application provides a remote sensing image processing apparatus based on dynamic degradation modulation, comprising:
[0034] The encoding result generation module is used to acquire the remote sensing image to be processed and the corresponding degradation parameters including the blur kernel width and noise intensity, and to perform structured encoding on the blur kernel width using the fully connected layer and the preset activation function in the preset kernel prior embedding module to obtain the encoding result;
[0035] The feature extraction module is used to generate a structured degradation vector based on the encoding result and the noise intensity, and to perform shallow feature extraction on the remote sensing image to be processed using a first preset convolutional layer to obtain initial shallow features.
[0036] The data to be processed determination module is used to input the first data to be processed, including the structured degradation vector and the initial shallow features, into each degradation modulation layer in the preset residual module, and to modulate the first data to be processed using the normalization layer and attention mechanism in each degradation modulation layer to obtain the second data to be processed.
[0037] The fusion result generation module is used to process the second data to be processed using a second preset convolutional layer to obtain data to be fused, and to fuse the data to be fused with the initial shallow features to obtain a fusion result;
[0038] The remote sensing image determination module is used to process the fusion result using a preset upsampling module to obtain a target remote sensing image; the resolution of the target remote sensing image is greater than the resolution of the remote sensing image to be processed.
[0039] Optionally, the data to be processed determination module includes:
[0040] The data input unit is used to construct the first data to be processed based on the structured degradation vector and the initial shallow features, and input the first data to be processed into a preset residual module including a plurality of serially connected degradation modulation layers;
[0041] The normalization feature determination unit is used to normalize the first data to be processed by using the normalization layer in each of the degraded modulation layers in the preset residual module to obtain normalized features, and combine the normalized features with the structured degraded vector to obtain a combination result. Then, the combination result is generated by using each preset projection matrix to obtain the corresponding query tensor and the key tensor and value tensor fused with the structured degraded vector.
[0042] The attention matrix generation unit is used to generate an attention matrix by using a learnable relative position bias term and an attention mechanism on the query tensor, the key tensor and the value tensor, and then superimpose the attention matrix with the first data to be processed to obtain intermediate features.
[0043] The feature transformation unit is used to normalize the intermediate features to obtain the features to be transformed. Then, the features to be transformed are transformed using a multilayer perceptron including a nonlinear activation function. The transformation result is superimposed with the intermediate features to obtain a superimposed result. The superimposed result is then used to extract and fuse features using the normalization layer and attention mechanism in each of the degenerate modulation layers to obtain the second data to be processed.
[0044] Optionally, the fusion result generation module includes:
[0045] The data to be fused determination unit is used to perform channel dimension adjustment processing on the second data to be processed using a second preset convolutional layer to obtain the data to be fused; the second preset convolutional layer has the same number of convolutional kernels as the first preset convolutional layer;
[0046] The fusion result generation subunit is used to add the data to be fused and the initial shallow features element-wise at the corresponding spatial positions and channels to obtain a fusion result including deep semantic information and low-level detail information; the fusion result includes global structural features and local detail features corresponding to the remote sensing image to be processed.
[0047] Thirdly, this application provides an electronic device, comprising:
[0048] Memory, used to store computer programs;
[0049] A processor is used to execute the computer program to implement the aforementioned remote sensing image processing method based on dynamic degradation modulation.
[0050] Fourthly, this application provides a computer-readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the aforementioned remote sensing image processing method based on dynamic degradation modulation.
[0051] As can be seen from the above, before performing remote sensing image processing based on dynamic degradation modulation, this application needs to obtain the remote sensing image to be processed and the corresponding degradation parameters including the blur kernel width and noise intensity. The blur kernel width is then structured and encoded using a fully connected layer and a preset activation function in a preset kernel prior embedding module to obtain the encoding result. A structured degradation vector is generated based on the encoding result and noise intensity, and a first preset convolutional layer is used to extract shallow features from the remote sensing image to be processed, obtaining initial shallow features. The first data to be processed, including the structured degradation vector and the initial shallow features, is input into each degradation modulation layer in a preset residual module. The first data to be processed is modulated using a normalization layer and an attention mechanism in each degradation modulation layer to obtain second data to be processed. The second data to be processed is processed using a second preset convolutional layer to obtain data to be fused, and the data to be fused is fused with the initial shallow features to obtain a fusion result. Finally, the fusion result is processed using a preset upsampling module to obtain the target remote sensing image.
[0052] Therefore, this application first needs to acquire the remote sensing image to be processed and the corresponding degradation parameters, including the blur kernel width and noise intensity. Then, it uses a fully connected layer and a preset activation function in the preset kernel prior embedding module to perform structured encoding of the blur kernel width, obtaining the encoding result. Next, a structured degradation vector is generated based on the encoding result and noise intensity, and a first preset convolutional layer is used to extract shallow features from the remote sensing image to be processed, obtaining initial shallow features. Then, the first data to be processed, including the structured degradation vector and the initial shallow features, is input into each degradation modulation layer in the preset residual module, and the normalization layer and attention mechanism in each degradation modulation layer are used to modulate the first data to be processed, obtaining the second data to be processed. Finally, the second data to be processed is processed using a second preset convolutional layer to obtain the data to be fused, and the data to be fused is fused with the initial shallow features to obtain the fusion result. The fusion result is then processed using a preset upsampling module to obtain the target remote sensing image. In this way, the efficiency of remote sensing image resolution is improved in the process of remote sensing image processing based on dynamic degradation modulation, thereby enhancing the user experience. Attached Figure Description
[0053] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on the provided drawings without creative effort.
[0054] Figure 1 This is a flowchart of a remote sensing image processing method based on dynamic degradation modulation disclosed in this application;
[0055] Figure 2 This is a schematic diagram illustrating the principle of a specific remote sensing image processing method based on dynamic degradation modulation disclosed in this application, wherein... Figure 2 (a) is a schematic diagram of the overall framework for remote sensing image processing based on dynamic degradation modulation. Figure 2 (b) is a schematic diagram of the kernel prior embedding module. Figure 2 (c) is a schematic diagram of the SwinTransformer residual module. Figure 2 (d) is a schematic diagram of the degenerate modulation layer of the SwingTransformer;
[0056] Figure 3 This is a schematic diagram showing the quantization results of four specific comparison algorithms disclosed in this application and the present invention under super-resolution tasks, different noise intensities, and five anisotropic blur kernels.
[0057] Figure 4 This is a schematic diagram showing the PSNR comparison of various methods disclosed in this application in low, medium, and high complexity scenarios when constructing the AID dataset;
[0058] Figure 5 This is a schematic diagram illustrating the visualization results of different super-resolution methods disclosed in this application under different degradation conditions, wherein... Figure 5 (a) is a schematic diagram of an image containing anisotropic blur kernels and noise degradation, and its visualization results under different super-resolution methods. Figure 5 (a) The image within the red box is... Figure 5 (a) The corresponding anisotropic fuzzy kernel has a noise intensity of 0; Figure 5 (b) is a schematic diagram of another degraded image containing anisotropic blur kernels and noise degradation, and its visualization results under different super-resolution methods. Figure 5 (b) The image within the red box is Figure 5 (b) The corresponding anisotropic fuzzy kernel with a noise intensity of 10;
[0059] Figure 6 This is a performance comparison diagram of a specific combination of different modules disclosed in this application;
[0060] Figure 7 This is a schematic diagram of the structure of a remote sensing image processing device based on dynamic degradation modulation disclosed in this application. Detailed Implementation
[0061] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0062] Currently, high-resolution remote sensing images are core data support in the civilian field, and their spatial resolution directly determines the accuracy of ground feature detail identification, the completeness of target feature extraction, and the reliability of subsequent decision-making. However, due to factors such as imaging system hardware constraints, atmospheric interference, platform vibration, and data transmission bandwidth limitations, the acquired remote sensing images generally suffer from insufficient resolution and blurred details, making it difficult to meet the requirements of high-precision applications. To address this, this application provides a remote sensing image processing method based on dynamic degradation modulation, which can improve the resolution of remote sensing images during the dynamic degradation modulation-based remote sensing image processing process.
[0063] See Figure 1 As shown, this embodiment of the invention discloses a remote sensing image processing method based on dynamic degradation modulation, comprising:
[0064] Step S11: Obtain the remote sensing image to be processed and the corresponding degradation parameters including the blur kernel width and noise intensity, and use the fully connected layer and preset activation function in the preset kernel prior embedding module to perform structured encoding on the blur kernel width to obtain the encoding result.
[0065] In this embodiment, the overall network framework schematic diagram of this application embodiment is as follows: Figure 2 As shown, where, Figure 2 (a) is a schematic diagram of the overall framework for remote sensing image processing based on dynamic degradation modulation. Figure 2 (b) is a schematic diagram of the kernel prior embedding module. Figure 2 (c) is a schematic diagram of the SwinTransformer residual module. Figure 2(d) is a schematic diagram of the SwinTransformer degradation modulation layer, and the schematic diagram of the SwinTransformer residual module includes the SwinTransformer degradation modulation layer: First, in this embodiment, the input low-resolution image is processed through a 3×3 convolutional layer to complete the initial extraction of shallow features. Then, the above features are fed into six SwinTransformer residual modules for deep feature processing. Simultaneously, a kernel prior embedding module is used to explicitly model the image degradation information, generating a structured image degradation vector, which is then input into the first SwinTransformer degradation modulation layer of each SwinTransformer residual module. Through normalization processing and a multi-head self-attention mechanism, the image degradation vector is effectively encoded into the super-resolution network. After deep feature extraction, the output features are processed through a 3×3 convolutional layer and then fused with the initially extracted shallow features across layers to achieve a balance between detailed information and global structure optimization. Finally, the fused features are passed through an upsampling module to obtain a high-resolution reconstructed image.
[0066] In this embodiment, the structured degradation modeling with kernel prior embedding and the dynamic feature modulation mechanism of SwinTransformer are combined, and the super-resolution performance of remote sensing images in complex scenes is improved through full-process degradation guidance, taking into account both detail recovery accuracy and scene robustness.
[0067] Furthermore, to verify the effectiveness and engineering applicability of this application, the embodiments of this application use simulated remote sensing datasets and real remote sensing datasets to conduct dual-dimensional experimental verification. In the experimental training phase, simulated satellite remote sensing degradation scenarios are constructed based on the DIV2K and Flickr2K datasets. In the verification phase, the BSD100 dataset is used to quantify the performance. The real-scene test uses the AID (Aerial Image Dataset) large-scale aerial image dataset. The AID dataset covers 30 typical land cover scenarios such as airports, farmland, and residential areas. It contains complex degradation parameters such as platform motion blur, atmospheric scattering, and sensor noise in the real imaging process, without artificially synthesized degradation annotations, thereby effectively verifying the generalization and transfer capabilities of the algorithm.
[0068] In this embodiment, for the task of super-resolution of remote sensing images, the main process of the remote sensing image super-resolution method based on dynamic degradation modulation in this application embodiment includes the following steps: First, acquire the low-resolution remote sensing image and the corresponding degradation parameters, wherein the degradation parameters include the blur kernel width and the noise level, wherein the anisotropic Gaussian blur kernel is a Gaussian probability density function with zero mean and variable covariance matrix. Describe, The random eigenvalues are uniformly distributed in the interval [0.2, 4.0]. , and in the interval Random rotation angles uniformly distributed within It is confirmed that the Gaussian kernel size is fixed at 21×21; the noise level range is set to [0,25].
[0069] Specifically, acquiring the remote sensing image to be processed and the corresponding degradation parameters, including the blur kernel width and noise intensity, may include: acquiring the remote sensing image to be processed, and acquiring the degradation parameters corresponding to the remote sensing image to be processed based on an anisotropic Gaussian model, a first random eigenvalue, a second random eigenvalue, a random rotation angle, and a preset Gaussian kernel size; wherein the degradation parameters include the blur kernel width and noise intensity corresponding to the remote sensing image to be processed.
[0070] Subsequently, in this embodiment of the application, the degradation parameters need to be input into the kernel prior embedding module for structured encoding to generate a high-dimensional degradation vector. The specific process is as follows: First, the reconstructed 21×21 Gaussian blur kernel is stretched into a one-dimensional tensor and input into a multilayer perceptron (MLP) for feature learning and dimensionality reduction. The MLP consists of five fully connected layers and four Leaky ReLU activation functions, with the output dimensions of each fully connected layer being 256, 128, 64, 32, and 16, respectively. Then, deep correlation information of the degraded features is extracted through nonlinear transformation, and the dimensionality-reduced blur kernel vector is output. Subsequently, the 16-dimensional blur kernel vector is concatenated with a one-dimensional noise level parameter (value range [0,25]) to generate a 17-dimensional structured degradation vector. The aforementioned vectors are compatible with the feature dimensions of subsequent super-resolution networks, providing a structured prior for degraded modulation.
[0071] Specifically, the fuzzy kernel width is structured and encoded using fully connected layers and a preset activation function in the preset kernel prior embedding module to obtain the encoding result. This can include: using several fully connected layers, a preset slope, and a preset activation function in the preset kernel prior embedding module to perform deep correlation feature extraction on the fuzzy kernel width in a preset multilayer perceptron to obtain the correlation feature extraction result; performing feature dimensionality reduction on the correlation feature extraction result to obtain the result to be encoded; and then performing structured encoding on the result to obtain the encoding result; wherein each fully connected layer is connected sequentially.
[0072] Step S12: Generate a structured degradation vector based on the encoding result and the noise intensity, and use the first preset convolutional layer to extract shallow features from the remote sensing image to be processed to obtain initial shallow features.
[0073] In this embodiment, shallow feature extraction is performed on the low-resolution remote sensing image, and then the input image is mapped through a 3×3 convolutional layer to output initial shallow features with the same size as the input, thereby preserving the basic structural information of the image.
[0074] Specifically, a structured degradation vector is generated based on the encoding result and noise intensity, and a first preset convolutional layer is used to perform shallow feature extraction on the remote sensing image to be processed to obtain initial shallow features. This may include: concatenating the encoding result and noise intensity channels to obtain a structured degradation vector corresponding to the remote sensing image to be processed; the structured degradation vector includes structured prior information corresponding to degradation parameters; determining the image size corresponding to the remote sensing image to be processed, and performing shallow feature extraction on the remote sensing image to be processed based on the image size and using the first preset convolutional layer to obtain initial shallow features with the same size as the image size.
[0075] Step S13: Input the first data to be processed, including the structured degradation vector and the initial shallow features, into each degradation modulation layer in the preset residual module, and use the normalization layer and attention mechanism in each degradation modulation layer to modulate the first data to be processed to obtain the second data to be processed.
[0076] In this embodiment, the initial shallow features and structured degradation vector are input into a full-process degradation modulation framework for dynamic optimization of deep features. The full-process degradation modulation framework consists of six cascaded SwinTransformer residual modules. Each SwinTransformer residual module contains six SwinTransformer degradation modulation layers and one convolutional residual block. Subsequently, the initial shallow X features are subjected to layer normalization, and X, ... , respectively through projection matrix , , Generate a query vector Q, a key vector K, and a value vector V, wherein the key vector K and the value vector V are respectively superimposed with a space mapping term of the degenerate vector. , And the corresponding calculation formula is as follows:
[0077] ;
[0078] in, , , These are the projection matrices, , It is the image degradation vector The mapping to the key-value space enables attention computation to dynamically respond to the degradation characteristics of satellite imagery. This indicates a broadcast operation. Among them, It is an H×W dimensional matrix of all ones, which can be used to construct a one-dimensional degenerate embedding vector. and Extending to the H×W×C dimension, thus achieving spatial size matching.
[0079] In this embodiment, a learnable relative position code B is introduced to construct a multi-head self-attention (MSA) module, and the calculation formula for the attention matrix is determined as follows:
[0080] ;
[0081] in, It is a learnable relative position encoding.
[0082] In one specific implementation, the attention function is computed in parallel 6 times (h=6 heads) and then stitched together to enhance the ability to model the spatial relationships of ground features in remote sensing images.
[0083] In this embodiment, the present application uses a multilayer perceptron (MLP) for feature nonlinear transformation. The MLP consists of two fully connected layers with a GELU activation function embedded between them. The output dimensions of the fully connected layers are respectively... , ( (This refers to the number of feature channels). It is worth noting that in this embodiment, a normalization layer is embedded before both the MSA module and the MLP module, and a residual connection structure is used to improve feature transfer efficiency. The calculation process is as follows:
[0084] ;
[0085] Subsequently, the output features of the first Swing Transformer residual module are sequentially fed into the next five Swing Transformer residual modules. The first Swing Transformer degradation modulation layer of each module is input with the same structured degradation vector, thereby achieving full-process degradation guidance.
[0086] Specifically, the first data to be processed, including a structured degradation vector and initial shallow features, is input into each degradation modulation layer in a preset residual module. The first data to be processed is then modulated using a normalization layer and attention mechanism in each degradation modulation layer to obtain the second data to be processed. This process can include: constructing the first data to be processed based on the structured degradation vector and initial shallow features, and inputting the first data to be processed into a preset residual module comprising several cascaded degradation modulation layers; normalizing the first data to be processed using the normalization layers in each degradation modulation layer of the preset residual module to obtain normalized features; combining the normalized features with the structured degradation vector to obtain a combined result; and then using each preset projection matrix... Tensor generation is performed on the combined results to obtain corresponding query tensors and key and value tensors fused with structured degenerate vectors. A learnable relative position bias term and attention mechanism are used to generate matrices from the query, key, and value tensors to obtain attention matrices. These attention matrices are then residually superimposed with the first data to be processed to obtain intermediate features. The intermediate features are normalized to obtain the features to be transformed. A multilayer perceptron including a nonlinear activation function is then used to transform these features. The transformed features are residually superimposed with the intermediate features to obtain a superimposed result. The superimposed result is then used for feature extraction and fusion using the normalization layers and attention mechanisms in each degenerate modulation layer to obtain the second data to be processed.
[0087] Step S14: Process the second data to be processed using the second preset convolutional layer to obtain the data to be fused, and fuse the data to be fused with the initial shallow features to obtain the fusion result.
[0088] In this embodiment, the output features of the sixth Swing Transformer residual module are channel-adjusted through a 3×3 convolutional layer, and the channel-adjusted results are fused with the extracted initial shallow features across layers. Then, element-wise addition is used to achieve a balance optimization between detailed information and global structure to obtain fused features.
[0089] Specifically, the second preset convolutional layer is used to process the second data to be processed to obtain the data to be fused, and the data to be fused is fused with the initial shallow features to obtain the fusion result. This may include: using the second preset convolutional layer to adjust the channel dimension of the second data to be processed to obtain the data to be fused; the second preset convolutional layer and the first preset convolutional layer have the same number of convolutional kernels; the data to be fused and the initial shallow features are added element-wise at the corresponding spatial positions and channels to obtain the fusion result including deep semantic information and low-level detail information; the fusion result includes global structural features and local detail features corresponding to the remote sensing image to be processed.
[0090] Step S15: Process the fusion result using a preset upsampling module to obtain a target remote sensing image; the resolution of the target remote sensing image is greater than the resolution of the remote sensing image to be processed.
[0091] In this embodiment, to verify the effectiveness of the proposed invention, this example is compared with four mainstream super-resolution algorithms: the non-blind super-resolution methods DPSR (Deep Plug-and-Play Super-Resolution) and USRNet (Unfolding Super-Resolution Network), and the blind super-resolution methods DASR (Degradation-Aware Super-Resolution) and KDSR (Kernel-Decoupled Super-Resolution). Subsequently, Peak Signal-to-Noise Ratio (PSNR) is used as the core metric. Figure 3 The diagram illustrates the quantization results of four comparison algorithms and this invention under super-resolution tasks, different noise intensities (0 and 10), and five anisotropic blur kernels. Figure 3 As can be seen, the present invention achieves the optimal PSNR value in all degradation scenarios, verifying its adaptability to degradation scenarios of "pure fuzziness" and "fuzzy-noise composite".
[0092] Furthermore, Figure 4 This diagram illustrates the PSNR comparison of various methods under low, medium, and high complexity scenarios constructed on the AID dataset. Figure 4 It can be seen that the PSNR of this invention reaches 25.28dB in high-complexity scenes (AGM>30), which is significantly better than other methods, proving the ability to recover high-frequency details in complex degradation and complex texture coupling scenes.
[0093] In this embodiment, the visualization results of different super-resolution methods under different degradation conditions are shown in the following diagram. Figure 5 As shown, where, Figure 5 (a) is a schematic diagram of an image containing anisotropic blur kernels and noise degradation, and its visualization results under different super-resolution methods. Figure 5 (a) The image within the red box is... Figure 5 (a) The corresponding anisotropic fuzzy kernel has a noise intensity of 0; Figure 5 (b) is a schematic diagram of another degraded image containing anisotropic blur kernels and noise degradation, and its visualization results under different super-resolution methods. Figure 5 (b) The image within the red box is Figure 5 (b) The corresponding anisotropic fuzzy kernel has a noise intensity of 10.
[0094] It is worth mentioning that in degradation scenarios with noise levels of 0 and 10, DASR and KDSR methods suffer from low texture recognition and blurred edge contours, while DPSR and USRNet lack accurate detail recovery. In contrast, this invention can clearly restore high-frequency textures, enhance key features such as building edges, and achieve superior visual fidelity in the reconstructed image. In airport scenes on the AID dataset, this invention effectively adapts to real-world remote sensing degradation, clearly restoring parking lot markings and aircraft outlines, and exhibits better background suppression than the contrasting methods.
[0095] To further verify the effectiveness of the core module, an ablation experiment was conducted in the embodiments of this application. Figure 6 The diagram illustrates the performance comparison of different module combinations. The K1 model with the kernel prior embedding module removed has a PSNR of 25.51 dB, the K2 model with the degenerate modulation module removed has a PSNR of 25.24 dB, the E1 and E2 models with only a single module injected with a degenerate vector have PSNRs of 26.62 dB and 26.53 dB respectively, while the present invention (baseline model) including the complete module and full-process modulation achieves a PSNR of 26.81 dB. This demonstrates that the collaborative design of kernel prior embedding and full-process degenerate modulation is key to performance improvement.
[0096] Specifically, after processing the fusion result using a preset upsampling module to obtain the target remote sensing image, the process may further include: processing the remote sensing image to be processed using several super-resolution algorithms to obtain corresponding processing results; the super-resolution algorithms include DPSR, USRNet, DASR, and KDSR algorithms; determining the peak signal-to-noise ratio (PSNR) corresponding to each processing result under several complex scenarios, and determining the visual fidelity corresponding to each processing result under different noise intensities; then performing generalization verification on each processing result in a real scene to obtain corresponding verification results; determining the performance corresponding to each processing result to obtain corresponding performance results; and processing the target remote sensing image based on the PSNR, visual fidelity, verification results, and performance results to obtain a new target remote sensing image.
[0097] As can be seen from the above, the embodiments of this application first need to obtain the remote sensing image to be processed and the corresponding degradation parameters including the blur kernel width and noise intensity. The blur kernel width is then structured and encoded using a fully connected layer and a preset activation function in a preset kernel prior embedding module to obtain the encoding result. Next, a structured degradation vector is generated based on the encoding result and noise intensity, and a first preset convolutional layer is used to extract shallow features from the remote sensing image to be processed, obtaining initial shallow features. Then, the first data to be processed, including the structured degradation vector and the initial shallow features, is input into each degradation modulation layer in a preset residual module. The first data to be processed is modulated using a normalization layer and an attention mechanism in each degradation modulation layer to obtain second data to be processed. Finally, the second data to be processed is processed using a second preset convolutional layer to obtain data to be fused, and the data to be fused is fused with the initial shallow features to obtain a fusion result. The fusion result is then processed using a preset upsampling module to obtain the target remote sensing image. In this way, the efficiency of remote sensing image resolution is improved in the process of remote sensing image processing based on dynamic degradation modulation, thereby enhancing the user experience.
[0098] Accordingly, see Figure 7 As shown, this application also provides a remote sensing image processing apparatus based on dynamic degradation modulation, comprising:
[0099] The encoding result generation module 11 is used to acquire the remote sensing image to be processed and the corresponding degradation parameters including the blur kernel width and noise intensity, and to perform structured encoding on the blur kernel width using the fully connected layer and the preset activation function in the preset kernel prior embedding module to obtain the encoding result;
[0100] The feature extraction module 12 is used to generate a structured degradation vector based on the encoding result and the noise intensity, and to perform shallow feature extraction on the remote sensing image to be processed using a first preset convolutional layer to obtain initial shallow features.
[0101] The data to be processed determination module 13 is used to input the first data to be processed, including the structured degradation vector and the initial shallow features, into each degradation modulation layer in the preset residual module, and to modulate the first data to be processed using the normalization layer and attention mechanism in each degradation modulation layer to obtain the second data to be processed.
[0102] The fusion result generation module 14 is used to process the second data to be processed using a second preset convolutional layer to obtain data to be fused, and to fuse the data to be fused with the initial shallow features to obtain a fusion result;
[0103] The remote sensing image determination module 15 is used to process the fusion result using a preset upsampling module to obtain a target remote sensing image; the resolution of the target remote sensing image is greater than the resolution of the remote sensing image to be processed.
[0104] In some specific embodiments, the encoding result generation module 11 may specifically include:
[0105] The remote sensing image acquisition unit is used to acquire a remote sensing image to be processed, and to obtain degradation parameters corresponding to the remote sensing image to be processed based on an anisotropic Gaussian model, a first random feature value, a second random feature value, a random rotation angle and a preset Gaussian kernel size; wherein, the degradation parameters include the blur kernel width and noise intensity corresponding to the remote sensing image to be processed.
[0106] In some specific embodiments, the encoding result generation module 11 may specifically include:
[0107] The feature extraction unit is used to perform deep correlation feature extraction on the fuzzy kernel width in a preset multilayer perceptron by utilizing several fully connected layers, preset slope and preset activation function in the preset kernel prior embedding module, and obtain the correlation feature extraction result.
[0108] The encoding result generation unit is used to perform feature dimensionality reduction on the associated feature extraction result to obtain the result to be encoded, and then perform structured encoding on the result to be encoded to obtain the encoding result; wherein, each of the fully connected layers is connected sequentially.
[0109] In some specific embodiments, the feature extraction module 12 may specifically include:
[0110] A degradation vector generation unit is used to perform channel concatenation processing on the encoding result and the noise intensity to obtain a structured degradation vector corresponding to the remote sensing image to be processed; the structured degradation vector includes structured prior information corresponding to the degradation parameters;
[0111] An image size determination unit is used to determine the image size corresponding to the remote sensing image to be processed, so as to perform shallow feature extraction on the remote sensing image to be processed based on the image size and using a first preset convolutional layer to obtain initial shallow features with the same size as the image size.
[0112] In some specific embodiments, the data to be processed determination module 13 may specifically include:
[0113] The data input unit is used to construct the first data to be processed based on the structured degradation vector and the initial shallow features, and input the first data to be processed into a preset residual module including a plurality of serially connected degradation modulation layers;
[0114] The normalization feature determination unit is used to normalize the first data to be processed by using the normalization layer in each of the degraded modulation layers in the preset residual module to obtain normalized features, and combine the normalized features with the structured degraded vector to obtain a combination result. Then, the combination result is generated by using each preset projection matrix to obtain the corresponding query tensor and the key tensor and value tensor fused with the structured degraded vector.
[0115] The attention matrix generation unit is used to generate an attention matrix by using a learnable relative position bias term and an attention mechanism on the query tensor, the key tensor and the value tensor, and then superimpose the attention matrix with the first data to be processed to obtain intermediate features.
[0116] The feature transformation unit is used to normalize the intermediate features to obtain the features to be transformed. Then, the features to be transformed are transformed using a multilayer perceptron including a nonlinear activation function. The transformation result is superimposed with the intermediate features to obtain a superimposed result. The superimposed result is then used to extract and fuse features using the normalization layer and attention mechanism in each of the degenerate modulation layers to obtain the second data to be processed.
[0117] In some specific embodiments, the fusion result generation module 14 may specifically include:
[0118] The data to be fused determination unit is used to perform channel dimension adjustment processing on the second data to be processed using a second preset convolutional layer to obtain the data to be fused; the second preset convolutional layer has the same number of convolutional kernels as the first preset convolutional layer;
[0119] The fusion result generation subunit is used to add the data to be fused and the initial shallow features element-wise at the corresponding spatial positions and channels to obtain a fusion result including deep semantic information and low-level detail information; the fusion result includes global structural features and local detail features corresponding to the remote sensing image to be processed.
[0120] In some specific embodiments, the remote sensing image processing apparatus based on dynamic degradation modulation may further include:
[0121] The processing result generation unit is used to process the remote sensing image to be processed using several super-resolution algorithms to obtain corresponding processing results; the super-resolution algorithms include DPSR algorithm, USRNet algorithm, DASR algorithm and KDSR algorithm;
[0122] The peak signal-to-noise ratio determination unit is used to determine the peak signal-to-noise ratio corresponding to each of the processing results in several complex scenarios, and to determine the visual fidelity corresponding to each of the processing results under different noise intensities. Then, it performs generalization verification on each of the processing results in a real scenario to obtain the corresponding verification results, and determines the performance corresponding to each of the processing results to obtain the corresponding performance results.
[0123] An image processing unit is used to process the target remote sensing image based on the peak signal-to-noise ratio, the visual fidelity, the verification result, and the performance result to obtain a new target remote sensing image.
[0124] The various embodiments in this specification are described in a progressive manner, with each embodiment focusing on its differences from other embodiments. Similar or identical parts between embodiments can be referred to interchangeably. For the apparatus disclosed in the embodiments, since it corresponds to the method disclosed in the embodiments, the description is relatively simple; relevant parts can be referred to in the method section.
[0125] Those skilled in the art will further recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both. To clearly illustrate the interchangeability of hardware and software, the components and steps of the various examples have been generally described in terms of functionality in the foregoing description. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.
[0126] The steps of the methods or algorithms described in conjunction with the embodiments disclosed herein can be implemented directly by hardware, a software module executed by a processor, or a combination of both. The software module can be located in random access memory (RAM), main memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art.
[0127] Finally, it should be noted that in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.
[0128] The technical solutions provided in this application have been described in detail above. Specific examples have been used to illustrate the principles and implementation methods of this application. The descriptions of the above embodiments are only for the purpose of helping to understand the methods and core ideas of this application. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of this application. Therefore, the content of this specification should not be construed as a limitation of this application.
Claims
1. A remote sensing image processing method based on dynamic degradation modulation, characterized in that, include: The remote sensing image to be processed and the corresponding degradation parameters, including the blur kernel width and noise intensity, are obtained. The blur kernel width is then structured and encoded using the fully connected layer and the preset activation function in the preset kernel prior embedding module to obtain the encoding result. Based on the encoding result and the noise intensity, a structured degradation vector is generated, and the first preset convolutional layer is used to extract shallow features from the remote sensing image to be processed to obtain initial shallow features. The first data to be processed, including the structured degradation vector and the initial shallow features, is input into each degradation modulation layer in the preset residual module, and the first data to be processed is modulated using the normalization layer and attention mechanism in each degradation modulation layer to obtain the second data to be processed. The second data to be processed is processed using a second preset convolutional layer to obtain data to be fused, and the data to be fused is fused with the initial shallow features to obtain a fusion result; The fusion result is processed using a preset upsampling module to obtain a remote sensing image of the target. The resolution of the target remote sensing image is greater than the resolution of the remote sensing image to be processed. The process of inputting first data to be processed, including the structured degradation vector and the initial shallow features, into each degradation modulation layer in a preset residual module, and modulating the first data to be processed using the normalization layer and attention mechanism in each degradation modulation layer to obtain second data to be processed, includes: constructing first data to be processed based on the structured degradation vector and the initial shallow features, and inputting the first data to be processed into a preset residual module comprising several cascaded degradation modulation layers; normalizing the first data to be processed using the normalization layer in each degradation modulation layer of the preset residual module to obtain normalized features, combining the normalized features with the structured degradation vector to obtain a combination result, and then using each preset projection matrix to... The combined results are used to generate tensors, resulting in corresponding query tensors and key and value tensors fused with the structured degradation vector. A learnable relative position bias term and an attention mechanism are used to generate matrices from the query tensor, key tensor, and value tensor to obtain an attention matrix. The attention matrix is then residually superimposed with the first data to be processed to obtain intermediate features. These intermediate features are normalized to obtain features to be transformed. A multilayer perceptron including a nonlinear activation function is then used to transform these features, and the transformation result is residually superimposed with the intermediate features to obtain a superimposed result. The superimposed result is then used with the normalization layer and attention mechanism in each degradation modulation layer for feature extraction and fusion to obtain the second data to be processed.
2. The remote sensing image processing method based on dynamic degradation modulation according to claim 1, characterized in that, The process of acquiring the remote sensing image to be processed and the corresponding degradation parameters, including the blur kernel width and noise intensity, includes: A remote sensing image to be processed is acquired, and degradation parameters corresponding to the remote sensing image to be processed are obtained based on an anisotropic Gaussian model, a first random feature value, a second random feature value, a random rotation angle, and a preset Gaussian kernel size; wherein, the degradation parameters include the blur kernel width and noise intensity corresponding to the remote sensing image to be processed.
3. The remote sensing image processing method based on dynamic degradation modulation according to claim 1, characterized in that, The method of using a fully connected layer and a preset activation function in a preset kernel prior embedding module to perform structured encoding on the fuzzy kernel width to obtain the encoding result includes: Using several fully connected layers, a preset slope, and a preset activation function in the preset kernel prior embedding module, deep correlation feature extraction is performed on the width of the fuzzy kernel in a preset multilayer perceptron to obtain the correlation feature extraction result; The feature extraction results are subjected to feature dimensionality reduction to obtain the result to be encoded, and then the result to be encoded is subjected to structured encoding to obtain the encoded result; wherein, each of the fully connected layers is connected sequentially.
4. The remote sensing image processing method based on dynamic degradation modulation according to claim 1, characterized in that, The process involves generating a structured degradation vector based on the encoding result and the noise intensity, and then using a first preset convolutional layer to extract shallow features from the remote sensing image to be processed, resulting in initial shallow features, including: The encoding result and the noise intensity are concatenated to obtain a structured degradation vector corresponding to the remote sensing image to be processed; the structured degradation vector includes structured prior information corresponding to the degradation parameters; Determine the image size corresponding to the remote sensing image to be processed, and extract shallow features from the remote sensing image to be processed based on the image size and using a first preset convolutional layer to obtain initial shallow features with the same size as the image size.
5. The remote sensing image processing method based on dynamic degradation modulation according to claim 1, characterized in that, The process of using a second preset convolutional layer to process the second data to be processed to obtain data to be fused, and fusing the data to be fused with the initial shallow features to obtain a fusion result, includes: The second preset convolutional layer is used to adjust the channel dimension of the second data to be processed to obtain the data to be fused; the second preset convolutional layer has the same number of convolutional kernels as the first preset convolutional layer. The data to be fused is added element-wise to the initial shallow features at the corresponding spatial locations and channels to obtain a fusion result that includes deep semantic information and low-level detail information; the fusion result includes global structural features and local detail features corresponding to the remote sensing image to be processed.
6. The remote sensing image processing method based on dynamic degradation modulation according to any one of claims 1 to 5, characterized in that, After processing the fusion result using a preset upsampling module to obtain the target remote sensing image, the process further includes: The remote sensing image to be processed is processed using several super-resolution algorithms to obtain corresponding processing results; the super-resolution algorithms include DPSR algorithm, USRNet algorithm, DASR algorithm and KDSR algorithm; Determine the peak signal-to-noise ratio corresponding to each processing result in several complex scenarios, and determine the visual fidelity corresponding to each processing result under different noise intensities. Then, perform generalization verification on each processing result in a real scenario to obtain the corresponding verification results. Determine the performance corresponding to each processing result to obtain the corresponding performance results. The target remote sensing image is processed based on the peak signal-to-noise ratio, the visual fidelity, the verification result, and the performance result to obtain a new target remote sensing image.
7. A remote sensing image processing device based on dynamic degradation modulation, characterized in that, include: The encoding result generation module is used to acquire the remote sensing image to be processed and the corresponding degradation parameters including the blur kernel width and noise intensity, and to perform structured encoding on the blur kernel width using the fully connected layer and the preset activation function in the preset kernel prior embedding module to obtain the encoding result; The feature extraction module is used to generate a structured degradation vector based on the encoding result and the noise intensity, and to perform shallow feature extraction on the remote sensing image to be processed using a first preset convolutional layer to obtain initial shallow features. The data to be processed determination module is used to input the first data to be processed, including the structured degradation vector and the initial shallow features, into each degradation modulation layer in the preset residual module, and to modulate the first data to be processed using the normalization layer and attention mechanism in each degradation modulation layer to obtain the second data to be processed. The fusion result generation module is used to process the second data to be processed using a second preset convolutional layer to obtain data to be fused, and to fuse the data to be fused with the initial shallow features to obtain a fusion result; The remote sensing image determination module is used to process the fusion result using a preset upsampling module to obtain a target remote sensing image; the resolution of the target remote sensing image is greater than the resolution of the remote sensing image to be processed. Specifically, the data to be processed determination module is used to: construct first data to be processed based on the structured degradation vector and the initial shallow features, and input the first data to be processed into a preset residual module comprising several cascaded degradation modulation layers; normalize the first data to be processed using the normalization layers in each degradation modulation layer of the preset residual module to obtain normalized features, combine the normalized features with the structured degradation vector to obtain a combination result, and then generate tensors from the combination result using preset projection matrices to obtain corresponding query tensors and key tensors fused with the structured degradation vector. The query tensor, key tensor, and value tensor are used to generate an attention matrix using a learnable relative position bias term and an attention mechanism. The attention matrix is then residually superimposed with the first data to be processed to obtain intermediate features. The intermediate features are normalized to obtain features to be transformed. Then, a multilayer perceptron including a nonlinear activation function is used to transform the features to be transformed. The transformation result is residually superimposed with the intermediate features to obtain a superimposed result. The superimposed result is then used to extract and fuse features using the normalization layer and attention mechanism in each of the degenerate modulation layers to obtain the second data to be processed.
8. The remote sensing image processing apparatus based on dynamic degradation modulation according to claim 7, characterized in that, The data to be processed determination module includes: The data input unit is used to construct the first data to be processed based on the structured degradation vector and the initial shallow features, and input the first data to be processed into a preset residual module including a plurality of serially connected degradation modulation layers; The normalization feature determination unit is used to normalize the first data to be processed by using the normalization layer in each of the degraded modulation layers in the preset residual module to obtain normalized features, and combine the normalized features with the structured degraded vector to obtain a combination result. Then, the combination result is generated by using each preset projection matrix to obtain the corresponding query tensor and the key tensor and value tensor fused with the structured degraded vector. The attention matrix generation unit is used to generate an attention matrix by using a learnable relative position bias term and an attention mechanism on the query tensor, the key tensor and the value tensor, and then superimpose the attention matrix with the first data to be processed to obtain intermediate features. The feature transformation unit is used to normalize the intermediate features to obtain the features to be transformed. Then, the features to be transformed are transformed using a multilayer perceptron including a nonlinear activation function. The transformation result is superimposed with the intermediate features to obtain a superimposed result. The superimposed result is then used to extract and fuse features using the normalization layer and attention mechanism in each of the degenerate modulation layers to obtain the second data to be processed.
9. The remote sensing image processing apparatus based on dynamic degradation modulation according to claim 7, characterized in that, The fusion result generation module includes: The data to be fused determination unit is used to perform channel dimension adjustment processing on the second data to be processed using a second preset convolutional layer to obtain the data to be fused; the second preset convolutional layer has the same number of convolutional kernels as the first preset convolutional layer; The fusion result generation subunit is used to add the data to be fused and the initial shallow features element-wise at the corresponding spatial positions and channels to obtain a fusion result including deep semantic information and low-level detail information; the fusion result includes global structural features and local detail features corresponding to the remote sensing image to be processed.