A panchromatic sharpening method based on multi-sequence cyclic convolutional neural network
By using a pancolor sharpening model based on a multi-sequence recurrent convolutional neural network, the problem of ignoring inter-band dependencies and correlations in existing technologies is solved, the accuracy of pancolor sharpening results is improved, and high-quality remote sensing images are generated.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- NANJING UNIV OF AERONAUTICS & ASTRONAUTICS
- Filing Date
- 2022-06-15
- Publication Date
- 2026-06-12
AI Technical Summary
Most existing deep learning-based panchromatic sharpening techniques ignore the local dependencies and global correlations between different bands in panchromatic and multispectral images, resulting in low accuracy of panchromatic sharpening results.
A panchromatic sharpening model based on a multi-sequence recurrent convolutional neural network is adopted. By constructing feature extraction, feature fusion and feature recovery sub-networks, the local dependencies and global correlations between panchromatic images and multispectral images are captured, thereby improving the accuracy of sharpening results.
It improves the accuracy of panchromatic sharpening results, promoting wider applications, especially in generating high spatial and hyperspectral resolution multispectral images in remote sensing image processing.
Smart Images

Figure CN116402737B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of remote sensing image processing technology, and is a panchromatic sharpening technique that fuses panchromatic and multispectral images. Background Technology
[0002] High-resolution multispectral images have broad application prospects not only in engineering assessment, land management, and urban planning, but are also an important preprocessing step in many remote sensing tasks, such as scene recognition, target detection, and change detection. However, due to the technical and economic limitations of remote sensing satellite sensors, directly acquired multispectral images often cannot simultaneously possess high spatial and spectral resolution. Most multispectral remote sensing platforms are often equipped with panchromatic imaging devices; therefore, fusing panchromatic (PAN) images with multispectral (MS) images generates MS images with high spectral and spatial resolution. We call this image processing technique panchromatic sharpening, a hot topic in the field of remote sensing image processing.
[0003] Over the past few decades, panchromatic sharpening has evolved into various fusion methods: 1. Component Substitution (CS) methods first convert the multispectral image to intensity-hue-saturation (IHS) space, then replace the spatial components with the panchromatic image, and finally inversely transform the data back into a multispectral image. Gram-Schmidt (GS) fusion, Gram-Schmidt adaptive (GSA) fusion, and Band-Dependent Spatial-Detail (BDSD) fusion all belong to the component substitution type. While the spatial information of the fusion results based on component substitution is enhanced, it often produces severe spectral distortion; 2. Multiresolution Analysis (MRA) injects detailed spatial information extracted from high spatial resolution panchromatic images into multispectral images to improve spatial resolution and reduce spectral distortion. Methods based on multi-resolution analysis mainly include wavelet transform (WT), Laplacian pyramid (LP), additive wavelet brightness scaling, and Tr6us wavelet transform. While these methods can preserve spectral information well, they have high algorithm complexity and are more susceptible to image registration results, leading to spatial distortion in pancolor sharpening. Variational optimization (VO) primarily constructs an energy function through assumptions or priors, then minimizes this function to obtain better fusion results. Common VO-based methods include nonlocal optimization based on k-means clustering, Bayesian posterior probability, and adaptive regularization based on the normalized Gaussian distribution total variational operator. However, VO-based methods are sometimes inefficient, severely hindering their widespread application. 4: Deep learning (DL) methods are mainly inspired by super-resolution based on convolutional neural networks (CNNs). Masi et al. proposed using convolutional neural networks for pancolor sharpening to learn the mapping relationship from low-resolution multispectral images to high-resolution multispectral images, aiming to reconstruct high-resolution pancolor sharpening results. This is a successful innovation of deep learning technology in the field of pancolor sharpening.Subsequently, several methods have been developed, including Boosting the Accuracy of Multispectral Image Pansharpening by Learning (DRPNN), a Deep Residual Network: A Multiscale and Multidepth Convolutional Neural Network for Remote Sensing Imagery (MSDCNN), Remote Sensing Image Fusion Based on Two-stream Fusion Network Pan-Sharpening (TFNet), and Deep Gradient Projection Networks for Pan-sharpening (GPPNN). However, most existing deep learning-based pan-color sharpening methods treat pan-color and multispectral images as a whole, ignoring the local dependencies and global correlations between multispectral image bands. To address this issue, a pan-color sharpening model based on a multi-sequence convolutional recurrent neural network, Multi-Sequence Convolutional Recurrent Network for Pansharpening (MCRNN), is proposed, which can improve the accuracy of sharpening results and promote wider application. Summary of the Invention
[0004] Pancolor sharpening technology can fuse low spatial resolution pancolor images and high spectral resolution images to generate multispectral images with both high spatial and spectral resolution. However, most existing deep learning-based pancolor sharpening techniques treat the pancolor and multispectral images as a whole, neglecting the local dependencies and global correlations between the bands of the pancolor and multispectral images, thus affecting the pancolor sharpening results. To address this issue, this paper proposes a pancolor sharpening model based on a multi-sequence recurrent convolutional neural network. This model can improve the accuracy of the sharpening results and promote wider application. The proposed model includes the following steps:
[0005] (1) First, the original multispectral image is upsampled to the size of the panchromatic image using bicubic interpolation according to the scaling factor S. Then, the multispectral image and the panchromatic image are stacked to form multi-sequence data x. i .
[0006] (2) Construct a feature extraction subnetwork and process the multi-sequence data x i The shallow features of each band are extracted from the feature extraction subnetwork.
[0007] (3) Construct a feature fusion subnetwork, model the intra-band and inter-band relationships of the sequence features output by the feature extraction subnetwork, capture the local dependence and global correlation between different bands, thereby improving the performance of full-color sharpening.
[0008] (4) Construct a feature restoration subnetwork and restore the fusion result output by the feature fusion subnetwork into a multispectral image with high spatial resolution and high spectral resolution.
[0009] In step (1), the specific steps for stacking the panchromatic image and the multispectral image to form multi-sequence data S are as follows: Assuming g PAN (H×W×1) is a panchromatic image containing one spectral band. It is the original multispectral image containing M spectral bands, which, after bilinear interpolation using a scaling factor S, yields... It is a multispectral image with the size of a panchromatic image containing B spectral bands. The panchromatic image and the multispectral image are then stacked to form multi-sequence data S. This process can be expressed by the following formula.
[0010]
[0011]
[0012] In step (2), the main structure of the feature extraction subnetwork is two consecutive convolutional layers, with two residual blocks embedded between them. Each residual block mainly consists of two convolutional layers and a skip connection, with an activation layer embedded after each convolutional layer. The working principle of the residual blocks can be described by the following formula.
[0013] y i =h(x i )+R(x i B i (3)
[0014] x i+1 =F(y i (4)
[0015] Where x i and x i+1 These are the inputs and outputs of the residual block, R(.) is the residual function, and F(y) is the residual function. i h(x) is the activation function. i ) is the identity mapping function. In the model, the parameters of the residual blocks are shared to maintain a relative balance between fusion accuracy and the number of parameters. Therefore, the multi-sequence data x i The feature extraction network φ is input sequentially to obtain feature maps f for each band. i (i = 1, 2, ..., M+1), the entire process can be represented as:
[0016] f i =φ(x i ), i = 1, 2, ..., M+1 (5)
[0017] It should be noted that the kernel size of the convolutional layers mentioned above is 3*3.
[0018] In step (3), the main structure of the constructed feature fusion sub-network is a convGRU unit, and its working principle is as follows: Assume φ is the feature map. The update gate and the reset gate are z i and r i Update Gate Z i Is it an update to activate h? i The logic gate at time z i Represented as:
[0019] z i =σ(W z *f i +U z *h i-1 (6)
[0020] Where σ is the activation function. In the update gate, W z and U z h represents the weight matrices from the current input layer to the hidden layer and from the previous hidden layer to the current hidden layer, respectively. i-1 This is the output activation value of the previous hidden layer. * indicates the convolution operator. Reset gate r i When determining candidate activations, it is necessary to estimate whether to discard previous activations h. i Then r i It can be defined as:
[0021] r i =σ(W r *f i +U r *h i-1 (7)
[0022] In the reset door, W r and U r These represent the weight matrices from the current input layer to the hidden layer and from the previous hidden layer to the current hidden layer, respectively. Candidate activations Intended to receive [f i h i-1 ], and is given by the following formula:
[0023]
[0024] Among them W h and U hThese also represent the weight matrices from the current input layer to the hidden layer and from the previous hidden layer to the current hidden layer, respectively. Finally, the hidden state h of the unit... i The output activation target is to receive And obtained from equation (9)
[0025]
[0026] in This represents the Hadamard operator. In summary, the feature maps f of all bands are input into the deep feature fusion subnetwork, and the feature fusion results are obtained according to formulas (6)-(9). Then, global average pooling (GAP) is used to fix the output feature fusion result at the target resolution across multiple sequence dimensions, as shown below:
[0027]
[0028] In step (4), a feature restoration subnetwork is constructed. Its main structure consists of a single convolutional layer. A 1*1 convolutional kernel is used to restore the fusion result of the feature fusion subnetwork to a multispectral image of the target resolution. Attached Figure Description
[0029] Figure 1 Flowchart of the principle of the full-color sharpening model based on multi-sequence recurrent convolutional neural network
[0030] Figure 2(a) shows the original MS image from the QuickBird satellite dataset.
[0031] Figure 2(b) shows the original PAN image from the QuickBird satellite dataset.
[0032] Figure 2(c) shows the MS image after upsampling from the QuickBird satellite dataset.
[0033] Figure 2(d) shows the original MS image from the QuickBird satellite dataset.
[0034] Figure 2(e) shows the original PAN image from the QuickBird satellite dataset.
[0035] Figure 2(f) shows the MS image after upsampling from the QuickBird satellite dataset.
[0036] Figure 3(a) shows the down-resolution results of the QuickBird satellite dataset based on the PNN method.
[0037] Figure 3(b) shows the down-resolution results of the QuickBird satellite dataset based on the PanNet method.
[0038] Figure 3(c) shows the down-resolution results of the QuickBird satellite dataset based on the MSDCNN method.
[0039] Figure 3(d) shows the down-resolution results of the QuickBird satellite dataset based on the TFNet method.
[0040] Figure 3(e) shows the down-resolution results of the QuickBird satellite dataset based on the GPPNN method.
[0041] Figure 3(f) shows the down-resolution results of the QuickBird satellite dataset based on the MCRNN method.
[0042] Figure 4(a) shows the full-resolution results of the QuickBird satellite dataset based on the PNN method.
[0043] Figure 4(b) shows the full-resolution results of the QuickBird satellite dataset based on the PanNet method.
[0044] Figure 4(c) shows the full-resolution results of the QuickBird satellite dataset based on the MSDCNN method.
[0045] Figure 4(d) shows the down-resolution results of the QuickBird satellite dataset based on the TFNet method.
[0046] Figure 4(e) shows the full-resolution results of the QuickBird satellite dataset based on the GPPNN method.
[0047] Figure 4(f) shows the full-resolution results of the QuickBird satellite dataset based on the MCRNN method. Detailed Implementation
[0048] The full-color sharpening method based on multi-sequence convolutional recurrent neural networks includes the following steps:
[0049] (1) First, the original multispectral image is upsampled to the size of the panchromatic image using bicubic interpolation according to the scaling factor S. Then, the multispectral image and the panchromatic image are stacked to form multi-sequence data x. i The specific steps for stacking panchromatic and multispectral images to form multi-sequence data S are as follows: Assume g PAN (H×W×1) is a panchromatic image containing one spectral band. It is the original multispectral image containing M spectral bands, which, after bilinear interpolation using a scaling factor S, yields... It is a multispectral image with the size of a panchromatic image containing B spectral bands. The panchromatic image and the multispectral image are then stacked to form multi-sequence data S. This process can be expressed by the following formula.
[0050]
[0051]
[0052] (2) Construct a feature extraction subnetwork and process the multi-sequence data x i Shallow features for each band are extracted from the input feature extraction subnetwork. The main structure of the feature extraction subnetwork consists of two consecutive convolutional layers, with two residual blocks embedded between these two convolutional layers. Each residual block mainly comprises two convolutional layers and a skip connection, with an activation layer embedded after each convolutional layer. The working principle of the residual block can be described by the following formula.
[0053] y i =h(x i )+R(x i B i (3)
[0054] x i+1 =F(y i (4)
[0055] Where x i and x i+1 These are the inputs and outputs of the residual block, R(.) is the residual function, and F(y) is the residual function. i h(x) is the activation function. i ) is the identity mapping function. In the model, the parameters of the residual blocks are shared to maintain a relative balance between fusion accuracy and the number of parameters. Therefore, the multi-sequence data x i The feature extraction network φ is input sequentially to obtain feature maps f for each band. i (i = 1, 2, ..., M+1), the entire process can be represented as:
[0056] f i =φ(x i ), i = 1, 2, ..., M+1 (5)
[0057] It should be noted that the kernel size of the convolutional layers mentioned above is 3*3.
[0058] (3) Construct a feature fusion subnetwork, model the intra-band and inter-band relationships of the sequence features output by the feature extraction subnetwork, capture the local dependence and global correlation between different bands, thereby improving the performance of full-color sharpening.
[0059] The main structure of the constructed feature fusion subnetwork is a convGRU unit, and its working principle is as follows: Assume φ is the feature map. The update gate and reset gate are z... i and r i Update Gate Z i Is it an update to activate h? iThe logic gate at time z i Represented as:
[0060] z i =σ(W z *f i +U z *h i-1 (6)
[0061] Where σ is the activation function. In the update gate, W z and U z h represents the weight matrices from the current input layer to the hidden layer and from the previous hidden layer to the current hidden layer, respectively. i-1 This is the output activation value of the previous hidden layer. * indicates the convolution operator. Reset gate r i When determining candidate activations, it is necessary to estimate whether to discard previous activations h. i Then r i It can be defined as:
[0062] r i =σ(W r *f i +U r *h i-1 (7)
[0063] In the reset door, W r and U r These represent the weight matrices from the current input layer to the hidden layer and from the previous hidden layer to the current hidden layer, respectively. Candidate activations Intended to receive [f i h i-1 ], and is given by the following formula:
[0064]
[0065] Among them W h and U h These also represent the weight matrices from the current input layer to the hidden layer and from the previous hidden layer to the current hidden layer, respectively. Finally, the hidden state h of the unit... i The output activation target is to receive And obtained from equation (9)
[0066]
[0067] in This represents the Hadamard operator. In summary, the feature maps f of all bands are input into the deep feature fusion subnetwork, and the feature fusion results are obtained according to formulas (6)-(9). Then, global average pooling (GAP) is used to fix the output feature fusion result at the target resolution across multiple sequence dimensions, as shown below:
[0068]
[0069] (4) Construct a feature restoration subnetwork and restore the fusion result output by the feature fusion subnetwork into a multispectral image with high spatial resolution and high spectral resolution.
[0070] Figure 1 This is a flowchart illustrating the principle of the full-color sharpening model based on a multi-sequence recurrent convolutional neural network proposed in this invention.
[0071] Figure 2(a) shows the original MS image of the QuickBird satellite dataset; Figure 2(b) shows the original PAN image of the QuickBird satellite dataset; Figure 2(c) shows the down-resolution MS image of the QuickBird satellite dataset; Figure 2(d) shows the down-resolution PAN image of the QuickBird satellite dataset.
[0072] Figure 3(a) shows the resolution reduction results of the QuickBird satellite dataset based on the PNN method; Figure 3(b) shows the resolution reduction results of the QuickBird satellite dataset based on the PanNet method; Figure 3(c) shows the resolution reduction results of the QuickBird satellite dataset based on the MSDCNN method; Figure 3(d) shows the resolution reduction results of the QuickBird satellite dataset based on the TFNet method; Figure 3(e) shows the resolution reduction results of the QuickBird satellite dataset based on the GPPNN method; Figure 3(f) shows the resolution reduction results of the QuickBird satellite dataset based on the MCRNN method.
[0073] Figure 4(a) shows the full-resolution results of the QuickBird satellite dataset based on the PNN method; Figure 4(b) shows the full-resolution results of the QuickBird satellite dataset based on the PanNet method; Figure 4(c) shows the full-resolution results of the QuickBird satellite dataset based on the MSDCNN method; Figure 4(d) shows the full-resolution results of the QuickBird satellite dataset based on the TFNet method; Figure 4(e) shows the full-resolution results of the QuickBird satellite dataset based on the GPPNN method; Figure 4(f) shows the full-resolution results of the QuickBird satellite dataset based on the MCRNN method.
[0074] Table 1 shows the evaluation metrics for resolution reduction in the pancolor sharpening results, including SAM, ERGAS, Q4, and CC. Table 2 shows the evaluation metrics for full resolution in the pancolor sharpening results, including D at full resolution. λ D SAnd QNR. Tables 1 and 2 show the performance metrics of the six deep learning-based panchromatic sharpening methods in Experiments 1 and 2 at down-resolution and full resolution, respectively. The proposed Multi-Sequence Convolutional Recurrent Network for Panchromatic Sharpening (MCRNN) achieves the lowest SAM and ERGAS and the highest Q4 and CC at down-resolution among all methods; and the lowest D at full resolution. λ D S And the highest QNR. Overall, MTF-GLP-HPM-DS-REG demonstrates good performance and stability in both resolution-based and full-resolution evaluations.
[0075] Table 1. Results of the down-resolution experiment
[0076]
[0077] Table 2 Full-resolution experimental results indicators
[0078]
Claims
1. A full-color sharpening method based on a multi-sequence recurrent convolutional neural network, comprising the following steps: (1) First, the original multispectral image is upsampled to the size of the panchromatic image using bicubic interpolation according to the scaling factor S. Then, the multispectral image and the panchromatic image are stacked to form multi-sequence data. ; (2) Construct a feature extraction subnetwork and process the multi-sequence data. The shallow features of each band are extracted from the feature extraction subnetwork. (3) Construct a feature fusion subnetwork, model the intra-band and inter-band relationships of the sequence features output by the feature extraction subnetwork, capture the local dependence and global correlation between different bands, thereby improving the performance of full-color sharpening; The constructed feature fusion subnetwork includes convGRU units, let... It's a feature map; the update gate and the reset gate are respectively... and Update Gate It's an update / activation. Logic gates at time, Represented as: (6) in It's the activation function, in the update gate. and These represent the weight matrices from the current input layer to the hidden layer and from the previous hidden layer to the current hidden layer, respectively. It is the output activation value of the previous hidden layer; Represents the convolution operator; reset gate When determining candidate activations, consider whether to discard previous activations. ,but Defined as: (7) in, In the reset door, and Let these represent the weight matrices from the current input layer to the hidden layer and from the previous hidden layer to the current hidden layer, respectively, and the candidate activations. Intended to receive And given by the following formula: (8) in and These also represent the weight matrices from the current input layer to the hidden layer and from the previous hidden layer to the current hidden layer, respectively. Finally, the hidden state of the unit. The output activation target is to receive And obtained from equation (9) (9) in Represents the Hadamard operator; In summary, the feature maps of all bands Input the feature fusion subnetwork and obtain the feature fusion result according to formulas (6)-(9). Then, global average pooling is used to fix the output feature fusion result at the target resolution across multiple sequence dimensions, as shown below: (10); (4) Construct a feature restoration subnetwork and restore the fusion result output by the feature fusion subnetwork into a multispectral image with high spatial resolution and high spectral resolution.
2. The method as described in claim 1, characterized in that, In step (1), let... It is a panchromatic image containing one spectral band. It contains The original multispectral image of each spectral band, according to the scaling factor After bicubic interpolation, we get It contains A multispectral image of size 10000 spectral bands is generated, and then the panchromatic and multispectral images are stacked to form multi-sequence data S. This process is represented by the following formula. (1) (2)。 3. The method as described in claim 2, characterized in that, In step (2), the structure of the feature extraction subnetwork is: two consecutive convolutional layers, with two residual blocks embedded between these two convolutional layers. Each residual block consists of two convolutional layers and a skip connection. An activation layer is embedded after each convolutional layer. The residual blocks satisfy the following conditions: (3) (4) in and These are the inputs and outputs of the residual block. It is the residual function. It is an activation function. It is the identity mapping function; in the model, the parameters of the residual blocks are shared to maintain a relative balance between fusion accuracy and the number of parameters; therefore, multi-sequence data... Input into the feature extraction network sequentially The feature maps of each band are obtained. The entire process can be represented as follows: (5) The kernel size of the convolutional layers is 3*3.
4. The method as described in claim 3, characterized in that, In step (4), a feature recovery subnetwork is constructed, which includes a convolutional layer and uses a convolutional kernel of size 1*1 to restore the fusion result of the feature fusion subnetwork to a multispectral image of the target resolution size.