Frequency-space joint image fusion method, device and electronic equipment
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- HUIJING FUTURE (XIAN) TECHNOLOGY CO LTD
- Filing Date
- 2025-10-22
- Publication Date
- 2026-06-26
AI Technical Summary
In existing technologies, convolutional neural networks struggle to model global semantics, and the encoder computational overhead is too high, resulting in image fusion methods lacking detailed texture information and producing poor-quality fused images.
A frequency-space joint image fusion method is adopted. Optical images and synthetic aperture radar (SAR) images are respectively input into the shallow and deep feature extraction modules in the image fusion model. Feature information is extracted using semi-instance normalized residual units and frequency domain information enhancement units. The complementary fusion of frequency domain and spatial domain features is achieved through dual-domain feature fusion module and fused image reconstruction module.
It achieves efficient joint modeling of frequency domain and spatial domain features, strengthens the expression of high-frequency texture and structural features, and improves the quality and global consistency of fused images.
Smart Images

Figure CN121353094B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of image processing technology, and in particular to a frequency-space combined image fusion method, apparatus and electronic device. Background Technology
[0002] Image fusion enhances the performance of perception and vision tasks by integrating complementary information from multiple image sources. As a typical task, optical-synthetic aperture radar (SAR) fusion combines the texture and color advantages of optics with the all-weather imaging and structural information of SAR, making it valuable in military reconnaissance, target detection, and environmental monitoring.
[0003] While deep neural networks have driven the development of multimodal fusion, existing methods still have the following shortcomings: Convolutional Neural Networks (CNNs) struggle to model global semantics, and the encoder (Transformer) has excessively high computational overhead. Moreover, CNNs and Transformers are limited to the spatial domain, lacking detailed texture information, resulting in poor quality fused images. Summary of the Invention
[0004] This invention provides a frequency-space joint image fusion method, apparatus, and electronic device to solve the problem of poor quality of fused images due to lack of detailed texture information in the prior art.
[0005] This invention provides a frequency-space joint image fusion method, comprising:
[0006] Optical images and synthetic aperture radar (SAR) images are respectively input into the shallow feature extraction module in the image fusion model to obtain the first shallow feature information of the optical image and the second shallow feature information of the SAR image output by the shallow feature extraction module.
[0007] The first shallow feature information and the second shallow feature information are respectively input into the deep feature extraction module in the image fusion model to obtain the first deep feature information of the optical image and the second deep feature information of the SAR image output by the deep feature extraction module;
[0008] The first deep feature information and the second deep feature information are respectively input into the dual-domain feature fusion module in the image fusion model to obtain the first shallow fusion feature information and the second shallow fusion feature information output by the dual-domain feature fusion module.
[0009] The first shallow fusion feature information and the second shallow fusion feature information are input into the fusion image reconstruction module in the image fusion model to obtain the fusion image output by the fusion image reconstruction.
[0010] The image fusion model is trained based on sample optical images and sample SAR images.
[0011] According to the present invention, a frequency-space joint image fusion method is provided, wherein the shallow feature extraction module includes a semi-instance normalized residual unit and a frequency domain information enhancement unit;
[0012] An optical image is input into a shallow feature extraction module in an image fusion model to obtain the first shallow feature information of the optical image output by the shallow feature extraction module, including:
[0013] The optical image is input into the semi-instance normalized residual unit to obtain the residual feature information output by the semi-instance normalized residual unit;
[0014] The residual feature information is input into the frequency domain information enhancement unit to obtain the first shallow feature information of the optical image output by the frequency domain information enhancement unit.
[0015] According to the present invention, a frequency-space joint image fusion method is provided, wherein the semi-instance normalized residual unit includes at least one convolutional layer, at least one activation function, and at least one semi-channel normalization subunit;
[0016] The step of inputting the optical image into the semi-instance normalized residual unit to obtain the residual feature information output by the semi-instance normalized residual unit includes:
[0017] The optical image is input into the first convolutional layer to obtain the first convolutional feature information output by the first convolutional layer.
[0018] The first convolutional feature information is input into the first half-channel normalized subunit to obtain the first channel normalized feature information output by the first half-channel normalized subunit;
[0019] The first channel normalized feature information is input into the second half-channel normalization subunit to obtain the second channel normalized feature information output by the second half-channel normalization subunit;
[0020] The normalized feature information of the second channel is input into the third half-channel normalization subunit to obtain the normalized feature information of the third channel output by the third half-channel normalization subunit.
[0021] The normalized feature information of the third channel is input into the second convolutional layer to obtain the second convolutional feature information output by the second convolutional layer;
[0022] The second convolutional feature information is input into the third convolutional layer to obtain the residual feature information output by the third convolutional layer.
[0023] According to the present invention, a frequency-space joint image fusion method includes inputting the residual feature information into the frequency domain information enhancement unit to obtain the first shallow feature information of the optical image output by the frequency domain information enhancement unit, comprising:
[0024] The residual feature information is subjected to FFT transformation to obtain the first frequency domain feature information;
[0025] Based on the first frequency domain feature information, the first amplitude and the first phase are extracted;
[0026] The first amplitude and the first phase are concatenated along the channel dimension to obtain the first frequency domain feature tensor.
[0027] The first frequency domain feature tensor is input into the multilayer perceptron to obtain the perceptron features output by the multilayer perceptron.
[0028] Based on the perceived features, the second amplitude and the second phase are determined;
[0029] Based on the second amplitude and the second phase, determine the second frequency domain feature tensor;
[0030] Perform an inverse Fourier transform on the second frequency domain feature tensor to obtain the spatial domain features;
[0031] Based on the spatial domain features and the residual feature information, the first shallow layer feature information of the optical image is determined.
[0032] According to the present invention, a frequency-space joint image fusion method is provided, wherein the deep feature extraction module includes a sequence embedding unit and at least one global feature capture unit;
[0033] The first shallow feature information is input into the deep feature extraction module in the image fusion model to obtain the first deep feature information of the optical image output by the deep feature extraction module, including:
[0034] The first shallow feature information is input into the sequence embedding unit to obtain the feature sequence output by the sequence embedding unit;
[0035] The feature sequence is input to the at least one global feature capture unit to obtain the first deep feature information of the optical image output by the at least one global feature capture unit.
[0036] According to a frequency-space joint image fusion method provided by the present invention, the dual-domain feature fusion module includes: a channel conversion unit, a frequency domain fusion unit, and two shallow fusion units;
[0037] The first deep feature information and the second deep feature information are respectively input into the dual-domain feature fusion module in the image fusion model to obtain the first shallow fusion feature information and the second shallow fusion feature information output by the dual-domain feature fusion module, including:
[0038] The first deep feature information and the second deep feature information are respectively input to the channel conversion unit to obtain the first exchange information and the second exchange information output by the channel conversion unit;
[0039] The first deep feature information and the second deep feature information are respectively input into the frequency domain fusion unit to obtain the second frequency domain feature information output by the frequency domain fusion unit;
[0040] The second frequency domain feature information and the first exchange information are input into the first shallow fusion unit to obtain the first shallow fusion feature information output by the first shallow fusion unit;
[0041] The second frequency domain feature information and the second exchange information are input into the second shallow fusion unit to obtain the second shallow fusion feature information output by the second shallow fusion unit.
[0042] According to the present invention, a frequency-space joint image fusion method includes inputting the first deep feature information and the second deep feature information into the frequency domain fusion unit to obtain the second frequency domain feature information output by the frequency domain fusion unit, comprising:
[0043] Convolutions are performed on the first deep feature information and the second deep feature information respectively to obtain the third convolution feature information corresponding to the first deep feature information and the fourth convolution feature information corresponding to the second deep feature information;
[0044] FFT transformations are performed on the third convolutional feature information and the fourth convolutional feature information respectively to obtain the third frequency domain feature information corresponding to the third convolutional feature information and the fourth frequency domain feature information corresponding to the fourth convolutional feature information;
[0045] Based on the third frequency domain feature information and the fourth frequency domain feature information, the third amplitude and third phase corresponding to the third frequency domain feature information, and the fourth amplitude and fourth phase corresponding to the fourth frequency domain feature information are determined.
[0046] The third amplitude and the fourth amplitude are fused to obtain the fused amplitude;
[0047] The third phase and the fourth phase are fused to obtain a fused phase;
[0048] The second frequency domain feature information is determined based on the fusion amplitude and the fusion phase.
[0049] According to the present invention, a frequency-space joint image fusion method includes inputting the first shallow fusion feature information and the second shallow fusion feature information into the fused image reconstruction module in the image fusion model to obtain the fused image output by the fused image reconstruction, comprising:
[0050] Based on the first shallow fusion feature information and the second shallow fusion feature information, the third shallow fusion feature information is determined;
[0051] Based on the third shallow-layer fusion feature information, the restored image is determined;
[0052] The restored image is convolved to obtain the fused image.
[0053] The present invention also provides a frequency-space joint image fusion apparatus, comprising:
[0054] The first feature extraction module is used to input the optical image and the synthetic aperture radar (SAR) image into the shallow feature extraction module in the image fusion model, respectively, to obtain the first shallow feature information of the optical image and the second shallow feature information of the SAR image output by the shallow feature extraction module.
[0055] The second feature extraction module is used to input the first shallow feature information and the second shallow feature information into the deep feature extraction module in the image fusion model, respectively, to obtain the first deep feature information of the optical image and the second deep feature information of the SAR image output by the deep feature extraction module;
[0056] The feature fusion module is used to input the first deep feature information and the second deep feature information into the dual-domain feature fusion module in the image fusion model, respectively, to obtain the first shallow fusion feature information and the second shallow fusion feature information output by the dual-domain feature fusion module.
[0057] An image reconstruction module is used to input the first shallow fusion feature information and the second shallow fusion feature information into the fusion image reconstruction module in the image fusion model to obtain the fusion image output by the fusion image reconstruction.
[0058] The image fusion model is trained based on sample optical images and sample SAR images.
[0059] The present invention also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the frequency-space joint image fusion method as described above.
[0060] The present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the frequency-space joint image fusion method as described above.
[0061] The present invention also provides a computer program product, including a computer program that, when executed by a processor, implements the frequency-space joint image fusion method as described above.
[0062] The frequency-space joint image fusion method, apparatus, and electronic device provided by this invention involve inputting optical images and SAR images into a shallow feature extraction module in an image fusion model to obtain first shallow feature information of the optical image and second shallow feature information of the SAR image output by the shallow feature extraction module; inputting the first shallow feature information and the second shallow feature information into a deep feature extraction module in the image fusion model to obtain first deep feature information of the optical image and second deep feature information of the SAR image output by the deep feature extraction module; and inputting the first deep feature information and the second deep feature information into a dual-domain feature fusion module in the image fusion model to obtain dual-domain feature information. The first and second shallow fusion feature information output by the feature fusion module are input into the fusion image reconstruction module in the image fusion model to obtain the fused image output by the fusion image reconstruction. Since the image fusion model is trained based on sample optical images and sample SAR images, the shallow feature extraction module, deep feature extraction module, dual-domain feature fusion module and fusion image reconstruction module in the image fusion model are used to achieve efficient joint modeling of frequency domain and spatial features, strengthen the expression of high-frequency texture and structural features, and thus achieve complementary fusion of frequency domain and spatial domain features. While ensuring global consistency, the fusion performance is further improved, and the quality of the fused image is improved. Attached Figure Description
[0063] To more clearly illustrate the technical solutions in this invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.
[0064] Figure 1 This is a schematic flowchart of the frequency-space joint image fusion method provided by the present invention;
[0065] Figure 2 This is a schematic diagram of the structure of the semi-instance normalized residual unit provided by the present invention;
[0066] Figure 3 This is a schematic diagram of the frequency domain information enhancement unit provided by the present invention;
[0067] Figure 4 This is a schematic diagram of the frequency domain fusion unit provided by the present invention;
[0068] Figure 5 This is a schematic diagram of the image fusion model provided by the present invention;
[0069] Figure 6 This is a schematic diagram showing the comparison results of different fusion methods provided by the present invention;
[0070] Figure 7 This is a schematic diagram of the image fusion device with frequency-space integration provided by the present invention;
[0071] Figure 8 This is a schematic diagram of the structure of the electronic device provided by the present invention. Detailed Implementation
[0072] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of this invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this invention. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without creative effort are within the scope of protection of this invention.
[0073] The following is combined with Figures 1-6 The present invention describes a frequency-space joint image fusion method.
[0074] Figure 1 This is a schematic flowchart of the frequency-space joint image fusion method provided by the present invention, as shown below. Figure 1 As shown, the method includes steps 101-104.
[0075] Step 101: Input the optical image and SAR image into the shallow feature extraction module in the image fusion model respectively to obtain the first shallow feature information of the optical image and the second shallow feature information of the SAR image output by the shallow feature extraction module.
[0076] Specifically, this invention proposes an image fusion model (FMambaFuse), which fuses optical images and SAR images to obtain a fused image through a shallow feature extraction module, a deep feature extraction module, a dual-domain feature fusion module, and a fused image reconstruction module. The shallow feature extraction module extracts spatial domain and frequency domain feature information from the optical and SAR images respectively, and then fuses these spatial and frequency domain feature information to obtain shallow feature information.
[0077] By inputting the optical image and the SAR image into the shallow feature extraction module of the image fusion model, the first shallow feature information of the optical image and the second shallow feature information of the SAR image can be obtained from the output of the shallow feature extraction module. The first shallow feature information and the second shallow feature information are preliminary spatial detail information.
[0078] Step 102: Input the first shallow feature information and the second shallow feature information into the deep feature extraction module in the image fusion model, respectively, to obtain the first deep feature information of the optical image and the second deep feature information of the SAR image output by the deep feature extraction module.
[0079] Specifically, the deep feature extraction module is used to extract high-level features from optical images and SAR images respectively, and to capture two-dimensional global features to obtain deep feature information.
[0080] By inputting the first shallow feature information and the second shallow feature information into the deep feature extraction module in the image fusion model, the first deep feature information of the optical image and the second deep feature information of the SAR image can be obtained from the output of the deep feature extraction module. The first deep feature information refers to the deep optical image information of the optical image and the second deep feature information refers to the deep SAR image information of the SAR image.
[0081] Step 103: Input the first deep feature information and the second deep feature information into the dual-domain feature fusion module in the image fusion model to obtain the first shallow fusion feature information and the second shallow fusion feature information output by the dual-domain feature fusion module.
[0082] Specifically, the dual-domain feature fusion module is used to fuse the first deep feature information and the second deep feature information to obtain shallow fused features.
[0083] By inputting the first deep feature information and the second deep feature information into the dual-domain feature fusion module in the image fusion model, the first shallow fusion feature information and the second shallow fusion feature information output by the dual-domain feature fusion module can be obtained.
[0084] Step 104: Input the first shallow fusion feature information and the second shallow fusion feature information into the fusion image reconstruction module in the image fusion model to obtain the fusion image output by the fusion image reconstruction; wherein, the image fusion model is trained based on sample optical images and sample SAR images.
[0085] Specifically, the fused image reconstruction module is used to reconstruct the image based on the first shallow fusion feature information and the second shallow fusion feature information to obtain the fused image. The image fusion model is trained based on sample optical images and sample SAR images.
[0086] By inputting the first and second shallow fusion feature information into the fusion image reconstruction module in the image fusion model, a fusion image can be obtained as the output of the fusion image reconstruction.
[0087] The frequency-space joint image fusion method provided by this invention involves inputting optical images and SAR images into a shallow feature extraction module in an image fusion model, respectively, to obtain first shallow feature information of the optical image and second shallow feature information of the SAR image output by the shallow feature extraction module; inputting the first and second shallow feature information into a deep feature extraction module in the image fusion model, respectively, to obtain first deep feature information of the optical image and second deep feature information of the SAR image output by the deep feature extraction module; and inputting the first and second deep feature information into a dual-domain feature fusion module in the image fusion model, respectively, to obtain the dual-domain feature fusion module. The first and second shallow fusion feature information are output by the block; the first and second shallow fusion feature information are input into the fusion image reconstruction module in the image fusion model to obtain the fused image output by the fusion image reconstruction; since the image fusion model is trained based on sample optical images and sample SAR images, the shallow feature extraction module, deep feature extraction module, dual-domain feature fusion module and fusion image reconstruction module in the image fusion model are used to realize efficient joint modeling of frequency domain and spatial features, strengthen the expression of high-frequency texture and structural features, and thus realize complementary fusion of frequency domain and spatial domain features, ensuring global consistency while further improving fusion performance and improving the quality of the fused image.
[0088] Optionally, the shallow feature extraction module includes a half-normalized feature extraction (HFE) unit and a frequency-domain enhancement unit (FEU); the specific implementation of step 101 above includes:
[0089] The optical image is input to the semi-instance normalized residual unit to obtain the residual feature information output by the semi-instance normalized residual unit; the residual feature information is input to the frequency domain information enhancement unit to obtain the first shallow feature information of the optical image output by the frequency domain information enhancement unit.
[0090] Specifically, by inputting the optical image into the semi-instance normalized residual unit, the residual feature information output by the semi-instance normalized residual unit can be obtained; by inputting the residual feature information into the frequency domain information enhancement unit, the first shallow feature information of the optical image output by the frequency domain information enhancement unit can be obtained.
[0091] Optionally, the semi-instance normalized residual unit includes at least one convolutional layer, at least one activation function, and at least one half-channel normalization sub-unit; the step of inputting the optical image into the semi-instance normalized residual unit to obtain the residual feature information output by the semi-instance normalized residual unit includes:
[0092] (1) Input the optical image into the first convolutional layer to obtain the first convolutional feature information output by the first convolutional layer.
[0093] Specifically, the convolutional layers are CNN layers, such as 3x3 convolutional layers; the activation function is Leaky ReLU. The number of convolutional layers and activation functions can be set according to the actual situation and there is no limit to this.
[0094] By inputting the optical image into the first convolutional layer, we can obtain the first convolutional feature information output by the first convolutional layer. For example, the first convolutional feature information Xconv is 4* 30*128 *128.
[0095] (2) Input the first convolutional feature information into the first half-channel normalization subunit to obtain the first channel normalized feature information output by the first half-channel normalization subunit.
[0096] Specifically, the half-channel normalization subunit includes two convolutional layers, two activation functions, a half-channel normalization module, an instance normalization module, and a concatenation module; wherein, the first convolutional feature information is input into the first convolutional layer, the convolutional information output from the first convolutional layer is input into the first activation function, and the result output from the first activation function is... X act The input is fed into the half-channel normalization module, which then performs the normalization. X act Perform half-channel normalization, that is, use Will X actThe channel dimension is divided into two, and the instance normalization module applies instance normalization operation to half of the channels, i.e. The other half of the channel remains unchanged. Then the splicing module uses... By concatenating these two feature components along the channel dimension, we obtain... X act Where InstanceNorm represents instance normalization, µ and σ 2 These are the mean and variance, respectively. γ and β For learnable parameters, ε For a constant to prevent division by zero (e.g., 10) -5 Then... X act The first channel normalized feature information is obtained through the second convolutional layer and the second activation function.
[0097] (3) Input the first channel normalized feature information into the second half-channel normalized sub-unit to obtain the second channel normalized feature information output by the second half-channel normalized sub-unit.
[0098] Specifically, based on the description of step (2) above, the normalized feature information of the first channel is input into the normalized feature information of the second half-channel, and the normalized feature information of the second channel output by the normalized feature information of the second half-channel is obtained.
[0099] (4) Input the normalized feature information of the second channel into the third half-channel normalization sub-unit to obtain the normalized feature information of the third channel output by the third half-channel normalization sub-unit.
[0100] Specifically, based on the description of step (2) above, the normalized feature information of the second channel is input into the normalized feature information of the third half-channel, and the normalized feature information of the third channel output by the normalized feature information of the third half-channel is obtained.
[0101] (5) Input the normalized feature information of the third channel into the second convolutional layer to obtain the second convolutional feature information output by the second convolutional layer.
[0102] Specifically, by inputting the normalized feature information of the third channel into the second convolutional layer, the second convolutional feature information output by the second convolutional layer can be obtained.
[0103] (6) Input the second convolutional feature information into the third convolutional layer to obtain the residual feature information output by the third convolutional layer.
[0104] Specifically, by inputting the second convolutional feature information into the third convolutional layer, the residual feature information output by the third convolutional layer can be obtained.
[0105] In this invention, the HFE module performs normalization on half of the features to stabilize the distribution, reducing the difference in feature value ranges between different samples and different training stages, while retaining the other half of the unnormalized features to preserve key information and context. Finally, two convolutional layers are used for further detail extraction.
[0106] Figure 2 This is a schematic diagram of the structure of the semi-instance normalized residual unit provided by the present invention, as shown below. Figure 2 As shown, the semi-instance normalization residual unit (HFE) includes 3 CNN layers, 2 Leaky ReLU activation functions, and 3 semi-channel normalization subunits. Each semi-channel normalization subunit includes 2 CNN layers, 2 Leaky ReLU activation functions, a semi-channel normalization module (Norm), an instance normalization module, and a stitching module. Specifically, the optical image X is input to the CNN layer to obtain the first convolutional feature information X output by the first CNN layer. conv ; the first convolution feature information X conv The input is fed into the first CNN layer in the first half-channel normalized subunit, and the output of the first CNN layer is fed into the first Leaky ReLU activation function. X act The input is fed into the half-channel normalization module, which then performs the normalization. X act Perform half-channel normalization, that is, use Will X act The channel dimension is divided into two, and Norm applies instance normalization to one half of the channels, i.e. The other half of the channel remains unchanged. Then the splicing module uses... By concatenating these two features along the channel dimension, we obtain... X act Where InstanceNorm represents instance normalization, µ and σ 2 These are the mean and variance, respectively. γ and β For learnable parameters, ε For a constant to prevent division by zero (e.g., 10) -5 Then... X actThe first channel normalized feature information is obtained through the second CNN layer and the second Leaky ReLU. This first channel normalized feature information is then input into the second half-channel normalization subunit to obtain the second channel normalized feature information output by the second half-channel normalization subunit. The second channel normalized feature information is then input into the third half-channel normalization subunit to obtain the third channel normalized feature information output by the third half-channel normalization subunit. This third channel normalized feature information is then input into the second CNN layer in the residual unit of the semi-instance normalization, obtaining the convolution information output by the second CNN layer. This convolution information is then input into the second Leaky ReLU activation function, and the result output by the second Leaky ReLU activation function is... The input is fed into the third CNN layer, and the output of the third CNN layer is then fed into the third Leaky ReLU activation function to obtain the residual feature information. X HFE In this way, the HFE module performs normalization on half of the channel features to stabilize the distribution, while retaining the unnormalized channel features in the other half to preserve key information and context. This design allows the HFE module to enjoy the stability and feature generalization advantages of instance normalization without completely losing scale and intensity information, thus maintaining image details and content even at a shallow level.
[0107] Optionally, the step of inputting the residual feature information into the frequency domain information enhancement unit to obtain the first shallow feature information of the optical image output by the frequency domain information enhancement unit includes:
[0108] The residual feature information is subjected to FFT transformation to obtain first frequency domain feature information; based on the first frequency domain feature information, a first amplitude and a first phase are extracted; the first amplitude and the first phase are concatenated along the channel dimension to obtain a first frequency domain feature tensor; the first frequency domain feature tensor is input into a multilayer perceptron to obtain the perceptron output perceptron features; based on the perceptron features, a second amplitude and a second phase are determined; based on the second amplitude and the second phase, a second frequency domain feature tensor is determined; the second frequency domain feature tensor is subjected to inverse Fourier transform to obtain spatial domain features; based on the spatial domain features and the residual feature information, the first shallow layer feature information of the optical image is determined.
[0109] Figure 3 This is a schematic diagram of the structure of the frequency domain information enhancement unit provided by the present invention, as shown below. Figure 3 As shown, using residual feature information X HFEPerforming a Fast Fourier Transform (FFT) yields the first frequency domain feature information F, where, Indicates the frequency domain width. for W / 2+1. Based on the first frequency domain feature information F, extract the first amplitude from the complex-valued spectrum F. A and the first phase P ;use For the first amplitude A and the first phase P By concatenating along the channel dimension, we can obtain the first frequency domain feature tensor. F cat ; the first frequency domain feature tensor F cat By inputting the data into a multilayer perceptron (MLP), the perceptual features output by the MLP can be obtained. , It still has 2C channels, and then adopts Perceive features Divide it into two to obtain the second amplitude. Second phase This yields a new amplitude and phase. Based on the second amplitude... Second phase ,use The frequency domain representation can be reconstructed to determine the second frequency domain feature tensor. , For the second frequency domain feature tensor Performing the inverse Fast Fourier Transform (iFFT) yields the spatial domain features. X f ; spatial domain features X f and residual feature information X HFE Add them together to get the first sum. X concat1 , the first sum X concat1 The input is fed into the Cross Attention module to obtain the attention features output by the Cross Attention module. X attn Attention features X attn and residual feature information X HFE Add them together to get the second sum. X concat2 , the second sumX concat2 The input is fed into the Fusion Convolution module to obtain the first shallow feature information of the optical image output by the Fusion Convolution module. X h By selectively processing image features in the frequency domain—enhancing the overall structural information of the image in the low-frequency region and highlighting edges and details in the high-frequency region—while suppressing redundancy and noise, cross-modal feature complementarity and alignment are achieved. Compared to pure spatial domain fusion methods, frequency domain processing can more directly separate and utilize structural and detail information, resulting in a final fusion result that possesses both clear contours and rich textural details.
[0110] It should be noted that the Cross Attention module facilitates information exchange between the frequency domain and the spatial domain. Cross Attention helps the network extract and weight important information from features of different modalities, enabling the frequency-enhanced feature map to be effectively fused with the spatial domain features.
[0111] Optionally, the deep feature extraction module includes a sequence embedding unit and at least one global feature capture unit; the first shallow feature information is input into the deep feature extraction module in the image fusion model to obtain the first deep feature information of the optical image output by the deep feature extraction module, including:
[0112] The first shallow feature information is input to the sequence embedding unit to obtain the feature sequence output by the sequence embedding unit; the feature sequence is input to the at least one global feature capture unit to obtain the first deep feature information of the optical image output by the at least one global feature capture unit.
[0113] Specifically, the deep feature extraction module includes a sequence embedding unit (Patch embedding) and at least one global feature capture unit. The global feature capture unit is a 2D Selective Scan (SS2D) strategy module, i.e., a Mamba module, which captures the two-dimensional global features of the image. The number of global feature capture units can be set according to the actual situation; for example, the number of global feature capture units is 8.
[0114] The first shallow feature information is input into a sequence embedding unit, which splits the first shallow feature information into a sequence to obtain a feature sequence. The feature sequence is then input into at least one global feature capture unit to obtain the first deep feature information of the optical image output by at least one global feature capture unit. X SS2D ,Right now ,in,X h Represents the first shallow layer of feature information, Scan 2D Indicates 2D selective scan operation. SSM This represents a state-space model.
[0115] High-level features can be extracted by stacking multiple global feature capture units. As the Mamba block hierarchy deepens, its receptive field also expands with the increase of hierarchy, and the semantic level of representation becomes higher and higher. Mamba blocks have powerful long-range dependency modeling capabilities and can establish complex connections between various regions of an image.
[0116] Optionally, the dual-domain feature fusion module includes: a channel conversion unit, a frequency domain fusion unit, and two shallow fusion units; the first deep feature information and the second deep feature information are respectively input into the dual-domain feature fusion module in the image fusion model to obtain the first shallow fusion feature information and the second shallow fusion feature information output by the dual-domain feature fusion module, including:
[0117] (a) Input the first deep feature information and the second deep feature information into the channel conversion unit respectively to obtain the first exchange information and the second exchange information output by the channel conversion unit.
[0118] Specifically, the dual-domain feature fusion module includes: a channel exchange unit, a frequency-domain fusion unit (FFE), and two shallow fusion units.
[0119] The first and second deep feature information are respectively input to the channel conversion unit. The channel conversion unit swaps some feature information in the first and second deep feature information along the channel dimension to obtain the first and second swapped information output by the channel conversion unit. The first swapped information represents some feature information in the second deep feature information, and the second swapped information represents some feature information in the first deep feature information.
[0120] (b) Input the first deep feature information and the second deep feature information into the frequency domain fusion unit respectively to obtain the second frequency domain feature information output by the frequency domain fusion unit.
[0121] Specifically, by inputting the first deep feature information and the second deep feature information into the frequency domain fusion unit, the second frequency domain feature information output by the frequency domain fusion unit can be obtained.
[0122] (c) Input the second frequency domain feature information and the first exchange information into the first shallow fusion unit to obtain the first shallow fusion feature information output by the first shallow fusion unit.
[0123] Specifically, by inputting the second frequency domain feature information and the first exchange information into the first shallow fusion unit, the first shallow fusion feature information output by the first shallow fusion unit can be obtained.
[0124] (d) Input the second frequency domain feature information and the second exchange information into the second shallow fusion unit to obtain the second shallow fusion feature information output by the second shallow fusion unit.
[0125] Specifically, by inputting the second frequency domain feature information and the second exchange information into the second shallow fusion unit, the second shallow fusion feature information output by the second shallow fusion unit can be obtained.
[0126] Optionally, the step of inputting the first deep feature information and the second deep feature information into the frequency domain fusion unit respectively to obtain the second frequency domain feature information output by the frequency domain fusion unit includes:
[0127] Convolutional operations are performed on the first deep feature information and the second deep feature information to obtain the third convolutional feature information corresponding to the first deep feature information and the fourth convolutional feature information corresponding to the second deep feature information. FFT transformations are then performed on the third convolutional feature information and the fourth convolutional feature information to obtain the third frequency domain feature information corresponding to the third convolutional feature information and the fourth frequency domain feature information corresponding to the fourth convolutional feature information. Based on the third frequency domain feature information and the fourth frequency domain feature information, the third amplitude and third phase corresponding to the third frequency domain feature information, and the fourth amplitude and fourth phase corresponding to the fourth frequency domain feature information are determined. The third amplitude and the fourth amplitude are fused to obtain a fused amplitude. The third phase and the fourth phase are fused to obtain a fused phase. Based on the fused amplitude and the fused phase, the second frequency domain feature information is determined.
[0128] Figure 4 This is a schematic diagram of the frequency domain fusion unit provided by the present invention, as shown below. Figure 4As shown, convolutions are performed on the first and second deep feature information respectively, that is, the first and second deep feature information are input into a convolutional layer (Conv) to obtain the third convolutional feature information corresponding to the first deep feature information and the fourth convolutional feature information corresponding to the second deep feature information; the convolutional layer can be 1*1. FFT transformations are then performed on the third and fourth convolutional feature information respectively, such as a two-dimensional FFT (rFFT2), to obtain the third frequency domain feature information corresponding to the third convolutional feature information. The fourth frequency domain feature information corresponding to the fourth convolution feature information F 2, of which, , Based on third frequency domain feature information F 1 and fourth frequency domain feature information F 2. Extract third frequency domain feature information F 1 and fourth frequency domain feature information F The amplitude and phase of 2 are used to obtain the third frequency domain characteristic information. F The third amplitude corresponding to 1 A 1 and the third phase P 1. and fourth frequency domain feature information F The fourth amplitude corresponding to 2 A 2 and the fourth phase P 2.
[0129] For the third amplitude A 1 and fourth amplitude A 2. By performing fusion, the fusion amplitude can be obtained. A fuse It should be noted that the third amplitude A 1 and fourth amplitude A The fusion of 2 can introduce a grouped selective attention mechanism, that is, a third amplitude A 1 and fourth amplitude A 2. The splicing results in the spliced amplitude. A cat Then, the channel attention weights can be obtained through the attention module, and then normalized using Softmax. The normalized result is then combined with the concatenated amplitude. A cat The input is a weighted feature fusion module. This module multiplies the normalized result and the concatenated amplitude, and then performs convolution and activation to obtain the fused amplitude. A fuse For the third phase P Phase 1 and Phase 4 P 2. By fusing, a fused phase can be obtained. P fuseThe third phase is coming soon P Phase 1 and Phase 4 P 2. Perform splicing to obtain the spliced phase. And through the phase fusion network Conv pha Obtain the fusion phase The fusion range A fuse and fusion phase P fuse Inputting the data into the complex spectrum reconstruction module (Reconstruction) will yield the reconstruction information output by the module. Then rebuild information Perform inverse real Fourier transform (irFFT2) to restore the reconstructed information to the spatial domain energy map. To further optimize spatial details, a 2D selective scan (SS2D) module is introduced for residual refinement: Then, the second frequency domain feature information (outputs) is obtained through a convolutional layer (Conv). This frequency domain fusion unit effectively improves the integrity and contrast of the overall contour and enhances the clarity of texture and edges by performing weighted complementary fusion of low-frequency structural information and high-frequency detail features of optical and SAR images in the frequency domain. This provides a more discriminative feature representation for subsequent spatial domain reconstruction and cross-modal information integration.
[0130] Optionally, the step of inputting the first shallow fusion feature information and the second shallow fusion feature information into the fusion image reconstruction module in the image fusion model to obtain the fused image output by the fusion image reconstruction includes:
[0131] Based on the first shallow fusion feature information and the second shallow fusion feature information, a third shallow fusion feature information is determined; based on the third shallow fusion feature information, a restored image is determined; the restored image is convolved to obtain the fused image.
[0132] Specifically, the first and second shallow fusion feature information are added together and divided by 2 to obtain the third shallow fusion feature information. This third shallow fusion feature information is then input into at least one Mamba fuse module (e.g., five Mamba fuse modules). The Mamba fuse modules perform deep fusion on the third shallow fusion feature information. The output of each Mamba fuse module is then input into at least one Mamba module (e.g., eight Mamba modules). The Mamba modules perform feature enhancement on the output of each Mamba fuse module to obtain the restored image. The restored image is then convolved, i.e., input into at least one CNN layer (e.g., three CNN layers), to obtain the fused image. The image undergoes deep restoration through multiple Mamba layers. The restored image then undergoes a series of convolutional operations. The convolutional layers are used to extract finer image details, ultimately resulting in a high-quality fused image.
[0133] Figure 5 This is a schematic diagram of the image fusion model provided by the present invention, as shown below. Figure 5 As shown, the image fusion model includes a shallow feature extraction module, a deep feature extraction module, a dual-domain feature fusion module, and a fused image reconstruction module. The shallow feature extraction module includes a semi-instance normalized residual unit (HFE) and a frequency domain information enhancement unit (FEU). The deep feature extraction module includes a sequence embedding unit (patch embedding) and eight global feature capture units (SS2D). The dual-domain feature fusion module includes a channel exchange unit, a frequency domain fusion unit (FFE), and two shallow fusion units. The fused image reconstruction module includes five Mamba fusion units, eight Mamba fusion units, and three convolutional layers (CNN layers).
[0134] Specifically, the optical image (OPT) and SAR image are input into the semi-instance normalized residual unit (HFE) to obtain the residual feature information of the optical image and the SAR image, respectively. Then, the residual feature information of the optical image and the SAR image are input into the frequency domain information enhancement unit (FEU) to obtain the first shallow feature information of the optical image and the second shallow feature information of the SAR image output by the frequency domain information enhancement unit. The first and second shallow feature information are input into the sequence embedding unit to obtain the feature sequences corresponding to the first and second shallow feature information output by the sequence embedding unit. Finally, the feature sequences corresponding to the first and second shallow feature information are input into eight global feature acquisition units (SS2D) to obtain the first deep feature information of the optical image and the second deep feature information of the SAR image.
[0135] The first deep feature information and the second deep feature information are respectively input to the channel exchange unit to obtain the first exchange information and the second exchange information output by the channel exchange unit; the first deep feature information and the second deep feature information are respectively input to the frequency domain fusion unit (FFE) to obtain the second frequency domain feature information output by the frequency domain fusion unit; the second frequency domain feature information and the first exchange information are input to the first shallow fusion unit (Shallowfusion) (i.e., the Shallow fusion above) to obtain the first shallow fusion feature information output by the first shallow fusion unit; the second frequency domain feature information and the second exchange information are input to the second shallow fusion unit (Shallow fusion) (i.e., the Shallow fusion below) to obtain the second shallow fusion feature information output by the second shallow fusion unit; the first shallow fusion feature information and the second shallow fusion feature information are added together and divided by 2 to obtain the third shallow fusion feature information. The third shallow layer fusion feature information is then input into 5 Mamba fusion (fuse) modules, and the output of the 5 Mamba fuse modules is then input into 8 Mamba modules to obtain the restored image; the restored image is then input into 3 CNN layers to obtain the fused image.
[0136] In training the image fusion model, this invention randomly selects 1500 pairs of optical-SAR image data (i.e., optical and SAR image pairs) as training samples from the training dataset, and another 500 pairs of optical-SAR image data as the test set for evaluation. The batch size is 4, and each fusion task is trained for 40 iterations. During training, images are randomly cropped to 128×128 pixels and normalized to [0,1]. The Adam optimizer (learning rate 2×10⁻⁶) is used. -5 ).
[0137] Figure 6 This is a schematic diagram comparing the results of different fusion methods provided by the present invention, such as... Figure 6 As shown, different fusion methods (CDDfusion, Densefuse, Nestfuse, PSFuse, RFN-Nest, SwinFusion, U2Fusion, FusionMamba, MambaDfuse, and the method of this invention) were compared on the training dataset, with the region of interest highlighted by a white box. During the fusion process, the method of this invention demonstrated a significant advantage in preserving detail. In particular, when zoomed in to the region of interest (highlighted by a red box), the details appeared sharper, without noticeable blurring or artifacts. This was especially evident in textured and edge regions, where the method of this invention produced more natural and accurate results.
[0138] Table 1 summarizes the quantitative results of all methods on six widely used objective metrics, with the best values highlighted in bold. It can be seen that the method of this invention scores highest on the three core metrics of Information Entropy (EN), Standard Deviation (SD), and Spatial Frequencies (SF), with EN=7.03, SD=42.60, and SF=24.67. The results indicate that the fused image not only contains richer overall information but also has higher contrast and finer texture details. For the remaining metrics—Mutual Information (MI), Visual Information Fiedity (VIF), and the structural similarity-based index—although there is a slight performance gap compared to SwinFusion (the best-performing method on these metrics), the difference remains within 0.60. This demonstrates that our method maintains competitive performance across all evaluation criteria.
[0139] Table 1. Comparison of the image fusion model of this invention with other image fusion networks
[0140]
[0141] To further evaluate the contributions of each component in the model, a series of ablation experiments were conducted. In each experiment, a key module was removed or replaced, and the model was retrained under the same conditions to observe changes in fusion performance. The MambaDfuse network was used as the baseline to provide a standard starting point for comparison with models containing additional modules.
[0142] Table 2. Impact of Ablation of Each Module on Fusion Performance (Key Indicators)
[0143]
[0144] SFEM enhances frequency domain information, particularly by preserving low-frequency components, improving detail and global information. FFE improves structure fidelity (SF) and detail by compensating for the limitations of Fourier transform in frequency fusion, while also providing sufficient information entropy (EN). The SS2D module significantly enhances global correlation and detail extraction capabilities. Compared to the previous two variants, it achieves superior performance in mutual information (MI) while preserving more fine-grained details from the optical image. The final complete model integrates the advantages of all modules, exhibiting optimal fusion results.
[0145] This invention addresses the problem of insufficient utilization of frequency domain information in optical-synthetic aperture radar (SAR) image fusion by proposing an image fusion network (FMambaFuse). This method effectively mines amplitude and phase features using a frequency domain information enhancement unit (FEU), significantly improving detail representation capabilities. By incorporating a semi-instance normalized residual block (HFE) and a Mamba architecture with two-dimensional selective scanning (SS2D), the model further achieves in-depth exploration and efficient representation of spatial features. Furthermore, this invention designs a frequency domain fusion unit (FFE) for selectively integrating frequency domain features. Results show that FMambaFuse outperforms existing mainstream methods in both subjective visual quality and objective evaluation metrics, generating high-quality fused images and providing effective support for complex downstream tasks. In future work, this method can be further extended to multimodal perception, target detection, and other related tasks, thereby promoting the development of intelligent image fusion technology.
[0146] The frequency-space combined image fusion apparatus provided by the present invention is described below. The frequency-space combined image fusion apparatus described below and the frequency-space combined image fusion method described above can be referred to in correspondence.
[0147] Figure 7 This is a schematic diagram of the structure of the frequency-space joint image fusion device provided by the present invention, as shown below. Figure 7 As shown, the frequency-space joint image fusion device 700 includes: a first feature extraction module 701, a second feature extraction module 702, a feature fusion module 703, and an image reconstruction module 704; wherein,
[0148] The first feature extraction module 701 is used to input the optical image and the synthetic aperture radar (SAR) image into the shallow feature extraction module in the image fusion model, respectively, to obtain the first shallow feature information of the optical image and the second shallow feature information of the SAR image output by the shallow feature extraction module.
[0149] The second feature extraction module 702 is used to input the first shallow feature information and the second shallow feature information into the deep feature extraction module in the image fusion model, respectively, to obtain the first deep feature information of the optical image and the second deep feature information of the SAR image output by the deep feature extraction module;
[0150] The feature fusion module 703 is used to input the first deep feature information and the second deep feature information into the dual-domain feature fusion module in the image fusion model, respectively, to obtain the first shallow fusion feature information and the second shallow fusion feature information output by the dual-domain feature fusion module;
[0151] Image reconstruction module 704 is used to input the first shallow fusion feature information and the second shallow fusion feature information into the fusion image reconstruction module in the image fusion model to obtain the fusion image output by the fusion image reconstruction.
[0152] The image fusion model is trained based on sample optical images and sample SAR images.
[0153] The frequency-space joint image fusion device provided by this invention inputs optical images and SAR images into a shallow feature extraction module in an image fusion model to obtain first shallow feature information of the optical image and second shallow feature information of the SAR image output by the shallow feature extraction module; inputs the first shallow feature information and the second shallow feature information into a deep feature extraction module in the image fusion model to obtain first deep feature information of the optical image and second deep feature information of the SAR image output by the deep feature extraction module; inputs the first deep feature information and the second deep feature information into a dual-domain feature fusion module in the image fusion model to obtain the dual-domain feature fusion module. The first and second shallow fusion feature information are output by the block; the first and second shallow fusion feature information are input into the fusion image reconstruction module in the image fusion model to obtain the fused image output by the fusion image reconstruction; since the image fusion model is trained based on sample optical images and sample SAR images, the shallow feature extraction module, deep feature extraction module, dual-domain feature fusion module and fusion image reconstruction module in the image fusion model are used to realize efficient joint modeling of frequency domain and spatial features, strengthen the expression of high-frequency texture and structural features, and thus realize complementary fusion of frequency domain and spatial domain features, ensuring global consistency while further improving fusion performance and improving the quality of the fused image.
[0154] Optionally, the shallow feature extraction module includes a semi-instance normalized residual unit and a frequency domain information enhancement unit; the first feature extraction module 701 is specifically used for:
[0155] The optical image is input into the semi-instance normalized residual unit to obtain the residual feature information output by the semi-instance normalized residual unit;
[0156] The residual feature information is input into the frequency domain information enhancement unit to obtain the first shallow feature information of the optical image output by the frequency domain information enhancement unit.
[0157] Optionally, the semi-instance normalization residual unit includes at least one convolutional layer, at least one activation function, and at least one semi-channel normalization subunit; the first feature extraction module 701 is further configured to:
[0158] The optical image is input into the first convolutional layer to obtain the first convolutional feature information output by the first convolutional layer.
[0159] The first convolutional feature information is input into the first half-channel normalized subunit to obtain the first channel normalized feature information output by the first half-channel normalized subunit;
[0160] The first channel normalized feature information is input into the second half-channel normalization subunit to obtain the second channel normalized feature information output by the second half-channel normalization subunit;
[0161] The normalized feature information of the second channel is input into the third half-channel normalization subunit to obtain the normalized feature information of the third channel output by the third half-channel normalization subunit.
[0162] The normalized feature information of the third channel is input into the second convolutional layer to obtain the second convolutional feature information output by the second convolutional layer.
[0163] The second convolutional feature information is input into the third convolutional layer to obtain the residual feature information output by the third convolutional layer.
[0164] Optionally, the first feature extraction module 701 is further configured to:
[0165] The residual feature information is subjected to FFT transformation to obtain the first frequency domain feature information;
[0166] Based on the first frequency domain feature information, the first amplitude and the first phase are extracted;
[0167] The first amplitude and the first phase are concatenated along the channel dimension to obtain the first frequency domain feature tensor;
[0168] The first frequency domain feature tensor is input into the multilayer perceptron to obtain the perceptron features output by the multilayer perceptron.
[0169] Based on the perceived features, the second amplitude and the second phase are determined;
[0170] Based on the second amplitude and the second phase, determine the second frequency domain feature tensor;
[0171] Perform an inverse Fourier transform on the second frequency domain feature tensor to obtain the spatial domain features;
[0172] Based on the spatial domain features and the residual feature information, the first shallow layer feature information of the optical image is determined.
[0173] Optionally, the deep feature extraction module includes a sequence embedding unit and at least one global feature capture unit; the second feature extraction module 702 is specifically used for:
[0174] The first shallow feature information is input into the sequence embedding unit to obtain the feature sequence output by the sequence embedding unit;
[0175] The feature sequence is input to the at least one global feature capture unit to obtain the first deep feature information of the optical image output by the at least one global feature capture unit.
[0176] Optionally, the dual-domain feature fusion module includes: a channel conversion unit, a frequency domain fusion unit, and two shallow fusion units; the feature fusion module 703 is specifically used for:
[0177] The first deep feature information and the second deep feature information are respectively input to the channel conversion unit to obtain the first exchange information and the second exchange information output by the channel conversion unit;
[0178] The first deep feature information and the second deep feature information are respectively input into the frequency domain fusion unit to obtain the second frequency domain feature information output by the frequency domain fusion unit;
[0179] The second frequency domain feature information and the first exchange information are input into the first shallow fusion unit to obtain the first shallow fusion feature information output by the first shallow fusion unit;
[0180] The second frequency domain feature information and the second exchange information are input into the second shallow fusion unit to obtain the second shallow fusion feature information output by the second shallow fusion unit.
[0181] Optionally, the feature fusion module 703 is further configured to:
[0182] Convolutions are performed on the first deep feature information and the second deep feature information respectively to obtain the third convolution feature information corresponding to the first deep feature information and the fourth convolution feature information corresponding to the second deep feature information;
[0183] FFT transformations are performed on the third convolutional feature information and the fourth convolutional feature information respectively to obtain the third frequency domain feature information corresponding to the third convolutional feature information and the fourth frequency domain feature information corresponding to the fourth convolutional feature information;
[0184] Based on the third frequency domain feature information and the fourth frequency domain feature information, the third amplitude and third phase corresponding to the third frequency domain feature information, and the fourth amplitude and fourth phase corresponding to the fourth frequency domain feature information are determined.
[0185] The third amplitude and the fourth amplitude are fused to obtain the fused amplitude;
[0186] The third phase and the fourth phase are fused to obtain a fused phase;
[0187] The second frequency domain feature information is determined based on the fusion amplitude and the fusion phase.
[0188] Optionally, the image reconstruction module 704 is specifically used for:
[0189] Based on the first shallow fusion feature information and the second shallow fusion feature information, the third shallow fusion feature information is determined;
[0190] Based on the third shallow-layer fusion feature information, the restored image is determined;
[0191] The restored image is convolved to obtain the fused image.
[0192] Figure 8 This is a schematic diagram of the physical structure of an electronic device provided by the present invention, such as... Figure 8As shown, the electronic device 800 may include: a processor 810, a communication interface 820, a memory 830, and a communication bus 840, wherein the processor 810, the communication interface 820, and the memory 830 communicate with each other through the communication bus 840. The processor 810 can call logic instructions in the memory 830 to execute a frequency-space joint image fusion method. This method includes: inputting an optical image and a synthetic aperture radar (SAR) image into a shallow feature extraction module in an image fusion model, respectively, to obtain first shallow feature information of the optical image and second shallow feature information of the SAR image output by the shallow feature extraction module; inputting the first shallow feature information and the second shallow feature information into a deep feature extraction module in the image fusion model, respectively, to obtain first deep feature information of the optical image and second deep feature information of the SAR image output by the deep feature extraction module; inputting the first deep feature information and the second deep feature information into a dual-domain feature fusion module in the image fusion model, respectively, to obtain first shallow fusion feature information and second shallow fusion feature information output by the dual-domain feature fusion module; and inputting the first shallow fusion feature information and the second shallow fusion feature information into a fused image reconstruction module in the image fusion model to obtain a fused image output by the fused image reconstruction; wherein the image fusion model is trained based on sample optical images and sample SAR images.
[0193] Furthermore, the logical instructions in the aforementioned memory 830 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0194] On the other hand, the present invention also provides a computer program product, which includes a computer program that can be stored on a non-transitory computer-readable storage medium. When the computer program is executed by a processor, the computer is able to execute the frequency-space joint image fusion method provided by the above methods. The method includes: inputting an optical image and a synthetic aperture radar (SAR) image into a shallow feature extraction module in an image fusion model, respectively, to obtain a first shallow feature information of the optical image and a second shallow feature information of the SAR image output by the shallow feature extraction module; and inputting the first shallow feature information and the second shallow feature information into a deep feature extraction module in the image fusion model, respectively. The extraction module obtains the first deep feature information of the optical image and the second deep feature information of the SAR image output by the deep feature extraction module; the first deep feature information and the second deep feature information are respectively input into the dual-domain feature fusion module in the image fusion model to obtain the first shallow fusion feature information and the second shallow fusion feature information output by the dual-domain feature fusion module; the first shallow fusion feature information and the second shallow fusion feature information are input into the fused image reconstruction module in the image fusion model to obtain the fused image output by the fused image reconstruction; wherein, the image fusion model is trained based on sample optical images and sample SAR images.
[0195] In another aspect, the present invention also provides a non-transitory computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements the frequency-space joint image fusion method provided by the above methods. The method includes: inputting an optical image and a synthetic aperture radar (SAR) image into a shallow feature extraction module in an image fusion model, respectively, to obtain first shallow feature information of the optical image and second shallow feature information of the SAR image output by the shallow feature extraction module; inputting the first shallow feature information and the second shallow feature information into a deep feature extraction module in the image fusion model, respectively, to obtain the deep feature extraction... The first deep feature information of the optical image and the second deep feature information of the SAR image output by the module are obtained; the first deep feature information and the second deep feature information are respectively input into the dual-domain feature fusion module in the image fusion model to obtain the first shallow fusion feature information and the second shallow fusion feature information output by the dual-domain feature fusion module; the first shallow fusion feature information and the second shallow fusion feature information are input into the fused image reconstruction module in the image fusion model to obtain the fused image output by the fused image reconstruction; wherein, the image fusion model is trained based on sample optical images and sample SAR images.
[0196] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.
[0197] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.
[0198] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims
1. A frequency-space joint image fusion method, characterized in that, include: Optical images and synthetic aperture radar (SAR) images are respectively input into the shallow feature extraction module in the image fusion model to obtain the first shallow feature information of the optical image and the second shallow feature information of the SAR image output by the shallow feature extraction module. The first shallow feature information and the second shallow feature information are respectively input into the deep feature extraction module in the image fusion model to obtain the first deep feature information of the optical image and the second deep feature information of the SAR image output by the deep feature extraction module; The first deep feature information and the second deep feature information are respectively input into the dual-domain feature fusion module in the image fusion model to obtain the first shallow fusion feature information and the second shallow fusion feature information output by the dual-domain feature fusion module. The first shallow fusion feature information and the second shallow fusion feature information are input into the fusion image reconstruction module in the image fusion model to obtain the fusion image output by the fusion image reconstruction. The image fusion model is trained based on sample optical images and sample SAR images; The shallow feature extraction module includes a semi-instance normalized residual unit and a frequency domain information enhancement unit. An optical image is input into a shallow feature extraction module in an image fusion model to obtain the first shallow feature information of the optical image output by the shallow feature extraction module, including: The optical image is input into the semi-instance normalized residual unit to obtain the residual feature information output by the semi-instance normalized residual unit; The residual feature information is input into the frequency domain information enhancement unit to obtain the first shallow feature information of the optical image output by the frequency domain information enhancement unit.
2. The frequency-space joint image fusion method according to claim 1, characterized in that, The semi-instance normalized residual unit includes at least one convolutional layer, at least one activation function, and at least one semi-channel normalization subunit; The step of inputting the optical image into the semi-instance normalized residual unit to obtain the residual feature information output by the semi-instance normalized residual unit includes: The optical image is input into the first convolutional layer to obtain the first convolutional feature information output by the first convolutional layer. The first convolutional feature information is input into the first half-channel normalized subunit to obtain the first channel normalized feature information output by the first half-channel normalized subunit; The first channel normalized feature information is input into the second half-channel normalization subunit to obtain the second channel normalized feature information output by the second half-channel normalization subunit; The normalized feature information of the second channel is input into the third half-channel normalization subunit to obtain the normalized feature information of the third channel output by the third half-channel normalization subunit. The normalized feature information of the third channel is input into the second convolutional layer to obtain the second convolutional feature information output by the second convolutional layer. The second convolutional feature information is input into the third convolutional layer to obtain the residual feature information output by the third convolutional layer.
3. The frequency-space joint image fusion method according to claim 1, characterized in that, The step of inputting the residual feature information into the frequency domain information enhancement unit to obtain the first shallow feature information of the optical image output by the frequency domain information enhancement unit includes: The residual feature information is subjected to FFT transformation to obtain the first frequency domain feature information; Based on the first frequency domain feature information, the first amplitude and the first phase are extracted; The first amplitude and the first phase are concatenated along the channel dimension to obtain the first frequency domain feature tensor; The first frequency domain feature tensor is input into the multilayer perceptron to obtain the perceptron features output by the multilayer perceptron. Based on the perceived features, the second amplitude and the second phase are determined; Based on the second amplitude and the second phase, determine the second frequency domain feature tensor; Perform an inverse Fourier transform on the second frequency domain feature tensor to obtain the spatial domain features; Based on the spatial domain features and the residual feature information, the first shallow layer feature information of the optical image is determined.
4. The frequency-space joint image fusion method according to claim 1, characterized in that, The deep feature extraction module includes a sequence embedding unit and at least one global feature capture unit; The first shallow feature information is input into the deep feature extraction module in the image fusion model to obtain the first deep feature information of the optical image output by the deep feature extraction module, including: The first shallow feature information is input into the sequence embedding unit to obtain the feature sequence output by the sequence embedding unit; The feature sequence is input to the at least one global feature capture unit to obtain the first deep feature information of the optical image output by the at least one global feature capture unit.
5. The frequency-space joint image fusion method according to claim 1, characterized in that, The dual-domain feature fusion module includes: a channel conversion unit, a frequency domain fusion unit, and two shallow fusion units; The first deep feature information and the second deep feature information are respectively input into the dual-domain feature fusion module in the image fusion model to obtain the first shallow fusion feature information and the second shallow fusion feature information output by the dual-domain feature fusion module, including: The first deep feature information and the second deep feature information are respectively input to the channel conversion unit to obtain the first exchange information and the second exchange information output by the channel conversion unit; The first deep feature information and the second deep feature information are respectively input into the frequency domain fusion unit to obtain the second frequency domain feature information output by the frequency domain fusion unit; The second frequency domain feature information and the first exchange information are input into the first shallow fusion unit to obtain the first shallow fusion feature information output by the first shallow fusion unit; The second frequency domain feature information and the second exchange information are input into the second shallow fusion unit to obtain the second shallow fusion feature information output by the second shallow fusion unit.
6. The frequency-space joint image fusion method according to claim 5, characterized in that, The step of inputting the first deep feature information and the second deep feature information into the frequency domain fusion unit respectively to obtain the second frequency domain feature information output by the frequency domain fusion unit includes: Convolutions are performed on the first deep feature information and the second deep feature information respectively to obtain the third convolution feature information corresponding to the first deep feature information and the fourth convolution feature information corresponding to the second deep feature information; FFT transformations are performed on the third convolutional feature information and the fourth convolutional feature information respectively to obtain the third frequency domain feature information corresponding to the third convolutional feature information and the fourth frequency domain feature information corresponding to the fourth convolutional feature information; Based on the third frequency domain feature information and the fourth frequency domain feature information, the third amplitude and third phase corresponding to the third frequency domain feature information, and the fourth amplitude and fourth phase corresponding to the fourth frequency domain feature information are determined. The third amplitude and the fourth amplitude are fused to obtain the fused amplitude; The third phase and the fourth phase are fused to obtain a fused phase; The second frequency domain feature information is determined based on the fusion amplitude and the fusion phase.
7. The frequency-space joint image fusion method according to claim 1, characterized in that, The step of inputting the first shallow fusion feature information and the second shallow fusion feature information into the fusion image reconstruction module in the image fusion model to obtain the fused image output by the fusion image reconstruction includes: Based on the first shallow fusion feature information and the second shallow fusion feature information, the third shallow fusion feature information is determined; Based on the third shallow-layer fusion feature information, the restored image is determined; The restored image is convolved to obtain the fused image.
8. A frequency-space joint image fusion device, characterized in that, include: The first feature extraction module is used to input the optical image and the synthetic aperture radar (SAR) image into the shallow feature extraction module in the image fusion model, respectively, to obtain the first shallow feature information of the optical image and the second shallow feature information of the SAR image output by the shallow feature extraction module. The second feature extraction module is used to input the first shallow feature information and the second shallow feature information into the deep feature extraction module in the image fusion model, respectively, to obtain the first deep feature information of the optical image and the second deep feature information of the SAR image output by the deep feature extraction module; The feature fusion module is used to input the first deep feature information and the second deep feature information into the dual-domain feature fusion module in the image fusion model, respectively, to obtain the first shallow fusion feature information and the second shallow fusion feature information output by the dual-domain feature fusion module. An image reconstruction module is used to input the first shallow fusion feature information and the second shallow fusion feature information into the fusion image reconstruction module in the image fusion model to obtain the fusion image output by the fusion image reconstruction. The image fusion model is trained based on sample optical images and sample SAR images; The shallow feature extraction module includes a semi-instance normalized residual unit and a frequency domain information enhancement unit. The first feature extraction module is specifically used for: The optical image is input into the semi-instance normalized residual unit to obtain the residual feature information output by the semi-instance normalized residual unit; The residual feature information is input into the frequency domain information enhancement unit to obtain the first shallow feature information of the optical image output by the frequency domain information enhancement unit.
9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the frequency-space joint image fusion method as described in any one of claims 1 to 7.