A frequency-aware guided feature enhancement polarimetric image fusion method and system

By using a frequency-aware guided dual-branch fusion model, the problem of high-frequency texture weakening in polarization image fusion is solved, achieving high-quality fusion of intensity and linear polarization images, and improving the texture details and discriminative feature consistency of the fused image.

CN122243767APending Publication Date: 2026-06-19HUAQIAO UNIVERSITY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HUAQIAO UNIVERSITY
Filing Date
2026-05-14
Publication Date
2026-06-19

Smart Images

  • Figure CN122243767A_ABST
    Figure CN122243767A_ABST
Patent Text Reader

Abstract

This invention discloses a frequency-aware guided feature enhancement method and system for polarization image fusion, belonging to the field of image processing technology. The method includes: acquiring polarization images from different angles; constructing intensity image and linear polarization degree image datasets based on the Stokes vector synthesis method; building a fusion model with frequency-aware guided spatial feature enhancement, including two feature extraction branches for intensity and linear polarization degree; performing multi-scale frequency decomposition and reconstruction through a high-low frequency reconstruction module; enhancing local texture response using a local information fusion module; and achieving discriminative feature fusion through a residual-dissimilarity collaborative attention mechanism; and reconstructing and outputting the fused image by a decoder. This invention achieves polarization image fusion by constructing a dual-branch fusion model and utilizing high-low frequency reconstruction, local information fusion, and a residual-dissimilarity collaborative attention mechanism to perform multi-scale frequency decomposition and discriminative feature collaborative enhancement of intensity and linear polarization degree images.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of image processing technology, and specifically to a frequency-aware guided feature enhancement method and system for polarization image fusion. Background Technology

[0002] Polarization refers to the relationship between the vibration direction of an electromagnetic wave and its propagation direction during light propagation. Under ordinary natural light, the vibration direction of the electric field is random, with components in various directions. On the other hand, polarized light refers to light whose electric field exhibits a specific regularity in its vibration direction and may vibrate in a certain direction. Polarization imaging technology acquires polarized images, capturing the characteristics of light waves at different polarization angles. Unlike conventional visible light images, polarized images contain information about the polarization state of light and are typically captured at different polarization angles. These polarized images reveal the surface reflection characteristics, texture, and material details of objects, providing an additional dimension of information for target detection in complex scenes. This additional polarization dimension information helps enhance the separability between targets and backgrounds, giving polarization imaging superior perception capabilities compared to traditional intensity imaging in complex scenes. Therefore, polarization imaging is widely used in underwater imaging, defogging, target detection, and material classification.

[0003] Polarization image fusion aims to fully exploit the complementary characteristics between intensity and polarization information to generate a more discriminative fused image in terms of texture representation, edge detail, and contrast. The intensity image S0 represents the overall light intensity distribution of a scene and has good responsiveness to changes in illumination and shadow structure; DoLP, on the other hand, is more sensitive to surface texture and edge contours. Effective fusion of S0 and DoLP can enhance local details and material differences while maintaining brightness consistency. The fused result inherits the light intensity characteristics of the S0 image in its overall brightness structure, while effectively incorporating the texture and edge information contained in the DoLP image, thus outperforming single-modal input in both visual performance and detail fidelity. Therefore, polarization image fusion is a crucial link connecting polarization imaging technology with practical application systems. Although existing polarization image fusion methods have made some progress, most methods relying solely on spatial or frequency domain features tend to smooth or weaken high-frequency textures during multi-layer feature fusion, resulting in insufficient local structure and detail representation while maintaining global information. Summary of the Invention

[0004] To address the aforementioned issues, this invention proposes a frequency-aware guided feature enhancement polarization image fusion method and system. By constructing a frequency-aware guided dual-branch fusion model and utilizing a high-low frequency recombination module, a local information fusion module, and a residual differential reconstructive collaborative attention mechanism, this invention solves the problems of insufficient utilization of multi-scale frequency information and inadequate collaborative fusion of local texture and long-range dependent features in existing polarization image fusion methods. This achieves high-quality fusion of intensity images and linear polarization degree images, improving the texture detail representation ability and discriminative feature consistency of the fused image.

[0005] On the one hand, a frequency-aware guided feature enhancement polarization image fusion method includes:

[0006] S1. Acquire polarization images from four different angles and construct a dataset including intensity images and linear polarization images based on the Stokes vector synthesis method.

[0007] S2, Build a polarization image fusion model with frequency-aware guided spatial feature enhancement, and train the fusion model using the constructed dataset;

[0008] The polarization image fusion model includes a feature extraction branch and a decoder; the feature extraction branch includes an intensity image feature extraction branch and a linear polarization degree image feature extraction branch.

[0009] The intensity image feature extraction branch sequentially performs multi-scale frequency information decomposition and reconstruction on the intensity images in the dataset through the high-low frequency reconstruction module HLCR to obtain the high-low frequency fusion features of the intensity images; then, the local information fusion module LIF performs multi-scale local texture response enhancement and long-range dependency modeling on the high-low frequency fusion features of the intensity images to obtain the local fusion features of the intensity images; finally, the residual differential co-attention mechanism RHCA performs discriminative feature fusion of the channel and spatial dimensions of the local fusion features of the intensity images to obtain the discriminative features of the intensity images.

[0010] The linear polarization degree image feature extraction branch sequentially performs multi-scale frequency information decomposition and reconstruction on the linear polarization degree images in the dataset through the high-low frequency reconstruction module HLCR to obtain the high-low frequency fusion features of the linear polarization degree images; then, the local information fusion module LIF performs multi-scale local texture response enhancement and long-range dependency modeling on the high-low frequency fusion features of the linear polarization degree images to obtain the local fusion features of the linear polarization degree images; finally, the residual differential co-attention mechanism RHCA performs discriminative feature fusion of the channel and spatial dimensions of the local fusion features of the linear polarization degree images to obtain the discriminative features of the linear polarization degree images.

[0011] The discriminative features of the intensity image and the discriminative features of the linear polarization image are input into the decoder, merged in the channel, and reconstructed through a convolutional layer to obtain the fused polarization image. After a specified number of iterations of training, a fully trained polarization image fusion model is obtained.

[0012] S3 inputs the polarization images to be fused into the fully trained polarization image fusion model, completes the polarization image fusion task through forward inference, and outputs the fused polarization image.

[0013] Furthermore, in S1, a dataset including intensity images and linear polarization images is constructed based on the Stokes vector synthesis method, and the calculation formula is as follows:

[0014] ;

[0015] ;

[0016] ;

[0017] ;

[0018] in, Represents an intensity image. This represents the polarization difference between the 0° and 90° directions; This represents the polarization difference between the 45° and 135° directions; Represents the linear polarization degree image; , , and These represent polarization images at 0°, 45°, 90°, and 135°, respectively.

[0019] Furthermore, the intensity image feature extraction branch also includes a color space conversion module RGB2YCrCb. The color space conversion module RGB2YCrCb is used to convert the intensity image from RGB three-channel to YCrCb three-channel space, and to separate the luminance channel and chrominance channel Y of the intensity image. The separated luminance channel image is then input into several high and low frequency reconstruction modules.

[0020] The linear polarization image feature extraction branch also includes a grayscale conversion module RGB2GRAY. The RGB2GRAY module is used to convert the linear polarization image from a three-channel image into a grayscale image, and input the grayscale linear polarization image into several high and low frequency reconstruction modules.

[0021] Furthermore, the calculation formula for the high- and low-frequency fusion characteristics is as follows:

[0022] ;

[0023] ;

[0024] ;

[0025] in, Indicates merging on the channel; This represents the four sub-bands obtained after Haar wavelet decomposition; and These represent the outputs of the first residual channel shuffling module RCS and the wavelet frequency fusion module WFF, respectively. This indicates the high-low frequency fusion characteristics of the output of the first high-low frequency recombination module (HLFR). Represents the Hadamard operator; This indicates that the process has been performed through a convolutional layer with remaining connections. This represents the output of the first cascaded orientation sensing filter block.

[0026] Furthermore, the expression for the Local Information Fusion (LIF) module is as follows:

[0027] ;

[0028] ;

[0029] in; This represents the output generated by averaging multiple convolutions. This represents the input to the Local Information Fusion (LIF) module; S∈[3, 5, 7] indicates a depthwise separable convolution with a kernel size of S×S; This indicates taking the average value; This represents a learnable temperature parameter used to dynamically adjust the attention distribution of each head during training; Q represents the query; K represents the key. It is the softmax activation function; This indicates that the K key is transposed; This represents an attention map.

[0030] Furthermore, the calculation formula for the residual heterogeneous collaborative attention mechanism is as follows:

[0031] ;

[0032] ;

[0033] ;

[0034] ;

[0035] ;

[0036] ;

[0037] in, This represents the input to the heterogeneous collaborative HCA module; Represents a 1×1 convolution. , , These represent channel attention branch CA1, coordination attention branch CA2, and contextual attention branch CA3, respectively. This indicates the output of the heterogeneous collaborative HCA module; This indicates that the signal has passed through the sigmoid activation function; This represents the set of 1×1 convolutions, LeakyReLU activation functions, and 1×1 convolutions; and This represents max pooling and average pooling in space; This indicates that a sigmoid activation is performed after a 1×1 convolution operation; This represents a 3×3 convolution operation; This indicates that average pooling is performed simultaneously in both the horizontal and vertical directions. and These represent the average pooling in the horizontal and vertical directions, respectively. Indicates a separation operation; Indicates batch normalization; Indicates the activation function; Indicates upsampling; Indicates the characteristics after merging; This indicates the characteristics after separation.

[0038] Furthermore, an unsupervised training method is used during the training of the fusion model, and the loss function includes fine structure loss, polarization intensity loss and gradient loss.

[0039] Fine-grained structural loss is used to quantify the degree of structural matching between the fused image and the source image at multiple scales;

[0040] The polarization intensity loss is weighted according to different brightness regions using a gated switching mechanism to solve the problem of texture loss in dark areas of polarized images;

[0041] Gradient loss extracts the maximum value of the gradient magnitude of the source image as a supervision signal to preserve edge and texture details in the scene;

[0042] Among them, polarization intensity loss The calculation formula is as follows:

[0043] ;

[0044] ;

[0045] in, This indicates the first gate switch, which is equal to 0 when the source image pixel value is less than 0.5 and equal to 1 when it is greater than 0.5; This indicates the second gate switch; pixels in the source image with a value less than 0 are set to 0, and pixels with a value greater than 0 are set to 1. Indicates bilinear interpolation; Indicates average pooling; express Norm; * indicates element-wise multiplication; Indicates intermediate variables in the operation; Represents a grayscale linear polarization image; Represents a fused image; The brightness channel of the intensity image.

[0046] On the other hand, a frequency-aware guided feature-enhanced polarization image fusion system includes:

[0047] The dataset construction module is used to acquire polarization images from four different angles and construct a dataset including intensity images and linear polarization images based on the Stokes vector synthesis method.

[0048] The training module is used to build a polarization image fusion model with frequency-aware guided spatial feature enhancement, and to train the fusion model using the constructed dataset.

[0049] The polarization image fusion model includes a feature extraction branch and a decoder; the feature extraction branch includes an intensity image feature extraction branch and a linear polarization degree image feature extraction branch.

[0050] The intensity image feature extraction branch sequentially performs multi-scale frequency information decomposition and reconstruction on the intensity images in the dataset through the high-low frequency reconstruction module HLCR to obtain the high-low frequency fusion features of the intensity images; then, the local information fusion module LIF performs multi-scale local texture response enhancement and long-range dependency modeling on the high-low frequency fusion features of the intensity images to obtain the local fusion features of the intensity images; finally, the residual differential co-attention mechanism RHCA performs discriminative feature fusion of the channel and spatial dimensions of the local fusion features of the intensity images to obtain the discriminative features of the intensity images.

[0051] The linear polarization degree image feature extraction branch sequentially performs multi-scale frequency information decomposition and reconstruction on the linear polarization degree images in the dataset through the high-low frequency reconstruction module HLCR to obtain the high-low frequency fusion features of the linear polarization degree images; then, the local information fusion module LIF performs multi-scale local texture response enhancement and long-range dependency modeling on the high-low frequency fusion features of the linear polarization degree images to obtain the local fusion features of the linear polarization degree images; finally, the residual differential co-attention mechanism RHCA performs discriminative feature fusion of the channel and spatial dimensions of the local fusion features of the linear polarization degree images to obtain the discriminative features of the linear polarization degree images.

[0052] The discriminative features of the intensity image and the discriminative features of the linear polarization image are input into the decoder, merged in the channel, and reconstructed through a convolutional layer to obtain the fused polarization image. After a specified number of iterations of training, a fully trained polarization image fusion model is obtained.

[0053] The fusion module is used to input the polarization image to be fused into a fully trained polarization image fusion model, complete the polarization image fusion task through forward inference, and output the fused polarization image.

[0054] The present invention adopts the above technical solution and has the following beneficial effects:

[0055] (1) This invention solves the problems of insufficient utilization of frequency information and easy confusion between high-frequency details and low-frequency structure in traditional fusion methods by introducing a high-low frequency recombination module. It performs multi-scale frequency information decomposition and recombination on intensity image and linear polarization degree image, realizes effective decoupling and collaborative processing of high-low frequency features, and thus preserves the structural integrity of the image.

[0056] (2) This invention solves the limitations of single-scale convolution in capturing complex textures by using a local information fusion module. By using multi-kernel depth separable convolution and self-attention mechanism, it not only enhances the multi-scale local texture response, but also establishes long-range dependencies, which significantly improves the detail clarity and visual quality of the fused image.

[0057] (3) This invention solves the problem of redundancy and conflict in the fusion process of different modal features by designing a residual differential collaborative attention mechanism. It performs collaborative fusion of features in the channel and spatial dimensions, effectively filters and enhances discriminative features, and suppresses noise and irrelevant information, ensuring that the output image has stronger anti-interference ability while retaining color fidelity. Attached Figure Description

[0058] Figure 1 This is a flowchart of the frequency-aware guided feature-enhanced polarization image fusion method according to an embodiment of the present invention;

[0059] Figure 2This is a schematic diagram of the structure of the polarization image fusion model according to an embodiment of the present invention;

[0060] Figure 3 This is a schematic diagram of the high- and low-frequency recombination module according to an embodiment of the present invention;

[0061] Figure 4 This is a schematic diagram of the local information fusion module according to an embodiment of the present invention;

[0062] Figure 5 This is a schematic diagram of the residual heterogeneous collaborative attention mechanism in an embodiment of the present invention;

[0063] Figure 6 This is a schematic diagram of the qualitative results of the original image and the fused image obtained from the polarization image in an embodiment of the present invention.

[0064] Figure 7 This is a schematic diagram of a comparative experiment on polarization image fusion tasks according to an embodiment of the present invention;

[0065] Figure 8 This is a diagram of a frequency-aware guided feature-enhanced polarization image fusion system according to an embodiment of the present invention. Detailed Implementation

[0066] The present invention will be further described in detail below with reference to the embodiments and accompanying drawings, but the embodiments of the present invention are not limited thereto.

[0067] like Figure 1 As shown, the present invention provides a frequency-aware guided feature enhancement polarization image fusion method, comprising:

[0068] S1. Acquire polarization images from four different angles and construct a dataset including intensity images and linear polarization images based on the Stokes vector synthesis method.

[0069] Specifically, a dataset including intensity images and linear polarization images is constructed based on the Stokes vector synthesis method, and the calculation formula is as follows:

[0070] ;

[0071] ;

[0072] ;

[0073] ;

[0074] in, Represents an intensity image. This represents the polarization difference between the 0° and 90° directions; This represents the polarization difference between the 45° and 135° directions; Represents the linear polarization degree image; , , and These represent polarization images at 0°, 45°, 90°, and 135°, respectively.

[0075] S2, Build a polarization image fusion model with frequency-aware guided spatial feature enhancement, and train the fusion model using the constructed dataset;

[0076] The polarization image fusion model includes a feature extraction branch and a decoder; the feature extraction branch includes an intensity image feature extraction branch and a linear polarization degree image feature extraction branch.

[0077] The intensity image feature extraction branch sequentially performs multi-scale frequency information decomposition and reconstruction on the intensity images in the dataset through the high-low frequency reconstruction module HLCR to obtain the high-low frequency fusion features of the intensity images; then, the local information fusion module LIF performs multi-scale local texture response enhancement and long-range dependency modeling on the high-low frequency fusion features of the intensity images to obtain the local fusion features of the intensity images; finally, the residual differential co-attention mechanism RHCA performs discriminative feature fusion of the channel and spatial dimensions of the local fusion features of the intensity images to obtain the discriminative features of the intensity images.

[0078] The linear polarization degree image feature extraction branch sequentially performs multi-scale frequency information decomposition and reconstruction on the linear polarization degree images in the dataset through the high-low frequency reconstruction module HLCR to obtain the high-low frequency fusion features of the linear polarization degree images; then, the local information fusion module LIF performs multi-scale local texture response enhancement and long-range dependency modeling on the high-low frequency fusion features of the linear polarization degree images to obtain the local fusion features of the linear polarization degree images; finally, the residual differential co-attention mechanism RHCA performs discriminative feature fusion of the channel and spatial dimensions of the local fusion features of the linear polarization degree images to obtain the discriminative features of the linear polarization degree images.

[0079] The discriminative features of the intensity image and the discriminative features of the linear polarization image are input into the decoder, merged on the channel, and reconstructed through a convolutional layer to obtain the fused polarization image. After a specified number of iterations of training, a fully trained polarization image fusion model is obtained.

[0080] Specifically, the intensity image feature extraction branch also includes a color space conversion module RGB2YCrCb. The color space conversion module RGB2YCrCb is used to convert the intensity image from RGB three-channel to YCrCb three-channel space, and to separate the luminance channel and chrominance channel Y of the intensity image. The separated luminance channel image is then input into several high and low frequency reconstruction modules.

[0081] The linear polarization image feature extraction branch also includes a grayscale conversion module RGB2GRAY. The RGB2GRAY module is used to convert the linear polarization image from a three-channel image into a grayscale image, and input the grayscale linear polarization image into several high and low frequency reconstruction modules.

[0082] Specifically, the calculation formula for the high- and low-frequency fusion features is as follows:

[0083] ;

[0084] ;

[0085] ;

[0086] in, Indicates merging on the channel; This represents the four sub-bands obtained after Haar wavelet decomposition; and These represent the outputs of the first residual channel shuffling module RCS and the wavelet frequency fusion module WFF, respectively. This indicates the high-low frequency fusion characteristics of the output of the first high-low frequency recombination module (HLFR). Represents the Hadamard operator; This indicates that the process has been performed through a convolutional layer with remaining connections. This represents the output of the first cascaded orientation sensing filter block.

[0087] Specifically, the expression for the Local Information Fusion (LIF) module is as follows:

[0088] ;

[0089] ;

[0090] in; This represents the output generated by averaging multiple convolutions. This represents the input to the Local Information Fusion (LIF) module; S∈[3, 5, 7] indicates a depthwise separable convolution with a kernel size of S×S; This indicates taking the average value; This represents a learnable temperature parameter used to dynamically adjust the attention distribution of each head during training; Q represents the query; K represents the key. It is the softmax activation function; This indicates that the K key is transposed; This represents an attention map.

[0091] Specifically, the calculation formula for the residual heterogeneous collaborative attention mechanism is as follows:

[0092] ;

[0093] ;

[0094] ;

[0095] ;

[0096] ;

[0097] ;

[0098] in, This represents the input to the heterogeneous collaborative HCA module; Represents a 1×1 convolution. , , These represent channel attention branch CA1, coordination attention branch CA2, and contextual attention branch CA3, respectively. This indicates the output of the heterogeneous collaborative HCA module; This indicates that the signal has passed through the sigmoid activation function; This represents the set of 1×1 convolutions, LeakyReLU activation functions, and 1×1 convolutions; and This represents max pooling and average pooling in space; This indicates that a sigmoid activation is performed after a 1×1 convolution operation; This represents a 3×3 convolution operation; This indicates that average pooling is performed simultaneously in both the horizontal and vertical directions. and These represent the average pooling in the horizontal and vertical directions, respectively. Indicates a separation operation; Indicates batch normalization; Indicates the activation function; Indicates upsampling; Indicates the characteristics after merging; This indicates the characteristics after separation.

[0099] Specifically, the fusion model is trained using an unsupervised method, and its loss function includes fine structure loss, polarization intensity loss, and gradient loss.

[0100] Fine-grained structural loss is used to quantify the degree of structural matching between the fused image and the source image at multiple scales;

[0101] The polarization intensity loss is weighted according to different brightness regions using a gated switching mechanism to solve the problem of texture loss in dark areas of polarized images;

[0102] Gradient loss extracts the maximum value of the gradient magnitude of the source image as a supervision signal to preserve edge and texture details in the scene;

[0103] Among them, polarization intensity loss The calculation formula is as follows:

[0104] ;

[0105] ;

[0106] in, This indicates the first gate switch, which is equal to 0 when the source image pixel value is less than 0.5 and equal to 1 when it is greater than 0.5; This indicates the second gate switch; pixels in the source image with a value less than 0 are set to 0, and pixels with a value greater than 0 are set to 1. Indicates bilinear interpolation; It is average pooling; express Norm; * indicates element-wise multiplication; Indicates intermediate variables in the operation; Represents a grayscale linear polarization image; Represents a fused image; The brightness channel of the intensity image.

[0107] Specifically, in this embodiment, multi-scale weighted structural loss , is represented as:

[0108] ;

[0109] ;

[0110] ;

[0111] ;

[0112] ;

[0113] Where k∈{3,5,7,9,11} are the scales of local sliding windows of different sizes. For the fusion consistency term at the corresponding scale, it is used to quantify the degree of structural matching between the fused image and the source image at that scale; where, This is a grayscale linear polarization image used to characterize the polarization structure information of a scene; This is an intensity image used to characterize the brightness and detail information of a scene; The image is a fused image generated by a network; This is a single-scale structural similarity metric function used to calculate the structural similarity of an input image within a k×k local window; For adaptive weighting coefficients, 1- To correspond to complementary weights, the two are adaptively allocated based on the local statistical characteristics of the source image to achieve a dynamic balance between polarization information and intensity information; x is the source image, and y is the fused image. , These are the local mean values ​​of the source image x and the fused image y within a k×k local window, respectively, used to characterize the brightness consistency of the images; , These are the local standard deviations of the source image x and the fused image y within their respective windows, used to characterize the contrast consistency of the images; The covariance of the source image x and the fused image y within the corresponding window is used to characterize the structural correlation of the images; , The preset stability constant has a value of [value missing]. , This is used to avoid numerical singularities caused by a denominator of zero, and to ensure the numerical stability of the optimization process. )² represents the local variance of the intensity image within a k×k local window. )² represents the local variance of the linear polarization image within the corresponding window; ψ(·) is the threshold stabilization function, defined as follows: This is used to constrain the minimum value of the variance to... This avoids anomalies in weight calculation when the variance is zero, further improving the robustness of the algorithm.

[0114] Specifically, the significant gradient loss , is represented as:

[0115] ;

[0116] ;

[0117] ;

[0118] ;

[0119] The detailed explanations of the parameters and functions in the above formula are as follows: For gradient operators, , , Intensity images Linear polarization degree image (DoLP) and gradient magnitude of the fused image; Let be the gradient maximum function, where , It is used to extract the maximum value of the gradient magnitude of two source images pixel by pixel, preserving the significant edges and texture information of the scene; This is a global mean operation in the image domain, used to calculate the global statistics of the loss; A preset stability constant is used to avoid numerical singularities caused by a denominator of zero, and to ensure the stability of the optimization process; , These are weight adjustment functions used to filter insignificant regions of the intensity image and the linear polarization image, respectively. For element-wise multiplication operators, The indicator function takes a value of 1 when the condition in parentheses is true, and a value of 0 otherwise. The weight adjustment function achieves adaptive weight adjustment by setting the non-significant gradient regions (i.e., regions whose gradient magnitude is less than that of another source image) to zero. It retains only the significant texture features of the source image, avoids interference from non-significant regions on the fusion result, and improves the structural fidelity of the fused image.

[0120] For details, see Figure 2 As shown, the polarization image fusion model includes a feature extraction branch and a decoder module. The feature extraction branch includes an intensity image feature extraction branch and a linear polarization degree image feature extraction branch, which receive and process the intensity image and the linear polarization degree image, respectively. The intensity image feature extraction branch and the linear polarization degree image feature extraction branch each include five sequentially connected high-low frequency reconstruction modules (HLCR), local information fusion modules (LIF), and residual reconstruction collaborative attention mechanisms (RHCA) to receive and process the intensity image and the linear polarization degree image, respectively. Each HLCR uses Haar wavelet time-frequency feature decomposition and channel reconstruction to fuse multi-scale frequency information, using high- and low-frequency information as a perceptual modulation signal to guide spatial feature enhancement, so as to enhance high-frequency texture details while maintaining the consistency of image results. Skip connections are set in the middle of the HLCR to effectively guide the high- and low-frequency information. The features output by the last HLFR are used as the input features of LIF, and the output features of LIF are used as the input features of the residual heterogeneous collaborative attention mechanism RHCA. The residual heterogeneous collaborative attention mechanism RHCA generates discriminative features by combining the channel and spatial dimensions of the input features to highlight contextually important information. The features output by the intensity image feature extraction branch RHCA and the features output by the linear polarization degree image feature extraction branch RHCA are input into the decoder module, merged in channels, and reconstructed through 3 convolutional layers to obtain the fused polarization image.

[0121] Specifically, in this embodiment, the intensity image feature extraction branch includes an RGB2YCrCb module. The RGB2YCrCb module converts the intensity image from an RGB three-channel space to a YCrCb three-channel space, and separates the luminance channel and chrominance channel Y of the intensity image. The luminance channel image is then input into the plurality of high- and low-frequency reconstruction modules. The linear polarization image feature extraction branch includes an RGB2GRAY module. The RGB2GRAY module converts the linear polarization image from a three-channel image to a grayscale image, and the grayscale linear polarization image is then input into the plurality of high- and low-frequency reconstruction modules. The features output by the residual differential anatomy collaborative attention mechanism (RHCA) of the intensity image feature extraction branch and the residual differential anatomy collaborative attention mechanism (RHCA) of the linear polarization image feature extraction branch are merged in channels, then reconstructed through three convolutional layers. Finally, the features output by the two residual differential anatomy collaborative attention mechanisms (RHCA) are merged in channels and converted from YCrCb space to RGB space by the YCrCb2RGB module to obtain the fused polarized color image.

[0122] Specifically, the high-low frequency recombination module in this embodiment includes a residual channel shuffling module (RCS), a cascaded orientation sensing filter block (COAF), and a wavelet frequency fusion module (WFF).

[0123] The residual channel shuffling module RCS includes a 1×1 convolution, a 3×3 depthwise convolution, a 1×1 convolution and a channel shuffling structure connected in sequence; the output of the first 1×1 convolution and the output of the third 1×1 convolution are merged on the channel to form a residual connection with the original output as the output of the preprocessing layer.

[0124] After receiving the output of the residual channel shuffling module, the cascaded orientation sensing filter block COAF is processed sequentially by 1×5 convolution, batch normalization (BN), activation function LReLU, 5×1 convolution, batch normalization (BN), and sigmoid activation function, and then element-wise multiplied with the output of the residual channel shuffling module RCS to obtain the enhanced features.

[0125] After receiving the output of the residual channel shuffling module, the wavelet frequency fusion module (WFF) sequentially performs Haar wavelet transform (HDWT), channel merging, 3×3 convolution, and activation function LReLU. The original output is then subjected to another 3×3 convolution and added to the LReLU output features, followed by upsampling. This upsampling is then performed element-wise with the output of the residual channel shuffling module to obtain enhanced features. Finally, the two enhanced features are added to the output of the residual channel shuffling module (RCS) to obtain the output of the first high-low frequency reconstruction module (HLCR). The high-low frequency reconstruction module introduces skip connections to achieve information flow and compensate for potential texture loss. Both the intensity image feature extraction branch and the linear polarization degree image feature extraction branch include five high-low frequency reconstruction modules. Skip connections are established between the output of the residual channel shuffling module of the first high-low frequency reconstruction module and the output of the fifth high-low frequency reconstruction module, between the outputs of the first and second high-low frequency reconstruction modules, and between the outputs of the second and third high-low frequency reconstruction modules.

[0126] Specifically, the local information fusion module first uses multi-kernel deep convolution to enhance the multi-scale local texture response and then averages the obtained features to avoid the dominance of a single receptive field on the polarization feature response, thus obtaining IA. This makes the features of the input grouped self-attention (GSA) more prominent in terms of polarization high-frequency texture. In order to enhance the long-range dependency modeling capability, unlike the traditional self-attention which performs unified modeling in the global feature space, this grouped attention mechanism (GSA) can selectively aggregate complementary texture responses of polarization images within channel groups. That is, the channels of the input features are first divided into groups, and self-attention operation is performed within each channel group. This can effectively reduce the dimensionality of self-attention calculation, enabling different channel groups to focus on complementary structural and texture information, thereby improving the global expressive power.

[0127] S3 inputs the polarization images to be fused into the fully trained polarization image fusion model, completes the polarization image fusion task through forward inference, and outputs the fused polarization image.

[0128] like Figure 3 As shown, the model of the high-low frequency recombination module is presented. The model first enhances channel interaction through the residual channel shuffling module, captures details and guides spatial feature enhancement through the wavelet frequency fusion module, and the cascaded orientation sensing filter block strengthens attention in four directions through multi-kernel convolution.

[0129] like Figure 4 As shown, a local information fusion module model is presented, which captures long-range dependencies while preserving spatial information; multi-kernel convolution is applied to enhance local texture response, the obtained features are averaged to avoid the dominance of a single receptive field, and then input into the qkv key.

[0130] like Figure 5As shown, this is a model of the heterogeneous collaborative attention mechanism. To fully utilize the complementary roles of channel features and spatial features in highlighting contextual information, a heterogeneous collaborative attention (HCA) mechanism is designed. HCA consists of three branches: channel attention branch (CA1), coordinated attention branch (CA2), and contextual attention branch (CA3). The outputs of CA1 and CA2 are concatenated along the channel dimension and then fused through a 1×1 convolution to achieve cross-attention interaction and adaptive feature integration. Finally, element-based multiplication is used to apply spatial-contextual constraints to the fused features, thereby enhancing feature representation while maintaining structural consistency and highlighting discriminative regions.

[0131] Specifically, such as Figure 6 and Figure 7 The figure shows an example of the application of this invention to the original and fused images of polarized images. As can be seen from the figure, this invention achieves excellent polarized image fusion results and enhances texture. The figure also shows a quantitative comparison of "Ours" (our method) with seven existing mainstream methods, including UNFusion and PFNet, on image fusion tasks. The table lists seven evaluation metrics, where higher values ​​for SF, AG, CC, and PSNR are better, while lower values ​​for NIQE and MSE are better. SF and AG are used to measure the richness of image detail and texture information; higher values ​​indicate clearer edges and a clearer structural representation. CC evaluates the global structural similarity between the fused image and the source image; higher values ​​indicate better structure preservation. NIQE is a no-reference image quality assessment metric based on natural scene statistics; lower values ​​indicate better perceptual quality. Mean squared error and peak signal-to-noise ratio (PSNR) are classic reconstruction-based metrics; lower mean squared error indicates less distortion in pixel orientation, while higher PSNR indicates better visual quality and less degradation. The data shows that Ours method achieved the best results in almost all metrics, such as SF of 11.899, PSNR of 63.723, and MSE as low as 0.032. This strongly demonstrates that our method is superior to other comparative algorithms in terms of image sharpness, structure preservation, and error control, and has better overall performance.

[0132] like Figure 8 As shown, this embodiment also discloses a frequency-aware guided feature-enhanced polarization image fusion system, comprising:

[0133] The dataset construction module 81 is used to acquire polarization images from four different angles and construct a dataset including intensity images and linear polarization images based on the Stokes vector synthesis method.

[0134] Training module 82 is used to build a polarization image fusion model with frequency-aware guided spatial feature enhancement, and to train the fusion model using the constructed dataset;

[0135] The polarization image fusion model includes a feature extraction branch and a decoder; the feature extraction branch includes an intensity image feature extraction branch and a linear polarization degree image feature extraction branch.

[0136] The intensity image feature extraction branch sequentially performs multi-scale frequency information decomposition and reconstruction on the intensity images in the dataset through the high-low frequency reconstruction module HLCR to obtain the high-low frequency fusion features of the intensity images; then, the local information fusion module LIF performs multi-scale local texture response enhancement and long-range dependency modeling on the high-low frequency fusion features of the intensity images to obtain the local fusion features of the intensity images; finally, the residual differential co-attention mechanism RHCA performs discriminative feature fusion of the channel and spatial dimensions of the local fusion features of the intensity images to obtain the discriminative features of the intensity images.

[0137] The linear polarization degree image feature extraction branch sequentially performs multi-scale frequency information decomposition and reconstruction on the linear polarization degree images in the dataset through the high-low frequency reconstruction module HLCR to obtain the high-low frequency fusion features of the linear polarization degree images; then, the local information fusion module LIF performs multi-scale local texture response enhancement and long-range dependency modeling on the high-low frequency fusion features of the linear polarization degree images to obtain the local fusion features of the linear polarization degree images; finally, the residual differential co-attention mechanism RHCA performs discriminative feature fusion of the channel and spatial dimensions of the local fusion features of the linear polarization degree images to obtain the discriminative features of the linear polarization degree images.

[0138] The discriminative features of the intensity image and the discriminative features of the linear polarization image are input into the decoder, merged in the channel, and reconstructed through a convolutional layer to obtain the fused polarization image. After a specified number of iterations of training, a fully trained polarization image fusion model is obtained.

[0139] The fusion module 83 is used to input the polarization image to be fused into the fully trained polarization image fusion model, complete the polarization image fusion task through forward inference, and output the fused polarization image.

[0140] A specific implementation of a frequency-aware guided feature-enhanced polarization image fusion system is described in this embodiment, which is the same as the frequency-aware guided feature-enhanced polarization image fusion method.

[0141] Although the invention has been specifically shown and described in conjunction with preferred embodiments, those skilled in the art should understand that various changes in form and detail may be made to the invention without departing from the spirit and scope of the invention as defined in the appended claims, all of which shall be within the scope of protection of the invention.

Claims

1. A frequency-aware guided feature enhancement polarization image fusion method, characterized in that, Includes the following steps: S1. Acquire polarization images from four different angles and construct a dataset including intensity images and linear polarization images based on the Stokes vector synthesis method. S2, Build a polarization image fusion model with frequency-aware guided spatial feature enhancement, and train the fusion model using the constructed dataset; The polarization image fusion model includes a feature extraction branch and a decoder; the feature extraction branch includes an intensity image feature extraction branch and a linear polarization degree image feature extraction branch. The intensity image feature extraction branch sequentially performs multi-scale frequency information decomposition and reconstruction on the intensity images in the dataset through the high-low frequency reconstruction module HLCR to obtain the high-low frequency fusion features of the intensity images; then, the local information fusion module LIF performs multi-scale local texture response enhancement and long-range dependency modeling on the high-low frequency fusion features of the intensity images to obtain the local fusion features of the intensity images; finally, the residual differential co-attention mechanism RHCA performs discriminative feature fusion of the channel and spatial dimensions of the local fusion features of the intensity images to obtain the discriminative features of the intensity images. The linear polarization degree image feature extraction branch sequentially performs multi-scale frequency information decomposition and reconstruction on the linear polarization degree images in the dataset through the high-low frequency reconstruction module HLCR to obtain the high-low frequency fusion features of the linear polarization degree images; then, the local information fusion module LIF performs multi-scale local texture response enhancement and long-range dependency modeling on the high-low frequency fusion features of the linear polarization degree images to obtain the local fusion features of the linear polarization degree images; finally, the residual differential co-attention mechanism RHCA performs discriminative feature fusion of the channel and spatial dimensions of the local fusion features of the linear polarization degree images to obtain the discriminative features of the linear polarization degree images. The discriminative features of the intensity image and the discriminative features of the linear polarization image are input into the decoder, merged in the channel, and reconstructed through a convolutional layer to obtain the fused polarization image. After a specified number of iterations of training, a fully trained polarization image fusion model is obtained. S3 inputs the polarization images to be fused into the fully trained polarization image fusion model, completes the polarization image fusion task through forward inference, and outputs the fused polarization image.

2. The frequency-aware guided feature enhancement polarization image fusion method according to claim 1, characterized in that, In S1, a dataset including intensity images and linear polarization images is constructed based on the Stokes vector synthesis method. The calculation formula is as follows: ; ; ; ; in, Represents an intensity image. This represents the polarization difference between the 0° and 90° directions; This represents the polarization difference between the 45° and 135° directions; Represents the linear polarization degree image; , , and These represent polarization images at 0°, 45°, 90°, and 135°, respectively.

3. The frequency-aware guided feature-enhanced polarization image fusion method according to claim 1, characterized in that, In S2, the intensity image feature extraction branch also includes a color space conversion module RGB2YCrCb. The color space conversion module RGB2YCrCb is used to convert the intensity image from RGB three-channel to YCrCb three-channel space, and to separate the luminance channel and chrominance channel Y of the intensity image. The separated luminance channel image is then input into several high and low frequency reconstruction modules. The linear polarization image feature extraction branch also includes a grayscale conversion module RGB2GRAY. The RGB2GRAY module is used to convert the linear polarization image from a three-channel image into a grayscale image, and input the grayscale linear polarization image into several high and low frequency reconstruction modules.

4. The frequency-aware guided feature-enhanced polarization image fusion method according to claim 1, characterized in that, In S2, the calculation formula for the high- and low-frequency fusion features is as follows: ; ; ; in, Indicates merging on the channel; This represents the four sub-bands obtained after Haar wavelet decomposition; and These represent the outputs of the first residual channel shuffling module RCS and the wavelet frequency fusion module WFF, respectively. This indicates the high-low frequency fusion characteristics of the output of the first high-low frequency recombination module (HLFR). Represents the Hadamard operator; This indicates that the process has been performed through a convolutional layer with remaining connections. This represents the output of the first cascaded orientation sensing filter block.

5. The frequency-aware guided feature-enhanced polarization image fusion method according to claim 1, wherein its features are as follows: The key feature is that, in S2, the expression for the Local Information Fusion (LIF) module is as follows: ; ; in; This represents the output generated by averaging multiple convolutions. This represents the input to the Local Information Fusion (LIF) module; S∈[3, 5, 7] indicates a depthwise separable convolution with a kernel size of S×S; This indicates taking the average value; This represents a learnable temperature parameter used to dynamically adjust the attention distribution of each head during training; Q represents the query; K represents the key. It is the softmax activation function; This indicates that the K key is transposed; This represents an attention map.

6. The frequency-aware guided feature-enhanced polarization image fusion method according to claim 1, characterized in that, In S2, the calculation formula for the residual non-structural collaborative attention mechanism process is as follows: ; ; ; ; ; ; in, This represents the input to the heterogeneous collaborative HCA module; Represents a 1×1 convolution. , , These represent channel attention branch CA1, coordination attention branch CA2, and contextual attention branch CA3, respectively. This indicates the output of the heterogeneous collaborative HCA module; This indicates that the signal has passed through the sigmoid activation function; This represents the set of 1×1 convolutions, LeakyReLU activation functions, and 1×1 convolutions; and This represents max pooling and average pooling in space; This indicates that a sigmoid activation is performed after a 1×1 convolution operation; This represents a 3×3 convolution operation; This indicates that average pooling is performed simultaneously in both the horizontal and vertical directions. and These represent the average pooling in the horizontal and vertical directions, respectively. Indicates a separation operation; Indicates batch normalization; Indicates the activation function; Indicates upsampling; Indicates the characteristics after merging; This indicates the characteristics after separation.

7. The frequency-aware guided feature-enhanced polarization image fusion method according to claim 1, characterized in that, It also includes using an unsupervised training method during the training of the fusion model, with loss functions including fine structure loss, polarization intensity loss, and gradient loss; Fine-grained structural loss is used to quantify the degree of structural matching between the fused image and the source image at multiple scales; The polarization intensity loss is weighted according to different brightness regions using a gated switching mechanism to solve the problem of texture loss in dark areas of polarized images; Gradient loss extracts the maximum value of the gradient magnitude of the source image as a supervision signal to preserve edge and texture details in the scene; Among them, polarization intensity loss The calculation formula is as follows: ; ; in, This indicates the first gate switch, which is equal to 0 when the source image pixel value is less than 0.5 and equal to 1 when it is greater than 0.5; This indicates the second gate switch; pixels in the source image with a value less than 0 are set to 0, and pixels with a value greater than 0 are set to 1. Indicates bilinear interpolation; Indicates average pooling; express Norm; * indicates element-wise multiplication; Indicates intermediate variables in the operation; Represents a grayscale linear polarization image; Represents a fused image; The brightness channel of the intensity image.

8. A frequency-aware guided feature-enhanced polarization image fusion system, characterized in that, include: The dataset construction module is used to acquire polarization images from four different angles and construct a dataset including intensity images and linear polarization images based on the Stokes vector synthesis method. The training module is used to build a polarization image fusion model with frequency-aware guided spatial feature enhancement, and to train the fusion model using the constructed dataset. The polarization image fusion model includes a feature extraction branch and a decoder; the feature extraction branch includes an intensity image feature extraction branch and a linear polarization degree image feature extraction branch. The intensity image feature extraction branch sequentially performs multi-scale frequency information decomposition and reconstruction on the intensity images in the dataset through the high-low frequency reconstruction module HLCR to obtain the high-low frequency fusion features of the intensity images; then, the local information fusion module LIF performs multi-scale local texture response enhancement and long-range dependency modeling on the high-low frequency fusion features of the intensity images to obtain the local fusion features of the intensity images; finally, the residual differential co-attention mechanism RHCA performs discriminative feature fusion of the channel and spatial dimensions of the local fusion features of the intensity images to obtain the discriminative features of the intensity images. The linear polarization degree image feature extraction branch sequentially performs multi-scale frequency information decomposition and reconstruction on the linear polarization degree images in the dataset through the high-low frequency reconstruction module HLCR to obtain the high-low frequency fusion features of the linear polarization degree images; then, the local information fusion module LIF performs multi-scale local texture response enhancement and long-range dependency modeling on the high-low frequency fusion features of the linear polarization degree images to obtain the local fusion features of the linear polarization degree images; finally, the residual differential co-attention mechanism RHCA performs discriminative feature fusion of the channel and spatial dimensions of the local fusion features of the linear polarization degree images to obtain the discriminative features of the linear polarization degree images. The discriminative features of the intensity image and the discriminative features of the linear polarization image are input into the decoder, merged in the channel, and reconstructed through a convolutional layer to obtain the fused polarization image. After a specified number of iterations of training, a fully trained polarization image fusion model is obtained. The fusion module is used to input the polarization image to be fused into a fully trained polarization image fusion model, complete the polarization image fusion task through forward inference, and output the fused polarization image.