An image defogging method based on frequency-aware polarization filtering and state space model
By combining frequency-aware polarization filtering with a state-space model and utilizing the decoupling characteristics of polarization frequencies based on dielectric constant differences, a multi-domain loss function is designed. This solves the problem of poor dehazing effect in existing technologies, achieving efficient and real-time image dehazing while preserving high-frequency edges and texture details.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- ZHEJIANG UNIV OF TECH
- Filing Date
- 2026-05-22
- Publication Date
- 2026-06-19
AI Technical Summary
Existing image dehazing techniques fail to effectively utilize the polarization frequency decoupling characteristics brought about by the difference in dielectric constant, do not combine frequency domain characteristics with polarization priors to achieve targeted filtering, and the network architecture struggles to balance global dependencies and local details. Furthermore, the loss function lacks multi-dimensional constraints, resulting in poor dehazing performance in non-uniform high-concentration fog scenes.
A method based on frequency-aware polarization filtering and state-space model is adopted, which combines Mamba's efficient global feature modeling capability with frequency-aware polarization filtering based on the physical prior of dielectric constant. A multi-domain loss function that integrates spatial domain, frequency domain, and polarization physical consistency is designed, and image dehazing is achieved through Mamba encoder and decoder.
It achieves excellent dehazing results in various scenarios such as uniform fog, non-uniform fog, and high-concentration fog. The dehazed image has no fog residue, no artifacts, and natural colors. The edge details and structural integrity are significantly better than existing methods. It has low computational complexity and meets real-time requirements.
Smart Images

Figure CN122243780A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of computer vision and digital image processing technology, specifically relating to an image dehazing method based on frequency-aware polarization filtering and a state-space model. Background Technology
[0002] Fields such as autonomous driving, drone inspection, and outdoor security have extremely high requirements for image quality. However, fog can reduce image contrast, distort colors, and cause loss of detail, directly affecting the environmental perception capabilities of intelligent sensing systems and even leading to safety accidents. Therefore, image dehazing has become a key research direction in the field of computer vision.
[0003] The total atmospheric light intensity in the image is a linear superposition of the target radiation and atmospheric light. Their polarization states are controlled by different dielectric constants, and they possess a natural decoupling characteristic in spatial frequency: the low-frequency component is mainly dominated by atmospheric light, and its polarization degree is determined by the dielectric constant of the aerosol particles; the high-frequency component is dominated by light reflected from the target surface, and its polarization degree is determined by the object's dielectric constant and geometry. This physical characteristic provides a core basis for the accurate separation of fog from scene content, but current technologies have not yet explored this in depth.
[0004] Image dehazing technology has gone through two stages: traditional methods and deep learning. Traditional methods are mainly divided into image enhancement, image restoration, and fusion methods. Image enhancement methods are computationally efficient, but they have poor adaptability in complex foggy scenes and are prone to over-enhancing or losing details. Image restoration methods are based on atmospheric scattering models and achieve dehazing by estimating atmospheric light and transmittance, but they are highly dependent on ideal physical assumptions and have large errors in estimating actual parameters. Fusion methods can improve robustness, but they are prone to introducing artifacts and are inefficient.
[0005] In recent years, deep learning has become mainstream. Early methods incorporated atmospheric scattering models into the network to learn parameters such as transmittance, but these only used physical priors as training guides without deep integration into the architecture, resulting in a disconnect between the physical mechanism and feature learning. Subsequent end-to-end methods introduced attention mechanisms, GANs (Generative Adversarial Networks), and Transformers, achieving some success in non-uniform fog scenes, but they are entirely data-driven and have poor interpretability. At the same time, existing methods generally ignore the frequency domain nature of fog. Fog mainly interferes with low-frequency components, while high-frequency components correspond to detailed textures, making it difficult to balance defogging intensity and detail preservation in complex scenes, easily leading to blurred edges or fog residue.
[0006] Polarization dehazing technology has become an important direction due to its utilization of polarization differences, but it still has shortcomings: traditional methods rely on specialized hardware, limiting their practical application; existing deep learning methods only perform shallow fusion of polarization priors, failing to combine dielectric constant differences to achieve decoupling of polarization frequency features. Furthermore, the network architecture struggles to effectively capture global dependencies, and the loss function only focuses on pixel errors, lacking multi-dimensional constraints, leading to texture distortion in high-concentration fog scenes.
[0007] Furthermore, existing networks mostly employ CNNs (Convolutional Neural Networks) or Transformers. CNNs have limited global dependency capture capabilities, and while Transformers can model long-range dependencies, their high computational complexity makes them difficult to meet real-time requirements. The Mamba network achieves efficient global dependency capture through a two-dimensional state-space (SS2D) model, but it has not yet been organically integrated with polarization physics priors and frequency domain filtering in image dehazing.
[0008] In summary, existing technologies suffer from the following core problems: they fail to exploit the polarization frequency decoupling characteristics caused by differences in dielectric constants; they do not combine frequency domain characteristics with polarization priors to achieve targeted filtering; the network architecture struggles to balance global dependencies and local details; the loss function lacks multi-dimensional constraints; and the defogging effect is poor in non-uniform high-concentration fog scenarios. Summary of the Invention
[0009] The purpose of this invention is to solve the problems existing in the prior art and to propose an image dehazing method based on frequency-aware polarization filtering and state-space model. It combines Mamba's efficient global feature modeling capability with frequency-aware polarization filtering based on the physical prior of dielectric constant, and designs a multi-domain loss function that integrates spatial domain, frequency domain, and polarization physical consistency to achieve image dehazing with physical credibility and high visual quality.
[0010] To achieve the above objectives, the technical solution provided by this invention is as follows:
[0011] An image dehazing method based on frequency-aware polarization filtering and a state-space model includes:
[0012] Obtain the foggy image and its corresponding transmission map and semantic segmentation map;
[0013] The foggy image is input into a pre-built Mamba encoder, and high-level encoded features are obtained through three-level downsampling and two-dimensional state space operations.
[0014] Based on the foggy image, the transmission image, and the semantic segmentation image, three polarization images are obtained from different angles. The polarization images are then decoupled into low-frequency polarization features and high-frequency polarization features.
[0015] High-level coding features are mapped from the spatial domain to the frequency domain to obtain complex frequency domain features;
[0016] Low-frequency guiding weights and high-frequency guiding weights are generated based on low-frequency polarization features and high-frequency polarization features, respectively. Frequency domain modulation weights are obtained by combining them with a real-time constructed binary frequency mask. The frequency domain modulation weights are then applied to the complex frequency domain features to perform polarization-guided modulation in the frequency domain, resulting in modulated complex frequency domain features.
[0017] The modulated complex frequency domain features are mapped from the frequency domain to the spatial domain to obtain the filtered spatial domain features.
[0018] Based on the filtered spatial domain features and high-level coding features, the third-level upsampled features are obtained through three-level upsampling.
[0019] The third-level upsampling features are input into the pre-built image reconstruction module, and the dehazed image is output.
[0020] Furthermore, the foggy image is input into a pre-constructed Mamba encoder, and high-level coding features are obtained through three-level downsampling and two-dimensional state-space operations, including:
[0021] The Mamba encoder adopts a three-level downsampling structure. Each downsampling unit consists of a convolutional local feature extraction layer, a two-dimensional state space global feature modeling layer, a residual connection layer, and a stride convolutional downsampling layer. The convolutional local feature extraction layer includes a convolutional layer, a GELU activation function, and a convolutional layer in sequence.
[0022] In each residual connection layer, the global features of the current level downsampled after being modeled by the two-dimensional state space global feature modeling layer are residually connected with the input features of the current level downsampled unit. Each level downsampled unit outputs the downsampled features of the current level.
[0023] The downsampled features output by the third-level downsampling unit are sequentially passed through a convolutional local feature extraction layer, a two-dimensional state space global feature modeling layer, and a residual connection layer to obtain high-level encoded features.
[0024] Furthermore, the process of obtaining polarization images from three angles based on the foggy image, the transmission image, and the semantic segmentation image includes:
[0025] Atmospheric light polarization degree is obtained from foggy images;
[0026] The intensity of transmitted light and atmospheric light are obtained based on the transmission diagram;
[0027] The polarization degree of transmitted light is obtained based on the semantic segmentation map;
[0028] Based on atmospheric light polarization degree, transmitted light intensity, atmospheric light intensity, and transmitted light polarization degree, three angular polarization images at 0 degrees, 45 degrees, and 90 degrees are obtained by calculating using atmospheric scattering model formulas.
[0029] Furthermore, the low-frequency guiding weight and high-frequency guiding weight are generated based on the low-frequency polarization features and high-frequency polarization features, respectively. These are then combined with a real-time constructed binary frequency mask to obtain frequency domain modulation weights. These frequency domain modulation weights are applied to the complex frequency domain features to perform polarization-guided modulation in the frequency domain, resulting in modulated complex frequency domain features, including:
[0030] The dimensions of the low-frequency polarization features and the high-frequency polarization features are matched with the complex frequency domain features. The matched low-frequency polarization features and the high-frequency polarization features are then subjected to nonlinear transformations to generate low-frequency and high-frequency guiding weights for frequency domain modulation.
[0031] A binary frequency mask is constructed based on the coordinates of the complex frequency domain features, expressed by the formula:
[0032]
[0033] in, For indicator functions, The current coordinates of the complex frequency domain features. For horizontal frequency coordinates, For vertical frequency coordinates, Indicates the height of the complex frequency domain features. M( represents the width of the complex frequency domain feature) ) represents the coordinates in the two-dimensional frequency domain space. The binary frequency mask value at a given location, and the binary frequency mask values of all coordinates of the complex frequency domain feature constitute the binary frequency mask;
[0034] The binary frequency mask is multiplied and fused element-wise with the low-frequency and high-frequency guiding weights to obtain the fused frequency domain modulation weights.
[0035] The frequency domain modulation weights are applied to the complex frequency domain features to perform polarization-guided modulation in the frequency domain, resulting in the modulated complex frequency domain features.
[0036] Furthermore, the third-level upsampled features, obtained by three levels of upsampling based on the filtered spatial domain features and high-level coding features, include:
[0037] The filtered spatial domain features and high-level coded features are residually concatenated to obtain the bottleneck features.
[0038] The bottleneck features are input into the pre-built Mamba decoder. The Mamba decoder adopts a three-level upsampling structure. Each upsampling unit includes a stride convolutional upsampling layer, a convolutional local feature extraction layer, a two-dimensional state space global feature modeling layer, and a residual connection layer. The convolutional local feature extraction layer includes a convolutional layer, a GELU activation function, and another convolutional layer in sequence.
[0039] The third-level downsampling feature is fused with the output of the first-level stride convolutional upsampling layer. The fused feature is then subjected to convolutional local feature extraction and global feature modeling in sequence. The result is then residually connected with the fused feature to obtain the first-level upsampling feature.
[0040] The first-level upsampled features are input into the second-level stride convolutional upsampled layer, and the second-level upsampled features are obtained based on the second-level downsampled features and the output of the second-level stride convolutional upsampled layer.
[0041] The second-level upsampled features are input into the third-level stride convolutional upsampled layer, and the third-level upsampled features are obtained based on the first-level downsampled features and the output of the third-level stride convolutional upsampled layer.
[0042] Furthermore, the pre-built image reconstruction module sequentially includes a convolutional layer, a GELU activation function, and another convolutional layer.
[0043] Furthermore, the image dehazing method based on frequency-aware polarization filtering and state-space model is trained and optimized using a multi-domain loss function, which includes spatial domain loss, frequency domain decomposition loss, and polarization consistency loss.
[0044] Furthermore, the frequency domain decomposition loss includes high-frequency loss and low-frequency loss, with the high-frequency loss expressed by the formula:
[0045]
[0046] in, For high-frequency loss, For two-dimensional fast Fourier transform, The modulus is a feature of the complex frequency domain. For the natural logarithm operation, This represents the total number of pixels in a single image. After the dehazed image undergoes a Fast Fourier Transform, the first frequency domain feature is... The feature values of each feature point To clearly define the reference image in the frequency domain features The eigenvalues corresponding to each feature point;
[0047] The low-frequency loss can be expressed by the formula:
[0048]
[0049] in, For low-frequency loss, The phase is a characteristic of the complex frequency domain. The low-frequency region is the center of the frequency domain. This refers to the number of pixels in the low-frequency region. This indicates that the dehazed image is in the low-frequency region. The feature values of each feature point This indicates that the clear reference image is in the low-frequency region. The eigenvalues of each feature point.
[0050] Furthermore, the polarization uniformity loss is expressed by the formula:
[0051]
[0052] in, Indicates polarization uniformity loss. Indicates the dehazed image number 1 Reconstructed polarization degree of each pixel, The first clear reference image The observed polarization degree of each pixel.
[0053] Compared with existing technologies, this invention has the following significant advantages: Firstly, it explores and utilizes the natural decoupling characteristic of polarization frequency caused by the difference in dielectric constant between atmospheric aerosols and the target object. It deeply integrates the prior knowledge of polarization physics based on the dielectric constant into frequency domain feature processing. Simultaneously, through a multi-domain loss function, it transforms the polarization imaging equation into a differentiable optimization constraint. The dehazing result not only conforms to the physical laws related to polarization physics and the dielectric constant but also ensures the consistency between the reconstructed polarization degree and the observed polarization degree, solving the problems of poor interpretability and insufficient physical reliability of existing data-driven methods. The SS2D-Mamba encoding and decoding architecture achieves a balance between global long-range dependencies and local details. The frequency-aware polarization filtering module effectively protects high-frequency edges and texture details determined by the target dielectric constant while removing fog. The multi-domain loss function further enhances the polarization imaging process through frequency domain amplitude and phase constraints. The method ensures the consistency of image structure and texture. After dehazing, the image is free of fog residue, artifacts, and has natural colors. The edge details and structural integrity are significantly better than existing methods. The polarization frequency decoupling characteristics based on dielectric constant have universal physical significance. The multi-dimensional constraints of the multi-domain loss function allow the network to adapt to different fog distribution characteristics, enabling the method to achieve excellent dehazing effects in various scenarios such as uniform fog, non-uniform fog, and high-concentration fog. It performs particularly well in complex foggy scenarios in practical applications such as autonomous driving and drone monitoring, solving the problem of poor performance of existing methods in non-uniform high-concentration fog. The computational complexity and memory usage of the network are lower than those of the Transformer architecture. The frequency domain processing is a linear operation without complex multi-stage optimization. The method meets the real-time requirements of practical scenarios while ensuring dehazing effect. Attached Figure Description
[0054] Figure 1 This is a flowchart illustrating the overall architecture of the image dehazing method based on state-space model and frequency-sensing polarization filtering according to the present invention.
[0055] Figure 2 This is a detailed structural diagram of the SS2D-Mamba encoding and decoding architecture of the present invention;
[0056] Figure 3 This is a schematic diagram illustrating the specific processing flow of the frequency-sensing polarization filtering module of the present invention. Detailed Implementation
[0057] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the invention.
[0058] like Figures 1-3 As shown, this invention provides an image dehazing method based on frequency-aware polarization filtering and a state-space model. The specific steps are as follows:
[0059] Step 1: The SS2D-Mamba encoding / decoding architecture consists of an SS2D-based Mamba encoder, a bottleneck layer, and a Mamba decoder. The Mamba encoder employs a three-level downsampling structure. Each downsampling unit comprises a Mamba module and a stride convolutional downsampling layer. The Mamba module consists of a convolutional local feature extraction layer, an SS2D global feature modeling layer, and a residual connection layer. Simultaneously, a skip connection feature storage module stores the downsampling features output from each downsampling unit for feature fusion in the Mamba decoder.
[0060] The convolutional local feature extraction layer uses the input feature map (the input feature map of the first-level downsampling unit's convolutional local feature extraction layer is the foggy image to be processed) to... The convolutional layer extracts local features, and the GELU activation function follows the convolutional layer to achieve a non-linear transformation of the features. The calculation formula is as follows:
[0061]
[0062] in, For the input features of the current level, for Convolution operation, For convolution bias, Let Gaussian error be the linear activation function. These are the local features extracted at the current level.
[0063] In the SS2D global feature modeling layer, the core is the State Space Model (SSM). This embodiment uses a two-dimensional State Space Model (SS2D) to achieve two-dimensional global feature modeling of the image, solving the problem that traditional one-dimensional State Space Models cannot adapt to the two-dimensional structure of images. First, the local features extracted by convolution are dimensionally permuted, changing the features from standard... Format conversion Format( For batch size, For the number of channels, For height, (for width), adapted to the computational requirements of SS2D; then, SS2D operations are used to selectively scan the height and width dimensions of the feature to capture global remote dependencies, calculated as follows:
[0064]
[0065] in, These are local features after dimension permutation. The state-space parameters of SS2D are learned autonomously by the network. This refers to the global features after modeling (i.e., the global features downsampled at the current level).
[0066] In the residual connection layer and stride convolutional downsampling layer, the global features modeled by SS2D are residually connected with the input features of the current level to alleviate the gradient vanishing problem during network training. Then, a stride of 2 is used... Convolution achieves downsampling, halving the feature resolution and doubling the number of channels. The calculation formula is as follows:
[0067]
[0068] in, For stride 2 Convolution operation, The downsampled features output by the current stride convolutional downsampling layer, and simultaneously The skip connection features are stored in the skip connection feature storage module. .
[0069] The downsampled features, after three levels of downsampling, are then processed by a Mamba module to obtain high-level coded features.
[0070] In this embodiment, the encoder input is a foggy image. After the first level of downsampling, the feature resolution is Channel number 64, second level is Number of channels: 128; Level 3: With 256 channels, the final output high-level encoded features The frequency-sensing polarization filtering module inputs to the bottleneck layer.
[0071] Step 2: Synthesize polarized images using an algorithm, without requiring specialized hardware. First, obtain the transmission map and semantic segmentation map corresponding to the foggy image (the transmission map and semantic segmentation map are predicted based on the foggy image). Based on statistical regularities and physical characteristics, randomly generate atmospheric light polarization degrees that conform to the distribution of the real environment. The intensity of transmitted light is obtained based on the transmission diagram. and atmospheric light intensity Next, using semantic segmentation maps as guidance, regions of different materials in the image are assigned corresponding degrees of transmitted light polarization. (A specific transmitted light polarization value is assigned to each semantic category, typically sampled and generated within the range of [0.025, 0.2]. Finally, combining the traditional atmospheric scattering model with Malus's law in optics, these four physical parameters are fused into a total polarization degree.) And calculate based on the atmospheric scattering model formula , and Polarization images at three different angles.
[0072] The polarization frequency decomposition module performs channel merging and dimensionality reduction compression on the three polarization images generated at different angles to extract a unified basic representation of polarization features. The compressed features are then subjected to high-frequency and low-frequency filtering to extract high-frequency and low-frequency features respectively. Adaptive weights generated by a gating network are then used to dynamically modulate the high-frequency and low-frequency features, ultimately outputting the accurately separated polarization high-frequency features. (High-frequency features, corresponding to image edges / texture) and low-frequency polarization features (Low frequency, corresponding to image structure / uniform region). Specifically, the gated network consists of a 3×3 convolutional layer, a GELU activation function, and a Sigmoid activation function. The convolutional layer is responsible for extracting structural information from polarization features, GELU enhances nonlinear expressive power, and the final Sigmoid function normalizes the output value. The generated adaptive weights represent the "retention rate" of the corresponding frequency region. For example, in a heavily foggy region, the low-frequency noise component is large, so the low-frequency weights generated by the gated network for this region will be very small. Therefore, when the low-frequency weights are multiplied by the original features, the low-frequency component can be dynamically and adaptively reduced to guide the model to more effectively filter out low-frequency interference.
[0073] All operations are designed to be differentiable, ensuring that gradients can propagate back through the entire generation process. Random sampling during parameter generation is made differentiable through reparameterization techniques, allowing modules to be seamlessly integrated into the training framework. By transforming the complex polarization physics into a learnable parameterized model, physical rationality is maintained, effectively overcoming the dependence of traditional polarization dehazing methods on dedicated hardware, and providing a practical solution for image polarization dehazing.
[0074] Step 3: The frequency-aware polarization filtering module is set in the bottleneck layer of the network to process the high-level coding features output by the bottleneck layer Mamba module. While ensuring computational efficiency, it uses polarization physical priors and frequency domain characteristics to achieve accurate separation of fog and scene content.
[0075] The input to the frequency-aware polarization filtering module is the high-level coded features output from the bottleneck layer Mamba module. (in , , (and polarization features, including high-frequency polarization features extracted by the polarization frequency decomposition module) (High-frequency features, corresponding to image edges / texture) and low-frequency polarization features (Low frequency, corresponding to image structure / uniform region), all are... Dimension.
[0076] High-level coding features output by the bottleneck layer Mamba module are obtained through two-dimensional Fast Fourier Transform (FFT). Mapping to the frequency domain yields complex frequency domain features. The calculation formula is:
[0077]
[0078] in, This is a two-dimensional fast Fourier transform operation. Let be the real part of the feature in the complex frequency domain. The imaginary part of the feature in the complex frequency domain. It is the imaginary unit.
[0079] High-frequency polarization characteristics and low-frequency polarization characteristics Channel adaptation and spatial alignment are performed, that is, the adaptation layer is used to align the dimensions of polarization features with those of complex frequency domain features. Exact match, the adapter layer is It consists of convolution and batch normalization (BN), and the calculation formula is as follows:
[0080]
[0081] in, , These are adaptation layers for high-frequency polarization characteristics and low-frequency polarization characteristics, respectively. , The dimensions of the adapted high-frequency polarization features and low-frequency polarization features are both [dimensions missing]. .
[0082] The adapted polarization high and low frequency characteristics are nonlinearly transformed by a guiding network to generate frequency-domain modulated high and low frequency guiding weights. The guiding network is then... Composed of convolution, GELU activation function, and Sigmoid activation function, it normalizes the weight values to... The interval is used to achieve adaptive modulation of complex frequency domain features, and the calculation formula is as follows:
[0083]
[0084] in, For high-frequency guidance networks, For low-frequency boot network, For high-frequency guided weights, For low-frequency guidance weights, all dimensions are... The value range is .
[0085] Constructing a binary frequency mask This is used to distinguish between high-frequency and low-frequency components of complex frequency domain features, enabling adaptive fusion of high and low frequency guided weights. The binary frequency mask divides the space frequency based on the complex frequency domain features, defining the central region as the low-frequency component and the edge region as the high-frequency component. The calculation formula is as follows:
[0086]
[0087] in, The current coordinates of the complex frequency domain features. For horizontal frequency coordinates, M is the vertical frequency coordinate. Represents coordinates in the two-dimensional frequency domain space The binary frequency mask value at the location, the binary frequency mask values of all coordinates of the complex frequency domain feature constitute the binary frequency mask. The binary frequency mask is a binary mask matrix with only 0 or 1 values, which is used to automatically distinguish high frequency components from low frequency components in the complex frequency domain feature, and provide a spatial frequency division basis for the adaptive fusion of high and low frequency guiding weights. This is an indicator function; when the condition inside the parentheses is true, (High-frequency region), otherwise (Low-frequency region), binary frequency mask The dimension is .
[0088] Binary frequency mask With high frequency guided weights Low-frequency guided weight Element-wise multiplication and fusion are performed to obtain the fused frequency domain modulation weights. Subsequently, the frequency domain modulation weights are applied to the complex frequency domain features through complex multiplication. This achieves polarization-guided modulation in the frequency domain. The high-frequency region is composed of... Guide and protect the edge texture details of the scene; low-frequency areas are... The formula for guiding and removing low-frequency interference from fog is as follows:
[0089]
[0090]
[0091] in, For element-wise multiplication, This is the inverse mask of the binary frequency mask. These are the characteristics of the modulated complex frequency domain.
[0092] The modulated complex frequency domain features are obtained by using a two-dimensional inverse fast Fourier transform (IFFT). Mapping back to the spatial domain yields the filtered spatial domain features. The calculation formula is:
[0093]
[0094] in, This is a two-dimensional inverse fast Fourier transform operation.
[0095] Filtered spatial domain features High-level coding features output by the bottleneck layer Mamba module Residual connections are performed to enhance features and mitigate the loss of feature information caused by frequency domain transformation, ultimately yielding the output features of the bottleneck layer (bottleneck features). The calculation formula is:
[0096]
[0097] As input to the Mamba decoder, it enters the subsequent feature upsampling and image reconstruction process.
[0098] Step 4: The Mamba decoder adopts a three-level upsampling structure, which is symmetrical to the three-level downsampling structure of the encoder. Each upsampling unit consists of a stride convolutional upsampling layer, a skip feature fusion layer, a convolutional local feature extraction layer, an SS2D global feature modeling layer, and a residual connection layer. The core is to gradually restore the feature resolution and fuse the skip connection features of the encoder to preserve local details.
[0099] Upsampling is achieved by transposing the input feature map, which doubles the feature resolution and halves the number of channels. In this embodiment, a stride of 2 is used. The formula for transposed convolution is:
[0100]
[0101] in, The input features for the current level are the features output from the bottleneck layer (Mamba module and frequency-aware polarization filter module). , For stride 2 Transpose convolution operation, This represents the output features of the current stride convolutional upsampling layer.
[0102] Extract downsampled features of the corresponding level of the encoder from the skip connection feature storage module, and then... The convolution matches the number of channels with the output features of the current stride convolution upsampling layer, and then uses element-wise addition to achieve feature fusion. The calculation formula is as follows:
[0103]
[0104] in, For the skip connection features of the encoder, for Channel matching operation in convolution. This refers to the fusion features after the current-level jump fusion.
[0105] The fusion features after skip fusion are processed sequentially. Convolutional local feature extraction and SS2D global feature modeling are performed. Then, the modeled global features are residually concatenated with the fused features to achieve feature enhancement. The calculation formula is as follows:
[0106]
[0107] in, This refers to the upsampled features of the current level.
[0108] Specifically, the third-level downsampling features are fused with the output of the first-level stride convolutional upsampling layer in a skip fusion manner. The fused features are then subjected to convolutional local feature extraction and global feature modeling in sequence. The results are then residually connected with the fused features to obtain the first-level upsampling features.
[0109] The first-level upsampled features are input into the second-level stride convolutional upsampled layer. The second-level downsampled features are then fused with the output of the second-level stride convolutional upsampled layer. The fused features are then subjected to convolutional local feature extraction and global feature modeling in sequence. The results are then residually connected with the fused features to obtain the second-level upsampled features.
[0110] The second-level upsampled features are input into the third-level stride convolutional upsampled layer. The first-level downsampled features and the output of the third-level stride convolutional upsampled layer are fused together. The fused features are then subjected to convolutional local feature extraction and global feature modeling in sequence. The results are then residually connected with the fused features to obtain the third-level upsampled features.
[0111] The decoder's input is the high-level features processed by the frequency-aware polarization filtering module, with a feature resolution of [missing value] after the first-stage upsampling. Channel number 128, second level is Number of channels: 64; Level 3: With 32 channels, the final output is the third-level upsampling feature. The data is then input into the image reconstruction module.
[0112] Step 5, the image reconstruction module uses Convolutional layers: the first convolution does not change the number of channels, then it passes through GELU. The function performs a non-linear transformation of the features. The second convolutional layer maps the 32-channel features output by the decoder to 3-channel RGB features, outputting a dehazed image with the same resolution as the input hazy image. The calculation formula is as follows:
[0113]
[0114] in, This is the third-level upsampling feature of the decoder. for Convolution operation, Let Gaussian error be the linear activation function. This is a dehazed image.
[0115] Step 6: To achieve unified optimization of physical constraints and visual quality, this invention designs a multi-domain loss function that integrates spatial, frequency, and polarization physical consistency. This multi-dimensional constraint guides network learning, ensuring pixel accuracy, structural integrity, and physical reliability of the dehazed image. The multi-domain loss function consists of spatial domain loss... Frequency domain decomposition loss (Including high-frequency loss) and low-frequency loss and polarization consistency loss It consists of three parts, and the total loss is a weighted fusion of the losses of each part. All loss calculations are based on the image pixels within the batch, and the batch size is... .
[0116] Pixel-level fidelity constraints based on spatial domain loss are used to calculate the dehazed image using L1 loss. With clear reference images in the training set The L1 loss reduces pixel-level errors and is more robust to outliers, effectively generating clear image edges. The calculation formula is:
[0117]
[0118] in, This represents the total number of pixels in a single image. For the first dehazed image pixel value, For a clear reference image, the first pixel value, This is an absolute value operation.
[0119] The frequency domain decomposition loss, tailored to the frequency domain characteristics of fog, decouples the image restoration problem into two sub-tasks: high-frequency detail reconstruction and low-frequency structure preservation. Differential constraints are applied in the frequency domain using Fast Fourier Transform. .
[0120] High-frequency loss focuses on restoring high-frequency details such as image edges and textures. It employs logarithmic amplitude spectrum constraints to enhance the sensitivity of the loss function to high-frequency amplitude changes, aligning with the human visual system's perception of relative changes. The calculation formula is as follows:
[0121]
[0122] in, For two-dimensional fast Fourier transform, The modulus (amplitude spectrum) is a feature of the complex frequency domain. For the natural logarithm operation, After the dehazed image undergoes a Fast Fourier Transform, the first frequency domain feature is... The feature values of each feature point To clearly define the reference image in the frequency domain features The eigenvalues corresponding to each feature point.
[0123] Low-frequency loss focuses on preserving the global structure and uniform regions of the image. Phase constraints are used, limiting only the low-frequency region at the center of the frequency domain to avoid structural distortion. The calculation formula is as follows:
[0124]
[0125] in, The phase is a characteristic of the complex frequency domain. The low-frequency region at the center of the frequency domain (1 / 4 of the image size in this embodiment). This refers to the number of pixels in the low-frequency region. This indicates that the dehazed image is in the low-frequency region. The feature values of each feature point This indicates that the clear reference image is in the low-frequency region. The eigenvalues of each feature point.
[0126] Polarization consistency loss transforms the polarization imaging equation based on dielectric constant into a differentiable MSE loss target. The core logic is: if the dehazed image... If the predicted results conform to physical laws, then after being backfilled into the polarization imaging equation, the reconstructed polarization degree calculated must be consistent with the observed polarization degree. Completely identical. The core equation for polarization imaging is:
[0127]
[0128] in, Indicates degree of polarization. The transmitted light component is derived from the dehazed image. With transmittance diagram Obtained by element-wise multiplication; The atmospheric light component is determined by the global atmospheric light intensity. and Obtained by element-wise multiplication; The total intensity of the observed foggy image; The target polarization degree map (generated based on semantic features and determined by the target dielectric constant); Atmospheric polarization degree (a global parameter determined by the atmospheric dielectric constant); It is a numerical stability constant to avoid the denominator being 0.
[0129] Based on dehazing images The degree of polarization is reconstructed according to the polarization imaging equation. The calculation formula is:
[0130]
[0131] in, and These are global parameters and need to be expanded to a pixel-by-pixel dimension first. Then perform the calculation.
[0132] Reconstructing polarization degree using MSE loss constraint With observed polarization degree The consistency is calculated using the following formula:
[0133]
[0134] in, Indicates the dehazed image number 1 Reconstructed polarization degree of each pixel, The first clear reference image The observed polarization degree of each pixel.
[0135] The physical meaning of this loss is that if the dehazed image... If the prediction result is correct, then the corresponding polarization characteristics must conform to the physical law of polarization imaging dominated by the dielectric constant, and the reconstructed polarization degree must be completely matched with the actual observed polarization degree.
[0136] The total loss function is a weighted fusion of the losses from each component. The weight parameters for each loss are set through empirical tuning to balance the constraint strength across different dimensions. In this embodiment, a weight is set for the high-frequency loss. Low-frequency loss weights Polarization consistency loss weight The calculation formula is:
[0137]
[0138] The calculation of the multi-domain loss function is completed by the multi-domain loss function optimization module. The loss value is passed to all learnable parameters of the Mamba encoder, frequency-aware polarization filter module, and Mamba decoder through backpropagation. The Adam optimizer is used to update the parameters, thus completing the training and optimization of the model.
[0139] Table 1 Comparative Experimental Data
[0140] To verify the effectiveness and advancement of the method proposed in this invention, a quantitative comparative experiment was conducted on the "Foggy City Landscape Dataset" in this embodiment. Currently mainstream and representative defogging algorithms were selected as comparison methods, including: LTDWP: C. Zhou, M. Teng, Y. Han, C. Xu, and B. Shi. Learning to dehaze with polarization. in NeurIPS, 2021, pp. 11 487–11 500.
[0141] D4: Y. Yang, C. Wang, R. Liu, L. Zhang, X. Guo, and D. Tao. Self-augmented unpaired image dehazing via density and depth decomposition. inCVPR, 2022, pp. 2027–2036.
[0142] SCANet: Y. Guo, Y. Gao, RW Liu, Y. Lu, J. Qu, S. He, and W. Ren. Scanet: Self-paced semi-curricular attention network for non-homogeneous image dehazing. in CVPRW, 2023, pp. 1885–1894.
[0143] C2PNet: Y. Zheng, J. Zhan, S. He, J. Dong, and Y. Du. Curricular contrastive regularization for physics-aware single image dehazing. in CVPR, 2023, pp. 5785–5794.
[0144] DiffLI2D: Z. Yang, H. Yu, B. Li, J. Zhang, J. Huang, and F. Zhao. Unleashing the potential of the semantic latent space in diffusion models for image dehazing. in ECCV, 2024, pp. 371–389.
[0145] LHTD: R. Wang, Y. Zheng, Z. Zhang, C. Li, S. Liu, G. Zhai, and X. Liu. Learning hazing to dehazing: Towards realistic haze generation for real-world image dehazing. in CVPR, 2025, pp. 23 091–23 100.
[0146] The evaluation metrics included Peak Signal-to-Noise Ratio (PSNR), Structural Similarity (SSIM), Q-Align (a large-model-based, no-reference image quality evaluation metric), LIQE (a visual language-based metric for evaluating image quality), and the average processing time per image (in seconds). For PSNR, SSIM, Q-Align, and LIQE, higher values indicate better image quality, while lower average processing time values indicate higher algorithm efficiency. Specific quantitative comparative experimental results are shown in the table below.
[0147] As can be seen from the test data in Table 1, the method proposed in this invention achieves the best performance across all image quality evaluation metrics. In terms of pixel-level fidelity and structural restoration, the peak signal-to-noise ratio of the proposed method reaches 29.27, a significant improvement of approximately 2.4 compared to the second-best performing LHTD method (26.87). Simultaneously, the structural similarity of this invention reaches 0.9769, far exceeding all other comparative methods. This significant improvement fully verifies the effectiveness of the combination of the "frequency-aware polarization filtering module" and the "encoder-decoder architecture" in this invention. This mechanism can accurately separate and filter out low-frequency fog interference in the frequency domain, while greatly protecting high-frequency edges and texture details, avoiding the detail loss and over-smoothing phenomena common in existing deep learning dehazing methods, and achieving extremely high structural integrity.
[0148] In terms of visual perception quality (Q-Align and LIQE), this invention achieved the highest scores of 3.3109 and 1.792 for Q-Align and LIQE, respectively, which focus on human visual perception quality. This indicates that after optimization using a multi-domain loss function, the dehazing result not only minimizes mathematical error but also exhibits more natural color reproduction, artifact suppression, and overall visual appeal, closely matching the physical laws of real-world scenes.
[0149] Regarding the balance between computational efficiency and performance, the average inference time for processing a single image is 2.68 seconds. While slightly longer than the extremely lightweight SCANet network (0.16 seconds), the computational efficiency of this invention is far superior to other generative or complex architecture networks (e.g., processing speed is nearly 50 times that of DiffLI2D and nearly 35 times that of LHTD) while achieving a significant lead in dehazing quality. This demonstrates that the two-dimensional state-space model (SS2D) used in this invention successfully overcomes the bottleneck of quadratic growth in computational complexity of traditional large models (such as Transformer), maintaining highly competitive operating efficiency while ensuring industry-leading dehazing accuracy, and possessing significant value for engineering implementation and practical deployment.
[0150] Based on the comparison of the above indicators, it can be seen that the image dehazing method proposed in this invention, which is based on frequency-aware polarization filtering and state-space model, perfectly solves the technical problem of balancing dehazing intensity and detail preservation in complex foggy scenes. It achieves the optimal balance between dehazing accuracy, visual quality and computational cost, and its overall performance is significantly better than that of existing mainstream technologies.
[0151] The embodiments described above are merely illustrative of several implementations of the present invention, and while the descriptions are relatively specific and detailed, they should not be construed as limiting the scope of the invention patent. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of the present invention, and these all fall within the protection scope of the present invention. Therefore, the protection scope of this invention patent should be determined by the appended claims.
Claims
1. An image dehazing method based on frequency-aware polarization filtering and a state-space model, characterized in that, The image dehazing method based on frequency-aware polarization filtering and state-space model includes: Obtain the foggy image and its corresponding transmission map and semantic segmentation map; The foggy image is input into a pre-built Mamba encoder, and high-level encoded features are obtained through three-level downsampling and two-dimensional state space operations. Based on the foggy image, the transmission image, and the semantic segmentation image, three polarization images are obtained from different angles. The polarization images are then decoupled into low-frequency polarization features and high-frequency polarization features. High-level coding features are mapped from the spatial domain to the frequency domain to obtain complex frequency domain features; Low-frequency guiding weights and high-frequency guiding weights are generated based on low-frequency polarization features and high-frequency polarization features, respectively. Frequency domain modulation weights are obtained by combining them with a real-time constructed binary frequency mask. The frequency domain modulation weights are then applied to the complex frequency domain features to perform polarization-guided modulation in the frequency domain, resulting in modulated complex frequency domain features. The modulated complex frequency domain features are mapped from the frequency domain to the spatial domain to obtain the filtered spatial domain features. Based on the filtered spatial domain features and high-level coding features, the third-level upsampled features are obtained through three-level upsampling. The third-level upsampling features are input into the pre-built image reconstruction module, and the dehazed image is output.
2. The image dehazing method based on frequency-aware polarization filtering and state-space model according to claim 1, characterized in that, The process involves inputting a foggy image into a pre-constructed Mamba encoder, and obtaining high-level coded features through three-level downsampling and two-dimensional state-space operations, including: The Mamba encoder adopts a three-level downsampling structure. Each downsampling unit consists of a convolutional local feature extraction layer, a two-dimensional state space global feature modeling layer, a residual connection layer, and a stride convolutional downsampling layer. The convolutional local feature extraction layer includes a convolutional layer, a GELU activation function, and a convolutional layer in sequence. In each residual connection layer, the global features of the current level downsampled after being modeled by the two-dimensional state space global feature modeling layer are residually connected with the input features of the current level downsampled unit. Each level downsampled unit outputs the downsampled features of the current level. The downsampled features output by the third-level downsampling unit are sequentially passed through a convolutional local feature extraction layer, a two-dimensional state space global feature modeling layer, and a residual connection layer to obtain high-level encoded features.
3. The image dehazing method based on frequency-aware polarization filtering and state-space model according to claim 1, characterized in that, The method of obtaining polarization images from three angles based on foggy images, transmission maps, and semantic segmentation maps includes: Atmospheric light polarization degree is obtained from foggy images; The intensity of transmitted light and atmospheric light are obtained based on the transmission diagram; The polarization degree of transmitted light is obtained based on the semantic segmentation map; Based on atmospheric light polarization degree, transmitted light intensity, atmospheric light intensity, and transmitted light polarization degree, three angular polarization images at 0 degrees, 45 degrees, and 90 degrees are obtained by calculating using atmospheric scattering model formulas.
4. The image dehazing method based on frequency-aware polarization filtering and state-space model according to claim 1, characterized in that, The process involves generating low-frequency and high-frequency guiding weights based on low-frequency and high-frequency polarization features, respectively, and combining these with a real-time constructed binary frequency mask to obtain frequency domain modulation weights. These frequency domain modulation weights are then applied to the complex frequency domain features to perform polarization-guided modulation in the frequency domain, resulting in modulated complex frequency domain features, including: The dimensions of the low-frequency polarization features and the high-frequency polarization features are matched with the complex frequency domain features. The matched low-frequency polarization features and the high-frequency polarization features are then subjected to nonlinear transformations to generate low-frequency and high-frequency guiding weights for frequency domain modulation. A binary frequency mask is constructed based on the coordinates of the complex frequency domain features, expressed by the formula: in, For indicator functions, The current coordinates of the complex frequency domain features. For horizontal frequency coordinates, For vertical frequency coordinates, Indicates the height of the complex frequency domain features. M( represents the width of the complex frequency domain feature) ) represents the coordinates in the two-dimensional frequency domain space. The binary frequency mask value at a given location, and the binary frequency mask values of all coordinates of the complex frequency domain feature constitute the binary frequency mask; The binary frequency mask is multiplied and fused element-wise with the low-frequency and high-frequency guiding weights to obtain the fused frequency domain modulation weights. The frequency domain modulation weights are applied to the complex frequency domain features to perform polarization-guided modulation in the frequency domain, resulting in the modulated complex frequency domain features.
5. The image dehazing method based on frequency-aware polarization filtering and state-space model according to claim 1, characterized in that, The filtered spatial domain features and high-level coding features are subjected to three levels of upsampling to obtain the third-level upsampled features, including: The filtered spatial domain features and high-level coded features are residually concatenated to obtain the bottleneck features. The bottleneck features are input into the pre-built Mamba decoder. The Mamba decoder adopts a three-level upsampling structure. Each upsampling unit includes a stride convolutional upsampling layer, a convolutional local feature extraction layer, a two-dimensional state space global feature modeling layer, and a residual connection layer. The convolutional local feature extraction layer includes a convolutional layer, a GELU activation function, and another convolutional layer in sequence. The third-level downsampling feature is fused with the output of the first-level stride convolutional upsampling layer. The fused feature is then subjected to convolutional local feature extraction and global feature modeling in sequence. The result is then residually connected with the fused feature to obtain the first-level upsampling feature. The first-level upsampled features are input into the second-level stride convolutional upsampled layer, and the second-level upsampled features are obtained based on the second-level downsampled features and the output of the second-level stride convolutional upsampled layer. The second-level upsampled features are input into the third-level stride convolutional upsampled layer, and the third-level upsampled features are obtained based on the first-level downsampled features and the output of the third-level stride convolutional upsampled layer.
6. The image dehazing method based on frequency-aware polarization filtering and state-space model according to claim 1, characterized in that, The pre-built image reconstruction module includes, in sequence, a convolutional layer, a GELU activation function, and another convolutional layer.
7. The image dehazing method based on frequency-aware polarization filtering and state-space model according to claim 1, characterized in that, The image dehazing method based on frequency-aware polarization filtering and state-space model is trained and optimized using a multi-domain loss function, which includes spatial domain loss, frequency domain decomposition loss, and polarization consistency loss.
8. The image dehazing method based on frequency-aware polarization filtering and state-space model according to claim 7, characterized in that, The frequency domain decomposition loss includes high-frequency loss and low-frequency loss. The high-frequency loss is expressed by the formula: in, For high-frequency loss, For two-dimensional fast Fourier transform, The modulus is a feature of the complex frequency domain. For the natural logarithm operation, This represents the total number of pixels in a single image. After the dehazed image undergoes a Fast Fourier Transform, the first frequency domain feature is... The feature values of each feature point To clearly define the reference image in the frequency domain features The eigenvalues corresponding to each feature point; The low-frequency loss can be expressed by the formula: in, For low-frequency loss, The phase is a characteristic of the complex frequency domain. The low-frequency region is the center of the frequency domain. This refers to the number of pixels in the low-frequency region. This indicates that the dehazed image is in the low-frequency region. The feature values of each feature point This indicates that the clear reference image is in the low-frequency region. The eigenvalues of each feature point.
9. The image dehazing method based on frequency-aware polarization filtering and state-space model according to claim 7, characterized in that, The polarization uniformity loss is expressed by the formula: in, Indicates polarization uniformity loss. Indicates the dehazed image number 1 Reconstructed polarization degree of each pixel, The first clear reference image The observed polarization degree of each pixel.