Lightweight multi-scale image defogging method based on attention mechanism
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SOUTHEAST UNIV
- Filing Date
- 2023-04-27
- Publication Date
- 2026-06-12
Smart Images

Figure CN116523782B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the dehazing problem in image processing, and in particular to a dehazing method based on convolutional neural networks. Background Technology
[0002] With the continuous advancement of deep learning technology, the powerful computational and representational capabilities of Convolutional Neural Networks (CNNs) have made them shine in the field of computer vision, gradually replacing traditional learning methods. CNNs can not only extract shallow features of images, such as background and contours, through convolutional kernels, but also extract features continuously through multiple layers of convolutional kernels, enabling them to extract high-frequency details of images.
[0003] Image dehazing is the process of restoring a hazy image to a hazy one using specific techniques. It has significant research and application value in both production and daily life. For example, in license plate recognition, dehazing can improve the accuracy of license plate recognition, thereby ensuring traffic safety. In terrain surveying, dehazing technology can improve the quality of aerial images, better serving subsequent work. In entertainment, high-quality images can increase public satisfaction. Furthermore, image dehazing technology also has important applications in forestry early warning and transportation.
[0004] Currently, image dehazing technology has become a research hotspot in the field of computer vision. To solve this problem, many methods have been proposed to restore foggy images to fog-free images. Image enhancement-based dehazing methods do not need to focus on the reasons for the image quality reduction in foggy weather, but only improve the local or overall quality by enhancing certain features, such as reducing brightness and increasing saturation to improve the image quality. However, these image enhancement methods cannot estimate the fog concentration and can only remove fog to a certain extent, which may be too light or too heavy. Deep learning-based methods can rely on the powerful computing and feature representation capabilities of convolutional neural networks to perform deeper image feature extraction, focusing on the internal factors that cause the image quality reduction, and then constructing the corresponding degradation model of fog image to finally realize the restoration of the target scene. Its main design includes two aspects: (1) the overall network framework. (2) the specific network structure design. Since the development of deep convolutional networks, many classic and efficient network structure designs have emerged, such as residual learning, recursive learning, dense connections, etc. Different designs will bring different performance and parameter effects to the network model, and it is necessary to select a suitable network structure design through continuous experiments. Therefore, designing a dehazing method that is applicable to various scenarios, has low complexity, and produces stable results remains a challenging task. Summary of the Invention
[0005] 1. Technical Problem: The technical problem to be solved by this invention is to propose a lightweight attention mechanism.
[0006] This paper proposes a multi-scale image dehazing method. This method uses a novel multi-scale depthwise separable convolutional attention network (MDSCA-NET) based on convolutional neural networks to solve the problem of previous image dehazing networks processing information from different channels and different pixel positions equally.
[0007] 2. Technical Solution: To solve the above problems, the present invention adopts the following technical solution:
[0008] A lightweight multi-scale image dehazing method based on an attention mechanism is characterized by the following steps:
[0009] Feature extraction for foggy images;
[0010] The obtained feature map is input into the MDSCA-NET network to reconstruct the dehazed image. The MDSCA-NET network includes several image dehazing models and a model fusion module. The image dehazing model includes a feature attention module and a dehazed image generation module. The feature attention module is used to extract pixel features from the feature map. The dehazed image generation module is used to generate an initial dehazed image from the pixel feature map. The model fusion module is used to fuse several image dehazing models using an alternating direction multiplier optimization algorithm to output the final dehazed image.
[0011] Preferably, the feature attention module includes a channel attention module and a pixel attention module. The channel attention module first extracts and compresses features from the image using max pooling, then passes the features through two convolutional layers and sigmoid and ReLU activation function layers to obtain the weights of each channel. Then, the original input feature map and the channel weights are multiplied pixel by pixel to obtain the output of the channel attention module. The pixel attention module feeds the output of the channel attention module into two convolutional layers with ReLU and sigmoid activation functions to output pixel weights. Then, the output of the channel attention module and the pixel weights are multiplied pixel by pixel to obtain the output of the feature attention module.
[0012] Preferably, there are two image dehazing models, f1(x) and f2(x); the step of model fusion using the alternating direction multiplier optimization algorithm includes:
[0013] Construct the objective function:
[0014]
[0015] Where λ is a Lagrange multiplier, L(y i ,f1(x i )) and L(y i ,f2(x i )) is the loss function, x represents the input data, y represents the true label, n represents the number of input data, and f(x) represents the fused model;
[0016] Gradient descent is used to update f1, f2, and f respectively. The update of f is obtained by solving the following system of linear equations, where k is the number of iterations:
[0017]
[0018]
[0019] f(x i ) k+1 =f(x) i ) k +λ(f1(x i ) k+1 -f2(x i ) k+1 -f(x i ))
[0020] The process continues to be updated alternately until convergence. In each iteration step, f1, f2, f and the Lagrange multiplier need to be updated.
[0021] Preferably, the dehazing image generation module reconstructs the dehazing image according to the simplified atmospheric scattering formula, which is:
[0022] J(x)=K(x)I(x)-K(x)+b
[0023]
[0024] Where b is a constant, t(x) represents the transmittance map, A represents the atmospheric light value, I(x) represents the foggy image, and J(x) represents the defogging image.
[0025] Preferably, the loss function of the MDSCA-NET network is:
[0026] L MIX =αL S +(1-α)L1
[0027] in,
[0028] L1=|J i -f(x i )|
[0029] LS =1-SSIM(J i ,f(x i ))
[0030] J i This represents a standard fog-free image, x i To represent a foggy image, f(x) i ) represents the image after a foggy image has been dehazed using the MDSCA-NET network, SSIM(J i -f(x i )) represents f(x) i ) and the corresponding standard haze-free image J i The SSIM values between L1 and L2 represent the L1 loss function. S Let represent the image structure similarity loss function, and α and (1-α) represent the proportions of the image structure similarity loss function and the L1 loss function in the total loss function.
[0031] Preferably, the convolutional structure for feature extraction of foggy images includes a 1×1 convolutional layer and three 3×3 convolutional layers, and finally a 3×3 convolutional layer is used to cascade the above four layers to combine them in the channel dimension.
[0032] Beneficial effects: 1. Using FAM as the basic building block of the network can not only extract image features at different detail scales by using convolutions of different sizes, but also focus the network's attention on heavy hazy pixels and more important channel information through channel attention mechanism and pixel attention mechanism.
[0033] 2. The network structure design using ADMMU-Net allows for the cascading of multiple FAM blocks, which can extract image feature representations more fully while also improving the model's generalization ability.
[0034] 3. Two types of convolutional blocks, 1×1 and 3×3, were used. Two consecutive 3×3 convolutional kernels were used instead of 5×5 convolutional kernels, and 3×3 convolutional layers were cascaded to extract image features at different levels of detail.
[0035] 4. A skip connection network design is introduced into the network structure. Skip connections allow the current FAM to make full use of the output features of the previous FAM and cascade them together, improving the flow of feature information at all levels and enhancing the feature representation capability of the network.
[0036] 5. The alternating direction multiplier optimization algorithm is used to perform network fusion, which improves the generalization ability of the final overall network and enhances its dehazing effect in real-world fog maps. Attached Figure Description
[0037] Figure 1 This is a diagram of the framework structure of the method (Alternating Direction Method of Multipliers U-Net, ADMMU-Net).
[0038] Figure 2 This is a schematic diagram of the Feature Attention Module (FAM) in this method.
[0039] Figure 3 This is a schematic diagram of the channel attention module and pixel attention module in this method.
[0040] Figure 4 The results show a quantitative comparison of the mean PSNR / SSIM of this method with other methods on the RESIDE test set.
[0041] Figure 5 The results show a qualitative comparison of this method with other methods on a single image in the RESIDE test set.
[0042] Figure 6 The results show a qualitative comparison of this method with other methods on a single image in the RESIDE test set.
[0043] Figure 7 This section compares the runtime of this method with other methods on images of different sizes in the RESIDE test set. Detailed Implementation
[0044] The method described in this invention is a lightweight multi-scale image dehazing method based on an attention mechanism. The method first uses a convolutional structure to extract features from the hazy image; then the obtained feature map is input into the MDSCA-NET network to reconstruct the hazy-free image.
[0045] like Figure 1 The MDSCA-NET network shown includes a feature attention module, a model fusion module, and a fog-free map generation module.
[0046] Feature Attention Module: The feature attention module mainly consists of two attention modules: channel attention module and pixel attention module.
[0047] The channel attention module primarily focuses on the completely different weighted information of features from different channels. In the channel attention module, feature extraction and compression are first performed on the image using max pooling, as shown in the formula:
[0048]
[0049] Where g c F represents the feature map of the c-th channel of the output.c X represents the feature map of the c-th channel of the input. c (i,j) represents the value at pixel position (i,j) of the c-th channel, H p This is the max pooling function, where H and W represent the height and width of the foggy image, respectively. Here, a convolutional kernel of the same size as the image is used for max pooling, changing the shape of the feature map from C×H×W to C×1×1. This allows the network to focus only on the channel information of the feature map.
[0050] To further obtain the weight information of different channels, the features are processed through two convolutional layers and sigmoid and ReLU activation function layers, as shown in the following equation:
[0051] CA c =σ(Conv(δ(Conv(g) c ))))
[0052] Where Conv is the convolutional layer function, σ is the sigmoid function, δ is the ReLU function, and CA c The c-channel weights are for the output.
[0053] Finally, the original input feature map F c And channel weight CA c The output of the channel attention module is obtained by performing pixel-by-pixel multiplication. As shown in the formula:
[0054]
[0055] Considering that the distribution of haze is uneven across different image pixels, a pixel attention module is proposed to enable the network to focus more on informative features, such as thickly hazy pixels and high-frequency image regions.
[0056] Similar to the channel attention module, in order to further obtain the weight information of each pixel, the input is directly... (The output of the channel attention module) is fed into two convolutional layers with ReLU and sigmoid activation functions, as shown in the following equation:
[0057]
[0058] Where Conv is the convolutional layer function, σ is the sigmoid function, δ is the ReLU function, and PA is the output pixel weight. Assuming the feature map output by the channel attention submodule has a size of C×H×W, the feature map after the pixel attention submodule becomes 1×H×W.
[0059] Finally, using the input The output of the attention module is obtained by multiplying the PA pixel by pixel.
[0060]
[0061] This feature map The size is restored to the input size C×H×W. At the same time, the feature map realizes the differential treatment of pixel features in the fog map, instead of treating pixel features in the fog map equally.
[0062] Model Fusion Module: Considering the generalization performance of dehazing networks in real hazy image processing and the overfitting phenomenon during neural network training, MDSCA-NET utilizes the Alternating Directional Multiplier Optimization (ADMM) algorithm for network fusion optimization to obtain more accurate and robust image dehazing results. Specifically, assuming there are two models f1(x) and f2(x), they need to be fused to ensure that the fused model f(x) has better generalization performance. Here, x represents the input data, f(x) represents the output result, and the loss function is L(y,f(x)), where y represents the true label.
[0063] The process of model fusion using the alternating direction multiplier optimization algorithm can be summarized in the following steps:
[0064] 1. Transform the original problem into ADMM form: Decompose the original loss function into two parts, each fitted by a separate model, and introduce a Lagrange multiplier to construct the ADMM form. That is:
[0065]
[0066] st f1(x)=f2(x)=f(x)
[0067]
[0068] Here, λ is a Lagrange multiplier that controls the smoothness of the fusion.
[0069] 2. Optimization using ADMM: In the ADMM form of this problem, f1, f2, and f need to be updated separately. Specifically, for each f... i First fix f and another f j The algorithm updates f using gradient descent and other methods; then it updates f using the obtained f1 and f2. Here, the updates of f1, f2, and f can be obtained by solving a system of linear equations, where k is the number of iterations:
[0070]
[0071]
[0072] f(x i) k+1 =f(x) i ) k +λ(f1(x i ) k+1 -f2(x i ) k+1 -f(x i ))
[0073] 3. Continuously update alternately until convergence: Repeat step 2 until convergence. In each iteration step, f1, f2, f and the Lagrange multipliers need to be updated.
[0074] The final result f(x) is the model fusion result. By introducing Lagrange multipliers, the ADMM algorithm can effectively balance the contributions of the two models and play a role in model fusion.
[0075] Haze-free image generation module: In the existing atmospheric scattering models, the relationship between clear and hazy images can be expressed as:
[0076] I(x)=J(x)t(x)+A(1-t(x))
[0077] Where t(x) represents the transmittance map, A represents the atmospheric light value, I(x) represents the foggy image, and J(x) represents the clear, fog-free image that is ultimately required.
[0078] As an image dehazing neural network based on an atmospheric scattering model, and referencing previous understandings of atmospheric scattering models in other networks, to achieve a lightweight dehazing model, the number of layers and parameters needed should be minimized. Therefore, the dehazing network algorithm MDSCA-NET is primarily based on the simplified formula for atmospheric scattering after transformation.
[0079] J(x)=K(x)I(x)-K(x)+b
[0080] Where b is a constant, the formula for solving the unknown K(x) is as follows:
[0081]
[0082] The two unknowns in the atmospheric scattering model, the transmittance map t(x) and the atmospheric light value A, are transformed and merged into a single unknown K(x) through a mathematical formula.
[0083] In the fog-free image generation module, MDSCA-NET uses the K(x) value obtained by the network to restore the final clear image after defogging based on the simplified atmospheric scattering formula.
[0084] Loss Function: MDSCA-NET employs a composite loss function. Most of the algorithms compared in the experiments were trained using the L1 loss function, which can be expressed as:
[0085] L1=|J i -f(x i )|
[0086] J i This represents a clear, fog-free image, x i This represents the foggy image corresponding to the sharp image, f(x) i The image represents the image after the foggy image has been dehazed using the network of this invention.
[0087] The purpose of the image structure similarity loss function is to make the visual effect of the image more consistent with the subjective visual perception of the human eye. Structural similarity (SSIM) more directly reflects the degree of structural similarity between the generated image and the standard clear image than peak signal-to-noise ratio (PSNR). Therefore, this loss function L... S It can be represented as:
[0088] L S =1-SSIM(J i ,f(x i ))
[0089] Among them, SSIM(J i -f(x i )) represents the synthesized fog map x i The image f(x) after dehazing i ) and the corresponding standard haze-free image J i The SSIM values between.
[0090] The general formula for the composite loss function is shown in the following equation:
[0091] L MIX =αL S +(1-α)L1
[0092] The proposed composite loss function L MIX Compared to a single L1 loss function, it can more effectively correct the differences in contrast, brightness, and texture between the generated image and the sharp image.
[0093] This invention compares the widely used dataset RESIDE, which contains 17,495 synthetic training images with fog and 500 test images. The proposed MDSCA-NET model is trained using the 17,495 pairs of training images from the outdoor fog training set OTS in RESIDE. Data augmentation is performed on these training images, including random horizontal flipping and random 90° rotation.
[0094] This invention uses Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity (SSIM) as model evaluation metrics. Higher PSNR and SSIM values indicate that the dehazed images processed by the model are of better quality and closer to real, haze-free images.
[0095] The images input for training are sequentially drawn from the RESIDE training set OTS images, with a minimum batch size of 8 (i.e., 8 images per training iteration). The ADAM optimizer is used to train the model (with parameters β1 = 0.9, β2 = 0.99, and β1 = 10). -8 The initial learning rate was set to 10. -4 This invention uses the PyTorch framework to train and test a network model on an NVIDIA RTX 3070 GPU. In the MDSCA-NET network, all convolutional layers have 64 features, and both 1×1 and 3×3 convolutional blocks are used. Specifically, two consecutive 3×3 convolutional kernels replace a 5×5 convolutional kernel. From the perspective of convolution operation principles, two consecutive 3×3 convolutional layers and a single 5×5 convolutional layer have the same receptive field. Furthermore, a 5×5 convolutional kernel contains 25 parameters, while two 3×3 convolutional kernels only have 18 parameters. This substitution method not only significantly reduces the number of network parameters, lowering training and debugging time, but also increases the network's capacity and improves performance. 3×3 convolutional layers are then cascaded.
[0096] Figure 4 This paper summarizes the quantitative results of the invented MDSCA-NET network and other classic dehazing networks used for comparison experiments on 500 test images of the RESIDE test set SOTS. As shown in the table, the invented MDSCA-NET achieved the highest Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity (SSIM) on the test set, reaching 0.9336 and 24.3173 for SSIM and PSNR, respectively. Compared with other algorithms, MDSCA-NET performs relatively better in SSIM and PSNR. Compared with classic lightweight dehazing network models DehazeNet, MSCNN, and AOD-Net, MDSCA-NET shows improvements in both SSIM and PSNR, ranging from 6.3% to 8.8%. This indicates that the MDSCA-NET algorithm has certain advantages in dehazing performance, demonstrating the effectiveness of this attention-based lightweight multi-scale image dehazing algorithm, MDSCA-NET. Figures 5-7 The present invention achieves significant results in terms of both dehazing effect and dehazing processing time efficiency compared with other classic image dehazing models.
Claims
1. A lightweight multi-scale image dehazing method based on an attention mechanism, characterized in that, Includes the following steps: Feature extraction for foggy images; The obtained feature map is input into the MDSCA-NET network to reconstruct the dehazed image; the MDSCA-NET network includes several image dehazing models and model fusion modules; the image dehazing model includes a feature attention module and a hazy image generation module; the feature attention module is used to extract pixel features from the feature map; The dehazing image generation module is used to generate an initial dehazing image from pixel feature maps; the model fusion module is used to fuse several image dehazing models using an alternating direction multiplier optimization algorithm to output the final dehazing image. There are two image dehazing models, namely... and ; The steps for model fusion using the alternating direction multiplier optimization algorithm include: Construct the objective function: ; in, It is a Lagrange multiplier. and It is a loss function. Indicates input data, The labels represent the actual data, and n represents the number of input data. This represents the fused model; Using gradient descent , , Update them separately. The update is obtained by solving the following system of linear equations, where k is the number of iterations: ; ; ; The process involves alternating updates until convergence. In each iteration, an update is required. , , And Lagrange multipliers.
2. The lightweight multi-scale image dehazing method based on an attention mechanism according to claim 1, characterized in that, The feature attention module includes a channel attention module and a pixel attention module. The channel attention module first extracts and compresses features from the image using max pooling, then passes the features through two convolutional layers and sigmoid and ReLU activation function layers to obtain the weights of each channel. Then, the original input feature map and the channel weights are multiplied pixel by pixel to obtain the output of the channel attention module. The pixel attention module feeds the output of the channel attention module into two convolutional layers with ReLU and sigmoid activation functions to output pixel weights. Then, the output of the channel attention module and the pixel weights are multiplied pixel by pixel to obtain the output of the feature attention module.
3. The lightweight multi-scale image dehazing method based on an attention mechanism according to claim 1, characterized in that, The haze-free image generation module reconstructs the haze-free image based on a simplified atmospheric scattering formula, which is: ; ; in, It is a constant. This represents a transmittance diagram. Indicates atmospheric light value, This indicates an image with fog. This represents a dehazed image.
4. The lightweight multi-scale image dehazing method based on an attention mechanism according to claim 1, characterized in that, The loss function of the MDSCA-NET network is: ; in, ; ; in This represents a standard fog-free image. This indicates an image with fog. This represents the image after a foggy image has been dehazed using the MDSCA-NET network. express Corresponding standard haze-free image SSIM values between Describes the L1 loss function. This represents the image structure similarity loss function. and These represent the proportions of the image structure similarity loss function and the L1 loss function in the total loss function, respectively.
5. A lightweight multi-scale image dehazing method based on an attention mechanism according to claim 1, characterized in that, The convolutional structure for feature extraction of hazy images consists of a 1×1 convolutional layer and three 3×3 convolutional layers. Finally, a 3×3 convolutional layer is used to cascade the above four layers to combine them in the channel dimension.