A multi-color space multi-scale collaborative image enhancement method based on guided filtering
This underwater image enhancement method, which combines guided filtering decomposition and multi-scale collaborative Transformer, solves the dual problems of underwater image blurring and color cast, achieving efficient and robust image enhancement results, and is suitable for visual tasks in complex underwater environments.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- HENAN UNIVERSITY
- Filing Date
- 2026-03-23
- Publication Date
- 2026-06-23
AI Technical Summary
Existing underwater image enhancement methods struggle to collaboratively repair blur caused by scattering and color shift caused by selective absorption in real-world complex scenes. Furthermore, they are limited by the domain offset between synthetic data and real degradation characteristics, resulting in low-quality enhancement results.
A multi-color space, multi-scale collaborative image enhancement method based on guided filtering is adopted. By decomposing the underwater degraded image into a low-frequency base layer and a high-frequency detail layer, and utilizing a dual-path attention fusion mechanism and an adaptive feature weighted fusion mechanism, combined with guided filtering and the Transformer model, multi-scale collaborative restoration of underwater images is achieved.
It effectively enhances underwater images of different types and qualities, improves image quality, adapts to underwater scenes with varying degrees of degradation, reduces computational overhead, is compatible with underwater equipment with limited resources or requiring efficient processing, and provides high-quality image input to support tasks such as target detection and biometrics.
Smart Images

Figure CN122265065A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of image enhancement technology, and specifically to a multi-color space, multi-scale collaborative image enhancement method based on guided filtering. Background Technology
[0002] Underwater imagery serves as a core sensing medium for marine exploration, resource development, and ecological monitoring, and is widely used in visual tasks such as target recognition and scene understanding. However, due to water scattering and selective absorption, underwater images generally suffer from severe blurring (caused by scattering from suspended particles) and color distortion (caused by preferential attenuation of the blue and green bands), which significantly restricts the performance of downstream tasks.
[0003] Existing underwater image enhancement methods are mainly divided into two categories: (1) Traditional methods: These include physical models (such as dark channel priors and Jaffe-McGlamery models) and non-physical models (such as histogram equalization and Retinex transform). The former relies on imaging priors and can achieve physically interpretable restoration under ideal conditions, but it is sensitive to environmental parameters, has weak generalization ability, and high computational complexity, making it difficult to meet real-time processing requirements. The latter is computationally efficient, but it lacks modeling of degradation mechanisms, which can easily exacerbate blurring while repairing color shift, or introduce artifacts while suppressing blurring, making it difficult to coordinate the optimization of multi-dimensional degradation factors.
[0004] (2) Deep learning methods: represented by Convolutional Neural Networks (CNN) and Generative Adversarial Networks (GAN). CNN can effectively model local degradation patterns through end-to-end learning and performs well on specific datasets, but its performance is highly dependent on the distribution of training data. It is prone to overfitting when faced with varying lighting, turbidity and camera parameters in real scenes. Although GAN can generate visually realistic enhancement results, its training stability is poor and it is limited by the scarcity of high-quality real underwater image samples. Existing methods mostly rely on synthetic data (such as those generated through physical model simulation), which leads to a systematic deviation between the degradation priors learned by the model and the real degradation characteristics, thus causing artifact problems such as color distortion, texture duplication or structural distortion.
[0005] In summary, existing technologies have not yet effectively addressed the dual challenges of "multi-degradation coupling" and "real-to-synthetic domain offset," and in particular, they lack a robust enhancement mechanism that can both adaptively model local degradation differences and fuse multi-scale features to collaboratively repair blur and color shift, which restricts the reliable deployment of underwater vision systems in complex marine environments. Summary of the Invention
[0006] To address the technical problems of existing underwater image enhancement methods, such as their inability to collaboratively repair blur caused by scattering and color shift caused by selective absorption in real-world complex scenes, and their low enhancement quality due to the domain offset between synthetic data and real degradation characteristics, this invention aims to provide a multi-color space, multi-scale collaborative image enhancement method based on guided filtering. The specific technical solution adopted is as follows: One embodiment of the present invention provides a multi-color space, multi-scale collaborative image enhancement method based on guided filtering, the method comprising the following steps: Acquire underwater degraded images to be enhanced; The underwater degraded image is decomposed into a low-frequency base layer and a high-frequency detail layer; the low-frequency base layer is used to characterize at least the main structural information of the image, and the high-frequency detail layer is used to characterize at least the edge and texture information. The low-frequency base layer, the high-frequency detail layer, and the underwater degradation image are input into the trained underwater image enhancement model, and the enhanced underwater image is output. The underwater image enhancement model includes a dual-path attention fusion mechanism and an adaptive feature weighted fusion mechanism.
[0007] Furthermore, the step of decomposing the underwater degraded image into a low-frequency fundamental layer and a high-frequency detail layer in the Lab color space using guided filtering includes: Convert the underwater degraded image to the Lab color space; The L channel in the Lab space is decomposed using guided filtering to obtain a low-frequency foundation layer and a high-frequency detail layer.
[0008] Furthermore, before inputting the low-frequency base layer, the high-frequency detail layer, and the underwater degradation image into the trained underwater image enhancement model, the following steps are also included: For the low-frequency base layer, contrast-limited adaptive histogram equalization is performed as a preprocessing step. For high-frequency detail layers, nonlocal mean denoising is used for preprocessing. The preprocessed low-frequency base layer, high-frequency detail layer and underwater degradation image are used as module feature inputs. The three are mapped to a unified feature dimension through an embedding layer to obtain three sets of initial feature maps. The residual fusion module is used to process the initial feature maps corresponding to the low-frequency base layer and the high-frequency detail layer in three paths to obtain the first feature maps corresponding to the low-frequency base layer and the high-frequency detail layer.
[0009] Furthermore, the dual-path attention fusion mechanism is used to generate low-frequency-degenerate paths and high-frequency-degenerate paths; In the low-frequency degradation path, after global average pooling is performed on the first feature map of the low-frequency base layer and the underwater degradation image, attention weights are generated sequentially through 1×1 convolution, LeakyReLU activation, 3×3 convolution, and ReLU activation. The second feature map is then generated according to the following formula: In the formula, This represents the second feature map in the low-frequency degradation map path. Represents the first feature map. This represents the attention weights of the first feature map. This represents the attention weights for underwater degraded images. This represents the feature map after the original underwater degraded image has undergone 3×3 convolution preprocessing. In the high-frequency degradation map path, the same structure as the low-frequency path is adopted. The blurred regions in the underwater degradation image are located by attention weights, and the high-frequency details of the high-frequency detail layer are guided to supplement the texture information of the blurred regions.
[0010] Furthermore, the adaptive feature weighted fusion mechanism is used to dynamically allocate fusion weights according to the degree of degradation in different regions of the underwater degraded image, including: for the near-field cleaned area, increasing the feature weights of the underwater degraded image to retain real details; for the far-field blurred area, increasing the feature weights of the low-frequency base layer and the high-frequency detail layer to achieve restoration. The adaptive feature weighted fusion mechanism is implemented through the following formula: ; ; ; In the formula, This represents a context-aware feature map extracted from concatenated features via a convolution-normalization-activation link. This indicates a modified linear unit activation function. Indicates batch normalization, This represents a 3×3 convolution. This indicates a channel-level concatenation operation. This represents the second feature fusion map output by the dual-path attention fusion mechanism. This represents the feature map after the original underwater degraded image has undergone 3×3 convolution preprocessing. Indicates based on The separately extracted adaptive weighted generation graph, Represents the Sigmoid function; This represents the third feature map output by the adaptive feature weighting fusion mechanism. This represents the learnable, globally adaptive weight coefficients.
[0011] Furthermore, after obtaining the third feature map output by the adaptive feature weighting fusion mechanism, the following is also included: The third feature map is processed through multiple GFD-FCT modules to obtain an enhanced underwater image.
[0012] Furthermore, the underwater image enhancement model uses the Adam optimizer to optimize network training. The mathematical model of the Adam algorithm is expressed as follows: ; In the formula, This represents the first moment estimate of the gradient in the t-th iteration. This represents the first-order moment decay rate used to control the moving average. This represents the first moment estimate of the gradient under the (t-1)th iteration. Let represent the gradient of the loss function with respect to the parameter θ at the t-th iteration; This represents the second-order moment estimate of the squared gradient in the t-th iteration. This represents the second-order moment decay rate used to control the moving average. This represents the second moment estimate of the squared gradient under the (t-1)th iteration; Represents the estimation of the first moment The estimated value after bias correction. This represents the first-order decay rate raised to the power of t, used to eliminate estimation bias in the initial stage. Represents the estimation of the second moment Deviation correction, t represents the second-order decay rate raised to the power of t; This represents the model parameters in the (t+1)th iteration. This represents the model parameters in the t-th iteration. Indicates the learning rate. This represents a constant used to prevent division by zero.
[0013] The present invention has the following beneficial effects: This invention provides a multi-color space, multi-scale collaborative image enhancement method based on guided filtering. While comprehensively considering the two core problems commonly found in underwater images—color distortion and detail blurring—it also solves the problem of traditional methods struggling to balance color and detail. In particular, the underwater image enhancement model, which combines guided filtering decomposition with a multi-scale collaborative Transformer, exhibits strong generalization capabilities and can effectively enhance underwater images of different image qualities. Specifically, firstly, it addresses the core issues of color cast and blur in underwater images. Its ability to coordinate color and detail optimization provides high-quality image input for downstream visual tasks such as underwater target detection and biometrics, offering practical prospects for visual exploration in complex underwater scenes. Secondly, relying on the efficient hierarchical characteristics of guided filtering and the precise feature fusion capabilities of Transformer, it achieves both color cast correction and blur restoration while avoiding the high computational overhead of complex models. Compared to pure deep learning models, it is more adaptable to scenarios with limited resources or requiring efficient processing, such as real-time inspection of underwater robots and monitoring of portable underwater equipment, lowering the engineering threshold for underwater vision technology. Thirdly, it combines the advantages of strong physical interpretability of traditional guided filtering with the advantages of multi-feature adaptive fusion of Transformer. This not only compensates for the shortcomings of traditional methods, such as poor generalization and difficulty in addressing both color cast and blur, but also avoids the dependence of single deep learning models on massive real datasets, enhancing the robustness of underwater image enhancement. It can adapt to underwater scenes with different degrees of degradation and effectively improve image quality in various scenarios. Attached Figure Description
[0014] To more clearly illustrate the technical solutions and advantages in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0015] Figure 1 This is a flowchart of a multi-color space, multi-scale collaborative image enhancement method based on guided filtering according to the present invention; Figure 2 This is a structural block diagram of the GFD-MCT model in an embodiment of the present invention; Figure 3 This is a model architecture diagram of GFD-FCT, DAF, and AFWF in an embodiment of the present invention. Detailed Implementation
[0016] To further illustrate the technical means and effects adopted by the present invention to achieve its intended purpose, the specific implementation methods, structures, features, and effects of the technical solution proposed according to the present invention are described in detail below with reference to the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" do not necessarily refer to the same embodiment. Furthermore, specific features, structures, or characteristics in one or more embodiments can be combined in any suitable form.
[0017] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains.
[0018] One embodiment of the present invention provides a multi-color space, multi-scale collaborative image enhancement method based on guided filtering, such as... Figure 1 As shown, it includes the following steps: S1, acquire the underwater degradation image to be enhanced.
[0019] In this embodiment, the underwater degraded image to be enhanced refers to the raw image data acquired by underwater imaging equipment (including but not limited to underwater cameras and optical systems mounted on ROVs / AUVs) in a real ocean or simulated pool environment.
[0020] For underwater degraded images to be enhanced, due to water scattering and selective absorption, they exhibit varying degrees of blurring, low contrast, and blue-green shift, among other degradation characteristics. These images can be single-frame static images or keyframes in video sequences, and their formats include, but are not limited to, RGB three-channel BMP, JPEG, PNG, or RAW formats. Before being input into the enhancement model, preprocessing operations can be performed, such as depigmentation (applicable to Bayer array sensors), coarse white balance correction, or size normalization, without altering their original degradation characteristics.
[0021] It is worth noting that the underwater degraded images involved in this embodiment specifically refer to the original acquired images that have not undergone any depth enhancement processing, in order to ensure the objectivity and reproducibility of the subsequent enhancement process.
[0022] S2 decomposes the underwater degraded image into a low-frequency base layer and a high-frequency detail layer.
[0023] As an exemplary implementation, underwater degraded images are decomposed into a low-frequency base layer and a high-frequency detail layer, including: The first step is to convert the underwater degraded image to the Lab color space.
[0024] First, acquire the raw underwater degradation image, which is usually in RGB three-channel format; Secondly, the standard CIE XYZ color space is used as an intermediate transition, and the color space conversion is completed through the following two-stage transformation: First, the RGB values are normalized to the 0-1 range, and then an inverse gamma correction transform is performed according to the sRGB standard to obtain linear RGB values. Next, the XYZ color matching function matrix under the D65 standard light source is used to convert the linear RGB values into CIE XYZ tristimulus values. Finally, according to the CIELAB color space definition, the XYZ values are converted into three components: L* (luminance), a* (red-green axis), and b* (yellow-blue axis). The L* channel represents the light and dark structure information of the image, while the a* and b* channels jointly represent the chromaticity information.
[0025] The embodiments of this application pay particular attention to the L* channel, as it is most sensitive to blurring and contrast degradation in underwater images, while the a* and b* channels retain the original color cast characteristics for subsequent color consistency constraints.
[0026] It should be noted that the above-mentioned transformation of underwater degraded images effectively ensures the decoupling of brightness and chromaticity information, providing a physically meaningful processing basis for subsequent low-frequency to high-frequency decomposition and multi-scale enhancement based on guided filtering.
[0027] The second step involves using guided filtering to decompose the L channel in the Lab space, resulting in a low-frequency foundation layer and a high-frequency detail layer.
[0028] In this embodiment, the original L channel is used as the guide image. The filter window radius r and the regularization parameter ϵ are set. The smooth low-frequency component is estimated by the local linear model. This low-frequency component constitutes the low-frequency base layer, which is used to characterize the overall illumination distribution and main structure of the image. The original L channel is subtracted from the low-frequency base layer to obtain the high-frequency detail layer, which is used to carry edge contours, texture details and local contrast information. Finally, the low-frequency base layer and the high-frequency detail layer are combined with the a and b chromaticity channels in the Lab space to form a complete two-layer representation for subsequent enhancement module processing.
[0029] Furthermore, to improve the robustness and enhancement effect of subsequent feature extraction, this invention performs targeted preprocessing on the low-frequency base layer and high-frequency detail layer obtained from the decomposition: For the low-frequency base layer, contrast-limited adaptive histogram equalization (CLAHE) is performed to enhance the sense of hierarchy in its overall illumination distribution, while suppressing overly smoothed areas caused by low underwater light; the clipping limit parameter of CLAHE is set to 2.0, and the grid block size is set to 8×8 pixels. For the high-frequency detail layer, non-local means denoising is used to suppress residual noise introduced by the guided filter and preserve the real edge and texture structure; the denoising intensity parameter is set to 10, the search window size is 21×21 pixels, and the similarity window size is 7×7 pixels.
[0030] It should be noted that, while preserving the physical semantics of each layer, the preprocessing operation optimizes the global visibility of the low-frequency layer and the local fidelity of the high-frequency layer, providing high-quality input features for the subsequent dual-path attention fusion (DAF) and adaptive feature weighted fusion (AFWF) modules.
[0031] S3 takes the low-frequency base layer, the high-frequency detail layer, and the underwater degradation image as inputs into the trained underwater image enhancement model and outputs the enhanced underwater image.
[0032] Here, the underwater image enhancement model is essentially a multi-scale cooperative Transformer network (GFD-MCT) based on guided filtering, such as... Figure 2 As shown.
[0033] As an exemplary implementation, the steps for acquiring enhanced underwater images may include: The first step involves using the preprocessed low-frequency base layer, high-frequency detail layer, and underwater degradation image as module feature inputs. The three are then mapped to a unified feature dimension through an embedding layer, resulting in three sets of initial feature maps.
[0034] In this embodiment, the preprocessed low-frequency base layer, high-frequency detail layer, and the underwater degradation image converted to the Lab color space are used together as multi-branch inputs. To achieve cross-modal feature alignment, a lightweight embedding layer is defined, which consists of three parallel 1×1 convolutional kernels, acting on the three inputs respectively. For the low-frequency base layer and the high-frequency detail layer (single-channel brightness map), a 1×1 convolution with C output channels is used to upgrade them to a C-dimensional feature space. For the original underwater degraded image (three-channel Lab image), first separate the L, a, and b components along the channel dimension, and only align the L channel with the low-frequency base layer and the high-frequency detail layer, or perform dimensionality reduction or dimensionality increase as a whole through a 1×1 convolution with C output channels, so that its feature dimension is consistent with the former two. Where C is a preset unified feature dimension, such as This is used to balance the expressive power of the model with its computational cost.
[0035] After processing by the embedding layer, three sets of initial feature maps with consistent dimensions are output, which respectively preserve the main structure, local texture and global degradation context information, providing semantically aligned multi-scale input for the subsequent dual-path attention fusion (DAF) mechanism.
[0036] The second step involves using the residual fusion module to process the initial feature maps corresponding to the low-frequency base layer and the high-frequency detail layer in three paths to obtain the first feature maps corresponding to the low-frequency base layer and the high-frequency detail layer.
[0037] The model architecture diagrams for a single GFD-FCT (Guided Filter-Driven Feature Co-operation Module), DAF (Dual Path Attention Fusion Mechanism), and AFWF (Adaptive Feature Weighted Fusion Mechanism) are shown below. Figure 3 As shown.
[0038] As an exemplary implementation, the expression for the first feature map corresponding to the low-frequency base layer can be: ; ; ; ; In the formula, This represents the first feature map corresponding to the low-frequency base layer obtained after processing by the residual fusion module. express Convolution operation, This indicates a channel-level concatenation operation; This indicates a modified linear unit activation function. This indicates a guided filter with a filter window radius of 3. This represents the initial feature map corresponding to the low-frequency base layer. This represents the feature map of the first channel dimension corresponding to the low-frequency base layer. This indicates a guided filter with a filter window radius of 3. This represents the feature map of the second channel dimension corresponding to the low-frequency base layer. This indicates a guided filter with a filter window radius of 21. This represents the feature map of the third channel dimension corresponding to the low-frequency base layer.
[0039] As an exemplary implementation, the expression for the first feature map corresponding to the high-frequency detail layer can be: In the formula, This represents the first feature map corresponding to the high-frequency detail layer obtained after processing by the residual fusion module. express Convolution operation, This indicates a modified linear unit activation function. express Convolution operation, This represents the initial feature map corresponding to the high-frequency detail layer.
[0040] The third step involves using a dual-path attention fusion mechanism to construct two attention paths: a low-frequency-degradation map and a high-frequency-degradation map. This process determines the first feature map corresponding to the low-frequency base layer and the high-frequency detail layer, as well as the second feature fusion map for the underwater degradation image fusion.
[0041] In the low-frequency-degradation map path, attention weights are used to associate the main structure with local textures, ensuring spatial alignment between texture details and structural contours, thus obtaining a newly generated second feature map. .
[0042] In this embodiment, after performing global average pooling on the first feature map of the low-frequency base layer and the underwater degradation image, attention weights are generated sequentially through 1×1 convolution, LeakyReLU activation, 3×3 convolution, and ReLU activation. The second feature map is then generated according to the following formula: ; ; ; In the formula, This represents the attention weights of the first feature map. express Convolution operation, This indicates a modified linear unit activation function. This represents a linear rectified function with leakage. express Convolution operation, Indicates global average pooling. This represents the first feature map corresponding to the low-frequency base layer obtained after processing by the residual fusion module; This represents the attention weights for underwater degraded images. This represents the feature map after the original underwater degraded image has undergone 3×3 convolution preprocessing. This represents the second feature map in the low-frequency degradation map path.
[0043] In the high-frequency degradation map path, the same structure as the low-frequency path is adopted. The blurred regions in the underwater degradation image are located by attention weights, and the high-frequency details of the high-frequency detail layer are guided to supplement the texture information of the blurred regions.
[0044] In this embodiment, attention weights are used to locate blurred regions in underwater degraded images, and high-frequency details are guided to supplement texture information to the blurred regions. The feature extraction method in this path is the same as that in the low-frequency-degraded image path, and the second feature map in the high-frequency-degraded image path can be obtained.
[0045] After obtaining the second feature maps from the low-frequency-degradation map path and the high-frequency-degradation map path, an addition operation is used to fuse the two second feature maps. Then, a 3×3 convolution is applied to the initially fused feature map for image processing. The processed image is used as the second feature fusion map. The addition operation involves element-wise addition of two feature maps with the same dimensions, that is, directly summing the pixel values at the same spatial location and channel index to generate the fused feature map, which serves as the second feature fusion map.
[0046] The fourth step involves using the second feature fusion map output by the dual-path attention fusion mechanism, and then using an adaptive feature weighting fusion mechanism to dynamically allocate feature fusion weights according to the degree of degradation in different regions of the underwater degraded image, to obtain the third feature map.
[0047] When determining the fusion weights for different regions, for the near-field cleaned areas, the feature weights of the underwater degraded image are increased to preserve real details; for the far-field blurred areas, the feature weights of the low-frequency base layer and the high-frequency detail layer are increased to achieve restoration.
[0048] As an example, the adaptive feature weighted fusion mechanism is implemented using the following formula: ; ; ; In the formula, This represents a context-aware feature map extracted from concatenated features via a convolution-normalization-activation link. This indicates a modified linear unit activation function. Indicates batch normalization, This represents a 3×3 convolution. This indicates a channel-level concatenation operation. This represents the second feature fusion map output by the dual-path attention fusion mechanism. This represents the feature map after the original underwater degraded image has undergone 3×3 convolution preprocessing. Indicates based on The separately extracted adaptive weighted generation graph, Represents the Sigmoid function; This represents the third feature map output by the adaptive feature weighting fusion mechanism. This represents the learnable, globally adaptive weight coefficients.
[0049] In the fifth step, the third feature map is processed through multiple GFD-FCT modules to obtain the enhanced underwater image.
[0050] In this embodiment, the number of GFD-FCT modules can be set to 9. A convolution operation is performed on the third feature map after passing through multiple GFD-FCT modules. The image after the final convolution operation is used as the enhanced underwater image, such as... Figure 2 As shown.
[0051] For the underwater image enhancement model, the loss function employs mean squared error and perceptual loss. This loss function avoids over-amplifying the influence of outliers, making the model more robust and effective. Simultaneously, the network is trained using a terrestrial white balance dataset, and the Adam optimizer is used to accelerate network training. The detailed training process of the underwater image enhancement model is existing technology and is not within the scope of this invention; therefore, it will not be elaborated here.
[0052] The mathematical model of the Adam algorithm can be expressed as: ; In the formula, This represents the first moment estimate of the gradient in the t-th iteration. This represents the first-order moment decay rate used to control the moving average. This represents the first moment estimate of the gradient under the (t-1)th iteration. Let represent the gradient of the loss function with respect to the parameter θ at the t-th iteration; This represents the second-order moment estimate of the squared gradient in the t-th iteration. This represents the second-order moment decay rate used to control the moving average. This represents the second moment estimate of the squared gradient under the (t-1)th iteration; Represents the estimation of the first moment The estimated value after bias correction. This represents the first-order decay rate raised to the power of t, used to eliminate estimation bias in the initial stage. Represents the estimation of the second moment Deviation correction, t represents the second-order decay rate raised to the power of t; This represents the model parameters in the (t+1)th iteration. This represents the model parameters in the t-th iteration. Indicates the learning rate. This represents a constant used to prevent division by zero.
[0053] Thus, this invention presents a multi-color space, multi-scale collaborative image enhancement method based on guided filtering. This method comprehensively considers the two core problems commonly found in underwater images: color distortion and detail blurring, while also solving the problem of traditional methods struggling to balance color and detail. In particular, the underwater image enhancement model, which combines guided filtering decomposition with a multi-scale collaborative Transformer, exhibits strong generalization capabilities and can effectively enhance underwater images of different image qualities.
[0054] The above-described embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to limit it. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present invention, and should all be included within the protection scope of the present invention.
Claims
1. A multi-color space, multi-scale collaborative image enhancement method based on guided filtering, characterized in that, Includes the following steps: Acquire underwater degraded images to be enhanced; The underwater degraded image is decomposed into a low-frequency base layer and a high-frequency detail layer; The low-frequency base layer is used at least to characterize the main structural information of the image, and the high-frequency detail layer is used at least to characterize edge and texture information. The low-frequency base layer, the high-frequency detail layer, and the underwater degradation image are input into the trained underwater image enhancement model, and the enhanced underwater image is output. The underwater image enhancement model includes a dual-path attention fusion mechanism and an adaptive feature weighted fusion mechanism.
2. The multi-color space, multi-scale collaborative image enhancement method based on guided filtering according to claim 1, characterized in that, The method of decomposing the underwater degraded image into a low-frequency base layer and a high-frequency detail layer in the Lab color space using guided filtering includes: Convert the underwater degraded image to the Lab color space; The L channel in the Lab space is decomposed using guided filtering to obtain a low-frequency foundation layer and a high-frequency detail layer.
3. The multi-color space, multi-scale collaborative image enhancement method based on guided filtering according to claim 1, characterized in that, Before inputting the low-frequency base layer, the high-frequency detail layer, and the underwater degradation image into the trained underwater image enhancement model, the following steps are also included: For the low-frequency base layer, contrast-limited adaptive histogram equalization is performed as a preprocessing step. For high-frequency detail layers, nonlocal mean denoising is used for preprocessing. The preprocessed low-frequency base layer, high-frequency detail layer and underwater degradation image are used as module feature inputs. The three are mapped to a unified feature dimension through an embedding layer to obtain three sets of initial feature maps. The residual fusion module is used to process the initial feature maps corresponding to the low-frequency base layer and the high-frequency detail layer in three paths to obtain the first feature maps corresponding to the low-frequency base layer and the high-frequency detail layer.
4. The multi-color space, multi-scale collaborative image enhancement method based on guided filtering according to claim 3, characterized in that, The dual-path attention fusion mechanism is used to generate low-frequency-degenerate paths and high-frequency-degenerate paths; In the low-frequency degradation path, after global average pooling is performed on the first feature map of the low-frequency base layer and the underwater degradation image, attention weights are generated sequentially through 1×1 convolution, LeakyReLU activation, 3×3 convolution, and ReLU activation. The second feature map is then generated according to the following formula: In the formula, This represents the second feature map in the low-frequency degradation map path. Represents the first feature map. This represents the attention weights of the first feature map. This represents the attention weights for underwater degraded images. This represents the feature map after the original underwater degraded image has undergone 3×3 convolution preprocessing. In the high-frequency degradation map path, the same structure as the low-frequency path is adopted. The blurred regions in the underwater degradation image are located by attention weights, and the high-frequency details of the high-frequency detail layer are guided to supplement the texture information of the blurred regions.
5. The multi-color space, multi-scale collaborative image enhancement method based on guided filtering according to claim 4, characterized in that, The adaptive feature weighted fusion mechanism is used to dynamically allocate fusion weights according to the degree of degradation in different regions of the underwater degraded image, including: for the near-field cleaned area, increasing the feature weights of the underwater degraded image to preserve real details; for the far-field blurred area, increasing the feature weights of the low-frequency base layer and the high-frequency detail layer to achieve restoration. The adaptive feature weighted fusion mechanism is implemented through the following formula: ; ; ; In the formula, This represents a context-aware feature map extracted from concatenated features via a convolution-normalization-activation link. This indicates a modified linear unit activation function. Indicates batch normalization, This represents a 3×3 convolution. This indicates a channel-level concatenation operation. This represents the second feature fusion map output by the dual-path attention fusion mechanism. This represents the feature map after the original underwater degraded image has undergone 3×3 convolution preprocessing. Indicates based on The separately extracted adaptive weighted generation graph, Represents the Sigmoid function; This represents the third feature map output by the adaptive feature weighting fusion mechanism. This represents the learnable, globally adaptive weight coefficients.
6. The multi-color space, multi-scale collaborative image enhancement method based on guided filtering according to claim 5, characterized in that, After obtaining the third feature map output by the adaptive feature weighting fusion mechanism, the following is also included: The third feature map is processed through multiple GFD-FCT modules to obtain an enhanced underwater image.
7. The multi-color space, multi-scale collaborative image enhancement method based on guided filtering according to claim 1, characterized in that, The underwater image enhancement model uses the Adam optimizer to optimize network training. The mathematical model of the Adam algorithm is expressed as follows: ; In the formula, This represents the first moment estimate of the gradient in the t-th iteration. This represents the first-order moment decay rate used to control the moving average. This represents the first moment estimate of the gradient under the (t-1)th iteration. Let represent the gradient of the loss function with respect to the parameter θ at the t-th iteration; This represents the second-order moment estimate of the squared gradient in the t-th iteration. This represents the second-order moment decay rate used to control the moving average. This represents the second moment estimate of the squared gradient under the (t-1)th iteration; Represents the estimation of the first moment The estimated value after bias correction. This represents the first-order decay rate raised to the power of t, used to eliminate estimation bias in the initial stage. Represents the estimation of the second moment Deviation correction, t represents the second-order decay rate raised to the power of t; This represents the model parameters in the (t+1)th iteration. This represents the model parameters in the t-th iteration. Indicates the learning rate. This represents a constant used to prevent division by zero.