A low-light image enhancement method based on a divide-and-conquer strategy optimized Retinex network
By optimizing the Retinex network using a divide-and-conquer strategy, and utilizing edge gradient consistency loss, multi-scale independent decoupled attention, and adaptive weighted frequency domain perception loss, the problems of insufficient decoupling between illumination and reflection components and weak noise and artifact suppression capabilities in low-light image enhancement are solved, achieving higher quality image enhancement results.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- TIANJIN UNIVERSITY OF TECHNOLOGY
- Filing Date
- 2026-04-01
- Publication Date
- 2026-06-26
AI Technical Summary
Existing Retinex networks suffer from insufficient decoupling between illumination and reflection components, weak noise and artifact suppression, and low detail and color fidelity in low-light image enhancement, making it difficult to achieve effective enhancement in complex low-light scenes.
A divide-and-conquer strategy is adopted to optimize the Retinex network. By designing an edge gradient consistency loss function, a multi-scale independent decoupled attention module, and an adaptive weighted frequency domain perception loss function, a collaborative optimization enhancement framework is constructed, including image decomposition, reflectivity restoration, and illumination correction networks, which are used to process details and noise respectively.
It significantly improves the structural fidelity, detail clarity, and visual naturalness of low-light images, enabling more refined lighting adjustments and detail preservation, and enhancing the robustness and generalization ability of the enhancement effect.
Smart Images

Figure CN122289094A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of computer vision and digital image processing technology, specifically to a low-light image enhancement algorithm based on a divide-and-conquer strategy to optimize Retinex networks. Background Technology
[0002] Images captured under low-light conditions often suffer from low visibility, poor contrast, significant noise, and color distortion, severely impacting their subsequent analysis and application in critical fields such as autonomous driving, security monitoring, medical imaging, and mobile photography. Therefore, low-light image enhancement technology has always been an important research direction in computer vision and image processing. Traditional enhancement methods (such as histogram equalization and gamma correction) typically adjust based on the overall statistical characteristics of the image, making it difficult to effectively suppress noise and maintain local details and natural colors while increasing brightness, often leading to overexposure or underexposure. Methods based on Retinex theory simulate human eye perception by decomposing the image into illumination and reflection components for separate processing, theoretically offering better image restoration. However, traditional implementations often rely on manually designed priors, have limited generalization ability in complex scenes, and are computationally complex.
[0003] In recent years, deep learning techniques, especially Retinex decomposition models based on convolutional neural networks, have significantly improved enhancement effects by learning from large amounts of data, enabling more robust estimation of illumination and reflection components. However, existing deep learning-based Retinex network models still face challenges: on the one hand, networks typically employ an end-to-end approach to jointly optimize the estimation of illumination and reflection components, which can easily lead to insufficient decoupling of the two components, resulting in insufficient enhancement in extremely dark areas or amplification of artifacts in noisy regions; on the other hand, most existing models adopt a uniform processing strategy, making it difficult to adaptively handle regions with uneven illumination distribution and varying levels of detail in images, and while enhancing overall brightness, they are prone to losing texture details or causing local color distortion. Although some research has attempted to improve the situation through multi-scale networks and attention mechanisms, there are still significant shortcomings in achieving more refined and physically-aware synergistic optimization of illumination adjustment and detail preservation. This limits the performance ceiling and practical application effectiveness of existing algorithms in complex low-light scenes. Summary of the Invention
[0004] Addressing the shortcomings of existing technologies, this invention aims to provide a low-light image enhancement method based on a divide-and-conquer strategy to optimize Retinex networks, addressing issues such as insufficient decoupling of illumination and reflection components, weak noise and artifact suppression, and low detail and color fidelity in existing Retinex networks for low-light image enhancement. Guided by the divide-and-conquer principle, this method constructs a collaborative optimization framework. Specifically, it includes: designing a novel edge gradient consistency loss function to constrain the structural consistency among components after image decomposition; proposing a multi-scale independent decoupled attention module to achieve differentiated focusing and fusion of features at different scales; and introducing a novel adaptive weighted frequency domain perceptual loss function, which uses adaptive weights to differentiate and constrain high-frequency and low-frequency components, thereby achieving a balance between noise suppression and structure preservation at the frequency domain level. By establishing multi-level, clearly defined optimization objectives in the spatial and frequency domains, the method systematically improves the enhancement results in terms of structural fidelity, detail clarity, and visual naturalness.
[0005] The technical problem solved by this invention is achieved through the following technical solution: A low-light image enhancement method based on a divide-and-conquer strategy to optimize Retinex networks, comprising the following steps: S1: Image decomposition: Construct an image decomposition network to decompose the input image into illumination components and reflectivity components, and use an improved mutual consistency loss function to constrain the decomposition process; S2: Reflectance Restoration: A reflectance restoration network is constructed, which uses the reflectance components obtained from image decomposition as input for denoising and detail reconstruction. The network is based on U-Net and embeds a multi-scale independent decoupled attention module. By decoupling and adaptively fusing multi-scale features, a reflectance component with clear details and significantly suppressed noise is obtained. S3: Illumination Correction: An illumination correction network is constructed to correct the illumination components. The network adopts a multi-scale perception architecture and uses parallel convolutional branches with different dilation rates to extract details, regional and global illumination features. The network takes the concatenation of reflectivity and illumination components as input. Based on this fusion feature, the multi-scale perception architecture is used to achieve joint optimization of illumination and scene content, thereby obtaining the corrected illumination components. S4: Model Training: A phased joint training strategy is adopted. First, the image decomposition network is trained, and then the reflectance component recovery and illumination correction networks are optimized in sequence. The training uses the publicly available supervised dataset LOL, with low-light images corresponding to normally exposed images as training targets. The training is supervised through the multiple loss functions to ensure that each module works together and finally converges to a stable augmented model. S5: Image Reconstruction: Multiply the restored reflectivity component and the corrected illumination component element by element to reconstruct the final enhanced image.
[0006] Furthermore, in S1, an image decomposition network is constructed. Its core structure consists of multiple convolutional layers and residual modules. The backbone network is composed of six stacked convolutional layers, each using a 3×3 kernel with a fixed output channel count of 64. Each convolutional layer is configured with the same padding and stride. A multi-path residual module (MRM) is introduced into the image decomposition network. Through the design of this module, the network can process feature information in parallel on multiple paths. A comprehensive illumination decomposition loss function is constructed, consisting of four loss terms, namely the reconstruction error loss. loss of reflectivity consistency Light smoothing loss and edge gradient mutual consistency loss , The reconstruction error loss Represented as: , in For low-light images, and The reflectance component and the illumination component are obtained by decomposing the low-light image, respectively. This is a normal lighting image. and The reflectance component and the illumination component are obtained by decomposing the normal illumination diagram.
[0007] The loss of reflectivity uniformity Represented as: ,in The reflectance component is obtained from the decomposition of a low-light image. The reflectance component is obtained by decomposing a normally illuminated image.
[0008] The light smoothing loss Represented as: , in and These represent the spatial gradients of the illumination components obtained from the decomposition of the low-light image and the normal-light image, respectively. and These represent the spatial gradients of the reflectance components obtained from the decomposition of the low-light image and the normal-light image, respectively. This is for element-wise exponentiation.
[0009] The edge gradient consistency loss Represented as: in and For the function shape parameter, Adjust the peak width, Control the position of the wave crest, and These represent the spatial gradients of the illumination components obtained from the decomposition of the low-light image and the normal-light image, respectively. It is the sum of the absolute values of the two.
[0010] Based on this, the total loss of the image decomposition network is constructed. , represented as: Furthermore, in S2, the reflectivity recovery network is built based on the classic U-Net symmetric encoder-decoder structure. It leverages the ability of skip connections to fuse multi-scale features to preserve texture and edges. All max-pooling downsampling operations in the encoder are replaced with strided convolutions, which perform downsampling through a sliding window. A multi-scale independent decoupled attention module is designed between each skip connection and the bottleneck layer for the following functions: (1) Multi-scale feature extraction: The feature pyramid of the image is constructed, and three parallel branches are set up. By using 1×1, 3×3 and 5×5 convolution kernels respectively, a receptive field from fine to coarse is formed, thereby synergistically capturing complementary local pixel information, regional structure information and whole image context information. (2) Feature decorrelation operation: The global features are differentially calculated with the medium-scale features and the point features respectively to suppress the medium-scale and local detail information contained therein, so as to obtain purer global context information; the medium-scale features are differentially calculated with the point features to remove local details and strengthen the characterization of medium-scale structural features. (3) Attention-weighted fusion: Each decorrelated multi-scale feature branch is input into the channel attention module. Each branch learns the channel weights through global average pooling, dynamically strengthening the important features of each branch. Then, all weighted branch features are concatenated with the original input features provided by the skip connection along the channel to achieve optimized multi-scale feature fusion. Based on the constructed reflectivity recovery network, a comprehensive loss function is constructed and jointly optimized for it. This loss function includes the following three core parts: (1) Reflectance Consistency Loss: This loss directly constrains the recovered reflectance components to maintain consistency with the standard reflectance components in pixel values and their gradient space, ensuring that the recovered results are highly matched with the real reference in both numerical value and edge details. The reflectance consistency loss of the reflectance recovery network is expressed as: ,in, and These are the recovered reflectivity component and the spatial gradient of the recovered reflectivity component, respectively. and These are the reflectance components obtained from the decomposition of a normally illuminated image and the spatial gradient of the decomposition reflectance components, respectively. (2) Structural similarity loss: To improve the visual perception quality of the output, a loss term based on structural similarity index is introduced. The structural similarity loss is expressed as: ,in, The similarity index, and The meaning has not changed; (3) Adaptive weighted frequency domain sensing loss: A new frequency domain sensing loss is introduced, which combines three sub-losses to impose multi-dimensional constraints on the reflectivity components, among which Phase loss is used to constrain the frequency domain phase information, ensuring the accuracy of the recovered reflectance map at edges and details. The phase loss is expressed as: , in The total number of pixels in the image, ( For Fourier transform operators, The phase angle is a complex number in the frequency domain. and The meaning has not changed; Gradient loss is used to measure the difference between the recovered reflectance component and the standard reflectance component in the gradient domain. The gradient loss is expressed as: in The total number of pixels in the image, ( , , and The meaning has not changed; Amplitude loss employs differentiated weighting of amplitude information at different frequencies to coordinate the recovery of global information and local details. A frequency-adaptive weighting function is introduced into the amplitude loss function. and The weighting function is constructed based on the Euclidean distance from the frequency point to the center of the spectrum: It is mainly used to suppress high-frequency noise, while By attenuating the low-frequency components twice, the main structure of the image is well preserved. The amplitude loss is expressed as: in The meaning has not changed. , , This constitutes the adaptive weighted frequency domain sensing loss: in These are the amplitude loss, phase loss, and gradient loss in the adaptive weighted frequency domain sensing loss. Based on this, the total loss of the reflectivity recovery network is determined. , represented as: .
[0011] Furthermore, in S3, the illumination correction network IC-Net is constructed. The network takes as input the result of concatenating the reflectance component and illumination component obtained from previous steps along the channel dimension. The core of the network uses parallel branches of dilated convolutions with different dilation rates to construct a multi-scale feature pyramid: the detail branch (dilation rate = 1) is responsible for capturing local texture and edge illumination; the region branch (dilation rate = 2) extracts the illumination distribution over a medium range; and the global branch (dilation rate = 4) obtains the overall brightness topology of the image by expanding the receptive field. After concatenation and fusion, the multi-scale features are used by subsequent convolutional layers to reconstruct the corrected illumination components, thus constructing a corresponding loss constraint. This loss function consists of the following three parts: (1) Reconstruction Loss: This loss term constrains the corrected illumination component and the recovered reflectance component to accurately reconstruct a high-quality enhanced image, which is the basis for ensuring the visual realism of the enhancement result. It is expressed as: in This is a normal lighting image. These are the recovered reflectance component and the corrected illumination component, respectively; (2) Illumination smoothing loss: A constraint is imposed on the gradient of the illumination component, expressed as: in and These represent the spatial gradients of the corrected illumination component and the restored reflectivity component, respectively. (3) Frequency domain perception loss: The amplitude and phase information of the illumination map are jointly optimized from the frequency domain perspective, as follows: ; ; , in The total number of pixels in the image, ( For Fourier transform operators, and These are the corrected illumination component and the illumination component obtained from the decomposition of the normal illumination image, respectively. and These represent the amplitude and phase losses within the frequency domain sensing loss, respectively. Based on this, the total loss of the illumination correction network is constituted. , represented as: .
[0012] Furthermore, in S4, network training employs a phased training strategy, sequentially training the image decomposition network, reflectivity restoration network, and illumination correction network. The Adam optimizer is used throughout the entire training process, with its hyperparameters set to... β 1 = 0.9 β 2=0.999, the training parameters for the image decomposition network and the illumination correction network are the same: batch size is set to 10, input image patch size is 48×48 pixels, and the initial learning rate is fixed at 1×10. -4 The training epochs were 2400, and the training parameters for the reflectivity recovery network were configured differently: batch size was set to 4, input image patch size was 128×128 pixels, training epochs were 2400, and the learning rate used a phased decay strategy, starting from an initial value of 1×10⁻⁶. -4 Gradually decay to 1×10 -5 This is to ensure the stability of the training process and the effective convergence of the model.
[0013] Furthermore, in S5, image reconstruction: Under the optimization of the staged joint training strategy, the image decomposition, reflectivity restoration, and illumination correction networks can each generate high-quality components that conform to physical priors and visual perception. Based on the Retinex imaging model, the reflectivity and illumination components output by the fully trained reflectivity restoration and illumination correction networks are multiplied element-wise to reconstruct the enhanced image. : ,in These are the restored reflectance component and the corrected illumination component, respectively. This indicates element-wise multiplication.
[0014] The advantages and positive effects of this invention are: (1) This invention innovatively constructs a low-light image enhancement framework based on the "divide and conquer" strategy. This core strategy runs through the design of network modules and loss functions. By designing a targeted edge gradient consistency loss function, a multi-scale independent decoupled attention module, and an adaptive weighted frequency domain perception loss function, it effectively addresses the multiple challenges in low-light image enhancement.
[0015] (2) In the process of image decomposition, the present invention innovatively constructs an edge gradient consistency loss function. By differentiating the edge information, it retains the strong gradient edges while suppressing the interference of weak edges, thus obtaining a clearer and more accurate image structure.
[0016] (3) In the process of reflectivity recovery, the present invention innovatively constructs a multi-scale independent decoupled attention module, and uses differential operation to strip the overlapping parts of features at different scales, ensuring that the features extracted by each branch are more independent and complementary. This not only reduces feature redundancy, but also enables each branch to focus on feature information at different scales, resulting in better image effects.
[0017] (4) The present invention innovatively designs an adaptive weighted frequency domain perceptual loss. This loss function can adaptively distinguish and process different frequency components of the image, effectively suppressing high-frequency noise while preserving the main structure of the low frequency, thereby achieving fine control of image details and noise at the frequency domain level and significantly improving the visual fidelity of the enhancement results. Attached Figure Description
[0018] Figure 1 This is an overall algorithm flowchart in an embodiment of the present invention; Figure 2 This is a diagram of the image decomposition network structure in an embodiment of the present invention; Figure 3 This is a comparison chart of the processing effects before and after the improvement of edge gradient consistency loss in the embodiments of the present invention; Figure 4 This is a structural diagram of the multi-scale independent decoupled attention module in the reflectivity recovery network in this embodiment of the invention; Figure 5 This is a diagram of the illumination correction network structure in an embodiment of the present invention; Figure 6 This is a comparison image of the enhanced image effect with similar algorithms in this embodiment of the invention; Figure 7 These are the effect diagrams of different image enhancements in embodiments of the present invention. Detailed Implementation
[0019] To make the objectives, technical solutions, and advantages of this application clearer and easier to understand, the present invention will be further described below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are merely illustrative of this application and are not intended to limit the scope of this application.
[0020] This embodiment provides a low-light image enhancement method based on a divide-and-conquer strategy to optimize Retinex networks. The overall algorithm flow is as follows: Figure 1 As shown. The overall process of this method is as follows: Step 1: Constructing the Image Decomposition Network ID-Net: Its core structure consists of multiple convolutional layers and residual modules. The backbone network is composed of six stacked convolutional layers, each using a 3×3 kernel, with a fixed output channel count of 64. All convolutional layers are configured with the same padding and stride to maintain the spatial dimensions of the feature maps. Residual networks have been widely used in deep learning since their inception due to their unique skip connection structure, which allows gradients to flow more effectively during backpropagation, thus effectively alleviating the gradient vanishing problem in deep networks. Therefore, this invention introduces a multi-path residual module (MRM) into the image decomposition network. Through this module's design, the network can process feature information in parallel along multiple paths. It not only alleviates the gradient vanishing problem in deep networks through skip connections but also achieves effective fusion of cross-layer features by introducing a multi-path feature reuse mechanism. See the specific structure below. Figure 2 .
[0021] To guide the image decomposition network to perform accurate and physically consistent decomposition, a comprehensive set of illumination decomposition loss functions was constructed to stabilize network training and improve decomposition quality. These functions consist of four loss terms: reconstruction error loss, reflectivity consistency loss, illumination smoothing loss, and edge gradient consistency loss.
[0022] (1) Reconstruction error loss: Ensures that the network can accurately reconstruct a high-quality image from the decomposed components, expressed as: in For low-light images, and The reflectance component and illumination component are obtained by decomposing the low-light image, respectively. This is a normal lighting image. and The reflectance component and the illumination component are obtained by decomposing the normal illumination diagram.
[0023] (2) Reflectance consistency loss: According to Retinex theory, the same scene should have consistent reflectance components under different exposure conditions. Therefore, reflectance consistency loss is used as a constraint condition, expressed as: in The reflectance component is obtained from the decomposition of a low-light image. The reflectance component is obtained by decomposing a normally illuminated image.
[0024] (3) Illumination smoothing loss: Considering the continuity and smoothness of illumination distribution in the natural environment, illumination smoothing loss is introduced to guide the network to generate more realistic and spatially continuous illumination components, expressed as: in and These represent the spatial gradients of the illumination components obtained from the decomposition of the low-light image and the normal-light image, respectively. and These represent the spatial gradients of the reflectance components obtained from the decomposition of the low-light image and the normal-light image, respectively. This is for element-wise exponentiation.
[0025] (4) Edge Gradient Consistency Loss: Inspired by the edge gradient consistency concept in the Kind algorithm, this invention makes targeted improvements to more effectively preserve the key structures of the image. It is defined as: in and For the function shape parameter, Adjust the peak width, Control the position of the wave crest. and These represent the spatial gradients of the illumination components obtained from the decomposition of the low-light image and the normal-light image, respectively. It is the sum of the absolute values of the two.
[0026] Its core principle is to adaptively and differentially constrain gradient (edge) regions of varying intensity in an image using two adjustable penalty curves. The shape of this penalty function is determined by a shape parameter. and A joint decision, in which Control the peak position of the curve. Adjust the width of the peak. In practice, this function affects the gradient magnitude. M Apply smaller constraints to near-zero flat regions (carrying the main image content and global information) to prevent structural distortion and artifacts; M Greater attention is given to prominent edge areas with higher values (those carrying critical details and textures) to ensure their sharpness and accuracy. This was achieved through experimental settings. =0.08, Setting the loss to 0.1 and minimizing this loss effectively encourages the network to accurately preserve important strong edge details during the enhancement process. The before-and-after processing results are shown in the graphs. Figure 3 .
[0027] Based on this, the total loss of the image decomposition network is constructed. , represented as: in , , and These represent the reconstruction error loss, reflectivity consistency loss, illumination smoothing loss, and edge gradient consistency loss of the image decomposition network, respectively.
[0028] Step 2: Constructing the Reflectance Restoration Network RR-Net: To effectively remove noise introduced into the reflectance components during the decomposition process and preserve details to the maximum extent, this invention constructs a reflectance restoration network based on an improved U-Net architecture. The specific implementation steps are as follows: Based on the classic U-Net symmetric encoder-decoder structure, this invention leverages its ability to fuse multi-scale features via skip connections to preserve texture and edges. Building upon this, all max-pooling downsampling operations in the encoder are replaced with strided convolutions. Stride convolutions perform downsampling through a sliding window, reducing spatial information loss and effectively capturing local features, thus preserving details. Secondly, to enhance the network's ability to understand and utilize contextual information at multiple scales, this invention innovatively designs a multi-scale independent decoupled attention module (MIDA) between each skip connection and the bottleneck layer. See the detailed structure below. Figure 4 The core design concept of this module is as follows: (1) Multi-scale feature extraction: To construct the feature pyramid of the image, this invention sets up three parallel branches. By using 1×1, 3×3 and 5×5 convolution kernels respectively, a receptive field from fine to coarse is formed, thereby synergistically capturing local pixel information, regional structure information and overall image context information.
[0029] (2) Feature decorrelation operation: To effectively suppress feature redundancy between branches of different scales and force each branch to focus on its unique information components, this module uses feature decoupling differential computation. Specifically, global features are differentially calculated with medium-scale features and point features respectively to suppress the medium-scale and local detail information contained therein, thereby obtaining purer global context information; medium-scale features are differentially calculated with point features to remove local details, thereby strengthening the characterization of medium-scale structural features. This operation achieves explicit decoupling at the feature level, ensuring the independence and complementarity of the output features of each branch.
[0030] (3) Attention-weighted fusion: Each decorrelated multi-scale feature branch is input into the channel attention module. Each branch learns channel weights through global average pooling, dynamically strengthening the important features of each branch. Subsequently, all weighted branch features are concatenated with the original input features provided by skip connections along the channels to achieve optimized multi-scale feature fusion.
[0031] Based on the aforementioned reflectance restoration network, a comprehensive loss function was constructed to jointly optimize it in order to effectively guide the model to generate noise-free and naturally colored reflectance components. This loss function consists of the following three core parts: (1) Reflectivity Consistency Loss: This loss directly constrains the recovered reflectivity components to maintain consistency with the standard reflectivity components in pixel values and their gradient space, ensuring that the recovered results are highly matched with the real reference in both numerical value and edge details. It can be expressed as: in and These are the recovered reflectivity component and the spatial gradient of the recovered reflectivity component, respectively. and These are the reflectance components obtained from the decomposition of a normally illuminated image and the spatial gradient of the decomposed reflectance components, respectively.
[0032] (2) Structural similarity loss: To improve the visual perception quality of the output, a loss term based on structural similarity index is introduced, which can be expressed as: in The similarity index, and The reflectance components are obtained from the recovered reflectance components and the normal illumination image decomposition, respectively.
[0033] (3) Adaptive weighted frequency domain sensing loss: In order to comprehensively improve the quality of the recovered reflectance map, considering the frequency deviation of the reflectance map components, we introduce a new frequency domain sensing loss, which combines three sub-losses to impose multi-dimensional constraints on the reflectance components.
[0034] Phase loss is used to constrain frequency domain phase information, ensuring the accuracy of the recovered reflectivity map at edges and details, and can be expressed as: in The total number of pixels in the image, ( For pixel coordinate index, For Fourier transform operators, The phase angle is a complex number in the frequency domain. and The reflectance components are obtained from the recovered reflectance components and the normal illumination image decomposition, respectively.
[0035] Gradient loss measures the difference between the recovered reflectance component and the standard reflectance component in the gradient domain, and is expressed as: in The total number of pixels in the image, ( For pixel coordinate index, For Fourier transform operators, and These are the spatial gradients of the recovered reflectance component and the spatial gradients of the reflectance component obtained from the decomposition of the normal illumination image, respectively.
[0036] Amplitude loss employs differentiated weighting of amplitude information at different frequencies to coordinate the recovery of global information and local details. To achieve fine-grained control over different frequency components, a frequency-adaptive weighting function is introduced into the amplitude loss. and The weighting function is constructed based on the Euclidean distance from the frequency point to the center of the spectrum. It is mainly used to suppress high-frequency noise, while By attenuating low-frequency components twice, the main structure of the image is well preserved. This design enables the network to adaptively distinguish and process information of different frequencies, thereby maximizing the preservation of details while suppressing noise, as expressed in: in The total number of pixels in the image, ( For pixel coordinate index, For Fourier transform operators, , , This constitutes the total frequency domain loss: in These are the amplitude loss, phase loss, and gradient loss in the adaptive weighted frequency domain sensing loss.
[0037] Based on this, the total loss of the reflectivity recovery network is determined. , represented as: in These are the reflectivity consistency loss, structural similarity loss, and adaptive weighted frequency domain sensing loss of the reflectivity recovery network, respectively.
[0038] Step 3: Constructing the Illumination Correction Network IC-Net: The network uses the result of concatenating the reflectance component and illumination component obtained in the previous steps along the channel dimension as input, enabling the network to simultaneously perceive scene content and illumination information. The core of the network uses parallel branches of dilated convolutions based on different dilation rates to construct a multi-scale feature pyramid: the detail branch (dilation rate = 1) is responsible for capturing local texture and edge illumination; the region branch (dilation rate = 2) extracts the illumination distribution over a medium range; and the global branch (dilation rate = 4) expands the receptive field to obtain the overall brightness topology of the image. After the above multi-scale features are concatenated and fused, the corrected illumination component is reconstructed by subsequent convolutional layers. See the specific structure below. Figure 5 .
[0039] To supervise the optimization process of the illumination correction network and ensure its output meets expectations, this invention constructs corresponding loss constraints for it. This loss function consists of three parts that collectively guide the network in generating the corrected result: (1) Reconstruction Loss: This loss term constrains the corrected illumination component and the recovered reflectance component to accurately reconstruct a high-quality enhanced image, which is the basis for ensuring the visual realism of the enhancement result. It is expressed as: in This is a normal lighting image. These are the recovered reflectivity component and the corrected illumination component, respectively.
[0040] (2) Illumination Smoothing Loss: To promote the spatial continuity of the generated illumination map and avoid unnatural brightness abrupt changes, this loss imposes constraints on the gradient of the illumination components to ensure a smooth transition, expressed as: in and These represent the spatial gradient of the corrected illumination component and the spatial gradient of the restored reflectivity component, respectively.
[0041] (3) Frequency Domain Perception Loss: To constrain the correction quality of the illumination components in the frequency domain, this loss term jointly optimizes the amplitude and phase information of the illumination map from the frequency domain perspective, and is expressed as: in The total number of pixels in the image, ( For pixel coordinate index, For Fourier transform operators, and These are the corrected illumination component and the illumination component obtained from the decomposition of the normal illumination image, respectively. and These represent the amplitude and phase losses within the frequency domain sensing loss, respectively.
[0042] Based on this, the total loss of the illumination correction network is constituted. , represented as: in These are the reconstruction loss, illumination smoothing loss, and frequency domain sensing loss in the illumination correction network, respectively.
[0043] Step 4: Network Training: In the model training phase, this invention employs a staged training strategy, sequentially training the image decomposition network, reflectivity recovery network, and illumination correction network. All training processes utilize the Adam optimizer, with its hyperparameters set to... β 1 = 0.9 β 2 = 0.999. The training parameters for the image decomposition network and the illumination correction network are the same: batch size is set to 10, input image patch size is 48×48 pixels, and the initial learning rate is fixed at 1×10. -4 The training epochs were 2400. The training parameters for the reflectivity recovery network were configured differently: batch size was set to 4, input image patch size was 128×128 pixels, and the number of training epochs was 2400. The learning rate used a phased decay strategy, starting from an initial value of 1×10⁻⁶. -4 Gradually decay to 1×10 -5 This ensures the stability of the training process and the effective convergence of the model. The training strategy achieves stable and efficient optimization of each sub-network by adapting differentiated batch sizes, input sizes, and learning rate scheduling to different network modules.
[0044] Step 5: Image Reconstruction: Under the optimization of the phased joint training strategy, the image decomposition, reflectivity restoration, and illumination correction networks can each generate high-quality components that conform to physical priors and visual perception. This step, based on the Retinex imaging model, multiplies the reflectivity and illumination components output by the trained reflectivity restoration and illumination correction networks element-wise to reconstruct the enhanced image. : in These are the restored reflectance component and the corrected illumination component, respectively. This indicates element-wise multiplication.
[0045] Finally, to verify the performance of this invention, an evaluation and comparison were conducted based on the publicly available low-light image dataset LOL. The performance analysis mainly focused on the following two aspects: On a paired dataset with real reference images, the enhancement results of the proposed method were comprehensively compared with nine representative methods in the prior art, including ZERO-DCE, RUAS, SCI, PairLIE, RetinexNet, Kind, R2RNet, Cycle-Retinex, and UREtinex-Net. For quantitative analysis, Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), Learned Perceptual Image Similarity (LPIPS), and Mean Absolute Error (MAE) were used as objective evaluation metrics to comprehensively measure the performance of the enhanced images in terms of pixel accuracy, structure preservation, and error control.
[0046] Furthermore, to verify the generalization ability of the proposed method in real-world scenarios without references, it was evaluated on five publicly available unpaired low-light datasets: DICM, LIME, MEF, NPE, and VV. All comparison models were trained on the LOL dataset and tested on the aforementioned five datasets. Given the lack of real reference images, the Natural Image Quality Evaluator (NIQE) was used as a no-reference evaluation metric to measure the naturalness and visual quality of the augmentation results.
[0047] Table 1 provides a quantitative comparison of the LOL dataset in terms of PSNR, SSIM, LPIPS, and MAE. Table 2. Quantitative comparison of the DICM, LIME, MEF, NPE, and VV datasets based on the NIQE metric. As shown in Table 1, among similar methods, this invention outperforms all control group methods in three of the four quantitative evaluation indicators: PSNR, SSIM, LPIPS, and MAE, and also achieves second place in the other indicator. As shown in Table 2, the proposed method achieves the best performance on the DICM and LIME datasets, and second place on the NPE dataset. Furthermore, its overall average score ranks first among all methods, which fully verifies the superior generalization performance of this invention in different real low-light scenarios. Figure 6 and Figure 7 The images show a visual comparison between this invention and similar algorithms, as well as the enhancement effect of this method in different low-light scenarios.
[0048] Based on the above experimental results, it can be seen that the low-light image enhancement method based on the divide-and-conquer strategy proposed in this invention has achieved a leading level among similar methods in both quantitative evaluation and visual quality. Furthermore, it exhibits stable and leading enhancement effects in various complex real-world scenarios, fully verifying its excellent generalization ability and robustness. In summary, this invention provides an effective, reliable, and highly generalizable solution for low-light image enhancement tasks.
[0049] Although embodiments of the invention have been disclosed for illustrative purposes, those skilled in the art will understand that various substitutions, variations, and modifications are possible without departing from the spirit and scope of the invention and the appended claims. Therefore, the scope of the invention is not limited to the contents disclosed in the embodiments.
Claims
1. A low-light image enhancement method based on a divide-and-conquer strategy to optimize Retinex networks, characterized in that: The steps are as follows: S1: Image decomposition: Construct an image decomposition network to decompose the input image into illumination components and reflectivity components, and use an improved mutual consistency loss function to constrain the decomposition process; S2: Reflectance Restoration: A reflectance restoration network is constructed, which uses the reflectance components obtained from image decomposition as input for denoising and detail reconstruction. The network is based on U-Net and embeds a multi-scale independent decoupled attention module. By decoupling and adaptively fusing multi-scale features, a reflectance component with clear details and significantly suppressed noise is obtained. S3: Illumination Correction: Construct an illumination correction network to correct the illumination components. The network adopts a multi-scale perception architecture and uses parallel convolutional branches with different dilation rates to extract details, regional and global illumination features. The network takes the concatenation of reflectivity and illumination components as input and uses the multi-scale perception architecture to achieve joint optimization of illumination and scene content to obtain the corrected illumination components. S4: Model Training: A phased joint training strategy is adopted. First, the image decomposition network is trained, and then the reflectance component recovery and illumination correction networks are optimized in sequence. The training uses the publicly available supervised dataset LOL, with low-light images corresponding to normally exposed images as training targets. The loss function is used for supervised training to ensure that each module works together and finally converges to a stable augmented model. S5: Image Reconstruction: Multiply the restored reflectivity component and the corrected illumination component element by element to reconstruct the final enhanced image.
2. The low-light image enhancement method based on divide-and-conquer strategy to optimize Retinex networks according to claim 1, characterized in that: In S1, an image decomposition network is constructed. Its core structure consists of multiple convolutional layers and residual modules. The backbone network is composed of six stacked convolutional layers, each using a 3×3 kernel with a fixed 64 output channels. Each convolutional layer is configured with the same padding and stride. A multi-path residual module (MRM) is introduced into the image decomposition network. Through this module's design, the network can process feature information in parallel on multiple paths. A comprehensive illumination decomposition loss function is constructed, consisting of four loss terms: reconstruction error loss, etc. loss of reflectivity consistency Light smoothing loss and edge gradient mutual consistency loss Based on this, the total loss of the image decomposition network is constructed. , represented as: 。 3. The low-light image enhancement method based on divide-and-conquer strategy to optimize Retinex networks according to claim 1, characterized in that: In S2, the reflectivity recovery network is built based on the classic U-Net symmetric encoder-decoder structure. It leverages the ability of skip connections to fuse multi-scale features to preserve texture and edges. All max-pooling downsampling operations in the encoder are replaced with strided convolutions, which perform downsampling through a sliding window. A multi-scale independent decoupled attention module is designed between each skip connection and the bottleneck layer for the following functions: (1) Multi-scale feature extraction: The feature pyramid of the image is constructed, and three parallel branches are set up. By using 1×1, 3×3 and 5×5 convolution kernels respectively, a receptive field from fine to coarse is formed, thereby synergistically capturing complementary local pixel information, regional structure information and whole image context information. (2) Feature decorrelation operation: The global features are differentially calculated with the medium-scale features and the point features respectively to suppress the medium-scale and local detail information contained therein, so as to obtain purer global context information; the medium-scale features are differentially calculated with the point features to remove local details and strengthen the characterization of medium-scale structural features. (3) Attention-weighted fusion: Each decorrelated multi-scale feature branch is input into the channel attention module. Each branch learns the channel weights through global average pooling, dynamically strengthening the important features of each branch. Then, all weighted branch features are concatenated with the original input features provided by the skip connection along the channel to achieve optimized multi-scale feature fusion. Based on the constructed reflectivity recovery network, a comprehensive loss function is constructed and jointly optimized for it. This loss function includes the following three core parts: (1) Reflectance Consistency Loss: This loss directly constrains the recovered reflectance components to maintain consistency with the standard reflectance components in pixel values and their gradient space, ensuring that the recovered results are highly matched with the real reference in both numerical value and edge details. The reflectance consistency loss of the reflectance recovery network is expressed as: ,in, and These are the recovered reflectivity component and the spatial gradient of the recovered reflectivity component, respectively. and These are the reflectance components obtained from the decomposition of a normally illuminated image and the spatial gradient of the decomposition reflectance components, respectively. (2) Structural similarity loss: To improve the visual perception quality of the output, a loss term based on structural similarity index is introduced. The structural similarity loss is expressed as: ,in, The similarity index, and The meaning has not changed; (3) Adaptive weighted frequency domain sensing loss: A new frequency domain sensing loss is introduced, which combines three sub-losses to impose multi-dimensional constraints on the reflectivity components, among which Phase loss is used to constrain the frequency domain phase information, ensuring the accuracy of the recovered reflectance map at edges and details. The phase loss is expressed as: , in The total number of pixels in the image, ( For Fourier transform operators, The phase angle is a complex number in the frequency domain. and The meaning has not changed; Gradient loss is used to measure the difference between the recovered reflectance component and the standard reflectance component in the gradient domain. The gradient loss is expressed as: in The total number of pixels in the image, ( , , and The meaning has not changed; Amplitude loss employs differentiated weighting of amplitude information at different frequencies to coordinate the recovery of global information and local details. A frequency-adaptive weighting function is introduced into the amplitude loss function. and The weighting function is constructed based on the Euclidean distance from the frequency point to the center of the spectrum: It is mainly used to suppress high-frequency noise, while By attenuating the low-frequency components twice, the main structure of the image is well preserved. The amplitude loss is expressed as: ,in The meaning has not changed. , , This constitutes the adaptive weighted frequency domain sensing loss: ,in These are the amplitude loss, phase loss, and gradient loss in the adaptive weighted frequency domain sensing loss. Based on this, the total loss of the reflectivity recovery network is determined. , represented as: 。 4. The low-light image enhancement method based on divide-and-conquer strategy to optimize Retinex networks according to claim 1, characterized in that: In S3, the illumination correction network IC-Net is constructed. The network takes as input the result of concatenating the reflectance component and illumination component obtained from previous steps along the channel dimension. The core of the network uses parallel branches of dilated convolutions with different dilation rates to construct a multi-scale feature pyramid: the detail branch (dilation rate = 1) is responsible for capturing local texture and edge illumination; the region branch (dilation rate = 2) extracts the illumination distribution over a medium range; and the global branch (dilation rate = 4) obtains the overall brightness topology of the image by expanding the receptive field. After concatenation and fusion, the multi-scale features are used by subsequent convolutional layers to reconstruct the corrected illumination components, thus constructing a corresponding loss constraint. This loss function consists of the following three parts: (1) Reconstruction Loss: This loss term constrains the corrected illumination component and the recovered reflectance component to accurately reconstruct a high-quality enhanced image, which is the basis for ensuring the visual realism of the enhancement result. It is expressed as: ,in This is a normal lighting image. These are the recovered reflectance component and the corrected illumination component, respectively; (2) Illumination smoothing loss: A constraint is imposed on the gradient of the illumination component, expressed as: in and These represent the spatial gradients of the corrected illumination component and the restored reflectivity component, respectively. (3) Frequency domain perception loss: The amplitude and phase information of the illumination map are jointly optimized from the frequency domain perspective, as follows: ; ; , in The total number of pixels in the image, ( For Fourier transform operators, and These are the corrected illumination component and the illumination component obtained from the decomposition of the normal illumination image, respectively. and These represent the amplitude and phase losses within the frequency domain sensing loss, respectively. Based on this, the total loss of the illumination correction network is constituted. , represented as: 。 5. The low-light image enhancement method based on divide-and-conquer strategy to optimize Retinex networks according to claim 1, characterized in that: In S4, network training employs a phased training strategy, sequentially training the image decomposition network, reflectivity restoration network, and illumination correction network. The Adam optimizer is used throughout the training process, with its hyperparameters set to... β 1 = 0.9 β 2=0.999, the training parameters for the image decomposition network and the illumination correction network are the same: batch size is set to 10, input image patch size is 48×48 pixels, and the initial learning rate is fixed at 1×10. -4 The training epochs were 2400, and the training parameters for the reflectivity recovery network were configured differently: batch size was set to 4, input image patch size was 128×128 pixels, training epochs were 2400, and the learning rate used a phased decay strategy, starting from an initial value of 1×10⁻⁶. -4 Gradually decay to 1×10 -5 This is to ensure the stability of the training process and the effective convergence of the model.
6. The low-light image enhancement method based on divide-and-conquer strategy to optimize Retinex networks according to claim 1, characterized in that: In S5, image reconstruction: Under the optimization of the phased joint training strategy, the image decomposition, reflectivity restoration, and illumination correction networks can each generate high-quality components that conform to physical priors and visual perception. Based on the Retinex imaging model, the reflectivity and illumination components output by the fully trained reflectivity restoration and illumination correction networks are multiplied element-wise to reconstruct the enhanced image. : ,in These are the restored reflectance component and the corrected illumination component, respectively. This indicates element-wise multiplication.
7. The low-light image enhancement method based on divide-and-conquer strategy to optimize Retinex networks according to claim 2, characterized in that: The reconstruction error loss Represented as: , in For low-light images, and The reflectance component and the illumination component are obtained by decomposing the low-light image, respectively. This is a normal lighting image. and The reflectance component and the illumination component are obtained by decomposing the normal illumination diagram.
8. The low-light image enhancement method based on divide-and-conquer strategy to optimize Retinex networks according to claim 2, characterized in that: The loss of reflectivity uniformity Represented as: ,in The reflectance component is obtained from the decomposition of a low-light image. The reflectance component is obtained by decomposing a normally illuminated image.
9. The low-light image enhancement method based on divide-and-conquer strategy to optimize Retinex networks according to claim 2, characterized in that: The light smoothing loss Represented as: , in and These represent the spatial gradients of the illumination components obtained from the decomposition of the low-light image and the normal-light image, respectively. and These represent the spatial gradients of the reflectance components obtained from the decomposition of the low-light image and the normal-light image, respectively. This is for element-wise exponentiation.
10. The low-light image enhancement method based on divide-and-conquer strategy to optimize Retinex networks according to claim 2, characterized in that: The edge gradient consistency loss Represented as: in and For the function shape parameter, Adjust the peak width, Control the position of the wave crest, and These represent the spatial gradients of the illumination components obtained from the decomposition of the low-light image and the normal-light image, respectively. It is the sum of the absolute values of the two.