A dual-branch feature fusion image exposure repair method based on Retienx theory

By employing a dual-branch feature fusion image exposure restoration method, which combines dynamic convolution and Retinex theory, the problems of underexposure and overexposure in images are solved, achieving efficient image restoration under complex lighting conditions and improving image quality and visual experience.

CN122243836APending Publication Date: 2026-06-19BEIHANG UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
BEIHANG UNIV
Filing Date
2026-03-20
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies struggle to effectively handle simultaneous underexposure and overexposure, leading to issues such as noise, artifacts, and color shifts in images. This is especially true under complex lighting conditions, where the limitations of traditional methods relying on manual parameter correction and the problem of deep learning models introducing noise in overexposed areas remain unresolved.

Method used

A dual-branch feature fusion image exposure restoration method is adopted. By combining dynamic convolution and Retinex theory, feature information of underexposed and overexposed areas are extracted respectively. The feature is fused by dynamic convolution guided by grayscale image. Combined with channel attention mechanism and combined loss function, a UNet network model is designed to deal with noise, artifacts and color shift in the image.

Benefits of technology

Under complex exposure conditions, it significantly improves the effect of image exposure restoration, reduces noise and color shift, enhances image quality, and improves visual experience. The model has strong generalization ability and is suitable for image processing under various exposure conditions.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122243836A_ABST
    Figure CN122243836A_ABST
Patent Text Reader

Abstract

This method is a two-branch feature fusion image exposure restoration approach based on Retienx theory. Current research on image exposure restoration mainly focuses on low-light enhancement; this method can restore images affected by complex exposures while also handling low-light images. First, a two-branch network is designed to extract high-dimensional features from both the front and back images as guidance for subsequent attention mechanisms. Simultaneously, a dynamic convolutional kernel guided by the grayscale image is used to fuse the high-dimensional information. A combined loss function is also designed, combining the three loss components to learn the network weights, effectively enhancing the correction effect on images with exposure problems. Experiments show that our method achieves PSNR of 24.2935 and SSIM of 0.8726 on the mixed exposure dataset LCDP, outperforming most current methods. Furthermore, on the low-light datasets LOLv1, LOLv2-real, LOLv2-synthetic, and SMID, our method achieves PSNR of 24.4130, 23.0872, 26.9456, and 29.2406, and SSIM of 0.8499, 0.8774, 0.9469, and 0.8058, respectively, also outperforming most current methods.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This method relates to the fields of computer vision and image processing, specifically to an image exposure restoration method based on Retinex theory and dual-branch feature fusion. Figure 1 The main approach involves using a two-branch image front and back image feature extraction network model to obtain features of different exposure areas. Different feature information is then fused through dynamic convolution guided by grayscale images. High-dimensional feature information is used as guidance for the attention mechanism to remove noise, artifacts, color shifts, and other issues in the image. Ultimately, while ensuring proper handling of low-light images, it can also handle multiple exposure problems simultaneously. The model exhibits stronger generalization ability, more stable image exposure processing results, and is more suitable for practical application scenarios. Background Technology

[0002] Image exposure restoration is a technique that uses computer vision to repair issues such as uneven lighting distribution, noise, and color shifts in images. It is a sub-topic of image processing. Image exposure restoration has been increasingly widely used in many fields such as medical image processing, face recognition, and object detection. However, due to physical limitations such as insufficient lighting, limited exposure time, and unsuitable shooting angles, images often suffer from various quality degradations, including but not limited to poor visibility, low contrast, backlighting, shadows, and nighttime shooting.

[0003] Image enhancement is one of the main tasks of image processing. Its purpose is to add information or transform data in the original image to make it conform to visual response characteristics and selectively highlight features of interest. The main objectives of image enhancement are to amplify the differences between features of different objects in an image, suppress irrelevant features, improve image quality, enrich information content, enhance the interpretation and recognition of the image, and meet the needs of certain special analyses.

[0004] From a technical perspective, image processing can be divided into two main categories: traditional image enhancement algorithms and machine learning-based algorithms. Traditional image enhancement algorithms can be further divided into three categories: (1) grayscale transformation methods; (2) histogram equalization methods; and (3) methods based on Retinex theory. Machine learning-based algorithms can be divided into four categories: (1) end-to-end learning methods; (2) decomposition-based learning methods; (3) fusion-based learning methods; and (4) unpaired learning methods. The relevant research backgrounds are introduced below.

[0005] 1. Methods based on traditional image enhancement algorithms

[0006] The gray values ​​in the dark areas of an image are usually small, so the traditional image enhancement algorithm is to design mathematical formulas or filtering methods to adjust the gray values ​​of the image. Based on different enhancement approaches, traditional image enhancement algorithms can be divided into three categories: (1) gray-level transformation methods; (2) histogram equalization methods; and (3) Retinex-based methods.

[0007] (1) Gray-scale transformation method

[0008] When an image is underexposed or overexposed, the grayscale values ​​of the image may be limited to a relatively small range, which can lead to problems such as image blurring and grayscale loss. Grayscale transformation is an important method in image enhancement, used to improve the display effect of an image, and belongs to the spatial domain processing method (Reference 1: Wang, David CC, Anthony HVagnucci, and CC Li. 1983. “Digital Image Enhancement: A Survey.” ComputerVision, Graphics, and Image Processing, December, 363–81. doi:10.1016 / 0734-189x(83)90061-0.). This method can increase the dynamic range of an image and enhance its contrast, thereby making the image clearer. The essence of grayscale transformation is to modify the grayscale value of each pixel in the image according to certain rules, thereby changing the grayscale range of the image. Typical grayscale transformation methods can be divided into linear transformation and nonlinear transformation.

[0009] Linear grayscale transformation uses linear equations to map grayscale values. Assume the original image has grayscale values ​​of... The grayscale value after linear transformation is Then the formula for linear grayscale transformation is shown in equation (1).

[0010]

[0011] in, Represents pixel coordinates, It is the slope of the transformation function. There is also piecewise linear transformation, which involves dividing the pixel into segments and applying different linear functions to different pixel intervals; these will not be discussed in detail here.

[0012] There are two main methods for nonlinear transformation: logarithmic transformation and gamma correction.

[0013] The logarithmic transformation of an image refers to replacing all pixel values ​​in the image with their logarithmic values. The formula for the logarithmic transformation is shown in equation (2).

[0014]

[0015] Where λ is an adjustment constant used to adjust the grayscale values ​​so that the transformed image meets the actual requirements; v+1 is the base. Logarithmic transformation is used for image enhancement because it can amplify dark pixels and compress bright pixels in the image. Image brightness increases with the increase of parameter v, and dark area enhancement is also faster.

[0016] The formula for gamma correction is shown in equation (3).

[0017]

[0018] Here, λ and γ are constants. ε is set to avoid the case where the base value is 0. When γ < 1, the grayscale values ​​of the image will be mapped to high-brightness areas. Conversely, when γ > 1, the grayscale values ​​of the image will be mapped to low-brightness areas.

[0019] (2) Histogram equalization method

[0020] Suppose an image's grayscale histogram covers almost all grayscale values, and the distribution of grayscale values ​​is approximately uniform except for a few significant values. In this case, the image has a wide grayscale dynamic range and high contrast, with relatively rich image details. Histogram equalization algorithms apply the cumulative distribution function (CDF) to adjust the output grayscale levels, making the image's grayscale distribution more uniform. Common histogram equalization algorithms include global histogram equalization, adaptive histogram equalization, and contrast-limited histogram equalization.

[0021] Global histogram equalization (reference 2: Kaur, Manpreet, Jasdeep Kaur, and JappreetKaur. 2013. “Survey of Contrast Enhancement Techniques Based on HistogramEqualization.” International Journal of Advanced Computer Science and Applications, November. doi:10.14569 / ijacsa.2011.020721.) is a basic histogram equalization algorithm that processes the entire image; while local histogram equalization equalizes only a portion of the histogram to enhance more image details. However, sometimes global histogram equalization fails to meet practical needs because it may cause the loss of details in areas where enhancement is not desired.

[0022] The basic idea of ​​adaptive histogram equalization (reference 3: Vijayalakshmi, D., Malaya Kumar Nath, and Om Prakash Acharya. 2020. “A Comprehensive Survey on Image ContrastEnhancement Techniques in Spatial Domain.” Sensing and Imaging, December. doi:10.1007 / s11220-020-00305-3.) is to divide the image into several sub-blocks and perform histogram equalization on each sub-block separately. Because the histogram density is high in nearly constant regions of the image, adaptive histogram equalization often over-amplifies the contrast of these regions. Therefore, noise may be amplified in these nearly constant regions.

[0023] Contrast-limited adaptive histogram equalization (reference 4: Zuiderveld, Karel. 1994. "Contrast Limited Adaptive Histogram Equalization." In Graphics Gems, 474–85. doi:10.1016 / b978-0-12-336156-1.50061-6.) is an adaptive histogram equalization method that reduces noise amplification by limiting contrast amplification. The three methods each have their own focus when processing images, and their variations are shown in Figure (2).

[0024] (3) Retinex-based methods

[0025] Retinex (Reference 4: Land, Edwin H., and John J. McCann. 1971. “Lightness and Retinex Theory.” Journal of the Optical Society of America, January 1. doi:10.1364 / josa.61.000001.) was proposed by Edwin H. Land and is a commonly used image enhancement method based on scientific experiments and analysis. The term “Retinex” is a combination of “retina” and “cortex.” The Retinex model is based on three assumptions:

[0026] 1. The real world is colorless; color is the result of the interaction between light and objects. For example, water appears colorless to the human eye, but a film of soapy water appears colored due to the interference of light on its surface.

[0027] 2. Each color region is composed of the three primary colors of red, green and blue at specific wavelengths.

[0028] 3. These three primary colors determine the color of each unit area.

[0029] The calculation formula for Retinex theory is shown in equation (4).

[0030]

[0031] Where I represents the image observed by the naked eye, R represents the original physical properties of the object, and L represents the illumination information. The Retinex theory is shown in Figure (3).

[0032] There are three main types of Retinex-based methods: single-stage Retinex algorithm, multi-stage Retinex algorithm, and multi-stage Retinex algorithm with color restoration.

[0033] For the single-stage Retinex algorithm (reference 5: Jobson, DJ, Z. Rahman, and GAWoodell. 1997. “Properties and Performance of a Center / Surround Retinex.” IEEE Transactions on Image Processing, March, 451–62. doi:10.1109 / 83.557356.), solving R is mainly viewed as a process of finding singular solutions. The convolution operation in the single-stage Retinex algorithm can be seen as the calculation of image illumination intensity. Its physical meaning can be considered as reducing image illumination by calculating the weighted average of pixels and surrounding regions.

[0034] The multi-stage Retinex algorithm (reference 5: Rahman, Z., DJ Jobson, and GAWoodell. 2002. “Multi-Scale Retinex for Color Image Enhancement.” In Proceedings of 3rd IEEE International Conference on Image Processing. doi:10.1109 / icip.1996.560995.) is based on the single-stage Retinex algorithm, which combines multiple convolutional kernels and calculates the average.

[0035] However, since both of the above methods ignore the color factor, they often produce color shifts and distortions. The multi-stage Retinex algorithm with color restoration (reference 6: Jobson, DJ, Z. Rahman, and G.A. Woodell. 1997. “A Multiscale Retinex for Bridging the Gap between ColorImages and the Human Observation of Scenes.” IEEE Transactions on Image Processing, July, 965–76. doi:10.1109 / 83.597272.) is designed to solve these problems.

[0036] The above are the main traditional image enhancement methods, which have made significant contributions to early image enhancement and restoration. However, with the development of technology, image restoration has increasingly higher requirements for handling factors such as color, illumination, noise, and artifacts. At the same time, due to the limitations of traditional methods that rely on manual parameter adjustment, they are currently mainly combined with deep learning to handle image restoration tasks.

[0037] 2. Machine learning-based methods

[0038] In recent years, with the successful application of deep learning methods in many computer vision tasks such as face recognition and object detection, many scholars have also widely applied them to the field of image enhancement. Deep learning-based image enhancement methods are data-driven approaches that allow models to automatically learn the features of images under normal lighting conditions, thereby reducing the impact of low lighting on images.

[0039] (1) End-to-end machine learning methods

[0040] End-to-end learning is a deep learning process in which all parameters are trained jointly, rather than stepwise. The most common architecture for deep learning-based image enhancement algorithms is the encoder-decoder architecture. LLNet (Reference 7: Lore, Kin Gwn, Adedotun Akintayo, and Soumik Sarkar. 2017. “LLNet: A Deep Autoencoder Approach to Natural Low-Light Image Enhancement.” PatternRecognition, January, 650–62. doi:10.1016 / j.patcog.2016.06.008.) was the first deep learning-based low-light image enhancement algorithm and achieved remarkable results. Inspired by multi-stage Retinex theory and CNN (Convolutional Neural Network), Shen et al. (Reference 8: Shen, Liang, Zihan Yue, Fan Feng, Quan Chen, Shihao Liu, and Jie Ma. 2017. “MSR-Net: Low-Light Image Enhancement Using Deep Convolutional Network.” Cornell University - arXiv, Cornell University - arXiv, November.) argued that MSR (Multi-Scale Retinex Network) is equivalent to a feedforward convolutional neural network with different Gaussian convolution kernels, and proposed an MSR network that directly learns the end-to-end mapping between dark and bright images.

[0041] (2) Decomposition-based machine learning methods

[0042] Inspired by the excellent models in Retinex theory, many image enhancement studies have combined the idea of ​​image decomposition with deep learning algorithms (such as convolutional neural networks CNN). Wei et al.'s Retinex-Net decomposes the input image into reflectance and illumination components using Decom-Net and adjusts the illumination using Enhance-Net, achieving end-to-end low-light enhancement. Zhang et al.'s KinD network (reference 9: Zhang, Yonghua, Jiawan Zhang, and Xiaojie Guo. 2019. “Kindling the Darkness: A Practical Low-Light Image Enhancer.” arXiv:Computer Vision and Pattern Recognition, arXiv: Computer Vision and Pattern Recognition, May.) consists of a layer decomposition network, a reflectance recovery network, and an illumination adjustment network, achieving synergistic optimization of reflectance recovery and illumination adjustment. Building upon this, they proposed KinD++, which further suppresses artifacts through a multi-scale illumination attention module. Wenjing et al. proposed GLADNet, which guides illumination enhancement through global illumination estimation while preserving image details by fusing features with the original input. Zhu et al. proposed RRDNet, which decomposes an image into three components: illumination, reflectivity, and noise. It achieves denoising and enhancement through joint optimization, thereby further improving the visual quality of low-light images.

[0043] (3) Synthesis-based machine learning methods

[0044] Image fusion is a technique that combines multiple images into a single image while preserving the relevant features of each image. Image fusion-based methods typically use images under different exposure conditions as input, or acquire multi-scale features through different feature extraction methods. Jianrui et al. compared and analyzed the advantages and disadvantages of SICE (Single-Image Contrast Enhancement) and MEF (Multi-Exposure Fusion) methods, and proposed a SICE enhancer based on a convolutional neural network to automatically enhance the contrast of images under different exposure conditions. Lu et al. proposed the dual-branch exposure fusion network TBEFN, which generates two different enhancement results and fuses them to obtain the final image. Zhu et al. proposed EEMEFN, which combines a multi-exposure fusion module with an edge enhancement module to enhance image details and structural information. In addition, Lv et al. proposed MBLLEN (reference 10: Lv, Feifan, F. Lu, JianWu, and Chongsoon Lim. 2018. “MBLLEN: Low-Light Image / Video Enhancement UsingCNNs.” British Machine Vision Conference, January.), which achieves multi-branch feature fusion through the synergistic effect of feature extraction, enhancement and fusion modules, and has achieved good results in suppressing noise and artifacts in low-light regions.

[0045] (4) Unpaired machine learning methods

[0046] Because obtaining paired images of the same scene under both low-light and normal-light conditions is difficult, and models trained on paired data are prone to overfitting, some studies have begun to explore image enhancement methods that do not require paired data. Zhang et al. combined information entropy theory with the Retinex model to propose a self-supervised enhancement method that can be trained using only low-light images. Jiang et al.'s EnlightenGAN model is based on GAN (Generative Adversarial Network), uses an attention-guided U-Net as the generator, and utilizes dual discriminators to constrain global and local information, thereby achieving unsupervised low-light image enhancement. Furthermore, Guo et al. proposed the Zero-DCE method, which models low-light enhancement as an image-specific curve estimation problem and achieves zero-reference training by designing various no-reference loss functions. They also proposed a lightweight version, Zero-DCE++ (Reference 11: Li, Chongyi, Chunle Guo, and ChangeLoy Chen. 2021. “Learning to Enhance Low-Light Image via Zero-Reference DeepCurve Estimation.” IEEE Transactions on Pattern Analysis and Machine Intelligence, January, 1–1. doi:10.1109 / tpami.2021.3063604.). Zhang et al. further proposed ExCNet, which achieves image brightness restoration by estimating an “S-curve” suitable for backlit images.

[0047] In summary, most current image exposure restoration techniques primarily focus on enhancing low-light performance to address common low-light problems, while research on simultaneous overexposure and underexposure is relatively limited. This method addresses this issue, not only handling common low-light enhancement needs but also demonstrating excellent performance under complex lighting conditions. Summary of the Invention

[0048] This method aims to overcome difficulties such as overcorrection and noise introduction in images with complex exposures. It improves the image exposure correction effect by proposing a dynamic convolution and dual-branch network model guided by the image grayscale, thus providing a research foundation for image inpainting and other computer vision tasks.

[0049] This method primarily studies image exposure correction from the perspective of image inversion. For example... Figure 4As shown, considering both the original and its inverse image, a major factor affecting Retinex theory's ability to solve the problem of simultaneous underexposure and overexposure is that underexposed and overexposed regions in an image largely do not overlap. Using only a single convolutional kernel to estimate the image's illumination makes it difficult to simultaneously extract feature information from both underexposure and overexposure areas. Furthermore, the problems associated with underexposed and overexposed regions are not entirely the same. For underexposed regions, the focus is more on addressing noise and artifacts introduced during illumination recovery; for overexposed regions, factors such as color shift need to be considered.

[0050] Based on the above considerations, this method proposes a simple yet effective bi-branch feature extraction approach to extract feature information from the front and back images in high-dimensional space. Building upon this, a dynamic convolution guided by grayscale images is proposed for feature information fusion to generate an initial illumination map. This initial restored image is then obtained by combining Retinex theory. For issues such as noise, artifacts, and color shifts in the image, the extracted high-dimensional information from the front and back images is used as a guide. A UNet-based network model with a channel attention mechanism is employed to process and obtain perturbation terms. Finally, the perturbation terms and the initial restored image are combined to obtain the final repaired image.

[0051] like Figure 5 As shown, our method is compared with Retinexformer, a model that performs exceptionally well on multiple low-light enhancement datasets. The comparison reveals that both our method and Retinexformer perform reasonably well in repairing underexposed areas. However, when repairing overexposed areas, Retinexformer introduces significant noise in the sky region, severely impacting the visual experience, while our method avoids this issue.

[0052] Next, the main contents of the present invention will be described in detail, specifically including the following steps:

[0053] Step 1: Design of Region-Distributed Dynamic Convolution

[0054] The high-dimensional feature information extracted from the original and reverse images includes illumination, color, and texture information contained in underexposed and overexposed regions. This information plays a crucial guiding role in identifying perturbation terms and obtaining preliminary illumination maps during image exposure restoration. Therefore, this method first proposes a dynamic region convolution feature fusion method guided by exposure probability to achieve adaptive convolution processing for different exposure regions during the dual-branch image feature fusion process.

[0055] To effectively fuse the high-dimensional features extracted from the two branches, this method designs a dynamic region convolution module. The core idea of ​​this module is to adaptively select different convolution kernels for feature calculation based on the exposure state probability of each pixel in the image, thereby achieving differentiated processing for different exposure regions.

[0056] Specifically, this method first uses the grayscale information of the input image to generate an illumination probability guide map. This probability map contains three channels, representing the probability distribution of each pixel belonging to three states: underexposure, normal exposure, and overexposure. Each pixel value in the probability map represents the confidence level of that pixel belonging to a certain exposure state, thus providing a pixel-level region division basis for subsequent dynamic convolution.

[0057] In the dynamic convolution module, a kernel generation network first generates multiple candidate convolution kernels from the input features. Specifically, the input features first undergo adaptive average pooling to unify their spatial dimensions to the size of the convolution kernels. Then, two layers of pointwise convolutional networks generate multiple convolution kernel parameters. These kernels are organized into convolution operators corresponding to multiple regions, with each region corresponding to a specific set of convolution kernels, thus forming a region-specific convolution kernel set. In this invention, the dynamic convolution module can generate multiple different convolution kernels, each responsible for processing a potential exposure region feature.

[0058] Next, convolution calculations are performed between the input features and all candidate convolution kernels through relevant operations. This method employs an efficient implementation based on grouped convolution, transforming the convolution operation of batch samples into a single parallel convolution calculation, thereby significantly improving computational efficiency. Through this step, output feature maps corresponding to multiple candidate convolution kernels can be obtained.

[0059] Subsequently, this method designs a region selection mechanism to determine which convolutional kernel output should be used for each pixel location based on the exposure probability map. Specifically, firstly, the maximum value of the exposure probability map in the channel dimension is selected to determine the exposure category most likely to belong to each pixel. Then, a one-hot region mask is constructed using a scatter operation. This mask corresponds one-to-one with the candidate convolutional outputs in spatial location. The convolutional output of the corresponding region is preserved by element-wise multiplication, and a summation operation is performed in the region dimension to obtain the final dynamic convolutional output result.

[0060] To ensure end-to-end training of the dynamic convolution module, this method further designs a corresponding backpropagation gradient calculation method. During forward propagation, the output features are obtained by weighted summation of the candidate convolution outputs and the region mask. Therefore, during backpropagation, the gradient of the convolution kernel parameters can be directly propagated through the region mask; that is, the gradient is equal to the element-wise product of the output gradient and the region mask. This approach ensures that different region convolution kernels only receive gradient information from pixels in their corresponding regions, thereby achieving region-specific parameter updates.

[0061] For exposure probability guiding maps, since discrete region selection is generated through the argmax operation during the forward propagation, its gradient cannot be directly propagated through this operation. To address this issue, this method employs a softmax approximation gradient propagation strategy. Specifically, during backpropagation, the gradient contribution of region selection to the guiding features is first calculated based on the convolutional kernel output. Then, the gradient is redistributed using the softmax function to satisfy probability distribution constraints. This method can effectively update the parameters of the guiding feature network while maintaining the discreteness of region selection, thereby ensuring the trainability of the entire dynamic convolution module.

[0062] Step 2: Design of Combined Loss Function Strategy

[0063] In image enhancement and restoration tasks involving complex exposures, a single loss function often struggles to simultaneously achieve pixel-level accuracy, structural consistency, and preservation of high-level semantic information. Relying solely on traditional pixel reconstruction losses, such as L1 or L2 losses, while ensuring a similar overall brightness distribution, often falls short in texture detail restoration and structural preservation. Conversely, using only perceptual losses can lead to overall image brightness shifts or color distortion. Therefore, to simultaneously guarantee pixel accuracy, structural consistency, and preservation of high-level semantic information during image restoration, this method designs a combined loss function. By weightedly combining multiple loss functions, it achieves multidimensional constraints on the network training process.

[0064] Specifically, this method employs three loss functions—L1 reconstruction loss (L1 Loss), cosine similarity loss (Lcos), and perceptual loss (VGG Perceptual Loss, Lvgg)—for joint optimization. The L1 loss primarily constrains the pixel-level differences between the reconstructed image and the true reference image. Compared to L2 loss, L1 loss is less sensitive to outliers, effectively reducing over-smoothing during image reconstruction and thus better preserving image detail. The L1 loss is shown in equation (5).

[0065]

[0066] in, This represents the value of the i-th pixel in the recovered image. Let represent the value of the i-th pixel in the real image, and N represent the total number of pixels in the image.

[0067] During training, by calculating the absolute error per pixel between the recovered image and the real image, it can be ensured that the network output is consistent with the target image in terms of overall brightness and color distribution.

[0068] Secondly, this method introduces cosine similarity loss Lcos to constrain the consistency between the restored image and the real image in the feature direction. Cosine similarity mainly measures the directional similarity between two vectors, without being affected by the vector magnitude. The formula for calculating the cosine loss function is shown in equation (6).

[0069]

[0070] in, This represents the feature vector of the recovered image. The feature vector representing a real image, express Norm.

[0071] In image restoration tasks, mapping image features to high-dimensional vectors and calculating cosine similarity can effectively constrain the consistency of the restored image in terms of structural and texture representation. Compared to simple pixel error constraints, this loss focuses more on image structural information, enabling the network to better preserve object edges and texture details during the restoration process.

[0072] Furthermore, to further improve the visual quality of the images, this method introduces a deep feature-based perceptual loss, Lvgg. This loss extracts feature representations of the image at different levels through a pre-trained VGG network and calculates the difference between the reconstructed image and the real image in the feature space. Since deep convolutional networks can extract more abstract and semantic image features, the perceptual loss can effectively constrain the consistency of the reconstructed image in high-level semantic structure, thereby improving the overall visual realism and naturalness of the image. It should be noted that during training, the parameters of the VGG network remain fixed and are used only as a feature extractor, without participating in network parameter updates.

[0073] Combining the three loss functions mentioned above, this method constructs a combined loss function as shown in equation (7):

[0074]

[0075] in, , and These represent the weight coefficients of different loss terms, used to adjust the contribution ratio of each loss to the overall optimization objective. By setting the weight parameters appropriately, a good balance can be achieved between pixel accuracy, structural consistency, and perceptual quality.

[0076] Compared to traditional single loss functions, the combined loss function proposed in this method constrains the image restoration process from three different levels: pixel level, structural level, and semantic level. This enables the network to not only accurately restore the brightness and color information of the image but also maintain the structural details and visual realism of the image. Experimental results show that in complex exposure image restoration tasks, this combined loss function can effectively improve the reconstruction quality and visual effect of the model, thereby further improving the overall performance of the image enhancement system.

[0077] Step 3: Design of Image Exposure Restoration Network Based on Retinex Theory and Attention Mechanism

[0078] Building upon the first two steps of dynamic convolution design and combined loss function design, the third step involves designing a specific image exposure restoration network model. This consists of the following three aspects:

[0079] 3.1 Network Structure Design

[0080] The overall network structure diagram of this method is shown below. Figure 1 As shown, the main purpose of the network is to extract content and information from different regions of an image that is both underexposed and overexposed. The network has a two-branch structure that extracts high-dimensional feature information from different exposure areas of the image. The first branch focuses on the feature information of underexposed areas, while the second branch focuses on the feature information of overexposed areas.

[0081] First, the network selects a batch of input images. Then, it uses a simple inversion operation to obtain the inverse image corresponding to the original image, and inputs the original image and the inverse image into two branches respectively. High-dimensional information of the image is extracted through two layers of convolution. The high-dimensional information has two main functions: (1) it serves as guidance information for subsequent channel attention mechanisms through connection operations to obtain perturbation terms in the image, such as noise, artifacts, color shifts, etc.; (2) it obtains the initial illumination map by fusing high-dimensional information through dynamic convolution, and obtains the initial restored image based on Retinex theory.

[0082] Next, the high-dimensional information is guided by the three-channel exposure probability map obtained from the grayscale image and input into the dynamic convolution. The dynamic convolution selects different convolution kernels according to the exposure probability information of each pixel to fuse the high-dimensional information, and obtains the initial illumination map through downsampling. Based on Retinex theory, the initial restored image is obtained.

[0083] Then, the initial restored image is input into a UNet-based network, which primarily uses channel attention, employing the input information as the Query and Key, and combining the input information with high-dimensional feature information to form the Value, which is then used for positional encoding. The network's final output yields the perturbation term in the restored image.

[0084] Finally, the initial restored image is subtracted from the perturbation term to obtain the final restored image.

[0085] 3.2 Network Training Strategy

[0086] This experiment primarily utilizes paired supervised learning data. To improve the model's generalization ability and training efficiency, the original images are randomly cropped during training, with local patches used as input samples. Geometric data augmentation strategies, including random flipping and rotation, are also employed to increase data diversity. Furthermore, a Mixup data augmentation strategy is introduced during training, generating new training samples through linear mixing of different samples. This effectively alleviates overfitting and enhances the model's robustness.

[0087] In terms of optimization strategies, the model uses the Adam optimizer for parameter updates and enables gradient clipping during training to prevent gradient explosion and improve training stability. The learning rate scheduling strategy employs Cosine Annealing Restart Cyclic Learning Rate, dividing the training process into two phases: the first iteration maintains a high learning rate to accelerate model convergence, and then in the remaining iterations, the learning rate is gradually decayed from a high value using cosine annealing to obtain more stable and refined model parameters.

[0088] Regarding the loss function, the model uses Combined Loss as a supervision signal, constraining the network output by integrating pixel-level errors and structural information errors, thus achieving better visual quality while preserving detailed information. Throughout the training process, the PSNR metric is calculated at intervals of a certain number of training steps to evaluate model performance. Simultaneously, tools such as TensorBoard are used to record changes in loss and performance metrics during training to observe the model's convergence and training stability.

[0089] In summary, the main contribution of this method is the design of an image exposure correction neural network model based on Retinex theory, a classic topic in the field of computer vision. Starting with the analysis of images with exposure problems, this method comprehensively considers the distribution characteristics of underexposed and overexposed areas, as well as noise and color shifts easily introduced during the restoration process. It treats underexposed and overexposed areas separately, employing inverted images and dual-branch extraction methods. This approach overcomes, to some extent, the influence of different focuses in restoring underexposed and overexposed areas, demonstrating a degree of innovation.

[0090] Experimental results demonstrate that the proposed image exposure correction method outperforms most existing methods in terms of peak signal-to-noise ratio and structural similarity, and is more applicable in practical scenarios. Attached Figure Description

[0091] Figure 1 This is the overall network structure diagram proposed by this method, which is described in detail in step three.

[0092] Figure 2 This is a diagram showing the effect of histogram equalization.

[0093] Figure 3 This is a schematic diagram of the Retinex theory.

[0094] Figure 4 This is a diagram illustrating the inversion of an image.

[0095] Figure 5 This is a comparison chart of the exposure correction effects of the network model and the cutting-edge model using this method. Detailed Implementation

[0096] The following will refer to the accompanying drawings. 1 The technical solution, experimental method and test results of this method are further described in detail with specific experimental implementation methods.

[0097] (1) Image dataset and evaluation metrics

[0098] The following section introduces the test datasets used in the experiments. This study experimentally validated the proposed method on several publicly available low-light image enhancement datasets, including LCDP, MSEC, LOLv1, LOLv2, and SMID datasets. The LCDP and MSEC datasets were primarily used for experiments on complex exposure recovery tasks, while the LOLv1, LOLv2, and SMID datasets were used to further verify the model's generalization ability under different low-light environments.

[0099] The LCDP dataset is a paired dataset designed for complex exposure image restoration tasks. It contains low-quality images acquired under real-world complex lighting conditions and their corresponding reference images. The images in the dataset typically exhibit underexposure, overexposure, and uneven local illumination, thus effectively reflecting the actual image degradation under complex lighting conditions. The LCDP dataset includes 1415 images in the training set, 100 images in the validation set, and 218 images in the test set.

[0100] The MSEC dataset is a multi-scene exposure correction dataset containing images with exposure problems from various shooting environments, such as low-light indoor scenes, nighttime scenes, and localized bright light. This dataset exhibits good scene diversity, and images often show issues such as overall insufficient brightness or localized overbrightness, making it frequently used to evaluate the correction capabilities of algorithms in complex exposure environments.

[0101] The LOLv1 dataset is one of the classic datasets in the field of low-light image enhancement. The LOLv2 dataset expands upon LOLv1, including both real-world and synthetic sub-datasets, and features a wider variety of scene types and lighting variations. The SMID dataset is primarily used for image enhancement research in extremely low-light environments. It uses short-exposure and long-exposure images to create paired datasets to evaluate the algorithm's ability to recover details in extremely dark environments.

[0102] Experiments on the aforementioned datasets allow for a comprehensive evaluation of the proposed method's image enhancement performance under complex exposure and low-light conditions.

[0103] In the experiments, this invention primarily selected PSNR (Peak Signal-to-Noise Ratio) and SSIM (Structural Similarity Index Measure) as evaluation metrics. PSNR is an objective evaluation metric based on pixel error, used to measure the pixel-level difference between the reconstructed image and the original image. A higher PSNR value indicates a smaller error between the reconstructed image and the reference image, and thus higher image quality. This metric can intuitively reflect the overall reconstruction accuracy of the image, and is therefore widely used in image restoration and image enhancement tasks.

[0104] SSIM is a structural information-based image quality assessment metric that evaluates image quality by comparing the similarity of two images in terms of brightness, contrast, and structure. SSIM values ​​typically range from 0 to 1, with values ​​closer to 1 indicating greater structural similarity. Compared to PSNR, SSIM better aligns with human visual perception of image quality, thus better reflecting the preservation of structure and overall visual quality of an image.

[0105] (2) Experimental details and main parameter configuration

[0106] Taking the LCDP dataset as an example, the dataset uses paired images, where the input is a complex exposure-degraded image and the target is the corresponding high-quality reference image. The training, validation, and test sets are divided from the dataset. During training, ground truth (gt) and input are used as supervision data pairs, while valid-gt / valid-input and test-gt / test-input are used for model evaluation during validation and testing, respectively. To improve the model's generalization ability, geometric data augmentation, including random flipping, is performed on the images during training. Simultaneously, random cropping is used to generate training samples, cropping the input image into 384×384 image patches for training, thereby improving the model's ability to learn local details while ensuring training efficiency.

[0107] In terms of model structure, this experiment uses 3 input and 3 output channels, 40 feature dimensions, 1 network stage, and [1, 2, 2] attention modules. For optimization, the model uses the Adam optimizer for parameter updates, with an initial learning rate of 2*10⁻⁶. -4 The momentum parameters were set to β1=0.9 and β2=0.999. During training, the Mixup data augmentation strategy (mixup_beta=1.2) was introduced, and gradient clipping was enabled to improve training stability.

[0108] In terms of learning rate scheduling, the Cosine Annealing Restart CyclicLR learning rate decay strategy is adopted, dividing the training process into two stages: a high learning rate is maintained for the first 38,000 iterations to accelerate model convergence, and then the learning rate is gradually reduced through a cosine annealing strategy in the subsequent 82,000 iterations, with the minimum learning rate set to 1*10. -6 The entire model training process consisted of 120,000 iterations with a batch size of 4. During the validation phase, the model's performance was evaluated every 1,000 iterations, and the PSNR metric was used to quantitatively assess the enhancement results. Simultaneously, TensorBoard was used to record the loss changes and model performance during training to observe the model's training status and convergence.

[0109] (3) Exposure of the results of the corrective network experiment

[0110] Based on the evaluation metrics and experimental details described above, this method was tested on five datasets, and the corresponding experimental results were obtained. As shown in Table 2, this experiment compared this method with other state-of-the-art network models. It can be seen that the network model proposed in this method exhibits superior performance.

[0111] Table 2 presents the quantitative comparison results of various low-light image enhancement methods on the LCDP dataset, with PSNR and SSIM as the evaluation metrics.

[0112] Table 2. Comparison results of our method with other methods on the LCDP dataset.

[0113]

[0114] As the data shows, our proposed method, with a PSNR of 24.29 and an SSIM of 0.8726, comprehensively outperforms all other comparative algorithms, significantly surpassing the Exposure-slot model published at CVPR 2025, demonstrating its superior ability to recover image details and maintain structural consistency. Traditional methods such as Zero-DCE performed the weakest, while deep learning-based methods were generally superior. While MECR-LLE and LCPPNet showed some competitiveness, they did not surpass our proposed method. The advantage of our method stems from its unique network architecture and loss function design, which more effectively balances brightness enhancement and noise suppression. These results fully validate the advancement of our method on the LCDP benchmark, providing solid data support for correcting image exposure under real-world complex lighting conditions.

[0115] Table 3 Comparison results of our method with other methods on multiple low-light datasets.

[0116] To further demonstrate the effectiveness of the proposed network structure and training strategy, ablation experiments were designed based on the LCDP dataset. The specific ablation experiment results are shown in Table 4.

[0117] Table 4 Comparison of ablation test results

[0118]

[0119] Table 4 Comparison of results from continued ablation experiments

[0120]

[0121] In the ablation experiments, the training, validation, and test sets were each approximately one-third subsets drawn from the full dataset. These ablation experiments demonstrate that the proposed dual-branch network structure, dynamic convolution, and combined loss function are effective in handling the task of correcting complex exposure images.

[0122] In summary, this paper proposes a two-branch feature fusion image exposure restoration method based on Retienx theory. By focusing on different exposure regions through a two-branch feature extraction network model, more effective information is extracted in the high-dimensional space. The proposed network achieves PSNR and SSIM of 24.29 and 0.8726, respectively, on the LCDP dataset for complex exposure images, outperforming state-of-the-art methods. In particular, this method also demonstrates excellent performance on low-light datasets such as LOLv1, LOLv2, and SMID, comparable to leading low-light enhancement models.

Claims

1. A method for image exposure restoration based on Retienx theory using dual-branch feature fusion, characterized in that: A two-branch feature extraction network model is used to extract high-dimensional features from underexposed and overexposed areas respectively. Based on the image's exposure probability distribution, different convolutions are generated to fuse and downsample the high-dimensional information to obtain an illumination map. An initial restoration map containing a perturbation term is obtained based on Retinex theory. Then, based on a channel attention mechanism, the perturbation term is separated from the initial restoration map by combining the high-dimensional information extracted through the two branches. Finally, the initial restoration image is subtracted from the perturbation term to obtain the final restoration map. The implementation steps are as follows: S1, Regionally Distributed Dynamic Convolution Design The high-dimensional features extracted from the original image and the inverse image contain illuminance, color, and texture information in the underexposed and overexposed areas, which play an important role in estimating the perturbation term and generating the preliminary illuminance map during the exposure restoration process; the calculation method of the inverse image is shown in Equation (1): Based on this, a dynamic region convolution feature fusion method guided by exposure probability is proposed to achieve adaptive processing of different exposure regions during the dual-branch feature fusion process. First, an exposure probability guide map is generated based on the grayscale information of the input image. This probability map contains three channels: underexposure, normal exposure, and overexposure. Each pixel value represents the probability of it belonging to the corresponding exposure state, thus providing a pixel-level region division basis for subsequent dynamic convolution. In the dynamic region convolution module, a convolution kernel generation network generates multiple candidate convolution kernels from the input features. Specifically, the input features are first uniformly sized through adaptive average pooling, and then multiple convolution kernel parameters are generated through two layers of pointwise convolution. Different convolution kernels correspond to different potential exposure regions. Subsequently, the convolution result between the input features and all candidate convolution kernels is calculated using a group convolution method, thereby obtaining multiple candidate feature maps. To determine the appropriate convolution result for each pixel, this method employs a region selection mechanism. By selecting the maximum channel value from the exposure probability map, the exposure category of a pixel is determined, and a corresponding region mask is constructed. This mask is then multiplied element-wise with the candidate convolution outputs and summed along the region dimension to obtain the final dynamic convolution output. S2. Combination Loss Function Strategy Design This method employs three loss functions—L1 reconstruction loss (L1 Loss), cosine similarity loss (Lcos), and perceptual loss (VGG Perceptual Loss, Lvgg)—for joint optimization. L1 loss primarily constrains the pixel-level differences between the restored image and the real reference image. Compared to L2 loss, L1 loss is less sensitive to outliers, effectively reducing over-smoothing during image restoration and thus better preserving image details. L1 Loss is shown in equation (2). in, This represents the value of the i-th pixel in the recovered image. Let represent the value of the i-th pixel in the real image, and N represent the total number of pixels in the image; During training, by calculating the absolute error per pixel between the recovered image and the real image, it can be ensured that the network output is consistent with the target image in terms of overall brightness and color distribution. Secondly, this method introduces cosine similarity loss Lcos to constrain the consistency between the restored image and the real image in the feature direction; cosine similarity mainly measures the directional similarity between two vectors, and is not affected by the vector magnitude. The formula for calculating the cosine loss function is shown in equation (3): in, This represents the feature vector of the recovered image. The feature vector representing a real image, express Norm; In image restoration tasks, by mapping image features to high-dimensional vectors and calculating cosine similarity, the consistency of the restored image in terms of structure and texture representation can be effectively constrained. Compared with simple pixel error constraints, this loss can pay more attention to image structural information, enabling the network to better preserve object edges and texture details during the restoration process. Furthermore, to further improve the visual quality of the images, this method also introduces a deep feature-based perceptual loss, Lvgg. This loss extracts feature representations of the image at different levels through a pre-trained VGG network and calculates the difference between the reconstructed image and the real image in the feature space. Since deep convolutional networks can extract more abstract and semantic image features, the perceptual loss can effectively constrain the consistency of the reconstructed image in high-level semantic structure, thereby improving the overall visual realism and naturalness of the image. It should be noted that during training, the parameters of the VGG network remain fixed and are only used as a feature extractor, without participating in network parameter updates. Combining the above three loss functions, this method constructs a combined loss function as shown in equation (4): in, =0.1、 =0.5 and A value of 0.05 strikes a good balance between pixel accuracy, structural consistency, and perceptual quality, and the model performs best on all datasets when set to this value. S3. Design of Image Exposure Restoration Network Based on Retinex Theory and Attention Mechanism The front and back image feature extraction network is a two-branch network structure that can simultaneously map the feature information of the front and back images of an image to a high-dimensional space through two layers of convolutional upsampling. Each feature space corresponds to a network branch. This network design makes the first branch focus on underexposed areas, while the second branch focuses more on overexposed areas. The combination of the two branches makes the network model highly adaptable to various exposure conditions. First, the network selects a batch of N images with exposure problems, each with a size of H * W * C. Then, it normalizes the images and performs a simple subtraction operation to obtain the inverse image. Through the transformation of formula (1), the dark areas in the original image where details are difficult to extract will become more obvious in the reverse image, while the overly bright or even nearly saturated areas in the original image will become darker in the reverse image, thus restoring some structural information. Secondly, in images that are both underexposed and overexposed, a single brightness space often cannot recover both types of regions at the same time. For example, details in underexposed areas are compressed in the low brightness range and are difficult to recover directly; in overexposed areas, pixels are close to saturation and information is easily lost. By introducing a reverse image, the network can re-observe image features under another brightness distribution, making structures that were originally invisible or difficult to distinguish easier to learn in the reverse image, thereby helping the model to better recover detailed information. The original image and the reverse image are used as inputs to the dual-branch network. Two convolutional layers are used in the two branches, and the output channels of the dual-branch network are 24. The reverse image is used mainly to distinguish different exposure areas in the same image. After obtaining 48 sets of features, the network will adaptively select a dynamic convolution module to fuse feature information by using the exposure probability distribution of each pixel obtained from the grayscale image of the original image.