Night anti-glare image enhancement method based on progressive pixel difference convolution
By constructing an encoder and decoder using a progressive pixel-differential convolution method, and utilizing local differential convolution modes and a global attention layer, the problem of distinguishing lens flare and texture in nighttime image enhancement is solved, achieving high-quality restoration of nighttime images.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- NANJING UNIV OF SCI & TECH
- Filing Date
- 2026-03-04
- Publication Date
- 2026-06-23
Smart Images

Figure CN122265061A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of computer vision and image processing technology, specifically to a nighttime strong light-resistant image enhancement method based on progressive pixel difference convolution. Background Technology
[0002] In nighttime vision applications such as autonomous driving, security monitoring, and drone inspection, lens flare and glare caused by strong artificial light sources (such as vehicle headlights, streetlights, and billboards) are key bottlenecks affecting image quality. These optical artifacts not only severely degrade the visual appeal of images but also obscure the texture details of critical targets—such as license plate characters and the outlines of road signs. Therefore, developing effective nighttime anti-glare image enhancement technologies is of great practical significance for improving the reliability and safety of nighttime vision systems.
[0003] Current image enhancement methods for dealing with strong light interference at night mainly follow two major technical approaches: traditional enhancement methods based on convolutional neural networks (CNNs) and global restoration methods based on Transformers. However, both of these methods have significant limitations when dealing with lens flare at night.
[0004] Early research often employed CNN architectures for nighttime image enhancement, such as dehazing networks and low-light enhancement networks. These methods typically improve overall image brightness and contrast to some extent by learning an end-to-end mapping from degraded to clear images. However, CNN methods face the following core problems when dealing with strong light interference: Standard convolutional operations focus on the weighted sum of pixel intensities in local regions, making it difficult to effectively distinguish between high-frequency object edges and low-frequency lens flare halos. When strong light sources occupy a large area of the image, the convolutional kernel tends to smooth them, resulting in noticeable texture blurring and loss of detail in the deglare-free image. Lens flares in nighttime scenes often have a wide-ranging radial distribution, and the local inductive bias of CNNs cannot fully capture this global optical degradation pattern. Traditional CNNs randomly distribute weights during initialization, without explicitly encoding the image's gradient prior. The key difference between lens flares and real object edges lies in their gradient distribution patterns—lens flares typically exhibit smooth gradients, while object edges have sharp gradient changes. CNNs lack the ability to explicitly model this difference.
[0005] In recent years, the Vision Transformer (ViT) and its variants have shown great potential in image restoration, particularly in global context modeling. However, the Transformer architecture also faces significant challenges in the task of nighttime flare removal. The standard multi-head self-attention mechanism (MSA) has an O(n) time complexity. The computational complexity of lens flare removal is prohibitive for high-resolution nighttime images, limiting the practical deployment of the model. Self-attention mechanisms, essentially content-related weighted averaging, tend to over-smooth high-frequency details, resulting in images lacking sharp edges and clear textures after lens flare removal. Transformer models learn all representations from the data without explicitly incorporating prior knowledge of optical imaging, such as diffraction and reflection properties. This necessitates a large amount of training data to learn the complex physical patterns of lens flares, leading to low training efficiency. Notably, pixel difference convolution (PDC) has demonstrated unique advantages in edge detection and other fields in recent years. Unlike ordinary convolution, PDC calculates the differences between neighboring pixels and then performs a weighted sum of these differences. This operation is naturally sensitive to image gradients and effectively enhances edge and texture information. However, existing pixel difference methods are mainly applied to tasks such as edge detection and face liveness detection, and have not yet been systematically applied to the more challenging area of nighttime image flare removal.
[0006] Therefore, a nighttime strong light-resistant image enhancement method based on progressive pixel difference convolution is needed to solve the above problems. Summary of the Invention
[0007] The purpose of this invention is to provide a nighttime image enhancement method based on progressive pixel difference convolution, which effectively solves the problems of lens flare occlusion and texture loss caused by strong light sources in nighttime images. By configuring specific pixel difference convolution modes at different stages, it effectively distinguishes lens flare artifacts from real object textures.
[0008] The technical solution for implementing this invention is: a nighttime strong light-resistant image enhancement method based on progressive pixel difference convolution, comprising the following steps:
[0009] Step 1: Select the BDD100K dataset as the clean nighttime background image and the Flare7KPP dataset, which contains lens flare images and their corresponding light source information images, as the strong light interference source. Through randomization, the lens flare images and their corresponding light source information images in the Flare7KPP dataset are superimposed onto the clean nighttime background image of BDD100K to form training image pairs, thus constructing a training dataset for nighttime strong light interference images. Proceed to Step 2.
[0010] Step 2: Construct the encoder, which consists of four layers of pixel difference attention blocks configured with local difference convolution modes and a downsampling module. Its function is to extract image features from the training dataset of nighttime strong light interference images from shallow to deep, retain the original spatial features completely through depth-separable convolution, and use center difference, angular difference and radial difference modes to capture fine textures, extract geometric edges and obtain a large receptive field to capture radial lens flares, output deep feature maps, and proceed to step 3.
[0011] Step 3: Construct the neck layer, which consists of a single channel pixel difference attention block located between the encoder and decoder. Its function is to receive the deep feature map output by the encoder, aggregate global semantic information using the global attention layer, and output the deepest and most abstract feature map to ensure the overall color restoration of the image and the consistency of the background after lens flare removal. Proceed to Step 4.
[0012] Step 4: Construct the decoder. The decoder consists of four layers of pixel difference attention blocks configured with local difference convolution modes, combined with an upsampling module. After upsampling the deepest and most abstract feature maps output from the neck layer, they are concatenated along the channel dimension with the deep abstract feature map output from the fourth layer pixel difference attention block of the encoder to obtain the first concatenated feature map, which is used as the input to the fourth layer pixel difference attention block in the decoder. The feature maps input to the pixel difference attention blocks in the decoder are obtained by concatenating the upsampled feature maps with the feature maps of the corresponding encoder stages along the channel dimension. Through feature extraction and fusion from deep to shallow, the preservation of effective information is ensured. Finally, a clean nighttime restored image containing only light sources is reconstructed, which suppresses lens flare and enhances texture details. Proceed to Step 5.
[0013] Step 5: Optimize the progressive pixel difference Transformer network using a hybrid loss function. Update the network parameters using feature maps such as deep feature maps, the deepest and most abstract feature maps, and concatenated feature maps to finally obtain the trained progressive pixel difference Transformer network model, and then proceed to step 6.
[0014] Step 6: Through randomization, add lens flares to the clean night images and perform Gaussian blur processing to form night images with lens flare damage as input, construct a test dataset of night images with lens flare damage, and proceed to step 7.
[0015] Step 7: Input the test dataset into the trained progressive pixel difference Transformer network model, and output the clean nighttime image prediction result containing only the light source, which suppresses lens flare and enhances texture details for each sample in the test dataset.
[0016] Compared with the prior art, the significant advantages of this invention are:
[0017] (1) Current nighttime image enhancement methods often struggle to effectively distinguish between high-frequency object textures and low-frequency lens flare halos when dealing with strong light interference, leading to excessive smoothing of background textures or loss of detail when removing lens flares. This invention proposes a progressive pixel difference Transformer network. By designing a progressive pixel difference feedforward network, it progressively configures depth-separable, center-difference, angular-difference, and radial-difference convolution modes at different stages, constructing a processing closed loop of "light preservation - texture enhancement - structure correction - shadow suppression". This effectively distinguishes between lens flare artifacts and real object textures, significantly solving the problem of high-frequency texture loss under strong light interference.
[0018] (2) In view of the problem that existing synthetic data is difficult to simulate the degradation of real nighttime imaging and the resulting poor generalization ability of the model, this invention constructs a training dataset of nighttime strong light interference images and its generation strategy.
[0019] (3) Current methods often result in local color differences or background inconsistencies after removing large-area lens flares. This invention designs a neck layer between the encoder and decoder. By introducing channel pixel difference attention blocks and adding a global attention layer within them, global semantic information can be aggregated while maintaining the original weights without differential transformation. This design effectively recalibrates the feature channels, ensuring the correct restoration of overall color and hue, as well as background consistency, after removing lens flares.
[0020] (4) This invention constructs a symmetrical encoder-decoder architecture. Unlike simple upsampling restoration, this invention concatenates the upsampled features with the deep features of the corresponding stage of the encoder along the channel dimension during the decoding stage, and then uses progressive pixel difference attention blocks to perform feature fusion and reconstruction from deep to shallow. This design utilizes the gradient sensitivity of pixel differences to ensure the complete preservation of effective spatial information, thereby more accurately identifying and stripping glare layers with smoothing characteristics, and finally outputting a clean nighttime restored image containing only the light source that suppresses lens flare and enhances texture details. Attached Figure Description
[0021] Figure 1 This is a flowchart of a nighttime strong light-resistant image enhancement method based on progressive pixel difference convolution.
[0022] Figure 2 This is a flowchart illustrating the construction and training process of the Progressive Pixel Difference Transformer network for a nighttime strong light enhancement method based on progressive pixel difference convolution.
[0023] Figure 3This is a framework diagram of the Progressive Pixel Difference Transformer network for a nighttime strong light-resistant image enhancement method based on progressive pixel difference convolution, which includes pixel difference attention blocks, channel pixel difference attention blocks, upsampling and downsampling.
[0024] Figure 4 This is a network architecture diagram of a nighttime strong light-resistant image enhancement method based on progressive pixel difference convolution, consisting of pixel difference attention blocks, channel pixel difference attention blocks, window self-attention layers, sliding window self-attention layers, and progressive pixel difference feedforward networks.
[0025] Figure 5 This is a network architecture diagram of the local differential convolution mode of the nighttime anti-glare image enhancement method based on progressive pixel differential convolution, which includes model diagrams of the central differential convolution mode, the angular differential convolution mode, and the radial differential convolution mode.
[0026] Figure 6 The images show the experimental results of a nighttime image enhancement method based on progressive pixel difference convolution, where (a) is a nighttime image with lens flare damage and (b) is a clean nighttime restored image containing only the light source. Detailed Implementation
[0027] To make the objectives, technical solutions, and advantages of the present invention clearer, the embodiments of the present invention will be described in further detail below.
[0028] Combination Figures 1-5 This invention discloses a nighttime strong light-resistant image enhancement method based on progressive pixel difference convolution, comprising the following steps:
[0029] Step 1: Construct a training dataset for nighttime strong light interference images. To train the network model, a training set containing paired data is first needed. The BDD100K dataset is selected as the clean nighttime background image, and the Flare7KPP dataset, containing lens flare images and their corresponding light source information images, is selected as the strong light interference source. Through randomization, the lens flare images and their corresponding light source information images from Flare7KPP are superimposed onto the clean nighttime background image from BDD100K. To simulate realism degradation, the clean nighttime background image with added lens flare images is Gaussian blurred to form a nighttime image with lens flare damage as the input sample. Simultaneously, only the corresponding light source information image is synthesized on the same clean nighttime background image without Gaussian blurring, constructing an ideal clean nighttime scene image containing only the light source as the ground truth label. The above nighttime image with lens flare damage and the corresponding clean nighttime scene image containing only the light source are combined into training image pairs to form the nighttime strong light interference image training dataset, proceeding to Step 2.
[0030] Step 2: Construct the encoder. The encoder consists of a pixel-difference attention block (PDAB) of a four-layer progressive pixel difference feed-forward network (P-PDFN) and four downsampling modules. The pixel-difference attention block further includes a window multi-self-attention layer (W-MSA), a shifted window multi-self-attention layer (SW-MSA), and is supplemented by a layer normalization layer and residual connections. The progressive pixel difference feedforward network, configured stepwise, includes different Local Difference Convolution Patterns (LDCPs). Its function is to extract image features from the training dataset of nighttime strong light interference images from shallow to deep layers. It fully preserves the original spatial features through Depthwise Separable Convolution (DWC), and utilizes the C-LDCP (Local Difference Convolution Pattern based on Central Difference), A-LDCP (Local Difference Convolution Pattern based on Angular Difference), and R-LDCP (Local Difference Convolution Pattern based on Radial Difference) modes to capture fine textures, extract geometric edges, and obtain a large receptive field, respectively, to capture radial lens flares and output deep feature maps. The details are as follows:
[0031] S2.1 First, an input projection layer is used to output a sequential feature map from the nighttime image containing lens flare damage. This map is then input into a first-layer pixel difference attention block. The progressive pixel difference feedforward network in this first-layer pixel difference attention block is configured with depthwise separable convolutions. In the highest resolution system, the weights of the sequential feature map are preserved without differential transformation, thus fully retaining the original cue intensity, color information, and basic spatial features of the input image in the shallow feature layer. Specifically, this is represented as follows:
[0032] ,
[0033] ,
[0034] In the formula, Representing intermediate features of depthwise convolution. Indicates the location First One channel, Indicates the spatial size of the convolution kernel. This represents the sliding index of the depthwise convolution kernel in the horizontal and vertical directions, respectively. Represents the input feature map, Represents the depthwise convolution kernel. It is the output channel index. This represents the output feature map. This represents the total number of channels in the input feature map. This represents the pointwise convolution kernel.
[0035] Subsequently, feature transformation is performed through the first-layer downsampling module to obtain the first-layer feature map. While reducing the spatial resolution of the feature map, the channel dimension is expanded, providing a feature benchmark for subsequent multi-scale feature extraction.
[0036] S2.2, the progressive pixel difference feedforward network in the second-layer pixel difference attention block is configured with a center difference convolution mode. By calculating the difference between the center pixel and its eight surrounding neighboring pixels, it captures highly subtle textures in the first-layer feature map, preventing details from being lost in the next acquisition process. This is specifically represented as follows:
[0037] ,
[0038] In the formula, This represents the feature map after the center-discretionary convolution pattern. Indicates the corresponding feature map position The convolution weights, This represents the value of neighboring pixels.
[0039] Subsequently, feature transformation is performed through the second-layer downsampling module to obtain the second-layer feature map, which further compresses spatial redundancy and enhances the network's ability to perceive local gradients.
[0040] S2.3, the progressive pixel difference feedforward network in the third-layer pixel difference attention block is configured with an angle difference convolution mode. By calculating the characteristics of pixel differences in a specified angle direction, it extracts the geometric edges of the object in the second-layer feature map, thereby distinguishing between directional object edges and diffuse light edges, as specifically shown below:
[0041] ,
[0042] In the formula, This represents the feature map after the angle difference convolution pattern. Indicates the corresponding feature map position The convolution weights, This represents a 3×3 neighborhood of pixels arranged clockwise, with the index sequence defined as:
[0043] .
[0044] Subsequently, feature transformation is performed through the third-layer downsampling module to obtain the third-layer feature map, which initially separates the low-frequency glare edges from the high-frequency object structure edges.
[0045] S2.4, the progressive pixel difference feedforward network in the fourth-layer pixel difference attention block is configured with radial difference convolution mode. By focusing and mapping the 3×3 local blocks in the third-layer feature map to 5×5 dilated regions and calculating the difference, it obtains a large receptive field on the effective resolution feature map with low refinement, which is used to capture and suppress large-scale radial halos, as shown below:
[0046] ,
[0047] In the formula, This represents the feature map after radial differential convolution. This indicates the horizontal pixel order of the 5×5 expansion region. Indicates the pixel number.
[0048] Finally, the feature transformation is performed through the fourth-layer downsampling module to obtain a deep feature map, which is then output to the neck layer, achieving effective capture and suppression of large-scale radial halos.
[0049] The downsampling modules in S2.1 to S2.4 above reshape the input sequence feature map into a two-dimensional spatial feature map, and then compress it using a convolutional layer with a stride of 2.
[0050] Proceed to step 3.
[0051] Step 3: Construct the neck layer. The neck layer is a channel pixel-difference attention block (SE-PDAB). The SE-PDAB consists of a window self-attention layer, a sliding window self-attention layer, and a global attention layer (SE-Layer). It is constructed in conjunction with a progressive pixel-difference feedforward network, supplemented by a normalization layer and residual connections. Located between the encoder and decoder, its function is to receive the deep feature maps output by the encoder, aggregate global semantic information using the global attention layer, and output the deepest and most abstract feature maps. This ensures overall image color restoration and background consistency after lens flare removal. Specifically, it is represented as follows:
[0052] ,
[0053] ,
[0054] ,
[0055] ,
[0056] In the formula, Indicates the first The global average value of each channel. Indicates the first Feature map of each channel This represents a compression function that compresses the global spatial information of each channel into a single channel descriptor. Indicates the length of the feature map. Indicates the width of the feature map; Represents the channel weight vector. Represents the weight matrix. , This represents the activation function. This represents the output feature map, specifically the deepest and most abstract feature map. Indicates the input feature map, This represents the recalibration function, which applies the learned channel weights to the input feature map. ; This represents element-wise multiplication of the channel dimension.
[0057] Proceed to step 4.
[0058] Step 4: Construct the decoder. The decoder consists of a four-layer progressively configured pixel difference feedforward network pixel difference attention block and four upsampling modules. After upsampling the deepest and most abstract feature maps output from the neck layer, it is compared with the deep abstract feature map output from the fourth-layer pixel difference attention block of the encoder. Perform channel-dimensional concatenation to obtain the first concatenated feature map. This is used as the input to the fourth pixel difference attention block in the decoder. The feature map input to the pixel difference attention block in the decoder is formed by concatenating the upsampled feature map with the feature map from the corresponding encoder stage along the channel dimension. The pixel difference attention block also includes a window self-attention layer, a sliding window self-attention layer, supplemented by a normalization layer and residual connections. Through feature extraction and fusion from deep to shallow, effective information is preserved, ultimately reconstructing a clean, nighttime restored image containing only the light source, suppressing lens flare and enhancing texture details, as detailed below:
[0059] S4.1, the progressive pixel difference feedforward network in the first-layer pixel difference attention block is configured with radial difference convolution mode. First, the deepest and most abstract feature maps output from the neck layer are upsampled and then compared with the deep abstract feature maps output from the fourth-layer pixel difference attention block in the encoder. Perform channel-dimensional concatenation to obtain the first concatenated feature map. This is used as the input to the fourth pixel difference attention block in the decoder. The first pixel difference attention block in the decoder uses radial difference convolution to obtain a large receptive field on the low-resolution feature map to capture and suppress large-scale radial halos, as shown below:
[0060] ,
[0061] ,
[0062] In the formula, This represents the feature map after the first layer of upsampling. This indicates a feature cascade operation.
[0063] The output feature map of the first layer pixel difference attention block after passing through the decoder is then output to the next layer through the upsampling module.
[0064] S4.2, the progressive pixel difference feedforward network in the second-layer pixel difference attention block is configured with angle difference convolution mode. The feature map output from the first-layer pixel difference attention block of the decoder is passed through the upsampling module to obtain the feature map after the second-layer upsampling. The output feature map of the pixel difference attention block in the third layer of the encoder. Perform channel-dimensional concatenation to obtain the second concatenated feature map. Then, the geometric edges of the object are extracted by convolution of angular differences, thereby distinguishing between directional edges and diffuse light edges, as shown below:
[0065] ,
[0066] ,
[0067] The output feature map of the second-layer pixel difference attention block after passing through the decoder is then output to the next layer through the upsampling module.
[0068] S4.3, the progressive pixel difference feedforward network in the third-layer pixel difference attention block is configured with a central difference convolution mode. The feature map output from the second-layer pixel difference attention block of the decoder is passed through the upsampling module to obtain the feature map after upsampling in the third layer. The output feature map of the pixel difference attention block in the second layer of the encoder. The third concatenated feature map is obtained by concatenating the channel dimensions. Then, highly subtle textures are captured using central difference convolution, as shown below:
[0069] ,
[0070] ,
[0071] The output feature map of the third-layer pixel difference attention block after passing through the decoder is then output to the next layer through the upsampling module.
[0072] In S4.4, the progressive pixel difference feedforward network in the fourth-layer pixel difference attention block is configured with depthwise separable convolutions. The feature map output from the third-layer pixel difference attention block of the decoder is passed through an upsampling module to obtain the feature map after upsampling in the fourth layer. The output feature map of the pixel difference attention block in the first layer of the encoder. The fourth concatenated feature map is obtained by concatenating the channel dimensions. Then, depthwise separable convolution is used to fully restore the original cue intensity, color, and spatial features of the image, as shown below:
[0073] ,
[0074] ,
[0075] ,
[0076] Finally, through an output projection layer, using... Convolutional layers map feature channels back to RGB space, reconstructing a clean, nighttime image containing only the light source that suppresses lens flare and enhances texture details.
[0077] The upsampling modules in S4.1 to S4.4 above first reshape the input feature map into a two-dimensional spatial feature map, and then use transposed convolution to expand the spatial dimension and transform the channels, which reflects the spatial scale transformation logic opposite to downsampling, and is used to restore the resolution of the image and reconstruct the detailed information step by step.
[0078] Proceed to step 5.
[0079] Step 5: Optimize the Progressive Pixel Difference Transformer (P-PDT) network using a hybrid loss function. This involves updating the network parameters using deep feature maps, the deepest and most abstract feature maps, and concatenated feature maps, ultimately obtaining the trained P-PDT model. The hybrid loss function is specifically expressed as follows:
[0080] ,
[0081] ,
[0082] ,
[0083] In the formula, Represents the total loss function. The background image loss is represented by the mean absolute error loss. and perceived loss constitute, The image loss due to flare is indicated by and constitute, Indicates the losses incurred during reconstruction. The weights of the loss function are set to 0.5, 0.5, and 1 respectively. This represents a clean, nighttime restored image containing only the light source, predicted by a progressive pixel-difference Transformer network. This represents a clean night scene image containing only light sources, used as a truth label. This represents a light-free flare image predicted by a progressive pixel difference Transformer network. This represents a true image of a flare without a light source. Image representing light source information. Image representing a clean nighttime background. This indicates an image of lens flare.
[0084] The reconstruction loss is expressed by the following formula:
[0085] ,
[0086] In the formula, This indicates a nighttime image containing lens flare damage. This indicates that addition is performed in a linearized gamma decoding domain, where the gamma decoding domain is... , This indicates that the result will be cropped to... Within the range.
[0087] Proceed to step 6.
[0088] Step 6: Through randomization, add lens flares to the clean night images and perform Gaussian blur processing to form night images with lens flare damage as input, construct a test dataset of night images with lens flare damage, and proceed to step 7.
[0089] Step 7: Input the test dataset into the trained progressive pixel difference Transformer network model, and output the clean nighttime image prediction result containing only the light source, which suppresses lens flare and enhances texture details for each sample in the test dataset.
[0090] On the test dataset, the proposed nighttime strong light-resistant image enhancement model based on progressive pixel difference convolution achieved good experimental results, such as... Figure 6 As shown, the effectiveness of the designed algorithm is thus demonstrated. In summary, this invention can effectively suppress lens flare in nighttime images with lens flare damage, preserving only the light source information, and significantly improving the ability to restore details and edges in dark areas.
Claims
1. A nighttime strong light-resistant image enhancement method based on progressive pixel difference convolution, characterized in that, Includes the following steps: Step 1: Select the BDD100K dataset as the clean nighttime background image and the Flare7KPP dataset, which contains lens flare images and their corresponding light source information images, as the strong light interference source. Through randomization, the lens flare images and their corresponding light source information images in the Flare7KPP dataset are superimposed onto the clean nighttime background image of BDD100K to form training image pairs, thus constructing a nighttime strong light interference image training dataset. Proceed to Step 2. Step 2: Construct an encoder, which consists of four layers of pixel difference attention blocks configured with local difference convolution modes and a downsampling module. Its function is to extract image features from the training dataset of nighttime strong light interference images from shallow to deep, preserve the original spatial features completely through depth-separable convolution, and use center difference, angular difference and radial difference modes to capture fine textures, extract geometric edges and obtain a large receptive field to capture radial lens flares, output deep feature maps, and proceed to step 3. Step 3: Construct the neck layer, which consists of a single channel pixel difference attention block located between the encoder and the decoder. Its function is to receive the deep feature map output by the encoder, aggregate global semantic information using the global attention layer, and output the deepest and most abstract feature map to ensure the overall color restoration of the image and the consistency of the background after lens flare removal. Proceed to Step 4. Step 4: Construct the decoder. The decoder consists of four layers of pixel difference attention blocks configured with local difference convolution modes, combined with an upsampling module. After upsampling the deepest and most abstract feature maps output from the neck layer, they are concatenated along the channel dimension with the deep abstract feature map output from the fourth layer pixel difference attention block of the encoder to obtain the first concatenated feature map, which is used as the input to the fourth layer pixel difference attention block in the decoder. The feature maps input to the pixel difference attention blocks in the decoder are obtained by concatenating the upsampled feature map with the feature map of the corresponding encoder stage along the channel dimension. Through feature extraction and fusion from deep to shallow, the preservation of effective information is ensured. Finally, a clean nighttime restored image containing only light sources is reconstructed and restored, which suppresses lens flare and enhances texture details. Proceed to Step 5. Step 5: Optimize the progressive pixel difference Transformer network using a hybrid loss function. Update the network parameters using feature maps such as deep feature maps, the deepest and most abstract feature maps, and spliced feature maps to finally obtain the trained progressive pixel difference Transformer network model, and then proceed to step 6. Step 6: Through randomization, add lens flares to the clean night images and perform Gaussian blur processing to form night images with lens flare damage as input, construct a test dataset of night images with lens flare damage, and proceed to step 7. Step 7: Input the test dataset into the trained progressive pixel difference Transformer network model, and output the clean nighttime image prediction result containing only the light source, which suppresses lens flare and enhances texture details for each sample in the test dataset.
2. The nighttime strong light resistance image enhancement method based on progressive pixel difference convolution according to claim 1, characterized in that, In step 1, the training image pairs are formed. During the synthesis process, a clean nighttime background image with added lens flare images is Gaussian blurred to construct a nighttime image with lens flare damage that has realistic degradation as an input sample. At the same time, only the corresponding light source information image is synthesized on the same clean nighttime background image without Gaussian blurring to construct an ideal clean nighttime scene image containing only the light source as the ground truth label. The above-mentioned nighttime image with lens flare damage and the corresponding clean nighttime scene image containing only the light source are combined to form a training image pair, forming a training dataset of nighttime strong light interference images.
3. The nighttime strong light resistance image enhancement method based on progressive pixel difference convolution according to claim 2, characterized in that, In step 2, the encoder includes a pixel difference attention block consisting of a four-layer progressively configured pixel difference feedforward network and four downsampling modules. The pixel difference attention block further includes a window self-attention layer, a sliding window self-attention layer, and is supplemented by a normalization layer and residual connections to establish long-distance dependencies in the image, as detailed below: S2.1 First, an input projection layer is used to output a sequential feature map from the nighttime image containing lens flare damage. This map is then input into a first-layer pixel difference attention block. The progressive pixel difference feedforward network in this first-layer pixel difference attention block is configured with depthwise separable convolutions. In the highest resolution system, the weights of the sequential feature map are preserved without differential transformation, thus fully retaining the original cue intensity, color information, and basic spatial features in the shallow feature layers. Specifically, this is represented as follows: , , In the formula, Representing intermediate features of depthwise convolution. Indicates the location First One channel, Indicates the spatial size of the convolution kernel. This represents the sliding index of the depthwise convolution kernel in the horizontal and vertical directions, respectively. Represents the input feature map, Represents the depthwise convolution kernel. It is the output channel index. This represents the output feature map. This represents the total number of channels in the input feature map. Represents the pointwise convolution kernel; Subsequently, feature transformation is performed through the first-layer downsampling module to obtain the first-layer feature map. This reduces the spatial resolution of the feature map while expanding the channel dimension, providing a feature benchmark for subsequent multi-scale feature extraction. S2.2, the progressive pixel difference feedforward network in the second-layer pixel difference attention block is configured with a center difference convolution mode. By calculating the difference between the center pixel and its eight surrounding neighboring pixels, it captures highly subtle textures in the first-layer feature map, preventing details from being lost in the next acquisition process. This is specifically represented as follows: , In the formula, This represents the feature map after the center-discretionary convolution pattern. Indicates the corresponding feature map position The convolution weights, Indicates the value of neighboring pixels; Subsequently, feature transformation is performed through the second-layer downsampling module to obtain the second-layer feature map, which further compresses spatial redundancy and enhances the network's ability to perceive local gradients. S2.3, the progressive pixel difference feedforward network in the third-layer pixel difference attention block is configured with an angle difference convolution mode. By calculating the characteristics of pixel differences in a specified angle direction, it extracts the geometric edges of the object in the second-layer feature map, thereby distinguishing between directional object edges and diffuse light edges, as specifically shown below: , In the formula, This represents the feature map after the angle difference convolution pattern. Indicates the corresponding feature map position The convolution weights, This represents a 3×3 neighborhood of pixels arranged clockwise, with the index sequence defined as: ; Subsequently, feature transformation is performed through the third-layer downsampling module to obtain the third-layer feature map, which initially separates the low-frequency glare edges from the high-frequency object structure edges. S2.4, the progressive pixel difference feedforward network in the fourth-layer pixel difference attention block is configured with radial difference convolution mode. By focusing and mapping the 3×3 local blocks in the third-layer feature map to 5×5 dilated regions and calculating the difference, it obtains a large receptive field on the effective resolution feature map with low refinement, which is used to capture and suppress large-scale radial halos, as shown below: , In the formula, This represents the feature map after radial differential convolution. This indicates the horizontal pixel order of the 5×5 expansion region. Indicates the pixel number; Finally, the feature transformation is performed through the fourth-layer downsampling module to obtain a deep feature map, which is then output to the neck layer, achieving effective capture and suppression of large-scale radial halos.
4. The nighttime strong light resistance image enhancement method based on progressive pixel difference convolution according to claim 3, characterized in that, The downsampling modules in S2.1 to S2.4 reshape the input sequence feature map into a two-dimensional spatial feature map, and then compress it using a convolutional layer with a stride of 2.
5. The nighttime strong light resistance image enhancement method based on progressive pixel difference convolution according to claim 3, characterized in that, In step 3, the neck layer consists of a channel pixel difference attention block, which is composed of a window self-attention layer and a sliding window self-attention layer connected to a global attention layer, combined with a progressive pixel difference feedforward network and supplemented by a normalization layer and residual connections. The channel pixel difference attention block is configured with a global attention layer, and the progressive pixel difference feedforward network is configured with depthwise separable convolutions. This ensures that the weights of the deep feature maps are not differentially transformed, thus ensuring the correct restoration of the overall image color and the consistency of the background after lens flare removal. Specifically, it is represented as follows: , , , , In the formula, Indicates the first The global average value of each channel. Indicates the first Feature map of each channel This represents a compression function that compresses the global spatial information of each channel into a single channel descriptor. Indicates the length of the feature map. Indicates the width of the feature map; Represents the channel weight vector. Represents the weight matrix. , This represents the activation function. This represents the output feature map, specifically the deepest and most abstract feature map. Indicates the input feature map, This represents the recalibration function, which applies the learned channel weights to the input feature map. ; This represents element-wise multiplication of the channel dimension.
6. The nighttime strong light resistance image enhancement method based on progressive pixel difference convolution according to claim 5, characterized in that, In step 4, the decoder includes a pixel difference attention block of a four-layer progressively configured pixel difference feedforward network and four upsampling modules. After upsampling the deepest and most abstract feature maps output from the neck layer, it is compared with the deep abstract feature map output from the fourth-layer pixel difference attention block of the encoder. Perform channel-dimensional concatenation to obtain the first concatenated feature map. This is used as the input to the fourth pixel difference attention block in the decoder. The feature map input to the pixel difference attention block in the decoder is formed by concatenating the upsampled feature map with the feature map of the corresponding encoder stage along the channel dimension. The pixel difference attention block also includes a window self-attention layer, a sliding window self-attention layer, and is supplemented by a normalization layer and residual connections, thereby realizing image reconstruction, as detailed below: S4.1, the progressive pixel difference feedforward network in the first-layer pixel difference attention block is configured with radial difference convolution mode. First, the deepest and most abstract feature maps output from the neck layer are upsampled and then compared with the deep abstract feature maps output from the fourth-layer pixel difference attention block in the encoder. Perform channel-dimensional concatenation to obtain the first concatenated feature map. This is used as the input to the fourth pixel difference attention block in the decoder. The first pixel difference attention block in the decoder uses radial difference convolution to obtain a large receptive field on the low-resolution feature map to capture and suppress large-scale radial halos, as shown below: , , In the formula, This represents the feature map after the first layer of upsampling. Indicates a feature cascade operation; The output feature map of the first layer pixel difference attention block after passing through the decoder is then output to the next layer through the upsampling module; S4.2, the progressive pixel difference feedforward network in the second-layer pixel difference attention block is configured with angle difference convolution mode. The feature map output from the first-layer pixel difference attention block of the decoder is passed through the upsampling module to obtain the feature map after the second-layer upsampling. The output feature map of the pixel difference attention block in the third layer of the encoder. Perform channel-dimensional concatenation to obtain the second concatenated feature map. Then, the geometric edges of the object are extracted by convolution of angular differences, thereby distinguishing between directional edges and diffuse light edges, as shown below: , , The output feature map of the second-layer pixel difference attention block after the decoder is then output to the next layer through the upsampling module; S4.3, the progressive pixel difference feedforward network in the third-layer pixel difference attention block is configured with a central difference convolution mode. The feature map output from the second-layer pixel difference attention block of the decoder is passed through the upsampling module to obtain the feature map after upsampling in the third layer. The output feature map of the pixel difference attention block in the second layer of the encoder. The third concatenated feature map is obtained by concatenating the channel dimensions. Then, highly subtle textures are captured using central difference convolution, as shown below: , , The output feature map of the third-layer pixel difference attention block after being processed by the decoder is then output to the next layer through the upsampling module; In S4.4, the progressive pixel difference feedforward network in the fourth-layer pixel difference attention block is configured with depthwise separable convolutions. The feature map output from the third-layer pixel difference attention block of the decoder is passed through an upsampling module to obtain the feature map after upsampling in the fourth layer. The output feature map of the pixel difference attention block in the first layer of the encoder. The fourth concatenated feature map is obtained by concatenating the channel dimensions. Then, depthwise separable convolution is used to fully restore the original cue intensity, color, and spatial features of the image, as shown below: , , , Finally, through an output projection layer, using... Convolutional layers map feature channels back to RGB space, reconstructing a clean, nighttime image containing only the light source that suppresses lens flare and enhances texture details.
7. The nighttime strong light resistance image enhancement method based on progressive pixel difference convolution according to claim 6, characterized in that, The upsampling modules in S4.1 to S4.4 first reshape the input feature map into a two-dimensional spatial feature map, and then use transposed convolution to expand the spatial dimension and transform the channels, which reflects the spatial scale transformation logic opposite to downsampling, and is used to restore the resolution of the image and reconstruct the detailed information step by step.
8. The nighttime strong light resistance image enhancement method based on progressive pixel difference convolution as described in claim 6, characterized in that, The mixed loss function in step 5 is expressed by the following formula: , , , In the formula, Represents the total loss function. The background image loss is represented by the mean absolute error loss. and perceived loss constitute, The image loss due to flare is indicated by and constitute, Indicates the losses incurred during reconstruction. The weights of the loss function are set to 0.5, 0.5, and 1 respectively. This represents a clean, nighttime restored image containing only the light source, predicted by a progressive pixel-difference Transformer network. This represents a clean night scene image containing only light sources, used as a truth label. This represents a light-free flare image predicted by a progressive pixel difference Transformer network. This represents a true image of a flare without a light source. Image representing light source information. Image representing a clean nighttime background. This indicates an image of lens flare. The reconstruction loss is expressed by the following formula: , In the formula, This indicates a nighttime image containing lens flare damage. This indicates that addition is performed in a linearized gamma decoding domain, where the gamma decoding domain is... , This indicates that the result will be cropped to... Within the range.