Image enhancement method and system based on semantic segmentation and color prior guidance
By constructing a cross-domain semantically aligned object-level pairing dataset and a color encoder, and combining semantic segmentation with color prior guidance, brightness and chromaticity are decoupled in the HSV/HVI color space, solving the problems of insufficient brightness and color cast in low-light images, and achieving high-quality image enhancement effects.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHANDONG UNIV
- Filing Date
- 2026-03-19
- Publication Date
- 2026-06-19
Smart Images

Figure CN122244411A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of computer vision and digital image processing technology, and in particular relates to an image enhancement method and system based on semantic segmentation and color prior guidance. Background Technology
[0002] The statements in this section are merely background information related to the present invention and do not necessarily constitute prior art.
[0003] With the rapid development of computer vision technology and digital image sensing hardware, image acquisition equipment has been widely used in key areas such as smart city monitoring, autonomous driving environmental perception, smartphone photography, and nighttime disaster relief. However, in low-light environments such as at night, in tunnels, and on rainy days, the effective signal received by the image sensor is weak, resulting in problems such as insufficient brightness, low contrast, and loss of detail in the generated images. Furthermore, the noise introduced by high gain further reduces the signal-to-noise ratio of the image.
[0004] Low-quality low-light images not only affect the visual experience but also severely hinder the development of subsequent advanced computer vision tasks. For example, in autonomous driving scenarios, poor nighttime image quality may threaten driving safety; in security monitoring, dim lighting reduces the accuracy of facial recognition or abnormal behavior detection. Therefore, low-light image enhancement technology has significant research value and practical implications.
[0005] Early enhancement methods were mainly based on histogram equalization and Retinex theory. Histogram equalization improves contrast by stretching the grayscale distribution, but it is prone to local overexposure or underexposure and amplifies noise. Retinex theory methods recover the reflection component by estimating the illumination component, but often produce "halo artifacts" and are prone to color distortion in extremely low-light areas.
[0006] In recent years, end-to-end augmentation methods based on deep learning have gradually become mainstream, with networks such as LLNet, RetinexNet, and KinD achieving significant progress in brightness and denoising. However, existing methods still face the following key challenges: First, color restoration is unstable. In the RGB color space, brightness and chroma are highly coupled. When the network enhances brightness, it is difficult to maintain the accuracy of hue and saturation, resulting in an overall color cast or flat colors in the enhanced image.
[0007] Second, semantic information is missing. Existing methods often adopt a "one-size-fits-all" global enhancement strategy, ignoring the differences in optical characteristics among different object categories. For example, the night sky should be kept low-noise and dark, while foreground objects need to have their texture and color restored. Global enhancement often leads to local distortion.
[0008] Third, there is a lack of object-level color priors. Existing networks lack a memory mechanism for the inherent colors of objects, and cannot physically recover color information in extremely low-light areas. External prior knowledge needs to be introduced to "associate" the correct color, but currently there is a lack of effective injection mechanisms.
[0009] Fourth, color space limitations. Existing methods mostly operate in the RGB space, and amplifying channel values can easily disrupt color proportions. Decoupled spaces such as HSV / HVI are theoretically more suitable for low-light enhancement, but standard conversion formulas have singularities in low-light conditions, and direct conversion will amplify noise. Summary of the Invention
[0010] To overcome the shortcomings of the prior art, this invention provides an image enhancement method and system based on semantic segmentation and color prior guidance, which solves the problems of difficulty in extracting semantic information, severe interference of brightness in RGB space color restoration, and distortion of local areas caused by global enhancement strategies in existing low-light image enhancement methods.
[0011] To achieve the above objectives, one or more embodiments of the present invention provide the following technical solutions: The first aspect of this invention provides an image enhancement method based on semantic segmentation and color prior guidance; Image enhancement methods based on semantic segmentation and color prior guidance include: Construct an object-level pairing dataset based on cross-domain semantic alignment; A color encoder is trained based on the object-level paired dataset to learn the mapping relationship from high-noise, low-light chromaticity features to clear, normal-light chromaticity features, and to recover hue information contaminated by noise. Semantic segmentation based on a pre-brightening strategy is performed on the low-light image to be detected to obtain the semantic segmentation mask of the image; Using the semantic segmentation mask, the regions of each object are separated on the original low-light image and input into the trained color encoder to extract the object color information, thus obtaining a color prior feature map. The semantic edge information and the color prior feature map are input into the semantic-color adaptive normalization module to obtain the fused conditional features. The original low-light image and the fused conditional features are input into the enhancement network, and the output image is enhanced in quality.
[0012] As a further technical solution, the construction of the object-level pairing dataset based on cross-domain semantic alignment includes: Select paired low-light and normal-light training datasets; A pre-trained high-precision semantic segmentation model is used to infer the normal lighting images in the training dataset, generating high-confidence pixel-level semantic segmentation labels. Based on the spatial alignment characteristics of pairwise data, the semantic segmentation labels are mapped to the corresponding low-light images; Based on the category index in the semantic segmentation label, calculate the bounding box of the connected region of each object class; use the bounding box to crop out pairs of object image blocks from the low-light image and the corresponding normal-light image respectively; The cropped image patches are uniformly adjusted to a fixed size to construct an object-level pairing dataset containing pairs of low-light objects and normal-light objects, along with their category labels.
[0013] As a further technical solution, a color encoder is trained based on the object-level paired dataset to learn the mapping relationship from high-noise, low-light chromaticity features to clear, normal-light chromaticity features, and to recover the noise-contaminated hue information, including: The image patches in the object-level dataset are converted from the RGB color space to the target color space, which is selected from the HSV or HVI space. The decoupling of the luminance and chrominance components of the target color space is used to reduce low-light noise interference. A lightweight color encoder network is constructed, which takes the chromaticity correlation component of the low-light object image block as input, the chromaticity correlation component of the corresponding normal-light object image block as supervision target, and outputs the predicted normal-light chromaticity component. A dual-domain joint supervision strategy is adopted for network optimization. Color bias is corrected by calculating the chromaticity regression loss between the predicted features and the ground values in the target color space. The predicted features are then inversely transformed back to the RGB space to calculate the pixel-level reconstruction loss.
[0014] As a further technical solution, the step of performing semantic segmentation on the low-light image to be detected based on a pre-brightening strategy to obtain a semantic segmentation mask for the image includes: The input low-light image to be detected is preprocessed to generate an intermediate auxiliary image; The intermediate auxiliary image is input into the semantic segmentation model to obtain a high-precision semantic segmentation mask.
[0015] As a further technical solution, the semantic segmentation mask is used to separate object regions on the original low-light image, and the results are input into a trained color encoder to extract object color information, resulting in a color prior feature map, including: Using the obtained semantic segmentation mask, the regions of each object are located on the original low-light image. The data of each region are converted to the HVI domain and input into the trained color encoder to extract the normal light chromaticity prediction values of each region. Based on the coordinates of each region on the original low-light image, the predicted chromaticity features are spatially mosaicked to generate a full-image color prior feature map with the same size as the original low-light image.
[0016] As a further technical solution, semantic edge information and color prior feature maps are input into a semantic-color adaptive normalization module to obtain fused conditional features, including: The semantic-color adaptive normalization module is used to concatenate semantic edge information and color prior feature maps in the channel dimension, and the fused features are extracted through convolutional layers. The fused features are split and input into the offset branch and the scale branch to obtain the offset parameter map and the scale parameter map; Pixel-level affine transformations are performed on the feature maps of the low-light enhancement network based on displacement parameter maps and scale parameter maps to obtain fused conditional features.
[0017] As a further technical solution, the image enhancement network adopts the U-Net architecture, which includes an encoder, a bottleneck layer, and a decoder; the semantic-color adaptive normalization module is embedded after each convolutional operation of the decoder. Before being input into semantic-color adaptive normalization modules at different levels, the semantic segmentation mask and the full-image color prior feature map are downsampled and adjusted according to the resolution of the current layer feature map.
[0018] A second aspect of the present invention provides an image enhancement system based on semantic segmentation and color prior guidance.
[0019] Image enhancement systems based on semantic segmentation and color prior guidance include: The dataset building module is configured to: build an object-level pairing dataset based on cross-domain semantic alignment; The color encoder training module is configured to: train a color encoder based on the object-level paired dataset, learn the mapping relationship from high-noise low-light chromaticity features to clear normal light chromaticity features, and recover hue information contaminated by noise; The semantic segmentation module is configured to perform semantic segmentation on the low-light image to be detected based on a pre-brightening strategy to obtain the semantic segmentation mask of the image. The prior feature map acquisition module is configured to: use the semantic segmentation mask to separate each object region on the original low-light image, and input it into the trained color encoder to extract the object color information to obtain the color prior feature map; The feature fusion module is configured to input semantic edge information and color prior feature map into the semantic-color adaptive normalization module to obtain fused conditional features. The image enhancement module is configured to input the original low-light image and the fused conditional features into the enhancement network and output an enhanced image.
[0020] A third aspect of the present invention provides a computer-readable storage medium having a program stored thereon, which, when executed by a processor, implements the steps of an image enhancement method based on semantic segmentation and color prior guidance as described in the first aspect of the present invention.
[0021] A fourth aspect of the present invention provides an electronic device, including a memory, a processor, and a program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps of an image enhancement method based on semantic segmentation and color prior guidance as described in the first aspect of the present invention.
[0022] The above one or more technical solutions have the following beneficial effects: (1) This invention overcomes the bottleneck of semantic segmentation failure under low light conditions and significantly improves the robustness of the algorithm. By introducing a strategy of "pre-brightening-segmentation-backtracking guidance during testing", lightweight preprocessing methods are used to first restore the structural features of the image to obtain a high-precision semantic mask, and then the mask is backtracked and applied to the enhancement process of the original low-light image. This ensures that accurate semantic and region priors can still be obtained under extreme lighting conditions.
[0023] (2) This invention achieves decoupled restoration of brightness and color, effectively solving the problems of color shift and color distortion in low-light enhancement. This invention utilizes the natural decoupling characteristics of the HSV / HVI color space to isolate the color restoration task. By training a dedicated object-level color encoder, the network can focus on mining potential color information from the chroma and saturation components without being disturbed by the luminance component. This mechanism enables the model to, like human visual memory, associate and restore the correct color based on the object texture, rather than simply calculating the pixel mean.
[0024] (3) The dual-domain joint loss constraint mechanism adopted in this invention ensures the accuracy and visual naturalness of color restoration. In view of the periodicity of hue and the overall requirements of RGB reconstruction, the dual-domain joint loss function based on HSV / HVI domain chromaticity regression + RGB domain pixel reconstruction, on the one hand, uses cosine similarity loss to accurately constrain the direction of hue vector in HSV / HVI domain, effectively overcoming phase jitter caused by low light noise; on the other hand, the prediction result is mapped back to the RGB domain to calculate the reconstruction loss through differentiable transformation, ensuring that the restored color is consistent with the real scene in the final visual presentation, avoiding local artifacts caused by single color space optimization.
[0025] (4) By constructing a semantic-color adaptive normalization module, this invention can dynamically adjust the enhancement strategy according to the semantic attributes of different regions in the image. For example, it can suppress high-frequency noise and maintain deep tones in the sky region, and enhance texture details and inject vibrant colors in the foreground object region. This effectively avoids the "sky noise explosion" or "streetlight overexposure" phenomena commonly found in traditional global enhancement algorithms, resulting in enhanced images with distinct layers, rich details, and vivid colors.
[0026] (5) This invention utilizes semantic alignment technology to construct a “low-light-object” level dataset, decomposing the complex full-image enhancement problem into a relatively simple object color restoration problem. This training method based on object patches not only expands the number of effective training samples, but also enables the color encoder to focus more on learning the color distribution patterns of specific categories, thereby improving the convergence speed and generalization ability of the model.
[0027] Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. Attached Figure Description
[0028] The accompanying drawings, which form part of this invention, are used to provide a further understanding of the invention. The illustrative embodiments of the invention and their descriptions are used to explain the invention and do not constitute an improper limitation of the invention.
[0029] Figure 1 This is a flowchart of the method in the first embodiment.
[0030] Figure 2 A schematic diagram of the object-level dataset construction process and color encoder training logic for the first embodiment.
[0031] Figure 3 This is a schematic diagram of the enhanced network structure of the semantic-color adaptive normalization module (SCAN) in the first embodiment.
[0032] Figure 4 This is a schematic diagram of the color space decoupling transformation and dual-domain joint loss function calculation logic for the first embodiment.
[0033] Figure 5 This is a system structure diagram of the second embodiment. Detailed Implementation
[0034] It should be noted that the following detailed descriptions are exemplary and intended to provide further illustration of the invention. Unless otherwise specified, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains.
[0035] It should be noted that the terminology used herein is for the purpose of describing particular implementations only and is not intended to limit the exemplary implementations of the present invention.
[0036] Where there is no conflict, the embodiments and features in the embodiments of the present invention can be combined with each other.
[0037] Example 1 This embodiment discloses an image enhancement method based on semantic segmentation and color prior guidance. By introducing an intermediate brightening stage, the accuracy of semantic segmentation is ensured. The object-level color encoder is trained using the decoupling characteristics of color space. The semantic and color priors are injected into the enhancement network through an adaptive feature modulation mechanism, thereby achieving high-quality low-light image reconstruction.
[0038] Specifically, such as Figure 1 As shown, the image enhancement method based on semantic segmentation and color prior guidance includes: Step S1: Construct an object-level pairing dataset based on cross-domain semantic alignment.
[0039] To address the difficulty in extracting semantic information from low-light images, this embodiment employs a "reverse mapping" strategy.
[0040] Specifically, a training dataset (such as the LOL dataset or the SID dataset) containing pairs of low-light and normal-light images is selected. A pre-trained high-precision semantic segmentation model (such as Mask2Former) is used to infer the normal-light images in the dataset, generating high-confidence pixel-level semantic segmentation labels. Mask2Former is a state-of-the-art general-purpose image segmentation model based on the Transformer architecture. This model breaks away from the traditional pixel-level classification paradigm based on fully convolutional networks, instead employing the idea of mask classification, using a mask attention mechanism to extract and refine the regional features of different objects in an image.
[0041] Utilizing the spatial pixel alignment properties of pairwise data, this semantic segmentation label Map to the corresponding low-light image. Iterate through each category index in the semantic segmentation labels (e.g., sky, vegetation, buildings), calculate the bounding box of the connected region for that category, and crop pairs of object image patches from both the low-light and normal-light images. and The cropped image patches are uniformly adjusted to a fixed size to construct an object-level dataset containing pairs of "low-light objects - normal-light objects" and their category labels. This provides a data foundation for subsequent color recovery training targeting specific objects.
[0042] Step S2: Train a color encoder based on the object-level paired dataset, learn the mapping relationship from high-noise, low-light chromaticity features to clear, normal-light chromaticity features, and recover the hue information contaminated by noise.
[0043] Reference Figure 2 This step aims to train a lightweight neural network that leverages the decoupling properties of the HVI / HSV color spaces to recover clear, normal chromaticity features from noisy, low-light chromaticity characteristics. HSV and HVI are two color models used in image processing to decouple color information (chromaticity) from luminance information. In the HSV space, H represents hue, S represents saturation, and V represents lightness. Unlike the highly coupled luminance and color in the RGB color space, HSV theoretically allows independent adjustment of the luminance component V without altering the chromaticity components H and S. However, when processing noisy, low-light images, the HSV space introduces severe "red discontinuities" and "black plane" noise artifacts. HVI is built upon the decoupling concept of HSV, but it has undergone fundamental improvements through two major innovations: First, by using polarization transformation, the one-dimensional circular hue H is mapped to the two-dimensional continuous HV chromaticity plane, completely eliminating the problem of red discontinuity; second, a learnable intensity collapse function is introduced to adaptively suppress unstable chromaticity information in extremely dark regions, solving the problem of noise in the black plane. Specifically: Step S21: To overcome the interference caused by the high coupling between luminance and chrominance in the RGB color space, the cropped RGB image blocks are converted to the HVI color space. To make the luminance channel smoother and less sensitive to single-point noise, the intensity component I is calculated using the arithmetic mean of the three channels:
[0044] The HVI color space provides a noise-robust and color-continuous representation through polarization and intensity collapse mechanisms. Subsequently, only the HV chroma channel information is extracted from it as input to the color encoder.
[0045] The input to the color encoder is strictly limited to the pure chroma information of the low-light image. This is achieved by extracting its HV channel features, denoted as... This can force the network to ignore luminance noise and learn mapping relationships only from chrominance data. The network's learning target is the clean, paired object chrominance features under normal lighting, denoted as _____. .
[0046] In this embodiment, the low-light image is converted to the HVI domain, separating the luminance channel which is severely affected by noise, and the color encoder is trained using only the hue and saturation / value channels. This allows the model to focus on learning the inherent color attributes of objects, effectively avoiding the color shift problem caused by increased brightness in traditional RGB methods.
[0047] Step S22: Construct a lightweight color encoder network. The chromaticity components (hue H and saturation S) of low-light object image patches are used as input, and the corresponding chromaticity components of normally lit object image patches are used as the supervision target. The output is the predicted normally lit chromaticity components. To ensure the model is lightweight and focuses on the spatial mapping of color information, the color encoder is designed as a fully convolutional network without downsampling. This architecture consists of three convolutional layers: The first layer uses a 3×3 convolution kernel to map the input 2-channel HV features to a 64-dimensional intermediate feature space, and introduces non-linearity through the ReLU activation function.
[0048] The second layer uses 1×1 convolutions to perform deep information interaction and transformation between feature channels.
[0049] The third layer again uses a 1×1 convolution to remap the 64-dimensional features back to 2 channels, obtaining the predicted HV chromaticity features H and V.
[0050] This non-downsampling design ensures pixel-level spatial correspondence, which is crucial for accurately restoring color distribution.
[0051] Step S23, as Figure 2 As shown, in order to ensure that the predicted chromaticity information is not only numerically accurate in the HVI domain, but also visually correct after being reconstructed into an RGB image, a loss function combining the HVI and RGB domains is used for constraint.
[0052] Among them, the direct loss of the HVI domain ( Used to directly compare the colorimetric features predicted by the network. With true chromaticity features Since the HV plane is a Cartesian coordinate system and does not have a periodicity problem, L1 loss is used to directly penalize the deviation of the chromaticity coordinates:
[0053] To introduce visual-level supervision, a "virtual reconstruction" step is designed to convert the chromaticity components predicted by the network. Luminance component of the true value Combining, through a differentiable HVI-to-RGB inverse transformation module Reconstruct a predicted color image under ideal brightness. Then, the L1 loss between the reconstructed image and the real normal light image is calculated as follows:
[0054]
[0055] in, RGB domain reconstruction loss ( ). By using This completely eliminates the error introduced by brightness, making The gradient can be used specifically to optimize the accuracy of color prediction.
[0056] Finally, the weighted sum of the losses from the two domains constitutes the final optimization objective:
[0057] in, For the total loss, These are weighting coefficients used to balance the importance of HVI domain loss and RGB domain loss.
[0058] Furthermore, during the training of the color encoder, a cosine similarity loss is introduced for the hue components with periodic features. This not only addresses the issue that hue angle values (such as 1° and 359°) are numerically discontinuous but perceptually extremely similar, but also enables the network to ignore modulus fluctuations caused by low-light noise and focus on recovering the correct color orientation, thereby significantly improving the accuracy of color recovery.
[0059] By learning the mapping relationship from high-noise, low-light chromaticity features to clear, normal-light chromaticity features through the dual-domain loss joint constraint network, the network is forced to accurately recover the hue information contaminated by noise while maintaining the color consistency of the object's texture structure, thus obtaining a robust color prior extractor.
[0060] Step S3: Perform semantic segmentation on the low-light image to be detected based on a pre-brightening strategy to obtain the semantic segmentation mask of the image.
[0061] In the online enhancement phase, the input low-light image to be detected is first processed. Perform global gamma correction (e.g., γ=0.5) or histogram equalization preprocessing to generate intermediate auxiliary images. The auxiliary image significantly enhances the object's outline and contrast.
[0062] Then Input the semantic segmentation model to obtain a high-precision semantic segmentation mask for the entire image. This mask serves only as a structured prior to address the issue of missing or incorrect masks caused by directly segmenting the original low-light image. It is not involved in subsequent pixel value calculations, thus avoiding the introduction of noise amplified during preprocessing.
[0063] Step S4: Using the semantic segmentation mask, separate the object regions on the original low-light image and input them into the trained color encoder to extract the object color information, thereby obtaining the color prior feature map.
[0064] Using the obtained semantic segmentation mask In the original low-light image The data for each object region is located. The data for each region is converted to the HVI domain and input into the color encoder trained in step S2 to extract the normal chromaticity prediction values for each region.
[0065] Finally, based on the coordinates of each region in the original image, the predicted chromaticity features are spatially mosaicked to generate a full-image color prior feature map with the same size as the original image. .
[0066] Step S5: Input the semantic edge information and the color prior feature map into the semantic-color adaptive normalization module to obtain the fused conditional features.
[0067] Reference Figure 3 This embodiment designs a semantic-color adaptive normalization module (SCAN) to fuse semantic edge information with color prior information and guide the main network features. The main network feature map is the intermediate feature map of the low-light enhancement network mentioned below. Specifically: Step S51: Receive two external conditions using the SCAN module: semantic mask. and color prior feature map First, concatenate at the channel level to obtain the combination conditions. C .
[0068]
[0069] Subsequently, fused features are extracted through a shared convolutional layer with a 3×3 convolutional kernel, 128 output channels, and ReLU activation function.
[0070] Step S52: The fused features are split into two parallel offset branches and a scale branch. The offset branch generates a pixel-level offset parameter map through a convolutional layer. The scale branch generates a pixel-level scale parameter map through a convolutional layer. Among them, the pixel-level offset parameter diagram and pixel-level scale parameter map Each pixel in the graph corresponds to a scaling factor and a bias factor, and the parameters of all pixels are combined to form a parameter map.
[0071] Step S53: The main path features of the current layer of the low-light enhancement network are first normalized by a normalization layer. Then, a pixel-level affine transformation is performed:
[0072] in, The feature output after fusing semantic and color information; To enhance the main path features of the current layer of the low-light enhancement network Perform standardization processing.
[0073] Through this operation, the SCAN module will extract the semantic structure (from...) (dominant) and color tendency (by) The dominant feature is injected into the feature map to obtain the fused conditional features.
[0074] Step S6: Input the original low-light image and the fused conditional features into the enhancement network, and output the enhanced image.
[0075] The original low-light image is input into the enhancement network, and the fused conditional features are also fed into the enhancement network to guide the enhancement process. For example, Figure 4 As shown, U-Net is used as the backbone network for low-light image enhancement. The enhancement network adopts the U-Net architecture and utilizes its multi-scale feature fusion characteristics to inject semantic and color information through the feature modulation module in the decoder while preserving the texture details of the original low-light image, thereby achieving high-quality image reconstruction. The main network includes an encoder, a bottleneck layer, and a decoder.
[0076] Specifically, in this embodiment, the encoder consists of four downsampling modules that progressively extract multi-scale features from the image, with the number of channels increasing sequentially (64, 128, 256, 512). The decoder consists of four upsampling modules. Crucially, a SCAN module is embedded after each convolutional operation in the decoder. Furthermore, the full-image color prior feature map and semantic mask generated in step S4 are downsampled according to the resolution of the current layer's feature map before being input into the SCAN modules at different levels, achieving accurate multi-scale guidance.
[0077] Ultimately, the enhanced network outputs high-quality enhanced images with appropriate brightness, true colors, and no artifacts.
[0078] Example 2 This embodiment discloses an image enhancement system based on semantic segmentation and color prior guidance; like Figure 5As shown, the image enhancement system based on semantic segmentation and color prior guidance includes: The dataset building module is configured to: build an object-level pairing dataset based on cross-domain semantic alignment; The color encoder training module is configured to: train a color encoder based on the object-level paired dataset, learn the mapping relationship from high-noise low-light chromaticity features to clear normal light chromaticity features, and recover hue information contaminated by noise; The semantic segmentation module is configured to perform semantic segmentation on the low-light image to be detected based on a pre-brightening strategy to obtain the semantic segmentation mask of the image. The prior feature map acquisition module is configured to: use the semantic segmentation mask to separate each object region on the original low-light image, and input it into the trained color encoder to extract the object color information to obtain the color prior feature map; The feature fusion module is configured to input semantic edge information and color prior feature map into the semantic-color adaptive normalization module to obtain fused conditional features. The image enhancement module is configured to input the original low-light image and the fused conditional features into the enhancement network and output an enhanced image.
[0079] Example 3 The purpose of this embodiment is to provide a computer-readable storage medium.
[0080] A computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of an image enhancement method based on semantic segmentation and color prior guidance as described in Embodiment 1.
[0081] Example 4 The purpose of this embodiment is to provide an electronic device.
[0082] An electronic device includes a memory, a processor, and a program stored in the memory and executable on the processor. When the processor executes the program, it implements the steps in an image enhancement method based on semantic segmentation and color prior guidance as described in Embodiment 1.
[0083] The steps and methods involved in the apparatuses of Embodiments 2, 3, and 4 above correspond to those in Embodiment 1. For specific implementation details, please refer to the relevant description section of Embodiment 1. The term "computer-readable storage medium" should be understood as a single medium or multiple media including one or more instruction sets; it should also be understood as including any medium capable of storing, encoding, or carrying an instruction set for execution by a processor and enabling the processor to perform any of the methods in this invention.
[0084] Those skilled in the art will understand that the modules or steps of the present invention described above can be implemented using general-purpose computer devices. Optionally, they can be implemented using computer-executable program code, thereby allowing them to be stored in a storage device for execution by a computer device, or they can be fabricated as separate integrated circuit modules, or multiple modules or steps can be fabricated as a single integrated circuit module. The present invention is not limited to any particular combination of hardware and software.
[0085] While the specific embodiments of the present invention have been described above in conjunction with the accompanying drawings, this is not intended to limit the scope of protection of the present invention. Those skilled in the art should understand that various modifications or variations that can be made by those skilled in the art without creative effort based on the technical solutions of the present invention are still within the scope of protection of the present invention.
Claims
1. An image enhancement method based on semantic segmentation and color prior guidance, characterized in that, include: Construct an object-level pairing dataset based on cross-domain semantic alignment; A color encoder is trained based on the object-level paired dataset to learn the mapping relationship from high-noise, low-light chromaticity features to clear, normal-light chromaticity features, and to recover hue information contaminated by noise. Semantic segmentation based on a pre-brightening strategy is performed on the low-light image to be detected to obtain the semantic segmentation mask of the image; Using the semantic segmentation mask, the regions of each object are separated on the original low-light image and input into the trained color encoder to extract the object color information, thus obtaining a color prior feature map. The semantic edge information and the color prior feature map are input into the semantic-color adaptive normalization module to obtain the fused conditional features. The original low-light image and the fused conditional features are input into the enhancement network, and the output image is enhanced in quality.
2. The image enhancement method based on semantic segmentation and color prior guidance as described in claim 1, characterized in that, The construction of the object-level pairing dataset based on cross-domain semantic alignment includes: Select paired low-light and normal-light training datasets; A pre-trained high-precision semantic segmentation model is used to infer the normal lighting images in the training dataset, generating high-confidence pixel-level semantic segmentation labels. Based on the spatial alignment characteristics of pairwise data, the semantic segmentation labels are mapped to the corresponding low-light images; Based on the category index in the semantic segmentation label, calculate the bounding box of the connected region of each object class; use the bounding box to crop out pairs of object image blocks from the low-light image and the corresponding normal-light image respectively; The cropped image patches are uniformly adjusted to a fixed size to construct an object-level pairing dataset containing pairs of low-light objects and normal-light objects, along with their category labels.
3. The image enhancement method based on semantic segmentation and color prior guidance as described in claim 1, characterized in that, A color encoder is trained based on the object-level paired dataset to learn the mapping relationship from high-noise, low-light chromaticity features to clear, normal-light chromaticity features, and to recover noise-contaminated hue information, including: The image patches in the object-level dataset are converted from the RGB color space to the target color space, and the low-light noise interference is reduced by utilizing the decoupling property of the luminance component and chrominance component of the target color space. A lightweight color encoder network is constructed, which takes the chromaticity correlation component of the low-light object image block as input, the chromaticity correlation component of the corresponding normal-light object image block as supervision target, and outputs the predicted normal-light chromaticity component. A dual-domain joint supervision strategy is adopted for network optimization. Color bias is corrected by calculating the chromaticity regression loss between the predicted features and the ground values in the target color space. The predicted features are then inversely transformed back to the RGB space to calculate the pixel-level reconstruction loss.
4. The image enhancement method based on semantic segmentation and color prior guidance as described in claim 1, characterized in that, The step of performing semantic segmentation on the low-light image to be detected based on a pre-brightening strategy to obtain a semantic segmentation mask for the image includes: The input low-light image to be detected is preprocessed to generate an intermediate auxiliary image; The intermediate auxiliary image is input into the semantic segmentation model to obtain a high-precision semantic segmentation mask.
5. The image enhancement method based on semantic segmentation and color prior guidance as described in claim 1, characterized in that, Using the semantic segmentation mask, object regions are separated on the original low-light image and input into a trained color encoder to extract object color information, resulting in a color prior feature map, including: Using the obtained semantic segmentation mask, the regions of each object are located on the original low-light image. The data of each region are converted to the HVI domain and input into the trained color encoder to extract the normal light chromaticity prediction values of each region. Based on the coordinates of each region on the original low-light image, the predicted chromaticity features are spatially mosaicked to generate a full-image color prior feature map with the same size as the original low-light image.
6. The image enhancement method based on semantic segmentation and color prior guidance as described in claim 1, characterized in that, The semantic edge information and the color prior feature map are input into the semantic-color adaptive normalization module to obtain the fused conditional features, including: The semantic-color adaptive normalization module is used to concatenate semantic edge information and color prior feature maps in the channel dimension, and the fused features are extracted through convolutional layers. The fused features are split and input into the offset branch and the scale branch to obtain the offset parameter map and the scale parameter map; Pixel-level affine transformations are performed on the feature maps of the low-light enhancement network based on displacement parameter maps and scale parameter maps to obtain fused conditional features.
7. The image enhancement method based on semantic segmentation and color prior guidance as described in claim 1, characterized in that, The image enhancement network adopts the U-Net architecture, which includes an encoder, a bottleneck layer, and a decoder; the semantic-color adaptive normalization module is embedded after each convolutional operation of the decoder. Before being input into semantic-color adaptive normalization modules at different levels, the semantic segmentation mask and the full-image color prior feature map are downsampled and adjusted according to the resolution of the current layer feature map.
8. An image enhancement system based on semantic segmentation and color prior guidance, characterized in that, include: The dataset building module is configured to: build an object-level pairing dataset based on cross-domain semantic alignment; The color encoder training module is configured to: train a color encoder based on the object-level paired dataset, learn the mapping relationship from high-noise low-light chromaticity features to clear normal light chromaticity features, and recover hue information contaminated by noise; The semantic segmentation module is configured to perform semantic segmentation on the low-light image to be detected based on a pre-brightening strategy to obtain the semantic segmentation mask of the image. The prior feature map acquisition module is configured to: use the semantic segmentation mask to separate each object region on the original low-light image, and input it into the trained color encoder to extract the object color information to obtain the color prior feature map; The feature fusion module is configured to input semantic edge information and color prior feature map into the semantic-color adaptive normalization module to obtain fused conditional features. The image enhancement module is configured to input the original low-light image and the fused conditional features into the enhancement network and output an enhanced image.
9. A computer-readable storage medium having a program stored thereon, characterized in that, When executed by a processor, the program implements the steps of an image enhancement method based on semantic segmentation and color prior guidance as described in any one of claims 1-7.
10. An electronic device comprising a memory, a processor, and a program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the steps of the image enhancement method based on semantic segmentation and color prior guidance as described in any one of claims 1-7.