Weakly supervised image enhancement method based on terrestrial codebook prior and contrast constraint
By adopting a weakly supervised image enhancement method based on land codebook priors and contrast constraints, the problem of poor adaptability in underwater image enhancement is solved, and high-quality underwater image generation is achieved, which is suitable for a variety of underwater application scenarios.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- QINGDAO UNIV OF TECH
- Filing Date
- 2026-05-22
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies have poor adaptability in underwater image enhancement, making it difficult to adapt to diverse degradation scenarios and easily causing problems such as color distortion, uneven contrast, or local over-enhancement.
We employ a weakly supervised image augmentation method based on land codebook priors and contrast constraints. By constructing a neural network containing pre-trained, learnable, and supervised modules, we adopt a two-stage progressive training approach and combine a confidence predictor with a distance-guided iterative inference mechanism to optimize latent features and generate high-quality underwater images.
It significantly improves the color reproduction and texture details of underwater images, enhances the ability to model depth changes and regional attenuation differences, and generates images of higher quality than existing methods. It is suitable for scenarios such as underwater robot vision, marine life monitoring, underwater archaeology, and underwater facility inspection.
Smart Images

Figure CN122243839A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of image processing technology, and more specifically to a weakly supervised image enhancement method based on terrestrial codebook priors and contrast constraints. Background Technology
[0002] With the development of deep learning technology, supervised learning methods based on convolutional neural networks (CNNs) have become mainstream. These methods typically utilize paired datasets of degraded underwater images and reference images (sharp images) for end-to-end training, learning a mapping function from degraded to sharp images. Furthermore, generative adversarial networks (GANs) have been introduced to generate more realistic texture details, while diffusion models achieve high-quality image generation through an iterative denoising process. These data-driven methods can achieve remarkable results under ideal conditions, but their performance is inherently dependent on the availability of large-scale, high-quality paired training data.
[0003] Traditional methods rely on model assumptions and empirical parameters, resulting in poor adaptability. Physical model-based methods (such as dark channel priors and underwater imaging models) depend on idealized assumptions about water attenuation and scattering characteristics, while model-free methods (such as white balance and histogram equalization) are highly dependent on manually set parameters. Due to the complex variations in lighting, water quality, and depth in real underwater environments, these methods struggle to adapt to diverse degradation scenarios, easily leading to problems such as color distortion, uneven contrast, or local over-enhancement. Summary of the Invention
[0004] This invention provides a weakly supervised image enhancement method based on land codebook priors and contrast constraints to address the problem of poor adaptability of traditional methods.
[0005] This invention employs the following technical method: a weakly supervised image enhancement method based on land codebook priors and contrast constraints, comprising the following steps: S1. Construct a weakly supervised underwater image enhancement neural network based on land codebook priors, wherein the neural network includes a pre-training module, a learnable module, and a supervised module; The pre-training module includes a pre-trained VQ encoder. Pre-trained VQ decoder and discrete codebook The discrete codebook Derived from a pre-trained VQ-GAN model; The learnable module includes a learnable encoder. Learnable decoder Codebook predictor Confidence predictor It is used to achieve feature extraction, quantitative prediction and iterative optimization; The supervision module includes a contrastive loss calculation unit, which is used to construct contrastive constraints and provide supervision signals for network training; S2. The network is trained using a two-stage progressive training method: In training stage I, the discrete codebook is frozen. With pre-trained VQ encoder The parameters are derived from the pre-trained VQ encoder. Positive and negative sample supervision signals that normally participate in network forward inference and provide contrast constraints are used to jointly optimize the learnable encoder. Codebook predictor With learnable decoders The parameters; in training phase II, the parameters of the remaining modules are frozen, and only the confidence predictor is trained; S3. Acquire the underwater degraded image to be enhanced and preprocess the image to obtain image data that meets the network input requirements; S4. Input the preprocessed image data into the trained network, which is then processed by the learnable encoder. Extract initial latent features, and then pass them through the codebook predictor. Predict codeword sequences and query discrete codebooks Obtain quantified features; S5. Construct a mask based on the confidence score output by the confidence predictor, optimize the latent features through iterative updates, and input the optimized latent features into the learnable decoder. The reconstructed output is an enhanced underwater image.
[0006] Furthermore, S1 includes S1.1 determining the land codebook prior; The land codebook prior originates from a pre-trained VQ-GAN model, and the land codebook prior contains a fixed discrete codebook. : ; In the formula, It is a discrete codebook. For discrete codebooks A single embedding vector in For embedded dimensions, Size of the discrete codebook; The discrete codebook As a frozen query-based prior knowledge base, it does not participate in network parameter updates and is only used for subsequent feature quantization retrieval.
[0007] Furthermore, S1 includes S1.2, a network forward process, which involves processing the underwater image to be enhanced. After inputting into a neural network, an encoder can be learned. Extracting initial latent features from an image ; Subsequently, the codebook predictor In discrete codebooks of Estimating the probability distribution over each codeword: ; In the formula, For codebook predictor The module's output probability distribution Let be the spatial dimension of the latent feature map of the image, where The height of the latent feature map, The width of the latent feature map. Size of the discrete codebook; Codebook Predictor Output probability distribution The discrete codeword sequence is obtained by finding the codeword index corresponding to the maximum value. ; Based on discrete codeword sequence From discrete codebooks Retrieve the corresponding embedding vector from the data to construct the quantized latent features. Learnable decoder latent features Reconstructed into a preliminary enhanced image The preliminary enhanced image Used for comparative constraint supervision and iterative optimization; Introducing a confidence predictor Output discrete codeword sequence Confidence score at each position .
[0008] Furthermore, S2 includes S2.1 codebook predictor training phase I: Input an underwater degraded image and a reference image; both are simultaneously fed into a learnable encoder. and pre-trained VQ encoder , will come from the learnable encoder Multiscale features With pre-trained VQ encoder Features Fusion via deformable convolution: ; In the formula, The characteristics after fusion To demonstrate the channel splicing, For ordinary convolution, This is a deformable convolution operation; The latent features of underwater degradation images are The reference image quantization features are According to random numbers Determine the mask ratio : ; According to the mask ratio Random generation and Binary masks with the same shape Based on binary mask Constructing hybrid features : ; In the formula, For potential features of underwater degraded images, It is a binary mask. For element-wise multiplication, Quantize features for the reference image; Reference image codeword sequence For the objective, minimize the negative log-likelihood: ; In the formula, To predict the loss for the codebook, For mixed features Under the condition that the parameter is The network predicts the codeword sequence of the reference image. The Middle The conditional probability of a codeword. Reference image codeword sequence The first in Each code character; Potential features of underwater degraded images Using anchor points, quantize the features of the degraded image. As negative samples, i.e. degradation features, reference image quantization features As positive samples, i.e., clear features, a contrastive loss is constructed. : ; In the formula, The mean squared error (MSE) is... It is a small constant that is numerically stable; Comprehensive comparison of losses Codebook prediction loss Reconstruction losses GAN combat loss and perceived loss The total loss for training phase I is obtained. : ; In the formula, To compare the losses, To predict the loss for the codebook, To rebuild the losses, To combat losses in GANs To perceive loss; The reconstruction loss for: ; In the formula, For reference image; The GAN adversarial loss for: ; In the formula, For mathematical expectation, To score the discriminator, To initially enhance the image Input discriminator, obtain discriminator pair The score for determining the authenticity of an item; The perceived loss for: ; In the formula, For the pre-trained VGG feature extraction network, To convert high-definition standard reference images Inputting the pre-trained VGG feature extraction network yields the resulting feature map. To initially enhance the image Inputting the pre-trained VGG feature extraction network yields the resulting feature map. It is an L1 norm operator.
[0009] Furthermore, S2 includes S2.2 Confidence Predictor Training Phase II: Freeze pre-trained VQ encoder Learnable encoder Discrete codebook Codebook predictor Learnable decoder Pre-trained VQ decoder Only for confidence predictors Conduct training; For the predicted sequence With reference sequence Constructing a binary target through bit-by-bit comparison Based on binary target Define binary cross-entropy loss : ; In the formula, The binary cross-entropy loss is the result of binary matching supervision. This represents the total number of codewords in the codeword sequence. For the index of the codeword sequence, For binary objectives The One element, For confidence predictor For discrete codeword sequences The Middle individual code characters Output confidence score; Calculate the quantization features of the predicted image Quantization features of degraded images The degree of separation is determined by maximum-minimum normalization. Normalize the degree of separation to obtain the normalized result: ; In the formula, The degree of feature separation after normalization. For the minimax normalization operation, This is a measure of the degree of feature separation. Quantize features for degraded images, To predict image quantization features, Codebook predictor Output discrete codeword sequence Retrieve discrete codebook After the embedding vector is obtained, it is quantized. Based on normalization results Constructing cross-entropy loss corresponding to difference-guided soft supervision : ; In the formula, For difference-guided soft supervision, the corresponding cross-entropy loss is... The first normalization result One element; Combined with binary matching supervision loss Cross-entropy loss corresponding to difference-guided soft supervision The total loss of training phase II is obtained. : ; In the formula, The supervised loss is a binary matching loss. This is the weighting coefficient, with a value of 0.1; By minimizing the total loss Complete the confidence predictor Training.
[0010] Furthermore, S4 specifically includes: Preprocessed underwater degradation images from S3 Input data into the trained network and set the maximum number of iterations. Learnable encoder Extracting degraded images Potential characteristics of data ; Regarding the potential features Quantization processing yields degraded images. Quantitative features: ; In the formula, For quantitative operations; Initialize current features In the The next iteration is performed by the codebook predictor. For latent features Process the data to generate a probability distribution: ; Discrete codeword sequences are obtained through sampling. Based on discrete codeword sequences Retrieving Discrete Codebooks Quantized features of the predicted image are obtained. : ; In the formula, For discrete codeword order Retrieve discrete codebook The embedding vector corresponds to each codeword.
[0011] Furthermore, S5 specifically includes: By confidence predictor For discrete codeword sequences Perform a confidence assessment and output the initial confidence level. : ; Calculate distance-aware confidence :
[0012] ; In the formula, For cosine similarity, It is the sigmoid function; By fusing the initial confidence level and the distance-aware confidence level, we obtain the first... Confidence of the next iteration : ; In the formula, This is the weighting coefficient, with a value of 0.4; According to the current number Confidence of the next iteration Construct the first The binary mask corresponding to the next iteration , The tokens are used to mark low-confidence regions in latent features, and the updated token count is scheduled using cosine annealing. Sure, ; Current iteration number Less than the maximum number of iterations Then update the input features for the next iteration: ; In the formula, For potential features of underwater degraded images, To predict image quantization features, For the first Binary mask for the next iteration This is an element-wise multiplication operation. These are potential features to be used in the next iteration after the update;
[0013] Update the iteration count to the current iteration count plus 1, enter the next iteration loop, and return to the iteration process of S4; Current iteration number Reaching the maximum number of iterations If the iteration stops, the optimization of the potential features is complete; After reaching the final iteration, the decoded output is an enhanced underwater image. : ; In the formula, The final output is an enhanced underwater image. To quantize features of the predicted image Input-learnable decoder The output is an enhanced underwater image.
[0014] Compared with the prior art, the present invention has the following technical effects: This invention utilizes a VQ-GAN codebook pre-trained from land images as a high-quality visual prior, replacing the strong reliance on ideal underwater reference images in traditional methods. Even when the underwater reference image has color deviations or missing details, the method of this invention can still generate enhancement results that are significantly better than those of the reference image.
[0015] This invention proposes a two-stage contrastive learning framework that combines global degradation suppression with local adaptive refinement, significantly improving the modeling ability for non-uniform degradation caused by depth variations and regional attenuation differences in underwater images (such as color and contrast differences between shallow and deep water areas). By introducing a confidence predictor and a distance-guided iterative inference mechanism, the model can progressively correct low-confidence regions while preserving high-quality areas. The generated images outperform existing methods in color reproduction, texture detail, and structural consistency. This invention can be widely applied to scenarios requiring clear underwater images, such as underwater robot vision, marine life monitoring, underwater archaeology, and underwater facility inspection. Attached Figure Description
[0016] Figure 1 This is a flowchart illustrating the weakly supervised underwater image enhancement method based on terrestrial codebook priors and contrast constraints of the present invention. Detailed Implementation
[0017] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of this invention are described clearly and completely below. Obviously, the described embodiments are only some, not all, of the embodiments of this invention. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without creative effort are within the scope of protection of this invention.
[0018] Example 1 like Figure 1 As shown, this embodiment provides a weakly supervised underwater image enhancement method based on land codebook priors and contrast constraints.
[0019] Quantitative comparison experiment using the T200 dataset This embodiment conducts quantitative comparative analysis based on the T200 dataset, using two reference evaluation metrics (peak signal-to-noise ratio PSNR and structural similarity SSIM) and four non-reference quality evaluation metrics (UIQM, CCF, AG, and Uranker).
[0020] Algorithm Comparison: Traditional Algorithms (GDCP, HFM) vs. Deep Learning Algorithms (CWR, HCLR, UDA, NU) 2 Net).
[0021] Table 1 Comparison of UIQM, CCF, AG, Uranker, PSNR, and SSIM metrics for each method .
[0022] As shown in Table 1, the proposed method (OURS) achieves a leading advantage in several key evaluation metrics. Specifically, it achieves the highest values (1.016, 17.285, and 2.332) for the no-reference metrics UIQM, AG, and Uranker, respectively, demonstrating significant improvements over existing traditional algorithms and deep learning algorithms. It also maintains strong competitiveness in PSNR, SSIM, and CCF metrics, consistently ranking among the top performers overall. The experimental data fully validates the superiority and robustness of the proposed method in underwater image enhancement tasks.
[0023] Example 2 Quantitative comparison experiment without reference dataset Quantitative comparisons based on a no-reference underwater image dataset do not require a clear reference image, making them more relevant to real-world applications and validating the generalization ability of this invention.
[0024] Test datasets: Includes four types of no-reference datasets: C60, USOD, EUVP, and RUIE, covering different underwater scenarios, which can comprehensively evaluate the model's performance enhancement in no-reference scenarios.
[0025] Evaluation indicators: Four non-referenced quality evaluation indicators were selected: UIQM, CCF, AG, and Uranker. Comparison Algorithm: Same as Example 1, selecting GDCP, HFM, CWR, HCLR, UDA, and NU. 2 Net was used as the comparison algorithm.
[0026] Table 2 Comparison of UIQM and CCF indices for different methods .
[0027] Table 3 Comparison of AG and Uranker indices for different methods .
[0028] As shown in Tables 2 and 3, the proposed method (Ours) demonstrates significant comprehensive advantages across the four evaluation metrics: UIQM, CCF, AG, and Uranker. In the UIQM metric, our method achieves the highest values on the C60, EUVP, and RUIE datasets, reaching 1.011 on RUIE, far surpassing other comparative algorithms. In the AG metric, our method ranks first on all four datasets, achieving scores of 12.241, 12.084, 21.611, and 12.846 respectively, showcasing excellent edge and detail preservation capabilities. In the CCF and Uranker metrics, our method also maintains consistently high scores, achieving optimal results on some datasets. Overall, our method exhibits superior restoration performance and robust generalization ability in various underwater scenarios, validating its effectiveness.
[0029] Example 3 Comparison Experiment of the Method of this Invention with the CodeUNet Model We selected CodeUNet, a mainstream underwater image enhancement model in the same field, as a comparison object and conducted a quantitative comparison of multiple indicators on the UIEB and C60 datasets.
[0030] Evaluation metrics: Six metrics were selected: UIQM, AG, CCF, Uranker, PSNR, and SSIM. The C60 dataset does not have PSNR and SSIM detection metrics, so they are marked as "-".
[0031] Table 4. Quantitative comparison results of the method of this invention and the CodeUNet model on the UIEB and C60 datasets. .
[0032] As shown in Table 4, on the UIEB and C60 test datasets, the TPE-Net of this invention outperforms the CodeUNet model in most metrics, including AG, CCF, Uranker, PSNR, and SSIM, with significant improvements, particularly in AG and CCF. Specifically, on the UIEB dataset, the CCF metric improved from 28.954 to 39.13, and the AG metric from 21.771 to 22.719; on the C60 dataset, the AG metric improved from 12.364 to 14.085, and the CCF metric from 20.037 to 28.127. Considering all metrics, the method of this invention offers more comprehensive performance enhancements and better robustness.
[0033] Example 4 Network structure ablation experiment To verify the effectiveness of each module, network structure ablation experiments were conducted, sequentially combining CP (contextual parallel module), CFP (feature pyramid module), CPCL (contrastive learning module for enhanced discriminative ability), and CFPCL (contrastive learning module for enhanced feature discriminative ability). Ours of this invention served as the control group, and UIQM and AG were used as evaluation metrics.
[0034] Evaluation metrics: UIQM and AG are two core, non-referenced evaluation metrics.
[0035] Experimental groups: CP+CFP, CPCL+CFP, CP+CFPCL, CPCL+CFPCL, a total of 4 groups; Control group: the method of this invention (Ours).
[0036] Table 5 Ablation experimental results of different module combinations .
[0037] As shown in Table 5, the method of the present invention (Ours) achieves a score of 1.016 on UIQM and 17.285 on AG, both significantly outperforming other module combinations, indicating a good synergistic enhancement effect among the modules. Experimental results demonstrate the rationality of the network structure design of the method of the present invention, and that each functional unit makes a positive contribution to the final image enhancement performance.
[0038] Example 5 Comparison Experiments on Feature Point Matching of RUIE Dataset Models To verify the superiority of the method (Ours) in restoring local texture and structural information of underwater images, this experiment conducted a feature point matching comparison experiment on the RUIE dataset, selecting Input, CWR algorithm, and NU. 2 The Net algorithm was used as a comparison object.
[0039] Evaluation metric: Number of interior points (N) inliers ) and interior point ratio (R inliers As an evaluation standard, the higher the value, the better the image feature point matching effect and the more complete the texture and structural information recovery.
[0040] Table 6. Experimental results comparing the method of this invention with feature point matching on the RUIE dataset. .
[0041] As can be seen from the data in Table 6, the method of this invention (Ours) achieved the best results in both the number of inliers and the ratio of inliers in the feature point matching test of the RUIE dataset: N inliers R reached 231.80 inliers It is 0.9276. Compared to the CWR algorithm (N... inliers =182.95, R inliers =0.9221) and NU 2 Net algorithm (N inliers =221.75、R inliers Significant improvements were achieved in both (=0.9158), fully verifying the superiority of the method of the present invention (Ours) in restoring local texture and structural information of underwater images, and providing a high-quality image foundation for subsequent downstream tasks such as underwater image registration and target tracking.
[0042] Example 6 USOD dataset saliency detection comparison experiment To verify the adaptability of the method (Ours) of this invention to downstream visual tasks, this embodiment conducts comparative experiments on the USOD underwater salience detection dataset to evaluate the improvement effect of TPE-Net-enhanced images on salient target detection. Input, CWR algorithm, and NU were selected. 2 The Net algorithm was used as a comparison object.
[0043] Evaluation indicators: adopted and Two specific evaluation metrics for saliency detection are used. The higher the value, the better the accuracy and completeness of saliency target detection, and the stronger the adaptability of the image to downstream tasks.
[0044] Table 7. Comparison of saliency detection results between the method of this invention and the USOD dataset. .
[0045] As shown in Table 7, the method of this invention (Ours) performs well in the saliency detection test on the USOD dataset. and Both metrics achieved optimal results: Reaching 0.781, The value is 0.796. Compared to the CWR algorithm, the two metrics have improved by 0.029 and 0.032 respectively; compared to NU... 2 The Net algorithm improved two metrics by 0.005 and 0.003, respectively. These consistent improvements demonstrate that the method of our invention (Ours) can provide a more favorable image representation for saliency prediction, especially with significant advantages in foreground integrity and structure preservation. It is highly adaptable to downstream tasks and can effectively support practical applications such as underwater target recognition.
[0046] Of course, the above description is not intended to limit the present invention, and the present invention is not limited to the examples given above. Any changes, modifications, additions or substitutions made by those skilled in the art within the scope of the present invention should also fall within the protection scope of the present invention.
Claims
1. A weakly supervised image enhancement method based on land codebook prior and contrast constraints, characterized in that, include: S1. Construct a weakly supervised underwater image enhancement neural network based on land codebook priors, wherein the neural network includes a pre-training module, a learnable module, and a supervised module; The pre-training module includes a pre-trained VQ encoder. Pre-trained VQ decoder and discrete codebook The discrete codebook Derived from a pre-trained VQ-GAN model; The learnable module includes a learnable encoder. Learnable decoder Codebook predictor Confidence predictor It is used to achieve feature extraction, quantitative prediction and iterative optimization; The supervision module includes a contrastive loss calculation unit, which is used to construct contrastive constraints and provide supervision signals for network training; S2. The network is trained using a two-stage progressive training method: In training stage I, the discrete codebook is frozen. With pre-trained VQ encoder The parameters are derived from the pre-trained VQ encoder. Positive and negative sample supervision signals that normally participate in network forward inference and provide contrast constraints are used to jointly optimize the learnable encoder. Codebook predictor With learnable decoders The parameters; in training phase II, the parameters of the remaining modules are frozen, and only the confidence predictor is trained; S3. Acquire the underwater degraded image to be enhanced and preprocess the image to obtain image data that meets the network input requirements; S4. Input the preprocessed image data into the trained network, which is then processed by the learnable encoder. Extract initial latent features, and then pass them through the codebook predictor. Predict codeword sequences and query discrete codebooks Obtain quantified features; S5. Construct a mask based on the confidence score output by the confidence predictor, optimize the latent features through iterative updates, and input the optimized latent features into the learnable decoder. The reconstructed output is an enhanced underwater image.
2. The weakly supervised image enhancement method based on land codebook prior and contrast constraints according to claim 1, characterized in that, S1 includes S1.1 determining the land codebook prior; The land codebook prior originates from a pre-trained VQ-GAN model, and the land codebook prior contains a fixed discrete codebook. : ; In the formula, It is a discrete codebook. For discrete codebooks A single embedding vector in For embedded dimensions, Size of the discrete codebook; The discrete codebook As a frozen query-based prior knowledge base, it does not participate in network parameter updates and is only used for subsequent feature quantization retrieval.
3. The weakly supervised image enhancement method based on land codebook prior and contrast constraints according to claim 1, characterized in that, S1 includes the S1.2 network forward process, which takes the underwater image to be enhanced. After inputting into a neural network, an encoder can be learned. Extracting initial latent features from an image ; Subsequently, the codebook predictor In discrete codebooks of Estimating the probability distribution over each codeword: ; In the formula, For codebook predictor The module's output probability distribution Let be the spatial dimension of the latent feature map of the image, where The height of the latent feature map, The width of the latent feature map. Size of the discrete codebook; Codebook Predictor Output probability distribution The discrete codeword sequence is obtained by finding the codeword index corresponding to the maximum value. ; Based on discrete codeword sequence From discrete codebook Retrieve the corresponding embedding vector from the middle to form the quantized latent features. Learnable decoder latent features Reconstructed into a preliminary enhanced image The preliminary enhanced image Used for comparative constraint supervision and iterative optimization; Introducing a confidence predictor Output discrete codeword sequence Confidence score at each position .
4. The weakly supervised image enhancement method based on land codebook prior and contrast constraints according to claim 1, characterized in that, S2 includes S2.1 codebook predictor training phase I: Input an underwater degraded image and a reference image; both are simultaneously fed into a learnable encoder. and pre-trained VQ encoder , will come from the learnable encoder Multiscale features With pre-trained VQ encoder Features Fusion via deformable convolution: ; In the formula, The characteristics after fusion To demonstrate the channel splicing, For ordinary convolution, This is a deformable convolution operation; The latent features of underwater degraded images are The reference image quantization features are According to random numbers Determine the mask ratio : ; According to the mask ratio Random generation and Binary masks with the same shape Based on binary mask Constructing hybrid features : ; In the formula, For potential features of underwater degraded images, It is a binary mask. For element-wise multiplication, Quantize features for the reference image; Reference image codeword sequence For the objective, minimize the negative log-likelihood: ; In the formula, To predict the loss for the codebook, For mixed features Under the condition that the parameter is The network predicts the codeword sequence of the reference image. The Middle The conditional probability of a codeword. Reference image codeword sequence The first in Each code character; Potential features of underwater degraded images Using anchor points, quantize the features of the degraded image. As negative samples, i.e. degradation features, reference image quantization features As positive samples, i.e., clear features, a contrastive loss is constructed. : ; In the formula, The mean squared error (MSE) is... It is a small constant that is numerically stable; Comprehensive comparison of losses Codebook prediction loss Reconstruction losses GAN combat loss and perceived loss The total loss for training phase I is obtained. : ; In the formula, To compare the losses, To predict the loss for the codebook, To rebuild the losses, To combat losses in GANs To perceive loss; The reconstruction loss for: ; In the formula, For reference image; The GAN adversarial loss for: ; In the formula, For mathematical expectation, To score the discriminator, To enhance the initial image Input discriminator, obtain discriminator pair The score for determining the authenticity of an item; The perceived loss for: ; In the formula, For the pre-trained VGG feature extraction network, To convert high-definition standard reference images Inputting the pre-trained VGG feature extraction network yields the resulting feature map. To initially enhance the image Inputting the pre-trained VGG feature extraction network yields the resulting feature map. It is an L1 norm operator.
5. The weakly supervised image enhancement method based on land codebook prior and contrast constraints according to claim 1, characterized in that, S2 includes S2.2 Confidence Predictor Training Phase II: Freeze pre-trained VQ encoder Learnable encoder Discrete codebook Codebook predictor Learnable decoder Pre-trained VQ decoder Only for confidence predictors Conduct training; For the predicted sequence With reference sequence Constructing a binary target through bit-by-bit comparison Based on binary target Define binary cross-entropy loss : ; In the formula, The binary cross-entropy loss is the result of binary matching supervision. This represents the total number of codewords in the codeword sequence. For the index of the codeword sequence, For binary objectives The One element, For confidence predictor For discrete codeword sequences The Middle individual code characters Output confidence score; Calculate the quantization features of the predicted image Quantization features of degraded images The degree of separation is determined by maximum-minimum normalization. Normalize the degree of separation to obtain the normalized result: ; In the formula, The degree of feature separation after normalization. For the minimax normalization operation, This is a measure of the degree of feature separation. Quantize features for degraded images, To predict image quantization features, Codebook predictor Output discrete codeword sequence Retrieve discrete codebook After the embedding vector is obtained, it is quantized. Based on normalization results Constructing cross-entropy loss corresponding to difference-guided soft supervision : ; In the formula, For difference-guided soft supervision, the corresponding cross-entropy loss is... The first normalization result One element; Combined with binary matching supervision loss Cross-entropy loss corresponding to difference-guided soft supervision The total loss of training phase II is obtained. : ; In the formula, The supervised loss is a binary matching loss. This is the weighting coefficient, with a value of 0.1; By minimizing the total loss Complete the confidence predictor Training.
6. The weakly supervised image enhancement method based on land codebook prior and contrast constraints according to claim 1, characterized in that, S4 specifically includes: Preprocessed underwater degraded images from S3 Input data into the trained network and set the maximum number of iterations. Learnable encoder Extracting degraded images Potential characteristics of data ; Regarding the potential features Quantization processing yields degraded images. Quantitative features: ; In the formula, For quantitative operations; Initialize current features In the The next iteration is performed by the codebook predictor. For latent features Process the data to generate a probability distribution: ; Discrete codeword sequences are obtained through sampling. Based on discrete codeword sequences Retrieving Discrete Codebooks Quantized features of the predicted image are obtained. : ; In the formula, For discrete codeword order Retrieve discrete codebook The embedding vector corresponds to each codeword.
7. The weakly supervised image enhancement method based on land codebook prior and contrast constraints according to claim 1, characterized in that, S5 specifically includes: By confidence predictor For discrete codeword sequences Perform a confidence assessment and output the initial confidence level. : ; Calculate distance-aware confidence : ; In the formula, For cosine similarity, It is the sigmoid function; By fusing the initial confidence level and the distance-aware confidence level, we obtain the first... Confidence of the next iteration : ; In the formula, This is the weighting coefficient, with a value of 0.4; According to the current number Confidence of the next iteration Construct the first The binary mask corresponding to the next iteration , The tokens are used to mark low-confidence regions in latent features, and the updated token count is scheduled using cosine annealing. Sure, ; Current iteration number Less than the maximum number of iterations Then update the input features for the next iteration: ; In the formula, For potential features of underwater degraded images, To predict image quantization features, For the first Binary mask for the next iteration This is an element-wise multiplication operation. These are potential features to be used in the next iteration after the update; Update the iteration count to the current iteration count plus 1, enter the next iteration loop, and return to the iteration process of S4; Current iteration number Reaching the maximum number of iterations If the iteration stops, the optimization of the potential features is complete; After reaching the final iteration, the decoded output is an enhanced underwater image. : ; In the formula, The final output is an enhanced underwater image. To quantize features of the predicted image Input-learnable decoder The output is an enhanced underwater image.