Image enhancement method and system for underground construction environment of hydropower project, electronic device and storage medium
The TaylorFormer-HydFogNet image enhancement network solves the image degradation problem in underground construction environments of hydropower projects, improving image visibility and structural consistency, and is suitable for image enhancement in underground construction environments of hydropower projects.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHINA THREE GORGES UNIV
- Filing Date
- 2026-04-22
- Publication Date
- 2026-06-19
AI Technical Summary
Monitoring images from underground construction environments in hydropower projects are prone to reduced contrast, increased fogging, blurred edges, local overexposure or underexposure, and loss of texture details, affecting the accuracy of personnel positioning, equipment detection, and identification of hazardous areas.
The TaylorFormer-HydFogNet image augmentation network is adopted, which performs image augmentation through a multi-scale patch embedding module, a TaylorFormer backbone module, and a multi-scale attention refinement module, combined with the L1Loss loss function and the AdamW optimizer.
It effectively improves the overall visibility of monitoring images, reduces low contrast, blurred edges and lack of detail, and has the ability to model both local structures and global dependencies, making it suitable for deployment at the edge of engineering sites.
Smart Images

Figure CN122243838A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of image enhancement technology, specifically relating to an image enhancement method, system, electronic device, and storage medium for underground construction environments in hydropower projects. Background Technology
[0002] The underground construction environment of hydropower projects is complex. Operating spaces such as tunnels and underground powerhouses typically exhibit characteristics such as high dust levels, high humidity, limited air circulation, and uneven distribution of artificial lighting. Under the combined effects of drilling and blasting excavation, dust suppression spraying, and mechanical operations, monitoring images at construction sites are prone to degradation phenomena such as decreased contrast, enhanced fogging, blurred edges, localized overexposure or underexposure, and loss of texture details. This affects the recognizability and usability of the monitoring images, thereby hindering the accurate execution of visual tasks such as personnel positioning, equipment inspection, hazardous area identification, and safety management in underground construction environments. Therefore, there is an urgent need to propose a monitoring image enhancement method that is applicable to the complex degradation scenarios of underground hydropower construction, balancing structural preservation, detail restoration, and computational efficiency. Summary of the Invention
[0003] The technical problem to be solved by the present invention is to provide an image enhancement method, system, electronic device and storage medium for underground construction environment of hydropower projects, so as to solve the problems of unstable enhancement effect, insufficient global structural consistency, insufficient edge texture restoration and limited deployment efficiency of existing methods in complex underground construction scenarios.
[0004] To solve the above-mentioned technical problems, the technical solution adopted by the present invention is: an image enhancement method for the underground construction environment of hydropower projects, comprising the following steps: S1. Data Construction: Collect monitoring images from the construction environment of the hydropower station tunnel to form a dataset. The monitoring images include underground construction operation scenes. S2. Data preprocessing: Normalize, uniformly crop, and augment the dataset. The data augmentation operations include random horizontal flipping, random cropping, and color perturbation. The preprocessed dataset is used for model training and testing. S3. Model Construction: The TaylorFormer-HydFogNet image enhancement network is constructed. The network consists of a multi-scale patch embedding module, a TaylorFormer backbone module, and a multi-scale attention refinement module. S4. Model Training: Dynamic training is completed using L1Loss as the main loss function. S5. Model Inference and Enhanced Output: Input the tunnel construction image to be enhanced into the model trained in step S4, and output the processed enhanced image.
[0005] In the preferred embodiment, step S1 specifically includes the following steps: S101. Collect construction videos in the construction environment of the hydropower station tunnel. The construction videos include clear working condition videos and videos of working conditions with high dust, high humidity, and non-uniform lighting. Among them, clear construction images are extracted from the clear working condition videos to construct a paired training dataset; real degraded construction images are extracted from the videos of working conditions with high dust, high humidity, and non-uniform lighting for model testing. S102. Segment the collected construction video into single-frame images.
[0006] In a preferred embodiment, step S2 includes the following steps: S201. Use a monocular depth estimation algorithm to generate corresponding depth maps for clear construction images, and combine them with a physical atmospheric scattering model to synthesize fog images with different degradation intensities. Combine clear images with synthesized fog images to form a paired dataset. S202. Normalize, uniformly sized, and augment the paired dataset.
[0007] In the preferred embodiment, the processing procedure of the TaylorFormer-HydFogNet image enhancement network in step S3 is as follows: S301. Input the tunnel construction image to be enhanced into the multi-scale patch embedding module, and use overlapping deformable convolution to extract multi-scale features from the input image to obtain multi-scale features that characterize local structural information. S302. Input the multi-scale features into the TaylorFormer backbone module, perform hierarchical encoding processing on the multi-scale features through multiple TaylorFormerBlocks, generate query Q, key K, and value V through linear projection, and use a linear attention mechanism based on Taylor series expansion to model global dependency relationships and obtain deep semantic features. S303. Input the deep semantic features into the multi-scale attention refinement module, align and fuse the features at different scales, and refine the features by combining relative position encoding, self-attention mechanism, residual connection and layer normalization to obtain refined features. S304. The refined features are input to the decoding end for layer-by-layer upsampling and reconstruction, and the corresponding layer features of the encoding end are fused through skip connections to output an enhanced image.
[0008] In the preferred embodiment, in step S3, the multi-scale patch embedding module adopts Depth-Separable Deformable Convolution (DSDCN), which splits the convolution operation of standard DCN into depth-deformable convolution and pointwise convolution. Depth-deformable convolution is responsible for spatial sampling and local response modeling, while pointwise convolution is responsible for cross-channel information fusion.
[0009] In the preferred embodiment, step S4 specifically includes the following steps: S401. Introduce L1Loss as the primary loss function to calculate the predicted value. The average of the absolute errors between the true value y and the actual value y; S402. Train the network using the AdamW optimizer.
[0010] In a preferred embodiment, in step S5, the enhanced image is evaluated using mean square error (MSE) and / or peak signal-to-noise ratio (PSNR) and / or structural similarity (SSIM) and / or multi-scale structural similarity (MS-SSIM) and / or perceptual image patch similarity (LPIPS).
[0011] The present invention also provides an image enhancement system for underground construction environments in hydropower projects, used to perform the above-described method, comprising: The data acquisition unit is configured to acquire monitoring image data of underground construction scenes in hydropower projects; The preprocessing unit is configured to normalize, crop, and augment the acquired image data to construct a paired dataset containing clear images and synthetic degraded images. The model unit is configured to store and run the TaylorFormer-HydFogNet image augmentation model, receive underground construction images of hydropower projects to be augmented, and output clear augmented construction images.
[0012] The present invention also provides an electronic device, comprising: At least one processor; A memory that is communicatively connected to the at least one processor; The memory stores instructions that can be executed by the at least one processor. When the instructions are executed by the at least one processor, the electronic device performs the image enhancement method for the underground construction environment of the hydropower project described above.
[0013] The present invention also provides a computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the above-mentioned image enhancement method for the underground construction environment of hydropower projects.
[0014] The present invention provides an image enhancement method, system, electronic device, and storage medium for underground construction environments in hydropower projects, which has the following beneficial effects: 1. In response to the problems of dust-humidity coupling degradation and non-uniform lighting in underground construction scenarios of hydropower projects, this invention can effectively improve the overall visibility of monitoring images and improve the phenomena of low contrast, blurred edges and lack of detail.
[0015] 2. By combining multi-scale deformable convolution with Taylor linear attention, this invention balances local structure modeling and global dependency modeling capabilities, improving image quality while reducing computational complexity, making it suitable for edge deployment in engineering projects.
[0016] 3. This invention adopts a paired sample construction method based on monocular depth estimation and atmospheric scattering model, combined with depth map quality control, which can form a sample set suitable for supervised training under the condition of lacking strictly paired real clear images, thereby improving the model's adaptability to real underground construction scenarios. Attached Figure Description
[0017] The accompanying drawings, which are provided to further illustrate the invention and constitute a part of this invention, do not constitute an undue limitation thereof. In the drawings: Figure 1 An image enhancement framework for the underground construction environment of hydropower projects; Figure 2 Generate depth maps for sharp images; Figure 3 Image-enhanced model structure for underground construction environment of hydropower projects; Figure 4 The trends of L1LOSS and PSNR during the training process; Figure 5 Comparison of synthesized images using different methods; Figure 6 Comparison of experimental results for ordinary tunnel construction scenarios using different methods; Figure 7 Comparison of experimental results for large-scale underground construction scenes using different methods. Detailed Implementation
[0018] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are for illustrative purposes only and are not intended to limit the invention.
[0019] Example 1: An image enhancement method for the underground construction environment of a hydropower project, such as Figure 1 As shown, it includes the following steps: S1. Data Construction: Collect monitoring images from the construction environment of the hydropower station tunnel to form a dataset. The monitoring images include underground construction operation scenes.
[0020] Specifically, the following steps are included: S101. Collect construction videos in the construction environment of the hydropower station tunnel. The construction videos include clear working condition videos and videos of working conditions with high dust, high humidity, and non-uniform lighting. Among them, clear construction images are extracted from the clear working condition videos to construct a paired training dataset; real degraded construction images are extracted from the videos of working conditions with high dust, high humidity, and non-uniform lighting for model testing.
[0021] S102. Segment the collected construction video into single-frame images.
[0022] S2. Data preprocessing: Normalize, uniformly crop, and augment the dataset. The data augmentation operations include random horizontal flipping, random cropping, and color perturbation. The preprocessed dataset is used for model training and testing.
[0023] Specifically, the following steps are included: S201. Use a monocular depth estimation algorithm to generate corresponding depth maps for clear construction images, and combine them with a physical atmospheric scattering model to synthesize fog images with different degradation intensities. Combine the clear images with the synthesized fog images to form a paired dataset.
[0024] Combination Figure 2 The clear construction images are used to generate corresponding depth maps based on a monocular depth estimation algorithm, and then combined with a physical atmospheric scattering model to synthesize fog images with different degradation intensities. The clear images and the synthesized fog images are paired datasets, and depth maps are generated from the clear images to construct paired dataset images for model training.
[0025] S202. Normalize, uniformly crop, and augment the paired dataset. The data augmentation operations include random horizontal flipping, random cropping, and color perturbation.
[0026] S3. Model Construction: The TaylorFormer-HydFogNet image enhancement network is constructed. The network consists of a multi-scale patch embedding module, a TaylorFormer backbone module, and a multi-scale attention refinement module.
[0027] The processing procedure of the TaylorFormer-HydFogNet image augmentation network is as follows: S301. Input the tunnel construction image to be enhanced into the multi-scale patch embedding module, and use overlapping deformable convolution to extract multi-scale features from the input image to obtain multi-scale features that represent local structural information.
[0028] The multi-scale patch embedding module adopts Depth-Separable Deformable Convolution (DSDCN), which splits the convolution operation of standard DCN into depth-deformable convolution and pointwise convolution. Depth-deformable convolution is responsible for spatial sampling and local response modeling, while pointwise convolution is responsible for cross-channel information fusion.
[0029] like Figure 3 As shown, to overcome the limitations of traditional fixed-window partitioning methods in representing heterogeneous image regions, this paper proposes a multi-scale patch embedding module for overlapping deformable convolution in construction images of hydropower station tunnels. Simultaneously, to reduce the computational cost of deformable convolution, we employ Depth-Separable Deformable Convolution (DSDCN): the convolution operation of standard DCN is represented as a combination of depth-deformable convolution and pointwise convolution, where the former handles spatial sampling and local response modeling, and the latter handles cross-channel information fusion. The computational costs of standard DCN and DSDCN for h×w images are as follows: ; ; In the formula: M and N are the number of channels in the input and output, respectively; K is the kernel size of the convolution.
[0030] The number of parameters for DCN and DSDCN are as follows: ; ; Compared to DCN, DSDCN significantly reduces computational complexity and the number of parameters. Furthermore, through spatial adaptive adjustment of the kernel shape, it achieves highly sensitive extraction of complex edges, blurred textures, and mechanical structures in underground tunnel construction scenarios.
[0031] The output is a patch embedding representation with rich spatial structure awareness, providing structural prior support for subsequent attention mechanisms. The expression is: ; In the formula: Position in output features The value at; The weights of the k-th convolutional kernel; Standard sampling locations for convolution kernels; Learnable offset, The total number of samples in the convolution kernel; Output spatial location indexes in the feature map.
[0032] The multi-scale patch embedding module employs overlapping deformable convolution, which adaptively adjusts the convolution sampling position to achieve sensitive modeling of complex edges, blurred textures, and mechanical structures in the tunnel environment.
[0033] S302. Input the multi-scale features into the TaylorFormer backbone module, perform hierarchical encoding processing on the multi-scale features through multiple TaylorFormerBlocks, generate query Q, key K, and value V through linear projection, and use a linear attention mechanism based on Taylor series expansion to model global dependency relationships to obtain deep semantic features.
[0034] TaylorFormer's main module performs hierarchical processing on multi-scale features, generates query Q, key K, and value V through linear projection, and uses Taylor series to perform a second-order approximation of the Softmax function to complete global dependency modeling.
[0035] First, perform a linear projection on the input features to obtain the query (Q), key (K), and value (V), expressed as: ; The Softmax function is approximated by a second-order Taylor series, and the expression is as follows: ; In the formula: Represents the Query matrix; Represents the Key matrix; Represents the Value matrix; This represents the similarity (relevance) matrix between the query and the key. The square root of the feature channel dimension; represents the square of a matrix, used for the Taylor second-order approximation Softmax; The output represents the weighted value after attention weighting. Represents the feature dimension.
[0036] While maintaining overall reconstruction quality, TaylorFormer can efficiently handle dehazing tasks for high-resolution tunnel construction images and has good engineering deployment potential.
[0037] S303. Input the deep semantic features into the multi-scale attention refinement module, align and fuse the features at different scales, and refine the features by combining relative position encoding, self-attention mechanism, residual connection and layer normalization to obtain refined features.
[0038] The multi-scale attention refinement module receives multi-scale features output from the TaylorFormer backbone module. Combining the self-attention mechanism of relative position encoding, residual connections, and layer normalization, it aligns and refines features at different scales, enhances local geometric structure and edge continuity, and corrects feature deviations caused by fogging and noise.
[0039] Multi-Scale Alignment integrates semantic and edge information at different scales to enhance the model's ability to perceive fog thickness and depth gradients. The Relative PositionAttention mechanism addresses the boundary weakening problem caused by strong non-uniform lighting, scattering, and reflection coupling in tunnel monitoring by explicitly injecting the relative displacement relationship between tokens into the attention weights, thereby strengthening the modeling of local geometric consistency and boundary continuity and mitigating the spatial ambiguity and boundary drift risks brought about by tokenized representation. Residual connectivity and normalization (Residual and LayerNorm) improve the stability of gradient flow and feature continuity, and optimize the training convergence speed.
[0040] S304. The refined features are input to the decoding end for layer-by-layer upsampling and reconstruction, and the corresponding layer features of the encoding end are fused through skip connections to output an enhanced image.
[0041] S4. Model Training: Dynamic training is performed using L1Loss as the main loss function.
[0042] Specifically, the following steps are included: S401. Introduce L1Loss as the primary loss function to calculate the predicted value. The average of the absolute errors between the true value y and the actual value y.
[0043] To balance improving image sharpness and maintaining structural consistency in tunnel construction image reconstruction, L1Loss is introduced as the primary loss function, which calculates the predicted value. The average of the absolute errors between the true value y and the actual value y: ; in, Represents the L1 loss value; Represents the total number of pixels in the image; This represents the true value of the i-th pixel; This represents the predicted value for the number of boxes i.
[0044] S402. The network is trained using the AdamW optimizer. The initial learning rate is set to 2e-4, the weight decay coefficient is 1e-4, and the momentum parameters are β1=0.9 and β2=0.999. The learning rate is scheduled using cosine annealing and is dynamically adjusted during training.
[0045] S5. Model Inference and Enhanced Output: Input the tunnel construction image to be enhanced into the model trained in step S4, and output the processed enhanced image.
[0046] The enhanced images are evaluated using mean squared error (MSE) and / or peak signal-to-noise ratio (PSNR) and / or structural similarity (SSIM) and / or multi-scale structural similarity (MS-SSIM) and / or perceptual image patch similarity (LPIPS).
[0047] To achieve a more balanced quantitative evaluation of image restoration performance in complex degradation scenarios such as engineering tunnels, the restored image... With corresponding clear reference image A set of complementary full reference and perception metrics are calculated between them.
[0048] MSE is used to measure pixel-level distortion: ; In the formula: This represents the number of pixels.
[0049] PSNR is derived from MSE and is used to characterize the overall reconstruction fidelity on a logarithmic scale. ; In the formula: It is the maximum possible pixel value of the image. SSIM comprehensively measures local structural consistency from three aspects: brightness, contrast, and structure. ; In the formula: and These represent the average brightness of the two image blocks, respectively. and These represent the contrast of the two images respectively; It is the covariance of two images, which measures their structural similarity ratio; and It is a constant used to avoid the denominator being zero.
[0050] To further characterize cross-scale structural consistency, MS-SSIM extends SSIM to multi-resolution levels: ; In the formula: The first Brightness, contrast and structural components at each scale; The scale number; This is the weighted index.
[0051] LPIPS measures perceptual similarity in deep feature space: ; In the formula: For the first Layer feature response; The learned channel weights; This is element-wise multiplication.
[0052] Example 2: Implementation Example 1 was validated using specific data. In this example, the TaylorFormer-HydFogNet image enhancement network was trained using the AdamW optimizer, with an initial learning rate of 2e-4, a weight decay coefficient of 1e-4, and momentum parameters of β1=0.9 and β2=0.999. The learning rate scheduling strategy employed CosineAnnealing, with linear warm-up performed in the first 5 epochs. The total number of training epochs was 500, and the batch size was fixed at 8.
[0053] Combination Figure 4Analysis of the L1 loss function trend reveals a clear two-stage convergence characteristic, as observed in the figure. In the initial training phase (Epochs 0–50), the loss value rapidly decreases from approximately 1.0 to below 0.15, indicating that the model quickly grasps the dehazing mapping relationship in the synthetic tunnel image. This rapid convergence is attributed to the introduction of the model's structure-aware mechanism, enabling it to quickly extract features from images under complex underground conditions in hydropower engineering (such as uneven dust and fog distribution and significant lighting differences). The peak signal-to-noise ratio (PSNR) improvement trajectory analysis corresponds to the loss curve, showing a clear upward trend. In Epochs 0–100, the PNR rapidly increases from near 0dB to over 20dB, indicating that the model possesses strong overall dehazing capabilities, particularly in brightness restoration and edge sharpness enhancement. This stage is crucial for restoring the main structures in the image (such as lining contours, equipment markings, and work areas). In Epochs 100–500, the PNR continues to steadily increase, eventually reaching approximately 29.18dB.
[0054] To evaluate the restoration performance of the proposed method on synthetic tunnel images, a comprehensive analysis was conducted combining quantitative indicators and visualization results. Table 1 shows the comparison results of the proposed method with other models for synthesizing images. As can be seen from Table 1, TaylorFormer-HydFogNet achieves PSNR, SSIM, and MS-SSIM scores of 29.18 dB, 0.8843, and 0.9687, respectively, and MSE and LPIPS of 0.001317 and 0.1220, respectively, all of which are superior to the comparative methods. This indicates that the proposed method has better overall performance in terms of pixel-level reconstruction accuracy, structure preservation capability, and perceptual quality. In particular, compared with MB-TaylorFormerv2, which has the closest overall performance, this method improves PSNR from 28.94dB to 29.18dB, SSIM from 0.8690 to 0.8843, MSE from 0.001419 to 0.001317, and LPIPS from 0.1266 to 0.1220, indicating that it further improves the quality of detail recovery while maintaining global structural consistency.
[0055]
[0056] Combination Figure 5 While visual differences exist overall in synthesized images from different methods, the differences between some strong baselines are quite subtle, making it difficult to fully quantify them based solely on the overall image. Therefore, Figure 5This is primarily used to qualitatively demonstrate the overall trends of various methods in terms of visibility restoration, edge continuity, and local residual fog suppression. The main criterion for evaluating the model's performance remains the full reference index in Table 2. Overall, TaylorFormer-HydFogNet improves overall clarity while more stably maintaining structural boundaries and texture continuity, and its qualitative results are consistent with the quantitative evaluation conclusions.
[0057] Combination Figure 6 The image shows a comparison of experimental results for different methods in a typical tunnel construction scenario. While the general-purpose restoration model Restormer can brighten the tunnel, it tends to produce softer structures and uneven regional gain in low-texture rock walls / lining areas. DehazeFormer can improve visibility, but it is more prone to glare amplification, overexposure, and boundary haze near strong light sources. The Taylor approximate attention baselines MB-TaylorFormer and MB-TaylorFormer v2 are more conducive to global consistency, but local brightness instability or detail compression may still occur in rescattering and reflective areas. DedustGAN and IPC-Dehaze may introduce unnatural halos / textures or overall grayness and insufficient detail. In contrast, TaylorFormer-HydFogNet achieves a more stable balance between dark area enhancement and glare suppression, maintains clearer boundaries and contour continuity in areas difficult to annotate, and reduces artifacts such as light and dark discontinuities.
[0058] Combination Figure 7 In large-scale underground construction scene image experiments, the Restormer and DehazeFormer models, with their weak texture structures in the far field, are more prone to near-clear and far-blurred images and contrast discontinuities due to long-distance visibility decay and multi-source brightness gradients. Although MB-TaylorFormer and MB-TaylorFormer v2 are more favorable for near-far consistency, they may still experience detail compression when weak far-field signals coexist with strong near-field light. DedustGAN and IPC-Dehaze may also introduce halos or overall graying. Under large-scale scene conditions, TaylorFormer-HydFogNet is more robust to strong light-haze coupling, suppressing fog artifacts around the light source and maintaining the continuity of structural boundaries. At the same time, it significantly improves the recognizability of the contours of distant structures, making the edges of equipment and components deep in the tunnel clearer. Its global brightness transition is smoother, reducing the loss of readability caused by overexposure and dark shadows, thereby improving the overall interpretability of large-scale scenes.
[0059] Example 3: This embodiment provides an image enhancement system for the underground construction environment of a hydropower project, used to execute the method described in Embodiment 1, including: The data acquisition unit is configured to acquire monitoring image data of underground construction scenes in hydropower projects; The preprocessing unit is configured to normalize, crop, and augment the acquired image data to construct a paired dataset containing clear images and synthetic degraded images. The model unit is configured to store and run the TaylorFormer-HydFogNet image augmentation model, receive underground construction images of hydropower projects to be augmented, and output clear augmented construction images.
[0060] Example 4: This embodiment provides an electronic device, including: At least one processor; A memory that is communicatively connected to the at least one processor; The memory stores instructions that can be executed by the at least one processor. When the instructions are executed by the at least one processor, the electronic device performs the image enhancement method for the underground construction environment of hydropower projects as described in Embodiment 1.
[0061] Example 5: This embodiment provides a computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements the image enhancement method for the underground construction environment of hydropower projects described in Embodiment 1.
[0062] Those skilled in the art will readily understand that the above description is merely a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.
Claims
1. An image enhancement method for the underground construction environment of a hydropower project, characterized in that, Includes the following steps: S1. Data Construction: Collect monitoring images from the construction environment of the hydropower station tunnel to form a dataset. The monitoring images include underground construction operation scenes. S2. Data preprocessing: Normalize, uniformly crop, and augment the dataset. The data augmentation operations include random horizontal flipping, random cropping, and color perturbation. The preprocessed dataset is used for model training and testing. S3. Model Construction: The TaylorFormer-HydFogNet image enhancement network is constructed. The network consists of a multi-scale patch embedding module, a TaylorFormer backbone module, and a multi-scale attention refinement module. S4. Model Training: Dynamic training is completed using L1Loss as the main loss function. S5. Model Inference and Enhanced Output: Input the tunnel construction image to be enhanced into the model trained in step S4, and output the processed enhanced image.
2. The image enhancement method for the underground construction environment of a hydropower project according to claim 1, characterized in that, Step S1 specifically includes the following steps: S101. Collect construction videos in the construction environment of the hydropower station tunnel. The construction videos include clear working condition videos and videos of working conditions with high dust, high humidity, and non-uniform lighting. Among them, clear construction images are extracted from the clear working condition videos to construct a paired training dataset; real degraded construction images are extracted from the videos of working conditions with high dust, high humidity, and non-uniform lighting for model testing. S102. Segment the collected construction video into single-frame images.
3. The image enhancement method for the underground construction environment of a hydropower project according to claim 1, characterized in that, Step S2 includes the following steps: S201. Use a monocular depth estimation algorithm to generate corresponding depth maps for clear construction images, and combine them with a physical atmospheric scattering model to synthesize fog images with different degradation intensities. Combine clear images with synthesized fog images to form a paired dataset. S202. Normalize, uniformly sized, and augment the paired dataset.
4. The image enhancement method for the underground construction environment of a hydropower project according to claim 1, characterized in that, In step S3, the processing procedure of the TaylorFormer-HydFogNet image enhancement network is as follows: S301. Input the tunnel construction image to be enhanced into the multi-scale patch embedding module, and use overlapping deformable convolution to extract multi-scale features from the input image to obtain multi-scale features that characterize local structural information. S302. Input the multi-scale features into the TaylorFormer backbone module, perform hierarchical encoding processing on the multi-scale features through multiple TaylorFormer Blocks, generate query Q, key K, and value V through linear projection, and use a linear attention mechanism based on Taylor series expansion to model global dependency relationships and obtain deep semantic features. S303. Input the deep semantic features into the multi-scale attention refinement module, align and fuse the features at different scales, and refine the features by combining relative position encoding, self-attention mechanism, residual connection and layer normalization to obtain refined features. S304. The refined features are input to the decoding end for layer-by-layer upsampling and reconstruction, and the corresponding layer features of the encoding end are fused through skip connections to output an enhanced image.
5. The image enhancement method for the underground construction environment of a hydropower project according to claim 1, characterized in that, In step S3, the multi-scale patch embedding module adopts Depth-Separable Deformable Convolution (DSDCN), which splits the convolution operation of standard DCN into depth-deformable convolution and pointwise convolution. Depth-deformable convolution is responsible for spatial sampling and local response modeling, while pointwise convolution is responsible for cross-channel information fusion.
6. The image enhancement method for the underground construction environment of a hydropower project according to claim 1, characterized in that, Step S4 specifically includes the following steps: S401. Introduce L1Loss as the primary loss function to calculate the predicted value. The average of the absolute errors between the true value y and the actual value y; S402. Train the network using the AdamW optimizer.
7. The image enhancement method for the underground construction environment of a hydropower project according to claim 1, characterized in that, In step S5, the enhanced image is evaluated using mean square error (MSE) and / or peak signal-to-noise ratio (PSNR) and / or structural similarity (SSIM) and / or multi-scale structural similarity (MS-SSIM) and / or perceptual image patch similarity (LPIPS).
8. An image enhancement system for the underground construction environment of a hydropower project, characterized in that, For performing the method according to any one of claims 1 to 7, comprising: The data acquisition unit is configured to acquire monitoring image data of underground construction scenes in hydropower projects; The preprocessing unit is configured to normalize, crop, and augment the acquired image data to construct a paired dataset containing clear images and synthetic degraded images. The model unit is configured to store and run the TaylorFormer-HydFogNet image augmentation model, receive underground construction images of hydropower projects to be augmented, and output clear augmented construction images.
9. An electronic device, characterized in that, include: At least one processor; A memory that is communicatively connected to the at least one processor; The memory stores instructions that can be executed by the at least one processor. When the instructions are executed by the at least one processor, the electronic device performs the image enhancement method for the underground construction environment of a hydropower project as described in any one of claims 1 to 7.
10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the image enhancement method for the underground construction environment of hydropower projects as described in any one of claims 1 to 7.