An unmanned aerial vehicle aerial image defogging method and system combining physical joint inversion and infrared two-stage compensation
By combining physical joint inversion with infrared two-stage compensation, the problem of insufficient structure recovery and unstable transmittance estimation in existing image dehazing methods in dense fog areas is solved, achieving a highly efficient image dehazing effect that is suitable for complex weather scenarios such as drone aerial photography.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- GUANGDONG UNIV OF TECH
- Filing Date
- 2026-05-11
- Publication Date
- 2026-06-23
AI Technical Summary
Existing image dehazing methods suffer from problems such as insufficient structure recovery in dense fog regions, unstable transmittance estimation, weakened physical model restoration ability in high fog density regions, and unnatural results in local areas. Furthermore, they lack effective utilization of stable structural information in infrared images.
By combining physical joint inversion with infrared two-stage compensation, visible light and infrared images of the same scene are acquired, features are extracted and frequency correction is performed, transmittance maps and atmospheric light parameters are predicted, physical model inversion is performed, and infrared compensation fusion is performed in dense fog areas to construct physical masks for partitioned processing.
It improves image clarity and detail fidelity in foggy scenes, reduces color shift and contrast imbalance, enhances the structure and brightness recovery capabilities in dense fog areas, and the output defogging results are suitable for high-level vision tasks.
Smart Images

Figure CN122265095A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of image processing and multimodal visual perception technology, and in particular relates to a method and system for dehazing UAV aerial images by combining physical joint inversion and infrared two-stage compensation. Background Technology
[0002] Image dehazing plays a crucial role in scenarios such as intelligent surveillance, visual perception in unmanned systems, target observation under complex weather conditions, and multimodal image processing. By restoring fogged images, image clarity, edge detection capabilities, and scene readability can be improved, thereby providing more stable input for subsequent target detection, image fusion, scene analysis, and recognition.
[0003] Currently, image dehazing methods are mainly divided into three categories: image enhancement-based methods, prior model-based methods, and deep learning-based methods. Image enhancement-based methods improve visual effects by adjusting contrast and saturation, but they are prone to color distortion and over-enhancing of details. Prior model-based methods utilize physical assumptions such as dark channel priors and color attenuation priors to retrieve clear images. They perform well in scenes with uniform fog distribution, but their adaptability to complex fog conditions and dense fog areas is limited. Deep learning-based methods train neural networks to learn the mapping relationship between fogged and clear images, achieving significant progress in various foggy scenarios.
[0004] However, the existing methods described above still have the following technical drawbacks: First, most methods rely on a single visible light image for processing. When there is strong fog, smoke, or local occlusion, the edge, texture, and target contour information in the visible light image will be significantly attenuated, leading to problems such as incomplete dehazing, insufficient detail recovery, and structural distortion.
[0005] Second, although some methods incorporate physical scattering models, their estimates of transmittance and atmospheric light are not stable enough. In dense fog or areas with localized high-brightness obstruction, color shifts, contrast imbalances, and unnatural results are prone to occur.
[0006] Third, existing methods generally lack effective utilization of stable structural information in infrared images. Infrared images can still maintain a strong target contour response in foggy, smokey, and low-visibility scenes. If this type of cross-modal structural information can be introduced into the defogging process, it is expected to further improve the restoration effect in foggy scenes.
[0007] Therefore, there is an urgent need to propose a method for dehazing UAV aerial images that combines physical joint inversion with infrared two-stage compensation. Summary of the Invention
[0008] To address the problems in existing image dehazing methods, such as insufficient structure restoration in dense fog regions, unstable transmittance estimation, weakened restoration ability of physical models in high fog density regions, and unnatural results in local areas, this invention provides a UAV aerial image dehazing method that combines physical joint inversion and infrared two-stage compensation, comprising the following steps: Acquire visible light and infrared images of the same scene under fog conditions; Feature extraction is performed on the fogged visible light image to obtain basic visible light features, and structural feature extraction is performed on the infrared image to obtain infrared structural features; Based on the infrared structural features, the visible light fundamental features are frequency-corrected to obtain the correction features; Based on the correction characteristics, predict the transmittance map and atmospheric optical parameters; Based on the fogged visible light image, the transmittance map, and the atmospheric light parameters, a physical model inversion is performed to obtain the first-stage defogging result; The dense fog area is determined based on the transmittance map, and a physical mask is constructed. In the dense fog area, infrared compensation fusion is performed based on the infrared image and the first-stage defogging result to obtain the final defogging result.
[0009] Optionally, feature extraction is performed on the fogged visible light image to obtain basic visible light features, and structural feature extraction is performed on the infrared image to obtain infrared structural features. The specific process includes: The fogged visible light image is encoded using shallow convolutional methods to obtain the basic features of the visible light. The structural features of the infrared image are extracted using a multi-branch parallel convolutional structure. The multi-branch parallel convolutional structure extracts horizontal edge information, vertical edge information, and local neighborhood texture information respectively, and the output features of each branch are fused to obtain the infrared structural features.
[0010] Optionally, based on the infrared structural features, the visible light fundamental features are frequency-corrected to obtain corrected features. The specific process includes: The visible light basic features and the infrared structural features are respectively smoothed and decomposed to obtain their respective background components and high-frequency detail components; Guided by the high-frequency detail components of the infrared structural features, scaling parameters and bias parameters are generated for modulating the high-frequency detail components of the visible light basic features. Based on the scaling parameters and the bias parameters, the high-frequency detail components of the visible light basic features are modulated to obtain candidate detail enhancement results; The high-frequency detail components of the visible light basic features are stitched together with the high-frequency detail components of the infrared structural features, and an adaptive gating graph is predicted. Based on the gated weight map, the candidate detail enhancement results and the high-frequency detail components of the visible light basic features are fused position by position to obtain the output high-frequency features; The corrected feature is obtained by recombining the output high-frequency feature with the background component of the visible light basic feature after linear mapping.
[0011] Optionally, based on the correction features, the transmittance map and atmospheric optical parameters are predicted, specifically including: The corrected features are input into a shared coding network to obtain shared representation features; Predict transmittance maps and atmospheric optical parameter maps based on the shared representation features; The atmospheric light parameter map is spatially averaged to obtain the global atmospheric light parameters.
[0012] Optionally, a physical model inversion is performed based on the fogged visible light image, the transmittance map, and the atmospheric light parameters to obtain the first-stage defogging result. The specific process includes: Calculate the difference between the fogged visible light image and the global atmospheric light parameters; The larger value between the transmittance map and a preset lower limit constant is selected; Divide the difference by the larger value to obtain the intermediate ratio; The intermediate ratio is added to the global atmospheric light parameters to obtain the first-stage defogging result.
[0013] Optionally, the dense fog region is determined based on the transmittance map, and a physical mask is constructed. The specific process includes: The first-stage dehazing result, the infrared image, and the transmittance map are stitched together and input into the second-stage compensation fusion network to obtain the network prediction mask. Construct a physically forced mask based on the transmittance map; The maximum value at the corresponding position in the network prediction mask and the physical forced mask is used as the final mask.
[0014] Optionally, in the dense fog area, infrared compensation fusion is performed based on the infrared image and the first-stage defogging result to obtain the final defogging result. The specific process includes: The first stage dehazing result is converted from RGB color space to YUV color space to obtain its luminance and chrominance components. Based on the final mask, calculate the difference between the final mask and the final mask, and use it as the weight of the brightness component of the first stage dehazing result; Based on the infrared image and the final mask, calculate the product of the infrared image and the final mask as the infrared contribution component; Based on the brightness component and its weight of the first-stage defogging result, and the infrared contribution component, the product of the brightness component multiplied by its weight is added to the infrared contribution component to obtain the compensated brightness component. Based on the compensated luminance component and the chromaticity component of the first-stage dehazing result, the two are recombined and converted back to the RGB color space to obtain the second-stage dehazing result. The image of the second-stage dehazing result is truncated to obtain the final dehazing result.
[0015] Optionally, it also includes the process of training the dehazing network, specifically including: Using a clear visible light image as the supervision target, the first-stage loss is calculated using the first-stage dehazing result, transmittance map, and atmospheric light parameters output by the dehazing network. The first-stage loss includes transmittance supervision loss, atmospheric light supervision loss, total variational constraint of transmittance map, and dark channel constraint. The second-stage dehazing result output by the dehazing network is used to calculate the loss of the second stage, which includes background brightness preservation constraint, infrared brightness alignment constraint in dense fog region and gradient recovery constraint. The Adam optimizer is used to perform end-to-end joint optimization of the dehazing network with preset learning rate, batch size, training block size and training period.
[0016] On the other hand, this embodiment also provides a UAV aerial image dehazing system combining physical joint inversion and infrared two-stage compensation, used to implement the method, including: The data input module is used to acquire visible light and infrared images of the same scene under fog conditions. A dual-modal feature extraction module is used to extract features from the fogged visible light image to obtain basic visible light features, and to extract structural features from the infrared image to obtain infrared structural features; An infrared guiding frequency correction module is used to perform frequency correction on the visible light basic features based on the infrared structural features to obtain the correction features; A shared encoding and physical parameter prediction module is used to predict transmittance maps and atmospheric optical parameters based on the correction features; The physical model inversion module is used to perform physical model inversion based on the fogged visible light image, the transmittance map and the atmospheric light parameters to obtain the first-stage defogging result. A dense fog region determination and physical mask construction module is used to determine the dense fog region based on the transmittance map and construct a physical mask; An infrared compensation fusion module is used to perform infrared compensation fusion based on the infrared image and the first-stage defogging result in the dense fog area to obtain the final defogging result.
[0017] On the other hand, this embodiment also provides a computer device, including a memory, a processor, and a computer program stored in the memory, wherein the processor executes the computer program to implement the steps of the method.
[0018] Compared with the prior art, this embodiment has the following advantages and technical effects: This invention first acquires visible light and infrared images of the same scene under fog conditions, with the infrared image maintaining a relatively stable structural response under foggy conditions. Based on this, the extracted infrared structural features are used to perform targeted frequency correction on the basic visible light features, particularly enhancing high-frequency detail information that is easily attenuated by fog. This method allows subsequent processing to simultaneously preserve the color content of the visible light and the structural priors of the infrared image, effectively overcoming the problems of blurred edges and texture loss in dense fog areas caused by a single visible light image, thus improving the contour clarity and detail fidelity of the defogging result.
[0019] This invention, after obtaining the corrected features after infrared guidance, predicts the transmittance map and atmospheric light parameters based on these features, and performs a first-stage physical inversion of the fogged image according to the atmospheric scattering model. Because the corrected features have incorporated cross-modal structure-guided information, the estimation of transmittance and atmospheric light is more accurate and stable, reducing color shifts and contrast imbalances that occur in parameter prediction using traditional methods. This first-stage result achieves good defogging effects in areas with light fog or relatively uniform fog distribution, while maintaining the physical integrity of the image.
[0020] This invention further identifies dense fog regions based on the predicted transmittance map and constructs a corresponding physical mask. This mask can accurately distinguish between clear regions with high transmittance and dense fog regions with low transmittance, thus providing spatially adaptive weight guidance for subsequent fusion. In clear regions, the results of the first-stage physical inversion are fully preserved, avoiding texture distortion caused by unnecessary infrared introduction; in dense fog regions, an infrared compensation fusion mechanism is activated. This partitioned processing method takes into account the restoration needs under different fog concentrations and improves the robustness of the overall results.
[0021] This invention performs compensatory fusion of infrared images and the results of the first-stage dehazing within a dense fog area indicated by a physical mask. Since the infrared image maintains a strong subject response even under heavy fog conditions, this fusion process can supplement stable structural and brightness components in areas where visible light information is severely attenuated, thus solving the problem of physical inversion failure or reduced effectiveness in dense fog areas. The final dehazed image output shows significant improvements in target contours, edge textures, and scene details in the foggy area, with natural overall brightness and smooth transitions.
[0022] The final dehazing result obtained by this invention maintains the same size and color space format as the input image, and can be used as input for high-level vision tasks such as object detection, image fusion, and scene analysis without additional preprocessing. This method balances dehazing effect and computational efficiency, and is suitable for complex weather scenarios with high real-time requirements, such as drone aerial photography. Attached Figure Description
[0023] The accompanying drawings, which form part of this application, are used to provide a further understanding of this application. The illustrative embodiments and descriptions of this application are used to explain this application and do not constitute an undue limitation of this application. In the drawings: Figure 1 This is a schematic diagram of the composition of the drone defogging network device according to an embodiment of the present invention; Figure 2 This is a schematic diagram of the overall structure of the defogging network according to an embodiment of the present invention; Figure 3 This is a schematic diagram of the network architecture of the dual-modal feature extraction module according to an embodiment of the present invention; Figure 4 This is a network schematic diagram of the infrared guidance frequency correction module according to an embodiment of the present invention; Figure 5 This is a schematic diagram of the network architecture of the physical parameter prediction and model inversion module in an embodiment of the present invention; Figure 6 This is a network diagram of the dense fog area determination and infrared compensation fusion module according to an embodiment of the present invention. Detailed Implementation
[0024] It should be noted that, unless otherwise specified, the embodiments and features described in this application can be combined with each other. This application will now be described in detail with reference to the accompanying drawings and embodiments.
[0025] It should be noted that the steps shown in the flowchart in the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions, and although a logical order is shown in the flowchart, in some cases the steps shown or described may be executed in a different order than that shown here.
[0026] Example 1 This embodiment provides a method for dehazing UAV aerial images by combining physical joint inversion and infrared two-stage compensation. The method first utilizes the relatively stable structural response of infrared images under foggy conditions to perform frequency correction on the features of the foggy visible light image. Then, a shared coding network is used to predict transmittance maps and atmospheric light parameters to complete the first stage of physical model inversion. Subsequently, infrared compensation fusion is further introduced for dense fog areas to supplement brightness and structure in areas where the physical inversion effect is weakened. Through dual-modal feature extraction, physical parameter prediction, physical model inversion, and dense fog area zoning compensation, the dehazing effect can be considered in both light and dense fog scenarios, improving the clarity, structure preservation, and physical consistency of the resulting image. This embodiment can be applied to scenarios such as intelligent monitoring, unmanned system visual perception, complex weather observation, and multimodal imaging processing.
[0027] Specifically, the following steps are included: Acquire visible light and infrared images of the same scene under fog conditions; Feature extraction is performed on the fogged visible light image to obtain basic visible light features, and structural feature extraction is performed on the infrared image to obtain infrared structural features; Based on the infrared structural features, the visible light fundamental features are frequency-corrected to obtain the correction features; Based on the correction characteristics, predict the transmittance map and atmospheric optical parameters; Based on the fogged visible light image, the transmittance map, and the atmospheric light parameters, a physical model inversion is performed to obtain the first-stage defogging result; The dense fog area is determined based on the transmittance map, and a physical mask is constructed. In the dense fog area, infrared compensation fusion is performed based on the infrared image and the first-stage defogging result to obtain the final defogging result.
[0028] To realize the above-mentioned method for dehazing UAV aerial images combining physical joint inversion and infrared two-stage compensation, this embodiment also provides a UAV aerial image dehazing device combining physical joint inversion and infrared two-stage compensation, including a computer, a UAV, a visible light camera, an infrared camera, and pedestrians in a dense fog scene; the visible light camera and the infrared camera acquire fogged visible light images and infrared images of the same scene, the computer performs image registration and executes image dehazing algorithms, performs physical parameter inversion and dense fog area compensation fusion on the fogged visible light image, and outputs the dehazed visible light image.
[0029] Based on the same inventive concept, this embodiment also provides a UAV aerial image dehazing system that combines physical joint inversion and infrared two-stage compensation, applied to the above-mentioned device, and includes the following modules: The data input module is used to acquire visible light and infrared images of the same scene under fog conditions. A dual-modal feature extraction module is used to extract features from the fogged visible light image to obtain basic visible light features, and to extract structural features from the infrared image to obtain infrared structural features; An infrared guiding frequency correction module is used to perform frequency correction on the visible light basic features based on the infrared structural features to obtain the correction features; A shared encoding and physical parameter prediction module is used to predict transmittance maps and atmospheric optical parameters based on the correction features; The physical model inversion module is used to perform physical model inversion based on the fogged visible light image, the transmittance map and the atmospheric light parameters to obtain the first-stage defogging result. A dense fog region determination and physical mask construction module is used to determine the dense fog region based on the transmittance map and construct a physical mask; An infrared compensation fusion module is used to perform infrared compensation fusion based on the infrared image and the first-stage defogging result in the dense fog area to obtain the final defogging result.
[0030] As a specific implementation method, the steps of the defogging method are as follows: Step 1: As Figure 1 As shown, images in both visible and infrared modes were captured by drones, primarily targeting pedestrians, to construct an RGB-IR image dataset I. d =[I RGB1 I IR1 I RGB2 I IR2 , ...I RGBK I IRK ], where dataset I d The total number of elements in the image is 2K, meaning K pairs of RGB and IR image data were obtained by registering raw images captured by a drone. The image size is [size missing]. , For image channels, Image height, Let be the image width. During the training phase, a clear visible light image is used as the supervised target image, and a foggy input image is constructed online based on an atmospheric scattering model. Let the clear visible light image be... The input image is affected by fog. The transmittance diagram is as follows Atmospheric light parameters are Then the fogged image can be represented as: ; In the above formula, Represents a clear scene image. Indicates an image affected by fog. This represents a transmittance diagram. This represents atmospheric light parameters. During training, a vertical depth distribution can be constructed based on the vertical direction of the image, and then a spatial variation term can be constructed by combining the image grayscale information to generate a full-image transmittance map. The transmittance map is then truncated to keep its values within a preset range. At the same time, noise and soft smoke occlusion are added to local areas to improve the adaptability of the training samples to real complex scenes, and the transmittance supervision map and atmospheric light supervision values are saved simultaneously.
[0031] Step 2.1: Overall network architecture as follows Figure 2 As shown, the fogged visible light image is first input into the visible light feature extraction branch to obtain initial visible light features; the infrared image is then input into the infrared structural feature extraction branch to obtain infrared structural features. The initial visible light features are then frequency-corrected using the infrared structural features to obtain corrected features. Next, the corrected features are input into a shared coding network to obtain shared representation features for the defogging task, and the transmittance map and atmospheric light parameters are predicted based on these shared representation features. In the first stage, a physical model inversion is performed on the fogged image based on the predicted transmittance map and atmospheric light parameters to obtain preliminary defogging results. In the second stage, the dense fog region is determined based on the transmittance map predicted in the first stage. When the transmittance is high, the defogging result from the first stage can be directly used; when the transmittance is low and the physical inversion effect is weakened, infrared modal information is further introduced for compensation fusion to obtain the final defogging result. After the result is generated, truncation is performed to keep the pixel values of the output image within a preset range.
[0032] Step 2.2: Dual-modal feature extraction module as follows Figure 3 As shown, the input fogged visible light image and infrared image are fed into their respective feature encoding branches to form the basic representations required for subsequent dehazing physical parameter estimation and cross-modal correction. Shallow convolutional encoding is performed on the fogged visible light image to obtain the basic visible light features. It can be represented as: ; In the above formula, This represents the visible light feature extraction function. This represents the input hazy visible light image. This represents the obtained basic features of visible light. The visible light branch uses a convolutional layer with a certain size kernel to initially encode the input image, and the activation function is a linear rectified function with a leakage slope to preserve the edge and texture responses in foggy scenes.
[0033] Structural features are extracted from infrared images to obtain infrared structural characterization. It can be represented as: ; In the above formula, Indicates the infrared structure extraction function. This indicates the input infrared image. This represents the extracted infrared structural features. In a specific implementation, the infrared branch employs a multi-branch parallel convolutional structure to extract horizontal edges, vertical edges, and local neighborhood texture information, respectively. The process can be represented as follows: ; ; ; ; In the above formula, This represents a convolution operation with a kernel size of 1×3. This represents a convolution operation with a kernel size of 3×1. This indicates a convolution operation with a kernel size of 3×3. , , These represent the output features of the three parallel branches. Represents a non-linear activation function. This represents the fused infrared structural features. Using the dual-branch encoding method described above, the visible light branch is responsible for preserving color and scene content information, while the infrared branch provides more stable structural contour information under foggy conditions, offering cross-modal priors for subsequent frequency correction.
[0034] Step 2.3: Infrared guidance frequency correction module as follows Figure 4 As shown, the visible light basic features and infrared structural features obtained in step 2.2 are input into the frequency correction module to selectively enhance the visible light information at the frequency domain component level. First, the visible light and infrared features are smoothly decomposed to obtain their respective background and detail components. This process can be represented as follows: ; ; ; ; In the above formula, This represents the average pooling operation. Represents the visible light background component. Represents the high-frequency detail components of visible light. Indicates the infrared background component. This represents the high-frequency detail components of the infrared spectrum. In one specific implementation, the average pooling operation uses 5×5 average pooling to extract more stable low-frequency information.
[0035] Subsequently, using the infrared high-frequency detail components as a guide, scaling and bias parameters for the visible light high-frequency detail components are generated. This process can be represented as follows: ; ; In the above formula, This represents a 1×1 convolution mapping used to generate scaling parameters. This represents a 1×1 convolution mapping used to generate bias parameters. This represents the scaling parameter. The scaling parameter and the bias parameter represent the bias parameters. Based on these parameters, the high-frequency detail components of the visible light are modulated to obtain the candidate detail enhancement result, which can be expressed as: ; In the above formula, This indicates the results of enhanced candidate details.
[0036] To avoid excessive infrared information leading to texture distortion, the visible light high-frequency detail components are stitched together with the infrared high-frequency detail components, and an adaptive gating map is predicted. This process can be represented as follows: ; In the above formula, This indicates a feature concatenation operation. This indicates a convolution operation with a kernel size of 3×3. This represents the Sigmoid activation function. This represents the gated weight map. Based on the gated weight map, position-by-position fusion is performed between the candidate detail enhancement results and the original visible light high-frequency detail components to obtain the output high-frequency features, which can be represented as: ; In the above formula, This indicates the high-frequency characteristics of the output.
[0037] Finally, the output high-frequency features are recombined with the linearly mapped visible light background components to obtain the frequency-corrected features, which can be expressed as: ; In the above formula, This represents a 1×1 convolution mapping applied to the background component. This represents the correction features. Using the above method, the infrared mode primarily guides and compensates for high-frequency structures in foggy images that are easily weakened by fog, while the visible light mode retains the original colors and subject information of the scene. This provides more stable and structurally consistent input features for subsequent predictions of transmittance maps and atmospheric light parameters.
[0038] Step 2.4: Shared coding and physical parameter prediction module, as shown Figure 5 As shown, the correction features Inputting into a shared coding network yields shared representation features. , can be represented as: ; In the above formula, Indicates shared encoding function, Indicates the correction feature, This represents shared representation features. In a specific implementation, the shared encoding network consists of multiple 3×3 convolutions, with task condition maps repeatedly stitched between different layers, ensuring that task control information continuously participates in feature updates during the feature encoding process. Simultaneously, a cross-layer aggregation structure is incorporated within the shared encoding network to add and fuse intermediate features from different levels, enhancing the expressive power of multi-scale texture information and task-related information. The process of predicting transmittance maps and atmospheric light parameters based on shared representation features can be represented as follows: ; ; in, This represents the convolutional mapping used to predict the transmittance map. This represents the convolutional mapping used to predict atmospheric light parameter maps. This represents the predicted transmittance map. This represents a map of the predicted atmospheric optical parameters. In a specific implementation... This is a single-channel plot. This is a three-channel image. To obtain the global atmospheric light parameters, the atmospheric light parameter image is spatially averaged to obtain the global atmospheric light parameters, which can be expressed as: ; In the above formula, This indicates a spatial averaging operation. This represents the global atmospheric light parameters.
[0039] Step 2.5: Physical model inversion module as follows Figure 5 As shown, based on the predicted transmittance map and global atmospheric light parameters, a physical model inversion is performed on the input fogged image to obtain the first-stage defogging result image, which can be represented as: ; In the above formula, This image represents the result of the first stage of dehazing. This indicates that the input image is fogged. Represents global atmospheric light parameters. This represents the predicted transmittance map. This represents a lower limit constant for transmittance, used to prevent numerical instability caused by excessively low transmittance. In a specific implementation, The value is 0.1. The image obtained from the first stage of dehazing is truncated and can be represented as follows: ; In the above formula, This represents the truncation function. This image represents the first-stage dehazing result after truncation. For hazy regions, since the visible light image still retains a significant amount of effective structural and color information, the physical model inversion based on the transmittance map and atmospheric light parameters can achieve a good dehazing effect. Therefore, the first-stage result can be directly used as the effective restoration result for hazy regions.
[0040] Step 2.6: Dense Fog Area Determination and Physical Mask Construction Module (as follows) Figure 6 As shown, for dense fog regions, since the effective visible light information in the fogged image is significantly weakened, it is difficult to recover sufficient structural details by simply relying on the first-stage physical model inversion. Therefore, it is necessary to further utilize the transmittance map to determine the dense fog region and construct a physical mask. The first-stage defogging result image, infrared image, and transmittance map are stitched together and input into the second-stage compensation fusion network. After multi-layer convolutional encoding, intermediate fusion features are obtained, and the network mask is predicted by the intermediate fusion features. The physically forced mask constructed from the transmittance map is... Then, the physically forced mask can be expressed as: ; In the above formula, Indicates a physical forced mask. Indicates the mask magnification factor. This indicates the threshold for determining dense fog. This represents a transmittance diagram. This represents a truncation function. In a specific implementation, 0.42 is acceptable. A value of 5.0 is acceptable. The final mask can be represented as: ; In the above formula, Indicates the final mask. Indicates the network prediction mask. This refers to physically forced masking. This method enhances the involvement of infrared information in dense fog areas with low transmittance, while retaining more of the first-stage defogging results in areas with high transmittance, thus forming a zoned processing mechanism that combines clear area protection with dense fog area compensation.
[0041] Step 2.7: Infrared compensation fusion module as follows Figure 6As shown, the first-stage dehazing result image, infrared image, and transmittance map are input into the second-stage compensation and fusion module to supplement infrared information in the dense fog area. The compensation and fusion module maintains the brightness distribution of the first-stage dehazing result in the clear area, enhances the proportion of infrared brightness and structural information in the dense fog area, and creates a smooth transition between the two types of areas. To preserve the color information of the visible light image, the first-stage dehazing result image is first converted to the YUV space, and the brightness and chromaticity components are extracted, which can be represented as: ; In the above formula, This represents a color space conversion function. This represents the brightness component of the first-stage dehazing result. , This represents the chromaticity component of the first-stage dehazing result. Based on the infrared compensation of the luminance component using the final mask, it can be expressed as: ; In the above formula, This represents the brightness component after infrared compensation. Represents an infrared image. This represents the final mask. The compensated luminance component is then reconstructed with the original chrominance component and converted back to RGB space to obtain the second-stage dehazing result image, which can be represented as: ; In the above formula, This represents the image reconstruction function that converts the YUV color space back to the RGB color space. This represents the image of the second-stage dehazing result. The final output image is obtained by truncating the second-stage dehazing result image, which can be represented as: ; In the above formula, This represents the final dehazing result image. Using the above method, the first-stage physical inversion results are retained in light fog areas, while infrared modes are used to supplement brightness and structural information in dense fog areas, thus achieving stable two-stage restoration of complex foggy scenes.
[0042] This module can automatically adjust the fusion weights of RGB and infrared according to the characteristics of the scene. In daytime scenes, the weight of RGB is larger, while in nighttime scenes, the weight of infrared is larger, thus achieving scene-adaptive feature fusion.
[0043] Step 3: During the training phase, a clear visible light image is used as the supervised target image. The first-stage output of the dehazing network, the transmittance map, and atmospheric light parameters are used together to calculate the loss function. The second-stage fusion result is used to guide the compensation and recovery of dense fog regions. In the first-stage training, the network is jointly optimized using transmittance supervised loss, atmospheric light supervised loss, total variational constraints on the transmittance map, and dark channel constraints. The transmittance supervised loss constrains the consistency between the predicted transmittance map and the supervised transmittance map; the atmospheric light supervised loss constrains the consistency between the predicted atmospheric light parameters and the supervised values; the total variational constraints on the transmittance map enhance the smoothness of the transmittance map; and the dark channel constraints improve the physical plausibility of the first-stage dehazing result. In the second-stage training, a compensation region is constructed based on the dense fog region mask and the infrared salient region, constraining the brightness fidelity and gradient recovery capability of the fusion result. Specifically, the background brightness preservation constraint ensures that the brightness of the clear region remains consistent with the first-stage dehazing result; the infrared brightness alignment constraint for dense fog regions aligns the brightness of the dense fog region with the infrared image; and the gradient recovery constraint ensures that the final result retains strong gradient information from both the first-stage dehazing result and the infrared image at texture edges.
[0044] The dehazing network is trained end-to-end during training, using Adam as the optimizer and setting the learning rate. Batch size Training block size 48×48, training cycle and the weight parameters for each loss term, including the transmittance supervised loss weight. Set to 80.0, atmospheric light supervision loss weight Set to 50.0, total variational constraint weight for transmittance map Set to 0.5, dark channel constraint weight Set it to 0.1.
[0045] Step 4: Use the trained dehazing network to perform inference on the fogged image to be tested, and convert the fogged visible light image to... and corresponding infrared images Inputting the network first yields a transmittance map. Atmospheric light parameters And the first stage dehazing result image Then, based on the transmittance diagram... Constructing a mask for dense fog areas and combined with infrared images The first-stage dehazing results are then compensated and fused in the second stage to obtain the final dehazing image. The result of the first stage of defogging can be expressed as: ; The final defogging result after the second-stage compensation can be expressed as: ; In the above formula, This represents the visible light image of the object under fog. This represents the corresponding infrared image. This represents a transmittance diagram. Represents global atmospheric light parameters. This represents the lower limit constant of transmittance. This image represents the result of the first stage of dehazing. This indicates the final mask corresponding to the dense fog area. , , These represent the luminance and chrominance components of the first-stage dehazing result image in the YUV space, respectively. This represents the image reconstruction function that converts the YUV color space back to the RGB color space. This represents the truncation function. This represents the final dehazed image output. Three channels with the same size as the input image Image, transmittance map It can be output as an intermediate physical quantity and saved as a single-channel or pseudo-color image as needed. The image dehazing process is complete when the outline of the target in the foggy area, edge texture, and scene details in the output image are restored.
[0046] Compared with existing technologies, this embodiment introduces infrared structural features before defogging and compensates for high-frequency information in visible light features through frequency correction, which helps improve the structural recovery capability in foggy scenes. This embodiment jointly predicts transmittance maps and atmospheric light parameters through a shared coding network and performs the first-stage inversion in conjunction with a physical model, which helps improve the physical consistency and restoration stability in light fog and moderate fog scenes. This embodiment further introduces a physical mask based on transmittance maps and an infrared compensation fusion module for dense fog areas, so that even if the first-stage physical inversion fails or the effect is weakened, the system can still recover the target contour and brightness information by means of infrared modes, thereby improving the restoration effect in dense fog areas. The defogging result image output by this embodiment can be directly used as input for subsequent image fusion, target detection and scene analysis tasks, and has good engineering applicability.
[0047] The above are merely preferred embodiments of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.
Claims
1. A method for dehazing UAV aerial images combining physical joint inversion and infrared two-stage compensation, characterized in that, Includes the following steps: Acquire visible light and infrared images of the same scene under fog conditions; Feature extraction is performed on the fogged visible light image to obtain basic visible light features, and structural feature extraction is performed on the infrared image to obtain infrared structural features; Based on the infrared structural features, the visible light fundamental features are frequency-corrected to obtain the correction features; Based on the correction characteristics, predict the transmittance map and atmospheric optical parameters; Based on the fogged visible light image, the transmittance map, and the atmospheric light parameters, a physical model inversion is performed to obtain the first-stage defogging result; The dense fog area is determined based on the transmittance map, and a physical mask is constructed. In the dense fog area, infrared compensation fusion is performed based on the infrared image and the first-stage defogging result to obtain the final defogging result.
2. The method according to claim 1, characterized in that, Feature extraction is performed on the fogged visible light image to obtain basic visible light features, and structural feature extraction is performed on the infrared image to obtain infrared structural features. The specific process includes: The fogged visible light image is encoded using shallow convolutional methods to obtain the basic features of the visible light. The structural features of the infrared image are extracted using a multi-branch parallel convolutional structure. The multi-branch parallel convolutional structure extracts horizontal edge information, vertical edge information, and local neighborhood texture information respectively, and the output features of each branch are fused to obtain the infrared structural features.
3. The method according to claim 1, characterized in that, Based on the infrared structural features, the visible light fundamental features are frequency-corrected to obtain the corrected features. The specific process includes: The visible light basic features and the infrared structural features are respectively smoothed and decomposed to obtain their respective background components and high-frequency detail components; Guided by the high-frequency detail components of the infrared structural features, scaling parameters and bias parameters are generated for modulating the high-frequency detail components of the visible light basic features. Based on the scaling parameters and the bias parameters, the high-frequency detail components of the visible light basic features are modulated to obtain candidate detail enhancement results; The high-frequency detail components of the visible light basic features are stitched together with the high-frequency detail components of the infrared structural features, and an adaptive gating graph is predicted. Based on the gated weight map, the candidate detail enhancement results and the high-frequency detail components of the visible light basic features are fused position by position to obtain the output high-frequency features; The corrected feature is obtained by recombining the output high-frequency feature with the background component of the visible light basic feature after linear mapping.
4. The method according to claim 1, characterized in that, Based on the correction characteristics, the transmittance map and atmospheric optical parameters are predicted. The specific process includes: The corrected features are input into a shared coding network to obtain shared representation features; Predict transmittance maps and atmospheric optical parameter maps based on the shared representation features; The atmospheric light parameter map is spatially averaged to obtain the global atmospheric light parameters.
5. The method according to claim 4, characterized in that, Based on the fogged visible light image, the transmittance map, and the atmospheric light parameters, a physical model inversion is performed to obtain the first-stage defogging result. The specific process includes: Calculate the difference between the fogged visible light image and the global atmospheric light parameters; The larger value between the transmittance map and a preset lower limit constant is selected; Divide the difference by the larger value to obtain the intermediate ratio; The intermediate ratio is added to the global atmospheric light parameters to obtain the first-stage defogging result.
6. The method according to claim 1, characterized in that, Based on the transmittance map, the dense fog area is determined, and a physical mask is constructed. The specific process includes: The first-stage dehazing result, the infrared image, and the transmittance map are stitched together and input into the second-stage compensation fusion network to obtain the network prediction mask. Construct a physically forced mask based on the transmittance map; The maximum value at the corresponding position in the network prediction mask and the physical forced mask is used as the final mask.
7. The method according to claim 6, characterized in that, In the dense fog area, infrared compensation fusion is performed based on the infrared image and the defogging result of the first stage to obtain the final defogging result. The specific process includes: The first stage dehazing result is converted from RGB color space to YUV color space to obtain its luminance and chrominance components. Based on the final mask, calculate the difference between the final mask and the final mask, and use it as the weight of the brightness component of the first stage dehazing result; Based on the infrared image and the final mask, calculate the product of the infrared image and the final mask as the infrared contribution component; Based on the brightness component and its weight of the first-stage defogging result, and the infrared contribution component, the product of the brightness component multiplied by its weight is added to the infrared contribution component to obtain the compensated brightness component. Based on the compensated luminance component and the chromaticity component of the first-stage dehazing result, the two are recombined and converted back to the RGB color space to obtain the second-stage dehazing result. The image of the second-stage dehazing result is truncated to obtain the final dehazing result.
8. The method according to claim 7, characterized in that, It also includes the process of training the defogging network, specifically including: Using a clear visible light image as the supervision target, the first-stage loss is calculated using the first-stage dehazing result, transmittance map, and atmospheric light parameters output by the dehazing network. The first-stage loss includes transmittance supervision loss, atmospheric light supervision loss, total variational constraint of transmittance map, and dark channel constraint. The second-stage dehazing result output by the dehazing network is used to calculate the loss of the second stage, which includes background brightness preservation constraint, infrared brightness alignment constraint in dense fog region and gradient recovery constraint. The Adam optimizer is used to perform end-to-end joint optimization of the dehazing network with preset learning rate, batch size, training block size and training period.
9. A dehazing system for UAV aerial images combining physical joint inversion and infrared two-stage compensation, characterized in that, For implementing the method according to any one of claims 1-8, comprising: The data input module is used to acquire visible light and infrared images of the same scene under fog conditions. A dual-modal feature extraction module is used to extract features from the fogged visible light image to obtain basic visible light features, and to extract structural features from the infrared image to obtain infrared structural features; An infrared guiding frequency correction module is used to perform frequency correction on the visible light basic features based on the infrared structural features to obtain the correction features; A shared encoding and physical parameter prediction module is used to predict transmittance maps and atmospheric optical parameters based on the correction features; The physical model inversion module is used to perform physical model inversion based on the fogged visible light image, the transmittance map and the atmospheric light parameters to obtain the first-stage defogging result. A dense fog region determination and physical mask construction module is used to determine the dense fog region based on the transmittance map and construct a physical mask; An infrared compensation fusion module is used to perform infrared compensation fusion based on the infrared image and the first-stage defogging result in the dense fog area to obtain the final defogging result.
10. A computer device comprising a memory, a processor, and a computer program stored in the memory, characterized in that, The processor executes the computer program to implement the steps of the method according to any one of claims 1-8.