An unmanned aerial vehicle image enhancement method based on a selective state space model
By employing a multimodal image enhancement method based on a selective state-space model, combined with three-way collaborative state coding of visible light and infrared images and flight degradation perception correction, the problem of poor image quality of UAV images in complex environments is solved, achieving clearer target recognition and scene perception.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- UNIV OF JINAN
- Filing Date
- 2026-05-27
- Publication Date
- 2026-06-23
Smart Images

Figure CN122265067A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of image enhancement, and specifically relates to a method for UAV image enhancement based on a selective state-space model. Background Technology
[0002] The rapid development of applications such as UAV remote sensing, low-altitude inspection, and target perception in complex environments has placed higher demands on the target identifiability and environmental adaptability of UAV images. High-quality UAV images not only help in the accurate identification of ground targets and abnormal heat sources, but also provide key support for subsequent target detection and intelligent decision-making. Traditional UAV image enhancement methods usually use a single visible light image as the processing object, and their enhancement effect is easily affected by factors such as aircraft shaking, changes in lighting, and smoke obstruction. In actual flight environments, environmental factors are more complex, so the acquired images often have problems such as motion blur, insufficient brightness, weakened edge texture, and insufficient target saliency, which seriously affect the usability and stability of UAV images in complex scenes, especially at night, in foggy weather, in smoke and dust obstruction, or in low-contrast backgrounds, thus limiting the application effect of UAV vision systems.
[0003] In recent years, technologies such as multimodal image enhancement and state-space models have developed rapidly, providing new solutions for UAV image enhancement in complex environments. Existing UAV image enhancement methods typically use a single visible light image as input, improving image quality through brightness restoration, denoising, and detail reconstruction. However, under low-light, smoke-covered, and long-distance aerial photography conditions, single-modal enhancement methods struggle to stably recover target contours and scene details from severely degraded visible light images. Furthermore, existing multimodal image enhancement methods cannot fully integrate the texture details of visible light images and the thermal radiation information of infrared images. Additionally, existing methods lack explicit modeling of UAV flight imaging degradation factors, making it difficult to adaptively adjust enhancement intensity based on motion blur, low light, and sensor noise. To address these issues, this invention proposes a multimodal UAV image enhancement method based on a selective state-space model. By performing three-way collaborative state encoding of multimodal images, flight degradation perception state correction, and main-auxiliary feature aggregation decoding, this method improves the structural clarity, target saliency, and enhancement stability of UAV images in complex flight environments. Summary of the Invention
[0004] This invention provides a UAV image enhancement method based on a selective state-space model. The method uses visible light images as the primary mode to be enhanced and infrared images as the secondary mode. Through three-way cooperative state coding, flight degradation perception state correction, and primary and secondary feature aggregation decoding, it achieves adaptive enhancement of texture details, edge structures, and target responses in UAV images, including the following steps.
[0005] S1. Visible light and infrared images are simultaneously acquired by the visible light sensor and infrared sensor carried by the UAV for the same ground scene. The visible light image is set as the main mode to be enhanced and the infrared image is the auxiliary mode. Synchronous registration, scale unification and intensity normalization are performed to obtain a dual-mode UAV image pair.
[0006] S2. Construct a bimodal shared state enhancement unit (BSSEU), including Mamba selective module building, adaptive state reading, shared state update, and shared state injection.
[0007] S3. Construct a three-way collaborative state encoder (TCSE).
[0008] S4. Construct the Flight Degradation Sensing State Correction Module (FDSC), which includes flight degradation factor calculation, modal confidence estimation, degradation condition-selective state correction, and hierarchical feature compensation.
[0009] S5. Construct the main and auxiliary feature aggregation and decoding module PAAD, which includes main and auxiliary feature aggregation and decoding reconstruction.
[0010] S6. Construct a multi-objective joint optimization function, including a main modality structure fidelity term, an auxiliary edge enhancement term, a degradation perception quality constraint term, and a main modality content preservation term.
[0011] Preferably, in step S2, a dual-modal shared state enhancement unit (BSSEU) is constructed, characterized in that: S21 and BSSEU include Mamba selective module construction, adaptive state readout, shared state update, and shared state feature injection. For the Mamba selective module construction, it is used for preliminary feature extraction, and the input is visible light features. Infrared features and shared features The specific process is as follows: First, [the process involves]... and Unfold into a one-dimensional sequence along multiple preset scanning directions, and for one of the scanning directions... The specific formula is as follows: , ,in Representation layer normalization, express Depthwise convolution, Represents the set of preset scan directions. Representative along Sequence expansion operation in the direction of direction, and represent and respectively along The one-dimensional scan sequence obtained from the direction is then used to... Selective state recursion is performed on a one-dimensional scan sequence to obtain the scan output and state vector for the corresponding direction. For one scan direction... The specific formula is as follows: , ,in and For Mamba's selective scan operator, and for and respectively along The scan output obtained from the direction, and To separate along The state vector obtained from the direction is then aggregated. The scanning output in each direction yields modal enhancement features. and The specific formula is as follows: , ,in represent Two-dimensional rearrangement operations in the direction, Finally, the output mapping parameters are aggregated separately. The aggregate state vector is obtained from the state vectors in each direction. and The specific formula is as follows: , ; S22, Adaptive State Readout, used to perform weighted aggregation of the state vector along the state dimension. The specific process is as follows: Through... and Calculate the visible light state description separately Infrared status description The specific formula is as follows: , ,in Indicates the length of the state dimension. and For learnable readout weights, and They are respectively and In the Components in each state dimension; S23, Shared State Update, used to update shared features through visible light and infrared state descriptions. The specific process is as follows: First, ... and After concatenation, mapping is performed to summarize the bimodal state descriptions and calculate the shared state description. ,in Represents a multilayer perceptron mapping. The representative channel dimension is concatenated, and then the adaptive update step size is calculated. This is used to control the strength of writing the current bimodal state description into the shared feature, and the specific formula is: ,in Represents the Softplus activation function. The learnable weights are defined for linear projection. Finally, the shared features are updated using the following formula: ,in The shared state decay parameter controls the degree to which historical shared features are retained. Represents the learnable write direction. This represents element-wise multiplication. This represents an extension operation along the state dimension. It is an exponential function; S24. Shared state injection, used to modulate dual-mode features in reverse through shared features. Due to the symmetry between visible light and infrared features in the shared state feature injection process, only the feature injection process for infrared features is described in detail: First, modulation coefficients are generated. and compensation bias The specific formula is as follows: ,in For projection mapping, then using the and The infrared signature is updated as follows: ,in, This is the scaling factor. Using the hyperbolic tangent activation function, the visible light features undergo the same feature injection process to obtain supplemented and updated visible light features. .
[0012] Preferably, in step S2, a dual-modal shared state enhancement unit is proposed to establish the state interaction relationship between the visible light main mode and the infrared auxiliary mode during the encoding stage. Visible light features, infrared features, and state vectors are extracted by the Mamba selective state building module, and the dual-modal state description is obtained by using an adaptive state readout mechanism. Based on this, shared state features are formed through a shared state update mechanism, and the shared state features are injected into the visible light main branch and the infrared auxiliary branch respectively to achieve modulation compensation and complementary enhancement of dual-modal features. This unit can introduce information from the infrared auxiliary mode while maintaining the structural texture information of the visible light main mode, which can effectively alleviate the problems of insufficient target response and unstable structural expression in visible light images in low-light, smoke-covered, or weak-texture scenes, and provide a stable encoding foundation for subsequent three-way collaborative state encoding.
[0013] Preferably, in step S3, a three-way cooperative state encoder (TCSE) is constructed, characterized in that: for the TCSE, the input is a pair of dual-modal UAV images. ,in This is a visible light dominant mode image. For infrared-assisted modal images, the specific process is as follows: TCSE is a multi-level coding structure with the following number of levels. , No. Within each level It is composed of several BSSEU modules connected in series, among which , Indicates the first The BSSEU module concatenation depth at each level, firstly, for the first level... The input features of each level are defined as follows: , , ,in, , and They represent the first Visible light features, infrared features, and shared features at each level and at a BSSEU concatenation depth of 0. and Represents the initial feature mapping. Represents downsampling, and They represent the first Hierarchical and BSSEU concatenation depth is The visible light and infrared characteristics at the location, secondly, regarding the first Each level, input , and Execute in a recursive manner The specific formula for the second BSSEU is as follows: ,in The recursive index and Third, in the Multiple representative BSSEU output features are selected from each level to form a hierarchical feature set, using the following formula: ,in, Indicates the first The set of BSSEU indexes retained in each level This is a round-down operation. For any segmented threshold parameter, The processing yields hierarchical features. ,in, This represents the LeakyReLU activation function. For batch normalization, for Convolution mapping, finally, after After encoding at each level, the visible light end features are output. Infrared terminal features and hierarchical feature set .
[0014] Preferably, in step S3, a three-way collaborative state encoder is proposed, which adopts a three-way structure of visible light main branch, infrared auxiliary branch and shared state branch. Multiple bimodal shared state enhancement units are connected in series in each level, so that visible light texture structure information and infrared target saliency information continuously interact under the guidance of shared state. After each level is completed, the visible light main branch and infrared auxiliary branch are downsampled to obtain feature representations at different scales step by step, and representative intermediate features are selected from each level to form a hierarchical feature set. This encoder can preserve edge, texture and detail information in shallow layers and enhance target response and global context expression in deep layers, so that the subsequent image reconstruction process can obtain clearer edge structure, more stable target saliency and more complete scene hierarchy, thereby enhancing the clarity, recognizability and adaptability of subsequent UAV images in complex environments.
[0015] Preferably, in step S4, a Flight Degradation Sensing State Correction Module (FDSC) is constructed, characterized in that: S41 and FDSC include flight degradation factor calculation, modal confidence estimation, degradation condition-selective state correction, and hierarchical feature compensation. For flight degradation factor calculation, it is used to construct a flight degradation factor vector based on noise during UAV flight imaging. The specific process is as follows: First, based on... Obtain the brightness map of the visible light principal mode image and normalize it to a fixed sensor range. Range, denoted as ,based on The intensity map of the infrared auxiliary modal image is obtained and normalized to a fixed sensor range. Range, denoted as Second, calculate the edge sharpness response of the visible light principal mode image. The specific formula is as follows: ,in For variance calculation, To extract the Laplacian operator for the second-order edge response of the image, the third step is to calculate the blur degradation factor of the visible light principal mode image. , The specific formula is used to characterize the degree of image blurring caused by drone flight jitter or attitude changes. ,in The representative restricts the result in scope, This represents the reference edge response scale in a sharp state. It is a stability constant. Fourth, calculate the low-light degradation factor. , The formula used to characterize the insufficient brightness of the dominant visible light mode in low-light scenes is as follows: ,in This indicates the preset reference brightness threshold. Calculated for the mean. Fifth, calculate the low-contrast degradation factor, which is a stability constant. , The formula used to characterize the compression of grayscale distribution caused by smoke, etc., is as follows: ,in For standard deviation calculation, This serves as the reference contrast scale for visible light images. As a stability constant, sixthly, flat region weight maps for visible and infrared modes are constructed respectively. and The specific formula is as follows: , ,in, For gradient operators, For amplitude calculation, and These are the edge response reference thresholds for the visible light and infrared modes, respectively. and As a stability constant, the noise degradation factor of the visible light principal mode is then calculated. and infrared-assisted modal noise degradation factor This formula is used to distinguish the different effects of low-light visible light noise and infrared thermal noise on the enhancement process. , ,in Indicates the absolute deviation of the median. This indicates a Gaussian filtering operation. and These represent the reference noise intensity scales for the visible light and infrared modes, respectively. As a stability constant, all degradation factors are finally combined into a flight degradation characterization vector. ; S42. Modal confidence estimation is used to determine the reliability of the visible light dominant mode and the infrared auxiliary mode under the current degradation conditions. The specific process is as follows: First, calculate the thermal significance response of the infrared auxiliary mode. ,in, A reference scale representing the significant infrared thermal response. As a stability constant, the reliability of the visible light principal modes is then calculated. and infrared-assisted mode reliability The specific formula is as follows: , ,in , , , , , As non-negative weighting parameters, the reliability is finally normalized to obtain the visible light mode confidence score. and infrared modal confidence The specific formula is as follows: , ,in It is the stability constant; S43. Degradation condition-selective state correction is used to correct the state of latent features at the end of a bimodal domain. The specific process is as follows: First, a degradation correction state is constructed based on the modal confidence. The specific formula is as follows: Subsequently, gated intensities for visible and infrared modal degradation correction are generated, respectively. and The specific formula is as follows: , ,in It is the Sigmoid activation function. and These are the gate mapping weights for the visible light and infrared modes, respectively. and Gated mapping biases are applied for visible and infrared modes, respectively. Finally, selective interpolation is performed between the end features of the two modes and the degradation correction state to obtain the corrected visible end features. and infrared end features The specific formula is as follows: , ; S44. Hierarchical feature compensation is used to compensate for the degradation correction state to features at different levels. The specific process is as follows: First, for any feature in the hierarchical feature set... Calculate the compensation weight ,in and The learnable compensation weight mapping parameters are then used to compensate for the hierarchical features, with the specific formula as follows: ,in, For the compensated hierarchical features, To and The corresponding scale matching mapping is then performed. Finally, after compensating all hierarchical features, the compensated hierarchical feature set is obtained. .
[0016] Preferably, in step S4, a flight degradation perception state correction module is proposed to adaptively correct the dual-modal features based on the blurring, low illumination, low contrast, and noise degradation during the UAV flight imaging process. This module first calculates the flight degradation representation vector and estimates the confidence of the visible light primary mode and infrared auxiliary mode based on the degree of degradation. Subsequently, a degradation correction state is constructed based on the modal confidence, and selective state correction is performed on the visible light end features and infrared end features. At the same time, the degradation correction state is compensated to different level features. This module can enhance reliable features, suppress degradation response, and supplement hierarchical detail information when the image is affected by flight jitter, low light, smoke, or noise, so that the UAV image reconstructed subsequently has clearer edges, more stable target salience, and fewer noise artifacts.
[0017] Preferably, in step S5, a main-auxiliary feature aggregation decoding module PAAD is constructed. The PAAD is characterized by aggregating features from the visible light main mode and the infrared auxiliary mode and decoding them to generate an enhanced UAV image. Specifically, the process involves first unifying the compensated hierarchical features at different scales to the final feature scale and then aggregating them to obtain hierarchical enhancement auxiliary features. The specific formula is as follows: ,in Indicates traversal All compensated hierarchical features in the data are then spliced together using channel dimensions. For scale alignment, visible light end features, infrared end features, and hierarchical enhancement auxiliary features are then aggregated to obtain enhancement features for image reconstruction. Finally, the enhanced image is obtained through decoding. ,in It is a U-Net-style decoder with step-by-step upsampling.
[0018] Preferably, in step S5, a master-auxiliary feature aggregation decoding module is proposed to aggregate and decode the corrected visible light master mode features, infrared auxiliary mode features, and compensated hierarchical features. This module first performs scale alignment and convolution aggregation on the compensated hierarchical features at different scales to form hierarchical enhancement auxiliary features containing edge texture, target response, and contextual information. Subsequently, taking the corrected visible light master mode end features as the main body, the corrected infrared end features are introduced according to the confidence of the infrared auxiliary mode, and enhanced features are generated by combining the hierarchical enhancement auxiliary features. Finally, the image spatial resolution is restored step by step through a U-Net-style decoder to output the enhanced UAV image. This module fully supplements the infrared target salient information and multi-level detail information while maintaining the visible light scene structure and texture details, so that the reconstructed image has a clearer target outline, a more complete texture hierarchy, and a more natural visual effect.
[0019] Preferably, in step S6, a multi-objective joint optimization function is constructed, characterized in that: the multi-objective joint optimization function is used to perform end-to-end joint training optimization of TCSE, FDSC, and PAAD, and the specific process is as follows: First, the principal modality structure fidelity term is defined. ,in, For structural similarity measurement, for The brightness map is then used to define auxiliary edge enhancement terms. ,in for Norm, This indicates taking the maximum value for each pixel. Infrared edge auxiliary coefficient and Third, calculate the degradation-perceived quality constraint term. ,in It is a non-negative truncation function. Fourth, construct the main modality content preservation term as the total variation regularization term. ,in For low-pass filtering, the loss functions are weighted and summed to obtain the multi-objective joint optimization function. ,in , , and , respectively, represent the non-negative weight coefficients of the main modality structure fidelity term, auxiliary edge enhancement term, degradation perception quality constraint term, and main modality content preservation term.
[0020] Preferably, in step S6, a multi-objective joint optimization function is proposed for end-to-end joint training and optimization of TCSE, FDSC, and PAAD. This function includes a main modality structure fidelity term, an auxiliary edge enhancement term, a degradation perception quality constraint term, and a main modality content preservation term. The main modality structure fidelity term is used to maintain the consistency between the enhanced image and the visible light main modality in terms of scene structure. The auxiliary edge enhancement term is used to supplement the target edge response based on the confidence level of the infrared auxiliary modality. The degradation perception quality constraint term is used to adaptively adjust the enhancement intensity based on the degree of blurring, low illumination, low contrast, and noise degradation. The main modality content preservation term is used to constrain the enhanced image from deviating from the main content of the visible light main modality. Through the above joint constraints, the enhanced UAV image achieves a balance between structure preservation, detail enhancement, target saliency, and visual naturalness.
[0021] Compared with existing technologies, the beneficial effects of this invention are as follows: It combines the advantages of selective state-space models in long-range context modeling with the ability of infrared-assisted modalities in representing target thermal response. Addressing the pain points of low illumination, low contrast, motion blur, noise interference, and insufficient target saliency in UAV visible light images under complex flight environments, it achieves state-level interaction of multimodal information through dual-modal shared state enhancement and a three-way collaborative state coding mechanism. Simultaneously, it introduces a flight degradation perception state correction module to perform confidence estimation, selective correction, and hierarchical compensation of dual-modal latent features under different degradation conditions. Combined with main and auxiliary feature aggregation decoding and multi-target joint optimization, it achieves stable enhancement and clear reconstruction of UAV images. This method possesses advantages such as strong long-range dependency modeling capability, good degradation adaptability, and naturally stable enhancement results, significantly improving the usability and accuracy of UAV images in target recognition, scene perception, and subsequent visual tasks under complex flight scenarios. Attached Figure Description
[0022] Figure 1 This is a flowchart of a UAV image enhancement method based on a selective state-space model provided by the present invention.
[0023] Figure 2 This is a structural diagram of the dual-modal shared state enhancement unit (BSSEU) provided by the present invention.
[0024] Figure 3 This is a structural diagram of the three-channel cooperative state encoder (TCSE) provided by the present invention.
[0025] Figure 4 This is a structural diagram of the Flight Degradation Sensing State Correction Module (FDSC) provided by the present invention.
[0026] Figure 5 This is a structural diagram of the PAAD (Paper-Auxiliary Feature Aggregation Decoding Module) provided by the present invention.
[0027] Figure 6 This is the structure diagram of the multi-objective joint optimization function provided by the present invention. Detailed Implementation
[0028] The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative effort are within the scope of protection of the present invention.
[0029] Please see Figures 1 to 6This invention provides a UAV image enhancement method based on a selective state space model. It introduces multimodal information and achieves UAV image enhancement in complex environments through dual-modal shared state enhancement, three-way collaborative state coding, flight degradation perception correction, and main and auxiliary feature aggregation decoding.
[0030] S1. Visible light and infrared images are simultaneously acquired by the visible light sensor and infrared sensor carried by the UAV for the same ground scene. The visible light image is set as the main mode to be enhanced and the infrared image is the auxiliary mode. Synchronous registration, scale unification and intensity normalization are performed to obtain a dual-mode UAV image pair.
[0031] Furthermore, in S1, the visible light sensor and the infrared sensor are jointly mounted on the UAV platform, with their fields of view overlapping. During flight data acquisition, visible light and infrared images of the same ground scene are simultaneously acquired using an onboard timestamp alignment method. Before acquisition, the visible light and infrared sensors are calibrated using a preset calibration scene, and the extrinsic parameters are calibrated by combining the dual-sensor mounting pose relationship to obtain the spatial mapping relationship between the visible light and infrared sensors. In the data preprocessing stage, distortion correction and spatial registration are first performed on the visible light and infrared images to achieve pixel correspondence between the two modal images in the same scene area. Then, the two modal images are unified to the same spatial scale through interpolation resampling, and the brightness information of the visible light image and the intensity information of the infrared image are normalized to finally obtain a dual-modal UAV image pair for subsequent image enhancement.
[0032] S2. Construct a bimodal shared state enhancement unit (BSSEU), including Mamba selective module building, adaptive state reading, shared state update, and shared state injection.
[0033] Furthermore, in S2, a dual-modal shared state enhancement unit (BSSEU) is constructed, the specific process of which includes...
[0034] S21 and BSSEU include Mamba selective module construction, adaptive state readout, shared state update, and shared state feature injection. For the Mamba selective module construction, it is used for preliminary feature extraction, and the input is visible light features. Infrared features and shared features The specific process is as follows: First, [the process involves]... and Unfold into a one-dimensional sequence along multiple preset scanning directions, and for one of the scanning directions... The specific formula is as follows: , ,in Representation layer normalization, express Depthwise convolution, This represents a preset set of scanning directions. In this embodiment, It includes four directions: horizontal forward, horizontal reverse, vertical forward, and vertical reverse. Representative along Sequence expansion operation in the direction of direction, and represent and respectively along The one-dimensional scan sequence obtained from the direction is then used to... Selective state recursion is performed on a one-dimensional scan sequence to obtain the scan output and state vector for the corresponding direction. For one scan direction... The specific formula is as follows: , ,in and For Mamba's selective scan operator, and for and respectively along The scan output obtained from the direction, and To separate along The state vector obtained from the direction is then aggregated. The scanning output in each direction yields modal enhancement features. and The specific formula is as follows: , ,in represent Two-dimensional rearrangement operations in the direction, Finally, the output mapping parameters are aggregated separately. The aggregate state vector is obtained from the state vectors in each direction. and The specific formula is as follows: , .
[0035] S22, Adaptive State Readout, used to perform weighted aggregation of the state vector along the state dimension. The specific process is as follows: Through... and Calculate the visible light state description separately Infrared status description The specific formula is as follows: , ,in In this embodiment, the length of the state dimension is represented. , and For learnable readout weights, and They are respectively and In the Components in each state dimension.
[0036] S23, Shared State Update, used to update shared features through visible light and infrared state descriptions. The specific process is as follows: First, ... and After concatenation, mapping is performed to summarize the bimodal state descriptions and calculate the shared state description. ,in Represents a multilayer perceptron mapping. The representative channel dimension is concatenated, and then the adaptive update step size is calculated. This is used to control the strength of writing the current bimodal state description into the shared feature, and the specific formula is: ,in Represents the Softplus activation function. The learnable weights are defined for linear projection. Finally, the shared features are updated using the following formula: ,in The shared state decay parameter controls the degree to which historical shared features are retained. Represents the learnable write direction. This represents element-wise multiplication. This represents an extension operation along the state dimension. It is an exponential function.
[0037] S24. Shared state injection, used to modulate dual-mode features in reverse through shared features. Due to the symmetry between visible light and infrared features in the shared state feature injection process, only the feature injection process for infrared features is described in detail: First, modulation coefficients are generated. and compensation bias The specific formula is as follows: ,in For projection mapping, then using the and The infrared signature is updated as follows: ,in, In this embodiment, the scaling factor is used. , Using the hyperbolic tangent activation function, the visible light features undergo the same feature injection process to obtain supplemented and updated visible light features. .
[0038] S3. Construct a three-way collaborative state encoder (TCSE).
[0039] Furthermore, in S3, a three-way cooperative state encoder (TCSE) is constructed. The specific process includes: for TCSE, the input is a pair of dual-modal UAV images. ,in This is a visible light dominant mode image. For infrared-assisted modal images, the specific process is as follows: TCSE is a multi-level coding structure with the following number of levels. , No. Within each level It is composed of several BSSEU modules connected in series, among which , Indicates the first The serial depth of the BSSEU modules at each level is as follows, in this embodiment: , , , First, regarding the first The input features of each level are defined as follows: , , ,in, , and They represent the first Visible light features, infrared features, and shared features at each level and at a BSSEU concatenation depth of 0. and Represents the initial feature mapping. Represents downsampling, and They represent the first Hierarchical and BSSEU concatenation depth is The visible light and infrared characteristics at the location, secondly, regarding the first Each level, input , and Execute in a recursive manner The specific formula for the second BSSEU is as follows: ,in The recursive index and Third, in the Multiple representative BSSEU output features are selected from each level to form a hierarchical feature set, using the following formula: ,in, Indicates the first The set of BSSEU indexes retained in each level This is a round-down operation. In this embodiment, the segmented threshold parameter is used. For any The processing yields hierarchical features. ,in, This represents the LeakyReLU activation function. For batch normalization, for Convolution mapping, finally, after After encoding at each level, the visible light end features are output. Infrared terminal features and hierarchical feature set .
[0040] S4. Construct the Flight Degradation Sensing State Correction Module (FDSC), which includes flight degradation factor calculation, modal confidence estimation, degradation condition-selective state correction, and hierarchical feature compensation.
[0041] Furthermore, in S4, the Flight Degradation Sensing State Correction Module (FDSC) is constructed, and the specific process includes...
[0042] S41 and FDSC include flight degradation factor calculation, modal confidence estimation, degradation condition-selective state correction, and hierarchical feature compensation. For flight degradation factor calculation, it is used to construct a flight degradation factor vector based on noise during UAV flight imaging. The specific process is as follows: First, based on... Obtain the brightness map of the visible light principal mode image and normalize it to a fixed sensor range. Range, denoted as ,based on The intensity map of the infrared auxiliary modal image is obtained and normalized to a fixed sensor range. Range, denoted as Second, calculate the edge sharpness response of the visible light principal mode image. The specific formula is as follows: ,in For variance calculation, In this embodiment, to extract the Laplacian operator for the second-order edge response of the image, use Third, calculate the blur degradation factor of the visible light principal mode image using the Laplacian convolution kernel. , The specific formula is used to characterize the degree of image blurring caused by drone flight jitter or attitude changes. ,in The representative restricts the result in scope, This represents the reference edge response scale in a sharp state. It is a stability constant. In this embodiment, , Fourth, calculate the low-light degradation factor. , The formula used to characterize the insufficient brightness of the dominant visible light mode in low-light scenes is as follows: ,in This indicates the preset reference brightness threshold. Calculated for the mean. As a stability constant, in this embodiment, , Fifth, calculate the low contrast degradation factor. , The formula used to characterize the compression of grayscale distribution caused by smoke, etc., is as follows: ,in For standard deviation calculation, This serves as the reference contrast scale for visible light images. As a stability constant, in this embodiment, , Sixth, construct flat region weight maps for visible light and infrared modes respectively. and The specific formula is as follows: , ,in, For gradient operators, For amplitude calculation, and These are the edge response reference thresholds for the visible light and infrared modes, respectively. and As a stability constant, in this embodiment, , , , Subsequently, the noise degradation factor of the visible light principal mode was calculated. and infrared-assisted modal noise degradation factor This formula is used to distinguish the different effects of low-light visible light noise and infrared thermal noise on the enhancement process. , ,in Indicates the absolute deviation of the median. This indicates a Gaussian filtering operation. and These represent the reference noise intensity scales for the visible light and infrared modes, respectively. As a stability constant, in this embodiment, , , Finally, all degradation factors are combined into a flight degradation characterization vector. .
[0043] S42. Modal confidence estimation is used to determine the reliability of the visible light dominant mode and the infrared auxiliary mode under the current degradation conditions. The specific process is as follows: First, calculate the thermal significance response of the infrared auxiliary mode. ,in, A reference scale representing the significant infrared thermal response. As a stability constant, in this embodiment, , Subsequently, the reliability of the visible light principal mode was calculated. and infrared-assisted mode reliability The specific formula is as follows: , ,in , , , , , In this embodiment, the weighting parameter is non-negative. , , , , , Finally, the reliability is normalized to obtain the confidence level of the visible light modes. and infrared modal confidence The specific formula is as follows: , ,in As a stability constant, in this embodiment, .
[0044] S43. Degradation condition-selective state correction is used to correct the state of latent features at the end of a bimodal domain. The specific process is as follows: First, a degradation correction state is constructed based on the modal confidence. The specific formula is as follows: Subsequently, gated intensities for visible and infrared modal degradation correction are generated, respectively. and The specific formula is as follows: , ,in It is the Sigmoid activation function. and These are the gate mapping weights for the visible light and infrared modes, respectively. and Gated mapping biases are applied for visible and infrared modes, respectively. Finally, selective interpolation is performed between the end features of the two modes and the degradation correction state to obtain the corrected visible end features. and infrared end features The specific formula is as follows: , .
[0045] S44. Hierarchical feature compensation is used to compensate for the degradation correction state to features at different levels. The specific process is as follows: First, for any feature in the hierarchical feature set... Calculate the compensation weight ,in and The learnable compensation weight mapping parameters are then used to compensate for the hierarchical features, with the specific formula as follows: ,in, For the compensated hierarchical features, To and The corresponding scale matching mapping is then performed. Finally, after compensating all hierarchical features, the compensated hierarchical feature set is obtained. .
[0046] S5. Construct the main and auxiliary feature aggregation and decoding module PAAD, which includes main and auxiliary feature aggregation and decoding reconstruction.
[0047] Furthermore, in S5, a main-auxiliary feature aggregation and decoding module (PAAD) is constructed. The specific process includes: PAAD is used to aggregate features from the visible light main mode and the infrared auxiliary mode and decode them to generate enhanced UAV images. Specifically, the process involves: first, unifying the compensated hierarchical features at different scales to the final feature scale and aggregating them to obtain hierarchical enhanced auxiliary features. The specific formula is as follows: ,in Indicates traversal All compensated hierarchical features in the data are then spliced together using channel dimensions. For scale alignment, visible light end features, infrared end features, and hierarchical enhancement auxiliary features are then aggregated to obtain enhancement features for image reconstruction. Finally, the enhanced image is obtained through decoding. ,in It is a U-Net-style decoder with step-by-step upsampling.
[0048] S6. Construct a multi-objective joint optimization function, including a main modality structure fidelity term, an auxiliary edge enhancement term, a degradation perception quality constraint term, and a main modality content preservation term.
[0049] Furthermore, in S6, a multi-objective joint optimization function is constructed. The specific process includes: the multi-objective joint optimization function is used to perform end-to-end joint training optimization of TCSE, FDSC, and PAAD. The specific process is as follows: first, define the main modality structure fidelity term. ,in, For structural similarity measurement, for The brightness map is then used to define auxiliary edge enhancement terms. ,in for Norm, This indicates taking the maximum value for each pixel. Infrared edge auxiliary coefficient and In this embodiment, Third, calculate the degradation-perceived quality constraint term. ,in It is a non-negative truncation function. Fourth, construct the main modality content preservation term as the total variation regularization term. ,in For low-pass filtering, the loss functions are weighted and summed to obtain the multi-objective joint optimization function. ,in , , and These represent the non-negative weight coefficients of the main modality structure fidelity term, auxiliary edge enhancement term, degradation-perceived quality constraint term, and main modality content preservation term, respectively. In this embodiment, , , , .
[0050] Furthermore, a UAV image enhancement method based on a selective state-space model is developed using the PyCharm application and Python language, employing the PyTorch framework. The model input is a dual-modal image of infrared and visible light with a resolution of 512×512.
[0051] The above are merely preferred embodiments of the present invention. It should be noted that those skilled in the art can make various modifications and improvements without departing from the inventive concept of the present invention, and these modifications and improvements all fall within the protection scope of the present invention.
Claims
1. A UAV image enhancement method based on a selective state-space model, characterized in that, Includes the following steps: S1. Visible light and infrared images are simultaneously acquired by the visible light sensor and infrared sensor carried by the UAV for the same ground scene. The visible light image is set as the main mode to be enhanced and the infrared image is the auxiliary mode. Synchronous registration, scale unification and intensity normalization are performed to obtain a dual-mode UAV image pair. S2. Construct a bimodal shared state enhancement unit (BSSEU), including Mamba selective module building, adaptive state reading, shared state update, and shared state injection. S3. Construct a three-way collaborative state encoder (TCSE); S4. Construct the Flight Degradation Sensing State Correction Module (FDSC), including flight degradation factor calculation, modal confidence estimation, degradation condition-selective state correction, and hierarchical feature compensation. S5. Construct the main and auxiliary feature aggregation and decoding module PAAD, including main and auxiliary feature aggregation and decoding reconstruction; S6. Construct a multi-objective joint optimization function, including a main modality structure fidelity term, an auxiliary edge enhancement term, a degradation perception quality constraint term, and a main modality content preservation term.
2. The UAV image enhancement method based on a selective state-space model according to claim 1, wherein in step S2, a dual-modal shared state enhancement unit (BSSEU) is constructed, characterized in that: S21 and BSSEU include Mamba selective module construction, adaptive state readout, shared state update, and shared state feature injection. For the Mamba selective module construction, it is used for preliminary feature extraction, and the input is visible light features. Infrared features and shared features The specific process is as follows: First, [the process involves]... and Unfold into a one-dimensional sequence along multiple preset scanning directions, and for one of the scanning directions... The specific formula is as follows: , ,in Representation layer normalization, express Depthwise convolution, Represents the set of preset scan directions. Representative along Sequence expansion operation in the direction of direction, and represent and respectively along The one-dimensional scan sequence obtained from the direction is then used to... Selective state recursion is performed on a one-dimensional scan sequence to obtain the scan output and state vector for the corresponding direction. For one scan direction... The specific formula is as follows: , ,in and For the selective scan operator of Mamba, and for and respectively along The scan output obtained from the direction, and To separate along The state vector obtained from the direction is then aggregated. The scanning output in each direction yields modal enhancement features. and The specific formula is as follows: , ,in represent Two-dimensional rearrangement operations in the direction, To output the mapping parameters, finally, aggregate them separately. The aggregate state vector is obtained from the state vectors in each direction. and The specific formula is as follows: , ; S22, Adaptive State Readout, used to perform weighted aggregation of the state vector along the state dimension. The specific process is as follows: Through... and Calculate the visible light state description separately Infrared status description The specific formula is as follows: , ,in Indicates the length of the state dimension. and For learnable readout weights, and They are respectively and In the Components in each state dimension; S23, Shared State Update, used to update shared features through visible light and infrared state descriptions. The specific process is as follows: First, ... and After concatenation, mapping is performed to summarize the bimodal state descriptions and calculate the shared state description. ,in Represents a multilayer perceptron mapping. The representative channel dimension is concatenated, and then the adaptive update step size is calculated. This is used to control the strength of writing the current bimodal state description into the shared feature, and the specific formula is: ,in Represents the Softplus activation function. The learnable weights are defined for linear projection. Finally, the shared features are updated using the following formula: ,in The shared state decay parameter controls the degree to which historical shared features are retained. Represents the learnable write direction. This represents element-wise multiplication. This represents an extension operation along the state dimension. It is an exponential function; S24. Shared state injection, used to modulate dual-mode features in reverse through shared features. Due to the symmetry between visible light and infrared features in the shared state feature injection process, only the feature injection process for infrared features is described in detail: First, modulation coefficients are generated. and compensation bias The specific formula is as follows: ,in For projection mapping, then using the and The infrared signature is updated as follows: ,in, This is the scaling factor. Using the hyperbolic tangent activation function, the visible light features undergo the same feature injection process to obtain supplemented and updated visible light features. .
3. The UAV image enhancement method based on a selective state-space model according to claim 1, wherein in step S3, a three-way cooperative state encoder (TCSE) is constructed, characterized in that: For TCSE, the input is a pair of dual-modal UAV images. ,in This is a visible light dominant mode image. For infrared-assisted modal images, the specific process is as follows: TCSE is a multi-level coding structure with the following number of levels. , No. Within each level It is composed of several BSSEU modules connected in series, among which , Indicates the first The BSSEU module concatenation depth at each level, firstly, for the first... The input features of each level are defined as follows: , , ,in, , and They represent the first Visible light features, infrared features, and shared features at each level and at a BSSEU concatenation depth of 0. and Represents the initial feature mapping. Represents downsampling, and They represent the first Hierarchical and BSSEU concatenation depth is The visible light and infrared characteristics at the location, secondly, regarding the first Each level, input , and Execute in a recursive manner The specific formula for the second BSSEU is as follows: ,in The recursive index and Third, in the Multiple representative BSSEU output features are selected from each level to form a hierarchical feature set, using the following formula: ,in, Indicates the first The set of BSSEU indexes retained in each level This is a round-down operation. For any segmented threshold parameter, The processing yields hierarchical features. ,in, This represents the LeakyReLU activation function. For batch normalization, for Convolution mapping, finally, after After encoding at each level, the visible light end features are output. Infrared terminal features and hierarchical feature set .
4. The UAV image enhancement method based on a selective state-space model according to claim 1, wherein in step S4, a Flight Degradation Perception State Correction (FDSC) module is constructed, characterized in that: S41 and FDSC include flight degradation factor calculation, modal confidence estimation, degradation condition-selective state correction, and hierarchical feature compensation. For flight degradation factor calculation, it is used to construct a flight degradation factor vector based on noise during UAV flight imaging. The specific process is as follows: First, based on... Obtain the brightness map of the visible light principal mode image and normalize it to a fixed sensor range. Range, denoted as ,based on The intensity map of the infrared-assisted modal image is obtained and normalized to a fixed sensor range. Range, denoted as Second, calculate the edge sharpness response of the visible light principal mode image. The specific formula is as follows: ,in For variance calculation, To extract the Laplacian operator for the second-order edge response of the image, the third step is to calculate the blur degradation factor of the visible light principal mode image. , The specific formula is used to characterize the degree of image blurring caused by drone flight jitter or attitude changes. ,in The representative restricts the result in scope, This represents the reference edge response scale in a sharp state. It is a stability constant. Fourth, calculate the low-light degradation factor. , The formula used to characterize the insufficient brightness of the dominant visible light mode in low-light scenes is as follows: ,in This indicates the preset reference brightness threshold. Calculated for the mean. Fifth, calculate the low-contrast degradation factor, which is a stability constant. , The formula used to characterize the compression of grayscale distribution caused by smoke, etc., is as follows: ,in For standard deviation calculation, This serves as the reference contrast scale for visible light images. As a stability constant, sixthly, flat region weight maps for visible and infrared modes are constructed respectively. and The specific formula is as follows: , ,in, For gradient operators, For amplitude calculation, and These are the edge response reference thresholds for the visible light and infrared modes, respectively. and As a stability constant, the noise degradation factor of the visible light principal mode is then calculated. and infrared-assisted modal noise degradation factor This formula is used to distinguish the different effects of low-light visible light noise and infrared thermal noise on the enhancement process. , ,in Indicates the absolute deviation of the median. This indicates a Gaussian filtering operation. and These represent the reference noise intensity scales for the visible light and infrared modes, respectively. As a stability constant, all degradation factors are finally combined into a flight degradation characterization vector. ; S42. Modal confidence estimation is used to determine the reliability of the visible light dominant mode and the infrared auxiliary mode under the current degradation conditions. The specific process is as follows: First, calculate the thermal significance response of the infrared auxiliary mode. ,in, A reference scale representing the significant infrared thermal response. As a stability constant, the reliability of the visible light principal modes is then calculated. and infrared-assisted mode reliability The specific formula is as follows: , ,in , , , , , As non-negative weighting parameters, the reliability is finally normalized to obtain the visible light mode confidence score. and infrared modal confidence The specific formula is as follows: , ,in It is the stability constant; S43. Degradation Conditional Selective State Correction: This method is used to correct the state of bimodal terminal features. The specific process is as follows: First, a degradation correction state is constructed based on the modality confidence. The specific formula is as follows: Subsequently, gated intensities for visible and infrared modal degradation correction are generated, respectively. and The specific formula is as follows: , ,in It is the Sigmoid activation function. and These are the gate mapping weights for the visible light and infrared modes, respectively. and Gated mapping biases are applied for visible and infrared modes, respectively. Finally, selective interpolation is performed between the end features of the two modes and the degradation correction state to obtain the corrected visible end features. and infrared end features The specific formula is as follows: , ; S44. Hierarchical feature compensation is used to compensate for the degradation correction state to features at different levels. The specific process is as follows: First, for any feature in the hierarchical feature set... Calculate the compensation weight ,in and The learnable compensation weight mapping parameters are then used to compensate for the hierarchical features, with the specific formula as follows: ,in, For the compensated hierarchical features, To and The corresponding scale matching mapping is then performed. Finally, after compensating all hierarchical features, the compensated hierarchical feature set is obtained. .
5. The UAV image enhancement method based on a selective state-space model according to claim 1, wherein in step S5, a main and auxiliary feature aggregation decoding module PAAD is constructed, characterized in that: PAAD is used to aggregate features from the visible light primary mode and infrared secondary mode and decode them to generate enhanced UAV images. The specific process is as follows: First, the compensated hierarchical features at different scales are unified to the end feature scale and aggregated to obtain hierarchical enhanced auxiliary features. The specific formula is as follows: ,in Indicates traversal All compensated hierarchical features in the data are then spliced together using channel dimensions. For scale alignment, visible light end features, infrared end features, and hierarchical enhancement auxiliary features are then aggregated to obtain enhancement features for image reconstruction. Finally, the enhanced image is obtained through decoding. ,in It is a U-Net-style decoder with step-by-step upsampling.
6. The UAV image enhancement method based on a selective state-space model according to claim 1, wherein in step S6, a multi-objective joint optimization function is constructed, characterized in that: The process for end-to-end joint training optimization of TCSE, FDSC, and PAAD is as follows: First, define the principal modality structure fidelity term. ,in, For structural similarity measurement, for The brightness map is then used to define auxiliary edge enhancement terms. ,in for Norm, This indicates taking the maximum value for each pixel. Infrared edge auxiliary coefficient and Third, calculate the degradation-perceived quality constraint term. ,in It is a non-negative truncation function. Fourth, construct the main modality content preservation term as the total variation regularization term. ,in For low-pass filtering, the loss functions are weighted and summed to obtain the multi-objective joint optimization function. ,in , , and , respectively, represent the non-negative weight coefficients of the main modality structure fidelity term, auxiliary edge enhancement term, degradation perception quality constraint term, and main modality content preservation term.