An infrared small target detection method based on deep unfolding network and learnable sparse transform
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHANDONG UNIV OF SCI & TECH
- Filing Date
- 2026-05-22
- Publication Date
- 2026-06-19
Smart Images

Figure CN122244640A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of image processing and computer vision technology, and in particular to an infrared small target detection method based on deep unfolded networks and learnable sparse transformations. Background Technology
[0002] Infrared Small Target Detection (IRSTD) technology boasts advantages such as passive imaging, all-weather operation, and excellent resistance to electromagnetic interference. However, in practical applications, limitations imposed by atmospheric attenuation in long-distance imaging and the physical constraints of infrared sensors mean that real-world scenarios often feature high-intensity background clutter (such as undulating clouds, sea surface reflections, or urban edges) and non-uniform sensor noise. This makes target signals easily obscured, resulting in extremely low thermal contrast and signal-to-noise ratio, posing significant challenges to detection algorithms, particularly regarding false alarms and missed detections.
[0003] To effectively separate small targets from complex infrared backgrounds, existing infrared small target detection techniques include traditional model-driven methods, data-driven deep learning methods, and methods based on deep unfolded networks (DUNs). While existing infrared small target detection techniques, especially existing deep unfolded network models (such as RPCANet), have to some extent balanced the interpretability of traditional optimization methods with the learning capabilities of deep learning, they still suffer from the following major technical problems and shortcomings when dealing with extremely complex real-world infrared scenes: 1. Background estimation errors have a cascading accumulation problem, resulting in incomplete background suppression: In the existing deep unfolding network architecture, the information transmission mechanism between different unfolding stages is usually too simple and singular. This causes early background estimation biases (such as high-frequency clutter residues or edge interference) to not only fail to be automatically corrected in subsequent stages, but are also easily amplified step by step (error diffusion) as the iteration depth increases, which seriously affects the purity of the final background extraction.
[0004] 2. Fixed, hand-crafted sparse priors lack adaptability, limiting the ability to extract weak targets: Existing methods mostly adhere to traditional image-domain-based approaches when constructing sparse constraints for targets. Norms and fixed soft threshold operators. These rigid, non-learning operators are difficult to adaptively capture the complex sparse structure of small infrared targets in the transform domain. When faced with targets with varied shapes and weak features, or in environments with extremely low signal-to-noise ratios, this fixed sparse modeling approach will greatly limit the model's target separation ability, and may easily lead to the "over-shrinkage" or even direct loss of real weak target signals during iterative decomposition.
[0005] 3. Limited image reconstruction fidelity and insufficiently refined feature fusion: In theoretical physical models, the original infrared image is usually assumed to be a simple linear superposition of the background and the target. However, in deep network implementations, directly applying simple linear addition operations is often insufficient to handle the complex feature artifacts generated during the decomposition process, making it difficult to achieve high-level cross-domain feature fusion, thus resulting in suboptimal feature fidelity of the reconstructed image.
[0006] The statements in this section are merely background information related to the present invention and do not necessarily constitute prior art. Summary of the Invention
[0007] This invention proposes an infrared small target detection method based on deep unfolded networks and learnable sparse transformations. By mapping the traditional robust principal component analysis optimization process based on the alternating direction multiplier method to a cascaded stage of a deep neural network, and through adaptive learning and feature enhancement, it is beneficial to achieve high-precision and low false alarm rate target detection under extremely low signal-to-noise ratio conditions.
[0008] To achieve the above objectives, the present invention adopts the following technical solution: An infrared small target detection method based on deep unfolded networks and learnable sparse transforms includes the following steps: Step 1. Input infrared image Set the original input image and initial sparse target components Initial Lagrange multipliers , and All are zero matrices, and initial deep background memory features are set simultaneously. ; Step 2. Construct a system composed of... A deep network consisting of cascaded stages processes the input infrared image. Perform iterative processing, for from arrive For each cascaded stage, steps 3 through 6 are repeated; Step 3. Using the attention residual background estimation module, based on the reconstructed image from the previous stage... sparse target components Lagrange multipliers and the deep background memory features of the previous stage Estimate the low-rank background components at the current stage. It outputs updated deep background memory features. ; Step 4. Utilize the learnable sparse transform target extraction module, based on... , , as well as By combining the sparse transformation basis and step size parameters of adaptive learning, the sparse target components of the current stage are calculated. ; Step 5. Use the attention-guided image reconstruction module to fuse the low-rank background components from the current stage. and sparse target components To obtain the reconstructed image at the current stage. And use it as input for the next stage; Step 6. Using the multiplier update module, based on , and the low-rank background components at the current stage and sparse target components Calculate the Lagrange multipliers for the current stage. And use it as input for the next stage; Step 7. After the iteration is complete, use the sparse target components from the last stage. This is the final target detection result image.
[0009] Furthermore, based on the aforementioned infrared small target detection method based on deep unfolded networks and learnable sparse transformations, this invention also proposes a computer device, which includes a memory and one or more processors.
[0010] The executable code is stored in memory. When the processor executes the executable code, it implements the steps of the infrared small target detection method based on deep unfolded networks and learnable sparse transforms described above.
[0011] Furthermore, based on the aforementioned infrared small target detection method based on deep unfolded networks and learnable sparse transforms, this invention also proposes a computer-readable storage medium storing a program that, when executed by a processor, implements the steps of the aforementioned infrared small target detection method based on deep unfolded networks and learnable sparse transforms.
[0012] The present invention has the following advantages: As described above, this invention presents an infrared small target detection method based on deep unfolded networks and learnable sparse transformations. This method overcomes the technical bottleneck of the gradual accumulation and amplification of background errors in traditional deep unfolded networks. This invention unfolds the rigorous robust principal component analysis (RPCA) and alternating direction multiplier method (ADMM) mathematical iteration process into a forward propagation neural network, endowing the model with strong physical interpretability. Due to this structured prior guidance, this invention achieves detection performance surpassing existing large-scale data-driven networks (often requiring tens of megabytes of parameters) with only a very small number of parameters (approximately 1.10M). This lightweight characteristic significantly saves computational memory and operating power, making it extremely suitable for efficient deployment on edge hardware devices with limited computing resources. Furthermore, this invention utilizes an innovatively designed multi-channel attention-supervised transmission enhancement module (MCASTM) to actively filter out interference channels carrying high-frequency clutter using a hard threshold truncation mechanism, achieving cross-stage feature purification. Extensive experiments have demonstrated that, under complex backgrounds such as thick clouds and strong sea surface light, this invention significantly improves the detection rate (Pd) while reducing the false alarm rate (Fa) by orders of magnitude. Furthermore, this invention addresses the problem of poor adaptability in traditional model-driven methods that often rely on manually set fixed sparse prior operators and empirical thresholds. By proposing a learnable sparse transformation target extraction module, it transforms the model into a fully differentiable neural network capable of adaptively learning the optimal sparse representation basis and dynamically generating soft threshold parameters based on the input scene. This adaptive mechanism greatly enhances the model's generalization ability to targets with varying shapes and sizes, enabling the system to operate and use more conveniently and intelligently without manual parameter tuning in various complex infrared scenes. Therefore, it boasts advantages such as strong environmental adaptability and freedom from manual parameter tuning. Additionally, this invention addresses the problem of target contour dilation or feature artifacts caused by simple linear feature addition by introducing the Dynamic Attention-Guided Feature Enhancement (DAGFEM) module. This DAGFEM module utilizes a dual-branch parallel architecture, combining channel attention for filtering semantic noise with dynamic spatial convolution for adaptive modulation. This achieves context-aware, refined residual correction, allowing the network to accurately locate the target center while faithfully reconstructing the boundary contours of small, single-pixel-level targets, significantly improving image reconstruction quality and target fidelity. The method of this invention maps the traditional robust principal component analysis optimization solution process based on the alternating direction multiplier method to a cascaded stage of a deep neural network. Through adaptive learning and feature enhancement, it achieves high-precision and low-false-alarm-rate target detection under extremely low signal-to-noise ratio. Attached Figure Description
[0013] Figure 1 This is the overall framework diagram of the infrared small target detection method based on deep unfolded network and learnable sparse transformation, namely the LST-RPCANet method, in the embodiments of the present invention. Figure 2This is the detailed network architecture of the single-stage LST-RPCANet in this embodiment of the invention, which includes an attention residual background estimation module, a learnable sparse transform target extraction module, an attention-guided image reconstruction module, and a multiplier update module. Figure 3 This is a schematic diagram of the multi-channel attention-supervised transmission enhancement module MCASTM in an embodiment of the present invention; Figure 4 This is a schematic diagram of the Dynamic Attention-Guided Residual Group (DAGRG) in an embodiment of the present invention; Figure 5 This is a schematic diagram of the Dynamic Attention-Guided Feature Enhancement Module (DAGFEM) in an embodiment of the present invention; Figure 6 This is a schematic diagram comparing the ROC curves of the LST-RPCANet method and the comparison method in this embodiment of the invention; wherein... Figure 6 In the dataset, (a), (b), and (c) represent the SIRST V1, NUDT-SIRST, and IRSTD-1K datasets, respectively. Figure 7 This is a qualitative comparison diagram of challenging infrared scenarios; in Figure 7 (a) in the image is the original image. Figure 7 In the table, (b)-(i) represent the IPI, PSTNN, ACM, DNANet, UIUNet, L2SKNet, RPCANet, and DRPCANet methods, respectively. Figure 7 In this context, (j) represents the method of the present invention (Ours); (k) represents the label. Detailed Implementation
[0014] The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments: Example 1 To address the shortcomings of existing infrared small target detection methods (especially existing deep unfolding networks), such as cascading accumulation of background estimation errors, poor adaptability of manually fixed sparse priors, and insufficient fineness of image reconstruction feature fusion, this invention proposes an infrared small target detection method, LST-RPCANet, based on deep unfolding networks and learnable sparse transforms. Its core idea is as follows: The robust principal component analysis (RPCA) iterative optimization process based on the alternating direction multiplier method (ADMM) is mapped and expanded into a process consisting of multiple (set as follows in this embodiment) A deep neural network architecture consisting of cascaded stages.
[0015] In infrared small target detection tasks, the original observed infrared image matrix can be modeled as a linear superposition of the background matrix and the target matrix. Since the background usually exhibits large areas of continuity and correlation, it has low-rank characteristics.
[0016] Small targets are extremely sparse in spatial distribution. Therefore, the original image It can be decomposed into: .
[0017] In the formula, Represents the low-rank background matrix. This represents a sparse target matrix.
[0018] Traditional RPCA methods typically utilize nuclear norm and Norms are used to constrain the low rank of the background and the sparsity of the target, respectively. However, fixed sparse priors are often difficult to adapt to complex and variable infrared scenes and weak targets with extremely low signal-to-noise ratios.
[0019] To separate targets more flexibly and accurately, this invention introduces a sparse transformation operator. A target decomposition and optimization model based on transform domain sparse constraints was constructed, and the formula is expressed as follows: ; ; In the formula, This represents the regularization term used to constrain background characteristics; for Norms are used to characterize the sparsity of features; To balance the weight parameters of the background and target terms, this model requires the target to be transformed... It then exhibits strong sparsity.
[0020] To solve the above constrained optimization problem, the augmented Lagrange multiplier method is introduced, and the augmented Lagrange function is constructed as follows: ; In the formula This represents the augmented Lagrange function. It is a Lagrange multiplier matrix; Represents the inner product of matrices; Denotes the Frobenius norm; This is the penalty parameter.
[0021] This invention employs the Alternating Direction Multiplier Method (ADMM) to iteratively solve the aforementioned Lagrange function.
[0022] In the In this iteration, the alternating update steps for each variable are as follows: a) Background update steps: Fixed Target and the vehicle Minimize background By completing the squared terms, the background update is transformed into the following subproblem: .
[0023] In traditional numerical algorithms, this problem usually requires Singular Value Decomposition (SVD), which has extremely high computational complexity.
[0024] In this invention, step a) can be mapped and expanded into an attention residual background estimation module, which implicitly learns the background update process using a deep residual network and a multi-channel attention supervision mechanism.
[0025] b) Target update steps: Fixed background and the vehicle Minimize the target The update of the objective is transformed into the following subproblems: .
[0026] Traditional algorithms typically use a soft threshold function combined with a fixed transformation operator to solve this problem.
[0027] In this invention, step b) is mapped and expanded into a learnable sparse transform target extraction module. By introducing learnable stride parameters, multi-layer convolutional sparse transform networks and their inverse networks, the sparse extraction of targets is adaptively completed in the transform domain.
[0028] c) Image reconstruction steps: Within the theoretical framework, the observed image is obtained through... Correlation. In complex real-world scenarios, to eliminate reconstruction errors and artifacts caused by simple linear addition, this invention additionally constructs an attention-guided image reconstruction module, utilizing dynamic spatial and channel attention to the background. and target Perform high-fidelity fusion and output the reconstructed image at the current stage. .
[0029] d) Lagrange multiplier update steps: Following the standard gradient ascent update rule, the update formula for the multipliers is: ; In this invention, this step is mapped to a multiplier update module (MUM) that performs parameterless numerical computation.
[0030] like Figure 1 As shown, the infrared small target detection method in this embodiment includes the following steps: Step 1. Input infrared image Set the original input image and initial sparse target components Initial Lagrange multipliers , and All are zero matrices, and initial deep background memory features are set simultaneously. .
[0031] Step 2. Construct a system composed of... A deep network consisting of cascaded stages processes the input infrared image. Perform iterative processing, for from arrive For each cascade stage, repeat steps 3 through 6.
[0032] In this embodiment, the number of cascaded deployment stages of LST-RPCANet is fixed at 1. =6. Of course. It is not limited to 6; for example, it can be flexibly set to other integers such as 4, 5, or 7, depending on the computing power or detection accuracy requirements of the specific equipment.
[0033] like Figure 2 As shown, the first Each stage ( The module includes several modules, such as attention residual background estimation module, learnable sparse transformation target extraction module, attention-guided image reconstruction module, and multiplier update module.
[0034] Step 3. Background feature estimation based on the attention residual background estimation module.
[0035] This invention utilizes an attention residual background estimation module to estimate the background based on the reconstructed image from the previous stage. sparse target components Lagrange multipliers and the deep background memory features of the previous stage Estimate the low-rank background components at the current stage. It outputs updated deep background memory features. .
[0036] In the traditional ADMM optimization framework, after fixing the objective variable and multiplier variables, the update of the background variable can be represented as a minimization subproblem with a low-rank regularization term, as shown in the following formula: .
[0037] To improve computational efficiency and adapt to complex backgrounds, this invention designs an attention residual background estimation module that implicitly learns the proximal mapping operator for background updates using a deep convolutional neural network.
[0038] The attention residual background estimation module contains a nested multi-channel attention-supervised transmission enhancement module (MCASTM) for estimating the low-rank background component of the current stage. And update deep background memory features .
[0039] like Figure 2 As shown, in this attention residual background estimation module, the background residual input is first calculated based on the variables from the previous stage. , recorded as .
[0040] Subsequently, the background residual is processed using a shallow feature extraction unit: + ; In the formula, These are the shallow background features extracted at the current stage.
[0041] express Convolution operations; This indicates residual block operations.
[0042] To overcome the cumulative amplification of background errors in the cascade stage, shallow background features are extracted. Features of deep background memory transmitted in the previous stage They were jointly fed into the Multichannel Attention Supervision Transmission Enhancement Module (MCASTM).
[0043] This MCASTM module not only performs feature fusion, but more importantly, it achieves cross-stage feature purification through a channel-level screening mechanism. After feature enhancement and channel purification by the MCASTM module, it outputs updated deep background memory features. .
[0044] Finally, the purified and enhanced features Mapping back to the image domain and combining it with residual connections, the final low-rank background component for the current stage is output. The formula is as follows: ; In the formula, express Convolution operations; This indicates residual block operations.
[0045] This invention proposes a multi-channel attention-supervised transmission enhancement module (MCASTM). This mechanism innovatively introduces a multi-channel feature fusion strategy to drive attention selection between the cascaded stages of a deep unfolded network. By truncating low-response channels and retaining high-response channels, it actively blocks the cross-stage diffusion of errors, achieving purification and correction of cross-stage background features.
[0046] like Figure 3 This embodiment illustrates the characteristic cross-stage purification process based on the Multi-Channel Attention-Supervised Transmission Enhancement Module (MCASTM). The processing flow of the MCASTM module in this embodiment is as follows: First, identify the shallow background features at the current stage. Features of deep background memory transmitted in the previous stage The data is stitched along the channel dimension and then initially fused using a convolutional layer to generate preliminary features. .
[0047] Preliminary characteristics The input is fed into the channel attention supervisor to calculate the channel response weights. This invention utilizes the channel attention supervisor to generate the weight distribution for each channel, such as... Figure 3 The structure of the channel attention supervisor is shown.
[0048] The channel attention supervisor consists of a global average pooling layer, a fully connected layer 1, a ReLU activation function, a fully connected layer 2, and a Sigmoid activation function connected in sequence.
[0049] In this channel attention supervisor, global average pooling, dimensionality reduction at the first fully connected layer, ReLU activation, dimensionality increase at the second fully connected layer, and Sigmoid activation are performed sequentially to generate the channel weight matrix. .
[0050] .
[0051] In the formula, This indicates a global average pooling operation; and These represent fully connected layers for dimensionality reduction and dimensionality enhancement, respectively. Represents the ReLU activation function; This represents the Sigmoid activation function.
[0052] The Sigmoid activation function is used to strictly constrain the output values of the channel weight matrix within a preset range of (0,1).
[0053] After obtaining the weight matrix, the MCASTM module breaks away from the conventional feature multiplication mechanism and performs channel sorting and truncation operations by utilizing channel sorting and truncation units.
[0054] Channel sorting and truncation units, based on the channel weight matrix The channels are sorted in descending order and low-response channels are truncated to generate updated deep background memory features. , It is determined by the following relationship: .
[0055] In the formula, This indicates element-wise multiplication; Indicates according to The attention weight values in the feature Sort all channels in descending order; This indicates a truncation operation, which means actively discarding channels that are ranked lower and have extremely low response values.
[0056] Through this hard threshold cutoff mechanism, MCASTM effectively filters out interference channels carrying high-frequency errors and noise, generating... This involves refining and updating the deep background memory features, and then safely transferring them to the next development stage.
[0057] In a preferred embodiment, the channel cutoff ratio is fixed at 50%, and half of the low-response channels are directly discarded through a hard cutoff mechanism to filter out redundant clutter. Of course, this hard cutoff ratio can be replaced with other fixed values such as 25% or 75%.
[0058] Updated deep background memory features through After convolution processing, it is combined with shallow background features The residual paths are added and merged to obtain the output characteristics of the MCASTM module. .
[0059] Output features After further residual block extraction, the low-rank background component of the current stage is obtained. The formula is as follows: As mentioned above, .
[0060] In the formula, express Convolution operations; This indicates residual block operations.
[0061] This invention innovatively designs a multi-channel attention-supervised transmission enhancement module (MCASTM) to effectively block cross-stage error propagation and achieve background feature purification. Through channel sorting and truncation mechanisms, the MCASTM module can actively identify and discard low-response channels containing high-frequency clutter, breaking through the technical bottleneck of background error accumulation and amplification in existing deep unfolded networks. This achieves effective purification of cross-stage memory features and significantly reduces the false alarm rate in complex infrared scenarios.
[0062] Step 4. Target feature extraction based on the learnable sparse transformation target extraction module.
[0063] This invention utilizes a learnable sparse transform target extraction module, based on... , , as well as By combining the sparse transformation basis and step size parameters of adaptive learning, the sparse target components of the current stage are calculated. .
[0064] Within the ADMM optimization framework, with the background and multiplier variables fixed, the update of the objective variable can be represented as the following minimization subproblem containing a sparse regularization term: .
[0065] Traditional model-based methods typically employ fixed sparse transform operators (such as wavelet transform or discrete cosine transform) combined with soft thresholding functions for solving the problem. However, infrared targets in real-world scenarios vary greatly in size and shape, and fixed operators can easily lead to missed detections or feature loss. To address this, this invention designs a learnable sparse transform target extraction module, completely mapping the aforementioned solution steps to a differentiable neural network.
[0066] This learnable sparse transformation target extraction module breaks through the limitations of traditional manual sparse prior operators (such as fixed prior operators). Overcoming the rigid limitations of norms and soft thresholds, this mechanism maps the target extraction task from the image domain to the transform domain. Through adaptive learning of the optimal sparse representation basis and threshold parameters by the network, it achieves accurate target separation under extremely low signal-to-noise ratio conditions.
[0067] like Figure 2 As shown, the learnable sparse transform target extraction module includes a learnable sparse transform network and its symmetric inverse transform network, which calculates the sparse target components at the current stage. The specific steps include: Step 4.1. As Figure 2 As shown, the low-rank background components that have been updated in the current stage are first utilized. , Sparse target components of the previous stage and the vehicle Perform gradient descent to generate auxiliary feature maps. .
[0068] To adaptively adjust the update magnitude, a learnable step size parameter is introduced. Calculate auxiliary feature maps To adaptively adjust the update magnitude of the current stage, where the auxiliary feature map It is determined by the following relationship: .
[0069] Step 4.2. Transfer the auxiliary feature map The input is fed into a learnable sparse transform network composed of multiple convolutional neural networks. In the middle, the auxiliary feature map will be used. Mapping to a high-dimensional sparse transform domain yields features in the sparse transform domain. .
[0070] in, = .
[0071] In this embodiment, the multi-layer convolutional neural network includes Convolutional layers and the ReLU activation function.
[0072] Step 4.3. Introduce learnable parameters in the transform domain. Instead of the traditional manual constant thresholding, a parameterized soft thresholding operation is performed on features in the sparse transform domain, as expressed in the following formula: .
[0073] In the formula This represents the difference between the absolute value of the characteristic variable in the transform domain and the shrinkage threshold parameter; This is a sign function used to extract and retain the positive and negative signs of each element in the original feature variable x.
[0074] This indicates the maximum value operation. Its core function is to truncate and set the part of the difference between the absolute value and the threshold that is less than 0 to 0, while retaining the part that is greater than 0.
[0075] This operation can adaptively truncate background noise and accurately preserve the sparse response of weak targets.
[0076] Step 4.4. Input the features after soft thresholding into the learnable inverse transform network. Mapping back to the image domain yields the sparse target components for the current stage. , where the sparse target component It is determined by the following relationship: In the formula To and A structurally symmetric learnable inverse sparse transformation basis; For parameterized soft thresholding operations; These are the characteristic variables in the sparse transform domain.
[0077] This step breaks the limitations of traditional fixed operators in the image domain and enables adaptive learning of the optimal sparse representation in the transform domain.
[0078] In a preferred embodiment of the present invention, a transformation network is used for mapping features. Inverse Transformation Network All three convolutional layers were used to balance feature representation capability and training stability.
[0079] Of course, in this embodiment, the number of layers in the convolutional network can be replaced with 2, 4, or 5 layers, etc.
[0080] This invention constructs a learnable sparse transformation target extraction module, which maps the traditional ISTA solution steps to a fully differentiable neural network, breaking the limitations of manual priors and achieving accurate target extraction under extremely low signal-to-noise ratios.
[0081] By adaptively learning the optimal sparse transformation basis and parameterized soft threshold, this invention can dynamically adjust prior assumptions based on data, greatly improving the model's accuracy and robustness in extracting small targets with varying shapes and sizes.
[0082] Step 5. High-fidelity image reconstruction based on the attention-guided image reconstruction module.
[0083] This invention utilizes an attention-guided image reconstruction module to fuse the low-rank background components at the current stage. and sparse target components To obtain the reconstructed image at the current stage. And use it as input for the next stage.
[0084] In the theoretical ADMM framework, the background and the objective should satisfy strict linear constraints. .
[0085] However, in the complex feature space of deep unfolded networks, simple linear addition (i.e. Often, severe feature artifacts remain in the reconstructed image. To alleviate this problem, this invention designs an attention-guided image reconstruction module.
[0086] Let the initial fusion characteristics of the current stage be: .
[0087] like Figure 2 As shown, the initial fused features are obtained by processing them through 1×1 convolution, BN batch normalization, and LReLU activation function. , Directly input into the Dynamic Attention Guided Residual Group (DAGRG).
[0088] Dynamic attention-guided residual group DAGRG, such as Figure 4 As shown, it mainly consists of dynamic attention-guided residual blocks, which are mainly composed of dynamic attention-guided feature enhancement modules (DAGFEM).
[0089] This invention proposes a Dynamic Attention-Guided Feature Enhancement Module (DAGFEM) within the attention-guided image reconstruction module. Based on the refined feature fusion of the DAGFEM module, such as... Figure 5 As shown.
[0090] This module employs a dual-branch parallel architecture to perform fine-grained correction of fused features from both channel semantic and spatial dynamic dimensions. After context-aware residual correction and dynamic modulation by the Dynamic Attention-Guided Feature Enhancement (DAGFEM) module, it directly outputs a high-fidelity reconstructed image for the current stage. And then transmit it to the next stage.
[0091] The DAGFEM module abandons pure linear additive reconstruction and adopts a dual-branch parallel structure that combines dynamic spatial attention with channel attention. This not only filters semantic noise in the channel dimension, but also uses dynamically generated convolutional kernels to achieve precise spatial modulation of small targets, effectively capturing spatially changing background patterns and significantly improving the fidelity of image reconstruction.
[0092] like Figure 5 As shown, the reconstructed image at the current stage is obtained using the DAGFEM module. The steps include: Step 5.1. Convert the low-rank background components of the current stage. With sparse target components Addition and fusion yield the initial fusion features. and will After convolution operation, the result is obtained. Subsequently Input into the Dynamic Attention Guided Residual Group (DAGRG).
[0093] The Dynamic Attention Guided Residual Group (DAGRG) contains the main structure of the Dynamic Attention Guided Residual Block, namely the Dynamic Attention Guided Feature Enhancement Module (DAGFEM).
[0094] Features in DAGRG The intermediate feature G is obtained by sequentially performing convolution processing, batch normalization processing, LReLU function processing, convolution processing, and batch normalization processing. This intermediate feature G is used as the input feature of DAGFEM.
[0095] DAGFEM employs a dual-branch structure: a channel attention branch and a dynamic spatial attention branch. The input feature of DAGFEM is the intermediate feature G, which is processed by a 3×3 convolution and a ReLU activation function to obtain the final feature. The two branch structures are as follows: a. In the channel attention branch, features After being processed by max pooling and average pooling respectively, the inputs are fed into a multilayer perceptron. The outputs of the two are summed and then processed by a sigmoid activation function to generate a channel attention map. .
[0096] b. In the dynamic spatial attention branch, input features It is fed into two parallel processing paths: One way to feature Perform channel averaging; another path pairs features. Perform global average pooling and feed the pooled features into the kernel generator to dynamically predict the specific convolution kernel required.
[0097] Subsequently, using the specific convolutional kernel predicted by the generator, a dynamic convolution operation is performed on the features in the first path after channel averaging, followed by processing with the Sigmoid activation function to finally generate an adaptive dynamic spatial attention map. .
[0098] Step 5.2. In the channel attention branch, the initial fused features are sequentially subjected to max pooling. With average pooling The system uses a multilayer perceptron and sigmoid activation function to generate channel attention maps, which filter out high-level semantic noise.
[0099] Max pooling is used in this branch. and average pooling Extract the global context and pass each part through a multilayer perceptron. After processing, the results are summed to generate a channel attention map. To filter high-level semantic noise, the formula is as follows: .
[0100] Step 5.3. Features in the dynamic space attention branch After channel averaging, a specific convolution kernel is dynamically predicted using a kernel generator, and dynamic convolution and sigmoid activation are performed to generate an adaptive dynamic spatial attention map.
[0101] In the dynamic spatial attention branch, the network focuses on features. Perform mean averaging across the channel dimensions. back, Enter again to Convolutional layer and ReLU activation function In a convolution kernel generator composed of alternating connections.
[0102] like Figure 5As shown, the kernel generator contains a network architecture for dynamically predicting adaptive kernels.
[0103] In the dynamic spatial attention branch, the first input is processed by global average pooling. The processed features are used to extract global contextual statistics. These features are then passed sequentially through a channel with the number of channels remaining constant. A convolutional layer (represented as K1 CC in the diagram), a ReLU activation function, and another layer that changes the number of channels. Convolutional layer (represented as K1 C-K² in the figure).
[0104] The second The core function of convolutional layers is to map features to a specific parameter dimension (i.e., The numerical representation of this dimension is... The weight parameters required for dynamic spatial convolution kernels.
[0105] Through the above process, the convolution kernel generator adaptively predicts the dynamic convolution kernel weight matrix specific to the current scene. And apply it to the spatial characteristics of another path that has been averaged through the channel. The formula for generating it is as follows: .
[0106] Step 5.4. Multiply the channel attention map with the dynamic spatial attention map, and then apply the resulting features to the feature map. By performing feature weighting, we obtain The output feature F for the current stage is generated by combining the residual connection network.
[0107] Perform dynamic convolution operations on the input features using adaptive convolution kernels. Generate dynamic spatial attention graph : .
[0108] The attention maps generated by the aforementioned channel branches and spatial branches are multiplied element-wise and applied together to the features. ,get : .
[0109] The final weighted features are obtained By combining the features obtained from the feature smoothing convolution Conv with residual connections, the output F is obtained: .
[0110] Wherein, smooth convolution Conv is a 1×1 convolution, and BN is batch normalization.
[0111] Specifically, Figure 5 The most primitive feature G is processed through a 1×1 convolutional layer, after batch normalization and weighting. Generate the feature F of the attention-guided feature enhancement module for the current stage.
[0112] Step 5.5. The attention-guided feature enhancement module features F obtained in the current stage and the input features of DAGRG. Dynamic attention-guided residual connections within residual blocks are used to obtain features. The formula is as follows: F + ; in Repeated dynamic attention-guided residual block operations are obtained .
[0113] Step 5.6. After a 3×3 convolution operation, the features are combined within the DAGRG. Perform residual connections to obtain the output features of DAGRG, and then use a 1×1 convolution to obtain the high-fidelity reconstructed image for the current stage. : = ( + ); in, This is a 3×3 convolution operation. This is a 1×1 convolution operation.
[0114] To address the issue of feature artifacts in image reconstruction caused by simple linear addition, this invention employs the Dynamic Attention-Guided Feature Enhancement Model (DAGFEM) to eliminate reconstruction artifacts and improve the high fidelity of target-background fusion.
[0115] The Dynamic Attention-Guided Feature Enhancement (DAGFEM) module utilizes a dual-branch parallel architecture to combine channel attention that filters high-level semantic noise with dynamic convolution that performs adaptive spatial modulation. This enables context-aware, refined residual correction, significantly improving image reconstruction fidelity and accelerating network convergence.
[0116] Step 6. Parameter iteration based on the multiplier update module.
[0117] This invention utilizes a multiplier update module, based on , and the low-rank background components at the current stage and sparse target components Calculate the Lagrange multipliers for the current stage. And use it as input for the next stage.
[0118] After updating the background, target, and image reconstructions, the network updates the Lagrange multipliers following the standard gradient ascent rule in the augmented Lagrange multiplier method. This step does not require the introduction of complex network layers; it only performs pure mathematical numerical calculations.
[0119] The multiplier update module performs parameterless gradient ascent updates and calculates... The specific steps include: Step 6.1. Obtain the original input image The low-rank background components at the current stage With sparse target components Calculate the image reconstruction residual, which is derived from... Obtained through calculation; Step 6.2. Combine with fixed penalty parameters and the Lagrange multipliers of the previous stage The Lagrange multipliers for the current stage are calculated. Lagrange multipliers It is determined by the following relationship: .
[0120] In the formula, The penalty parameter is fixed (corresponding to the step size parameter in the theoretical derivation).
[0121] The retention of this multiplier update module ensures that the entire LST-RPCANet strictly adheres to the mathematical constraints of the ADMM algorithm, guaranteeing the convergence and physical interpretability of the deep unfolded network during the cascaded iteration process.
[0122] Updated This will be passed as a known variable to the next cascade stage (i.e., the first...). (Stage) Continue to participate in iterations.
[0123] Step 7. After the iteration is complete, use the sparse target components from the last stage. This is the final target detection result image.
[0124] The LST-RPCANe deep unfolded network uses a hybrid loss function for end-to-end training. The optimizer chosen is Adam, with an initial learning rate of [missing value]. A polynomial decay strategy is adopted, with a decay factor of 0.9, a batch size of 8, and 400 training cycles.
[0125] This invention employs a hybrid loss function that incorporates segmentation, reconstruction, and constraint components for end-to-end training of LST-RPCANet. This hybrid loss function... The specific expansion formula is as follows: ; in For batch size, denoted as the total number of pixels in a single image, where TP, FP, and FN represent the number of pixels for true positives, false positives, and false negatives, respectively. and These are the reconstructed image and the original input image, respectively. , These are the sparse transform basis and its inverse transform basis, respectively. For the extracted target feature matrix, and This is the penalty weight parameter.
[0126] The expansion formula consists of three parts: the first term is the SoftIoU segmentation loss used to supervise target segmentation extraction (TP, FP, and FN are the number of pixels of true positives, false positives, and false negatives, respectively).
[0127] The second term is the mean squared error (MSE) reconstruction loss, which measures the fidelity of image reconstruction and is used to calculate the final output reconstructed image. Compared with the original input image The mean square error between them.
[0128] The third term is an orthogonal constraint loss used to ensure that the sparse transformation basis and its inverse transformation basis in the learnable sparse transformation target extraction module satisfy strict mathematical validity, thereby ensuring that the sparse transformation learned by the network has strict mathematical validity.
[0129] This invention transforms the RPCA and ADMM optimization algorithms into a forward propagation neural network stage, which not only gives each layer of features a clear physical meaning, but also achieves detection performance that surpasses existing large-scale data-driven networks with an extremely low number of parameters. It is extremely suitable for efficient deployment (with efficient lightweight deployment capabilities) and application on resource-constrained edge devices.
[0130] Furthermore, to verify the effectiveness of the method proposed in this invention, the following experimental procedures are presented. All experiments were performed on a computing server equipped with a single NVIDIA RTX 4000 Ada GPU. The software framework is implemented based on PyTorch.
[0131] This invention conducts experiments on three publicly available infrared small target detection benchmark datasets: SIRST V1, NUDT-SIRST, and IRSTD-1K datasets. During visualization, this invention selects XDU28, XDU50, XDU442, and XDU673 from the IRSTD-1K dataset, and 000492, 000523, 000908, and 001303 from the NUDT-SIRST dataset.
[0132] These datasets cover a diverse range of real-world and synthetic infrared imaging scenarios. During the model optimization phase, the Adam optimizer was used, with an initial learning rate set to [value missing]. A polynomial decay strategy was introduced to dynamically adjust the learning rate, ensuring stable training and convergence to the optimal solution. The training batch size was set to 8, and the total training epochs were set to 400. Considering computational efficiency and fair comparison, the spatial resolution of the input images was uniformly adjusted to 256×256 in the experiments.
[0133] The experiment selected four quantitative indicators widely used in this field for comprehensive evaluation.
[0134] Regarding semantic segmentation accuracy, the mean Intersection over Union (mIoU) ratio and... Fraction( -score), mainly used to evaluate the overlap quality between the predicted mask and the ground truth label; Regarding target detection accuracy, the detection rate (Probability of detection) is used. ) and false alarm rate To quantify the model's ability to suppress false alarms in the background while successfully capturing the target, a receiver operating characteristic (ROC) curve was introduced to visually demonstrate the dynamic trade-off between the detection rate and the false alarm rate.
[0135] To fully verify the superiority of the method of this invention, representative infrared small target detection algorithms covering three major categories—traditional model-driven, data-driven deep learning, and deep unfolding—were selected as comparison methods in the experiment. Traditional model-driven methods include: Tophat transform filtering based on morphological filtering principles; Multiscale Local Contrast Metric (MPCM) and High-Boost Multiscale Local Contrast Metric (HBMLCM) based on the characteristics of the human visual system; and Infrared Block Image Model (IPI) and Tensor Kernel Norm Partial Sum Model (PSTNN) based on low-rank sparse optimization theory and relying on hand-designed physical priors.
[0136] Data-driven deep learning methods include: Asymmetric Context Modulation (ACM) networks that enhance local features using attention mechanisms; Receptive Field and Direction-Induced Attention Network (RDIAN) that introduces multi-directional guided attention; Selective Kernel Network (L2SKNet) that adaptively adjusts the receptive field using selective convolutional kernels; Spatial-Channel Cross Network (SCTransNet) that combines the Transformer architecture; Densely Nested Attention Network (DNANet) designed to maintain high-resolution features; Infrared Small Target Detection U-Net (ISTDU-Net); and U-Net within U-Net (UIUNet).
[0137] Deep expansion methods include the basic model that maps the traditional robust principal component analysis algorithm to a deep network (RPCANet), and the dynamic robust principal component analysis expansion network (DRPCANet) that introduces a dynamic supernetwork to generate adaptive weights.
[0138] Table 1. Performance comparison of different methods on SIRST V1, NUDT-SIRST, and IRSTD-1K datasets.
[0139] Experimental results show that the proposed method LST-RPCANet(Ours) demonstrates significant detection advantages on the SIRST V1, NUDT-SIRST, and IRSTD-1K datasets. As shown in Table 1, the proposed method ranks first in both mIoU and F1 scores on the SIRST V1, NUDT-SIRST, and IRSTD-1K datasets.
[0140] This sustained superiority demonstrates the powerful capabilities of the deep expansion mechanism and feature purification strategy of this invention. Furthermore, LST-RPCANet achieves this optimal performance with only approximately 1.10M parameters, showcasing exceptional computational efficiency.
[0141] like Figure 6 As shown, in the comparison of ROC curves, the ROC surface of LST-RPCANet shows a significantly steeper rise towards the upper left corner (i.e., the direction of high detection rate and low false alarm rate) compared to other evaluation methods. This indicates that it can achieve a higher detection probability Pd at any given false alarm rate Fa, highlighting the model's enhanced robustness to background clutter and its high sensitivity.
[0142] like Figure 7As shown, compared with traditional methods and contemporary deep learning methods such as ACM, ALCNet, DNANet, UIUNet, and RDIAN, the LST-RPCANet proposed in this invention significantly improves the robustness and discriminative ability of target detection in complex background interference such as thick clouds and strong sea surface clutter, as well as low signal-to-noise ratio scenarios.
[0143] In the visualization comparison, LST-RPCANet can more accurately separate weak targets and completely suppress background noise, verifying the effectiveness of the multi-channel attention supervision mechanism MCASTM for background purification and the learnable sparse transformation LSTEM for target extraction.
[0144] To address the shortcomings of existing infrared small target detection technologies (especially existing deep unfolded networks), such as the cascading accumulation of background estimation errors, poor adaptability of manually fixed sparse priors, and insufficient refinement of image reconstruction feature fusion, this invention constructs a deep network inspired by robust principal component analysis and the iterative method of alternating direction multipliers. This breaks through the limitations of traditional fixed operators, effectively preventing the cross-stage diffusion of background errors and achieving an excellent balance between detection accuracy, efficiency, and physical interpretability. Compared to traditional methods, this invention has significant advantages in the following aspects.
[0145] 1. This invention introduces a multi-channel attention-supervised transmission enhancement module to address the problem of background error accumulation.
[0146] To address the issue of background clutter remnants being amplified at each stage in existing unfolded networks, this invention introduces a novel MCASTM module in the background estimation stage. This module receives deep background memory features from the previous stage and concatenates them with current shallow features, then calculates the response weights of each channel using a channel attention supervisor. Unlike conventional feature addition, this module strictly executes weight-based channel sorting and truncation, actively discarding low-response interfering channels and retaining only the dominant background structure. This mechanism effectively prevents the cross-stage propagation of high-frequency errors, achieving the purification of background features.
[0147] 2. This invention constructs a learnable sparse transformation target extraction module, which addresses the problem of poor adaptability with fixed priors.
[0148] To address the failure of traditional prior methods due to the variable morphology of small infrared targets, this invention designs a learnable sparse transform target extraction module. This module is an improvement upon the soft thresholding step of the classic Iterative Shrink Thresholding (ISTA) algorithm. This invention breaks away from the traditional model's reliance on fixed manual operators in the image domain (such as those based on wavelet or DCT transforms). Overcoming the limitations of the sparse transform norm, this design utilizes a multi-layer convolutional network with a symmetrical structure to replace the fixed sparse transform basis and inverse transform basis, and introduces a parameterized soft thresholding function. This design strictly maps the target extraction process from the image domain to the feature transform domain, adaptively learning the optimal sparse representation basis, thereby achieving accurate separation of weak targets under extremely low signal-to-noise ratios.
[0149] 3. This invention applies a dynamic attention-guided feature enhancement and multiplier update mechanism to address the problem of low reconstruction fidelity.
[0150] To address the issue that simple linear addition cannot handle complex feature artifacts, this invention novelly constructs a Dynamic Attention-Guided Feature Enhancement Module (DAGFEM) and a multiplier update module during the image reconstruction stage. DAGFEM abandons the traditional serial attention structure, employing a dual-branch parallel architecture: one branch uses channel attention to filter high-level semantic noise, while the other branch dynamically generates convolutional kernels based on input features to perform dynamic spatial attention. The combination of these two approaches effectively captures complex background patterns with spatial variations. Subsequently, the multiplier update module performs standard gradient ascent to update the Lagrange multipliers. This collaborative effort significantly improves the high fidelity of the target-background fusion reconstruction and accelerates the overall network convergence process.
[0151] Example 2 This embodiment 2 describes a computer device including a memory and one or more processors. Executable code is stored in the memory. When the processor executes the executable code, it implements the steps of the infrared small target detection method based on deep unfolded networks and learnable sparse transforms described in embodiment 1 above.
[0152] In this embodiment, the computer device can be any device or apparatus with data processing capabilities, and will not be described in detail here.
[0153] Example 3 This embodiment 3 describes a computer-readable storage medium storing a program that, when executed by a processor, is used to implement the steps of the infrared small target detection method based on deep unfolded network and learnable sparse transform in the above embodiment 1.
[0154] The computer-readable storage medium can be an internal storage unit of any device or apparatus with data processing capabilities, such as a hard disk or memory, or an external storage device of any device with data processing capabilities, such as a plug-in hard disk, smart media card (SMC), SD card, flash card, etc.
[0155] Of course, the above description is only a preferred embodiment of the present invention. The present invention is not limited to the above-described embodiments. It should be noted that any equivalent substitutions or obvious modifications made by those skilled in the art under the guidance of this specification fall within the scope of this specification and should be protected by the present invention.
Claims
1. A method for detecting small infrared targets based on deep unfolded networks and learnable sparse transforms, characterized in that, Includes the following steps: Step 1. Input infrared image Set the original input image and initial sparse target components Initial Lagrange multipliers , and All are zero matrices, and initial deep background memory features are set simultaneously. ; Step 2. Construct a system composed of... A deep network consisting of cascaded stages processes the input infrared image. Perform iterative processing, for from arrive For each cascaded stage, steps 3 through 6 are repeated; Step 3. Using the attention residual background estimation module, based on the reconstructed image from the previous stage... sparse target components Lagrange multipliers and the deep background memory features of the previous stage Estimate the low-rank background components at the current stage. It outputs updated deep background memory features. ; Step 4. Utilize the learnable sparse transform target extraction module, based on... , , as well as By combining the sparse transformation basis and step size parameters of adaptive learning, the sparse target components of the current stage are calculated. ; Step 5. Use the attention-guided image reconstruction module to fuse the low-rank background components from the current stage. and sparse target components To obtain the reconstructed image at the current stage. And use it as input for the next stage; Step 6. Using the multiplier update module, according to , and the low-rank background components at the current stage and sparse target components Calculate the Lagrange multipliers for the current stage. And use it as input for the next stage; Step 7. After the iteration is complete, use the sparse target components from the last stage. This is the final target detection result image.
2. The infrared small target detection method based on deep unfolded networks and learnable sparse transforms according to claim 1, characterized in that, In step 3, the processing flow of the attention residual background estimation module is as follows: First, calculate the background residual input based on the variables from the previous stage. The formula is as follows: ; Subsequently, the background residual input is used by a shallow feature extraction unit. The processing is performed using the following formula: ; In the formula, These are the shallow background features extracted at the current stage; express Convolution operations; Indicates residual block operations; To overcome the cumulative amplification of errors in the cascade stage, shallow background features are... Features of deep background memory transmitted in the previous stage They were all fed into the Multichannel Attention Supervision Transmission Enhancement Module (MCASTM). After feature enhancement and channel purification using the MCASTM module, the updated deep background memory feature is output. ; Updated deep background memory features through After convolution processing, it is combined with shallow background features The residual paths are added and merged to obtain the output characteristics of the MCASTM module. ; Finally, the features purified and enhanced by the MCASTM module will be... Through residual blocks and The convolution is mapped back to the image domain and combined with residual connections to output the final low-rank background component for the current stage. The formula is as follows: ; In the formula, express Convolution operations; This indicates residual block operations.
3. The infrared small target detection method based on deep unfolded networks and learnable sparse transforms according to claim 2, characterized in that, The processing flow of the MCASTM module is as follows: First, identify the shallow background features at the current stage. Features of deep background memory transmitted in the previous stage The data is stitched along the channel dimension and then initially fused using a convolutional layer to generate preliminary features. ; Preliminary characteristics The input is fed into the channel attention supervisor, which consists of a global average pooling layer, a fully connected layer 1, a ReLU activation function, a fully connected layer 2, and a Sigmoid activation function connected in sequence. The Sigmoid activation function is used to strictly constrain the output values of the channel weight matrix within a preset range of (0,1); In the channel attention supervisor, global average pooling, dimensionality reduction at the first fully connected layer, ReLU activation, dimensionality increase at the second fully connected layer, and Sigmoid activation are performed sequentially to generate the channel weight matrix. ; Using channel sorting and truncation units, based on the channel weight matrix The channels are sorted in descending order and low-response channels are truncated to generate updated deep background memory features. , It is determined by the following relationship: ; in This indicates channel sorting processing. This indicates channel truncation.
4. The infrared small target detection method based on deep unfolded networks and learnable sparse transforms according to claim 1, characterized in that, In step 4, the processing flow of the learnable sparse transform target extraction module is as follows: Step 4.
1. First, utilize the low-rank background components that have been updated in the current stage. , Sparse target components of the previous stage and the vehicle Perform gradient descent to generate auxiliary feature maps. In order to adaptively adjust the update magnitude, a learnable step size parameter is introduced. Calculate auxiliary feature maps To adaptively adjust the update magnitude for the current stage; Among them, auxiliary feature maps It is determined by the following relationship: ; Step 4.
2. Transfer the auxiliary feature map The input is fed into a learnable sparse transform network composed of multiple convolutional neural networks. In the middle, auxiliary feature maps Mapping to the sparse transform domain yields the features in the sparse transform domain. ; in, = ; Step 4.
3. Introduce learnable parameters For features in the sparse transform domain Perform parameterized soft thresholding: ; In the formula For parameterized soft thresholding operations; The learnable parameters can be represented by the absolute values of the feature variables in the transform domain and the shrinkage threshold parameter. The difference between them; A sign function used to extract and retain the original feature variables. The positive and negative signs of each element in the equation; This indicates the maximum value operation, whose core function is to truncate and set the part less than 0 in the difference between the absolute value of the transform domain feature variable and the shrinkage threshold to 0, while retaining the part greater than 0. Step 4.
4. Input the features after soft thresholding into the learnable inverse transform network. Mapping back to the image domain yields the sparse target components for the current stage. , where the sparse target component It is determined by the following relationship: ; These are characteristic variables in the sparse transform domain; Among them, learnable inverse transform network For sparse transform networks A structurally symmetric learnable inverse sparse transformation basis.
5. The infrared small target detection method based on deep unfolded networks and learnable sparse transforms according to claim 1, characterized in that, In step 5, the processing flow of the attention-guided image reconstruction module is as follows: Step 5.
1. Convert the low-rank background components of the current stage. With sparse target components Addition and fusion yield the initial fusion features. and will Features are obtained after convolution. Subsequently Input into the Dynamic Attention Guided Residual Group (DAGRG); DAGRG includes the Dynamic Attention-Guided Feature Enhancement Module (DAGFEM); Features in DAGRG The intermediate feature G is obtained by sequentially performing convolution processing, batch normalization processing, LReLU function processing, convolution processing, and batch normalization processing. This intermediate feature G is used as the input feature of DAGFEM. DAGFEM employs a dual-branch structure, which includes a channel attention branch and a dynamic spatial attention branch. DAGFEM takes G as input feature, and obtains the feature value after passing it through a 3×3 convolution and a ReLU activation function. ; Step 5.
2. In the channel attention branch, features After being processed by max pooling and average pooling respectively, the inputs are fed into a multilayer perceptron. The outputs of the two are summed and then processed by a sigmoid activation function to generate a channel attention map. ; Step 5.
3. In the dynamic spatial attention branch, input features It is fed into two parallel processing paths; One way to feature Perform channel averaging; another path pairs features Perform global average pooling and feed the pooled features into the kernel generator to dynamically predict the required kernels; Subsequently, using the convolution kernels predicted by the convolution generator, a dynamic convolution operation is performed on the features in the first path after channel averaging, followed by processing with the Sigmoid activation function to finally generate an adaptive dynamic spatial attention map. ; Step 5.
4. Multiply the channel attention map with the dynamic spatial attention map and then apply the result to the feature. Feature weighting is performed, and the residual connection network is combined to generate the feature F of the attention-guided feature enhancement module at the current stage; Step 5.
5. Attention-guided feature enhancement module features F and DAGRG input features Dynamic attention-guided residual connections within residual blocks are used to obtain features. : F + Repeated dynamic attention-guided residual block operations are obtained ; Step 5.
6. After a 3×3 convolution operation, the features are combined within the DAGRG. Perform residual connections to obtain the output features of DAGRG, and then pass them through a 1×1 convolution to obtain the high-fidelity reconstructed image of the current stage. : = ( + ); in, This is a 3×3 convolution operation. This is a 1×1 convolution operation.
6. The infrared small target detection method based on deep unfolded networks and learnable sparse transforms according to claim 5, characterized in that, The kernel generator includes a network architecture for dynamically predicting adaptive kernels; The network architecture receives features processed by average pooling and includes a 1×1 convolutional layer, a ReLU activation function, and another 1×1 convolutional layer connected in sequence to output features of a preset dimension.
7. The infrared small target detection method based on deep unfolded network and learnable sparse transform according to claim 1, characterized in that: In step 6, the processing procedure of the multiplier update module is as follows: Step 6.
1. Obtain the original input image The low-rank background components at the current stage With sparse target components Calculate the image reconstruction residual, which is derived from... Obtained through calculation; Step 6.
2. Combine with fixed penalty parameters and the Lagrange multipliers of the previous stage The Lagrange multipliers for the current stage are calculated. Lagrange multipliers It is determined by the following relationship: 。 8. The infrared small target detection method based on deep unfolded network and learnable sparse transform according to claim 1, characterized in that: The deep unfolded network LST-RPCANe is trained end-to-end using a hybrid loss function. Hybrid loss function The specific formula for expansion is as follows: ; The first term in the formula is the SoftIoU segmentation loss used to supervise target segmentation extraction, the second term is the mean square error (MSE) reconstruction loss used to measure the fidelity of image reconstruction, and the third term is the orthogonal constraint loss used to ensure that the sparse transform basis and its inverse transform basis in the learnable sparse transform target extraction module satisfy strict mathematical validity. For batch size, denoted as the total number of pixels in a single image, where TP, FP, and FN represent the number of pixels for true positives, false positives, and false negatives, respectively. and These are the reconstructed image and the original input image, respectively. , These are the sparse transform basis and its inverse transform basis, respectively. For the extracted target feature matrix, and This is the penalty weight parameter.
9. A computer device, comprising a memory and one or more processors; characterized in that, The memory stores executable code, which, when executed by the processor, is used to implement the infrared small target detection method based on deep unfolded network and learnable sparse transformation as described in any one of claims 1 to 8.
10. A computer-readable storage medium having a program stored thereon; characterized in that, When executed by the processor, the program is used to implement the infrared small target detection method based on deep unfolded network and learnable sparse transformation as described in any one of claims 1 to 8.