An infrared small target cluster sub-pixel positioning method based on physical structure prior and deep unfolding network

By introducing the physical structure prior of the infrared small target imaging mechanism and the exponential decay sparse mapping constraint into the deep unfolded network, the problem of inaccurate localization of infrared small target clusters in complex backgrounds in the prior art is solved, and more stable and accurate sub-pixel level localization is achieved.

CN122243934APending Publication Date: 2026-06-19PLA PEOPLES LIBERATION ARMY OF CHINA STRATEGIC SUPPORT FORCE AEROSPACE ENG UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
PLA PEOPLES LIBERATION ARMY OF CHINA STRATEGIC SUPPORT FORCE AEROSPACE ENG UNIV
Filing Date
2026-03-18
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies struggle to achieve sub-pixel-level precise positioning when dealing with clusters of small infrared targets, especially in complex backgrounds. Furthermore, existing deep unfolding network methods do not fully consider the physical structural characteristics of the real infrared imaging process, resulting in insufficient stability and robustness in complex backgrounds.

Method used

By introducing a physical structure prior that conforms to the imaging mechanism of small infrared targets, an exponential decay sparse mapping constraint is constructed in the deep unfolded network. Combined with the data consistency iteration process, a multi-module neural network structure is constructed to gradually restore the high-resolution distribution of small targets in spatial proximity.

Benefits of technology

It improves the stability and robustness of the network in complex backgrounds, achieves more accurate sub-pixel localization, reduces the impact of noise, and enhances the network's generalization ability and localization accuracy under different training data scales.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122243934A_ABST
    Figure CN122243934A_ABST
Patent Text Reader

Abstract

This invention discloses a subpixel localization method for infrared small target clusters based on physical structure priors and a deep unfolded network, solving the problem of improving the subpixel localization accuracy of infrared small targets, belonging to the field of subpixel localization; the method includes: fusing a low-resolution target image processed by a point spread function against a clean background with a real infrared background image as a feature; mapping the real position of the infrared small target to a subpixel grid coordinate system as a label to construct a dataset; constructing a linear mapping model by solving the linear mapping of sample pairs using the least squares method to generate a high-resolution initial estimation image; mapping the low-resolution observation image reconstruction process to a multi-module neural network structure based on the iterative unfolding idea to construct a deep unfolded network; performing a forward propagation process on the initial estimation image and combining it with a loss function to obtain the target network; inputting the test set into the target network and evaluating the localization performance; this invention improves the subpixel localization accuracy of infrared small targets.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of infrared small target cluster subpixel localization technology, and relates to an infrared small target cluster subpixel localization method based on physical structure prior and deep unfolded network. Background Technology

[0002] Infrared imaging technology, with its passive sensing capability of target thermal radiation and stable all-weather, all-time operation, has shown broad application prospects in fields such as security monitoring and environmental monitoring. This technology can achieve high-sensitivity detection without relying on external light sources, providing crucial support for early warning and situational analysis in complex scenarios. However, under long-range imaging conditions, targets typically occupy only a very small number of pixels on the detector's focal plane, appearing as weak point structures, known as small infrared targets, or simply targets. When multiple targets are densely or adjacently distributed in space, their radiation signals are affected by the point spread function (PSF) of the optical system during imaging, resulting in spatial diffusion and superposition. This ultimately forms one or more indistinguishable composite bright spots on the image plane, making it difficult to accurately resolve key information such as the number of targets, sub-pixel-level positions, and radiation intensity.

[0003] For the detection and localization of small infrared targets, existing technologies mainly focus on scenarios involving single targets or sparsely distributed multiple targets. The relevant methods can be categorized into the following three types:

[0004] One approach is the traditional method based on background modeling and filtering, such as Top-Hat filtering and high-pass filtering. This method highlights the target response by suppressing the gradual change components of the background. This type of method has a simple structure and low computational cost, but it is prone to producing a high false alarm rate in real-world scenarios with background interference such as complex clouds, ground features, or sea clutter, and its ability to separate spatially adjacent targets is limited.

[0005] The second approach is the low-rank sparse decomposition method based on optimized networks. This type of method typically models infrared images as the sum of low-rank background and sparse target components, and solves the problem through numerical optimization. While this method has certain advantages in background modeling, it generally assumes that targets appear as independent sparse point sources on the pixel grid, making it difficult to accurately describe the aliasing of multiple targets at the sub-pixel scale. Furthermore, its solution process has high computational complexity, making it difficult to meet the real-time requirements of practical engineering applications.

[0006] Thirdly, there are end-to-end detection methods based on deep learning. These methods typically utilize convolutional neural networks to directly output binary detection results or heatmaps of targets, demonstrating strong feature representation capabilities in various scenarios. However, due to the prevalent downsampling operation in the network structure, their output resolution is usually limited by the pixel scale of the input image, making it difficult to achieve sub-pixel-level accurate localization. This is especially problematic in spatially dense target scenes, where localization errors or target fusion phenomena are prone to occur.

[0007] It should be noted that the above methods are mainly designed for single-target or sparsely distributed multi-target detection tasks. When facing clusters of small infrared targets that are densely distributed in space, such as drone formations or unmanned vehicle formations, the strong aliasing of signals between targets poses significant challenges to the above methods in terms of target separation, accurate positioning, and intensity estimation.

[0008] To address the issues of unmixing and super-resolution localization of small infrared target clusters, a class of research methods based on deep unfolded networks has emerged in recent years, with the Dynamic Iterative Soft-Thresholding Network (DISTA-Net) being a typical example.

[0009] For example, application number: 202510644028.7, publication number: CN120543378A, invention title: A super-resolution method for infrared long-range spatial proximity targets; this type of method is based on the expansion idea of ​​the iterative soft thresholding algorithm (ISTA), which maps the traditional iterative optimization process into a neural network structure consisting of multiple cascaded stages, and optimizes the iterative parameters through learning.

[0010] Specifically, DISTA-Net introduces a dynamic transformation module in each unfolding stage to generate transformation weights based on the intermediate results of the previous stage, thereby enhancing feature representation capabilities. Simultaneously, it introduces a dynamic soft thresholding module to adaptively generate shrinkage threshold parameters based on the spatial context of the input features, thus overcoming the problem of a fixed threshold in traditional ISTA-Net. This approach demonstrates its feasibility in target unmixing and localization tasks on the simulation dataset CSIST-100K.

[0011] However, the aforementioned deep unfolding network methods are still mainly based on sparse contraction operators in the form of soft thresholding. Their contraction behavior is essentially an empirical design, failing to explicitly incorporate the physical structural characteristics of small infrared targets during imaging. Furthermore, their performance evaluation primarily relies on simulation datasets with clean backgrounds, and has not fully considered the impact of complex real-world infrared backgrounds on network stability and generalization ability. Therefore, under conditions of complex backgrounds and limited training data, there is still room for improvement in the strictly sub-pixel localization accuracy of these methods.

[0012] Although dynamic depth unfolding networks, represented by DISTA-Net, have made some progress in the unmixing and localization of infrared small target clusters, their methodology still has the following shortcomings from the perspective of engineering applications and consistency with imaging mechanisms:

[0013] 1. Existing methods primarily rely on simulation datasets with clean backgrounds for training and validation during network development and performance evaluation, failing to adequately consider the complex and structured background interference commonly present in real infrared scenes. Performance improvements obtained under idealized background conditions are difficult to directly transfer to practical applications, and the stability and robustness of the network model in complex backgrounds remain uncertain.

[0014] 2. Although existing dynamic deep unfolding networks introduce a parameter dynamics mechanism, the generation of relevant parameters still mainly relies on data-driven learning and lacks an explicit correspondence with the physical structure of infrared small target imaging. Taking the dynamic soft thresholding module as an example, its threshold parameters are usually adaptively predicted by the neural network based on the feature map. However, this prediction process does not explicitly incorporate physical priors such as the point spread function (PSF) or point source energy distribution, making the contraction behavior still somewhat empirical in a physical sense. Under complex noise conditions, it is easy to introduce unexplainable response artifacts.

[0015] 3. Existing methods generally employ soft-thresholding contraction operators to sparsely constrain intermediate features. These contraction functions have a simple structure, only using linear thresholding to crop feature amplitudes, making it difficult to characterize the energy distribution characteristics of small infrared targets that gradually attenuate from the center outwards during imaging. When the target signal is near the threshold, soft-thresholding operators tend to suppress both real, weak targets and noise components equally, thus affecting performance under strict positioning accuracy requirements.

[0016] 4. Existing dynamic threshold or dynamic parameter generation mechanisms typically lack structural coupling with the data consistency update process in deep unfolded networks. Changes in threshold parameters do not explicitly consider the impact of iteration stages or optimization step sizes, resulting in a lack of a unified control mechanism for sparse constraint strength across different iteration stages, which to some extent limits the stability of the iteration process.

[0017] 5. Due to the lack of clear physical structure constraints, existing methods are quite sensitive to data scale during training and are prone to overfitting. The generalization ability and engineering applicability of the network model still need to be improved. Summary of the Invention

[0018] To address the shortcomings of existing technologies in terms of adaptability to realistic and complex backgrounds, sparse contraction representation capabilities, and training stability, this invention provides a sub-pixel localization method for infrared small target clusters based on physical structure priors and deep unfolded networks. This method introduces physical structure priors consistent with the imaging mechanism of infrared small targets into the deep unfolded network framework and combines this with a data consistency iterative process for collaborative design, aiming to achieve the following objectives:

[0019] (1) Constructing a training and evaluation benchmark that closely approximates real imaging conditions:

[0020] Multiple small infrared targets against a clean background are aliased using a point spread function to obtain a low-resolution target image in a low-resolution pixel coordinate system. Following the principle of energy superposition, the low-resolution target image is fused with the real infrared background image to obtain a low-resolution observation image as a feature. The real position of the low-resolution target image is mapped to the sub-pixel position in the high-resolution image to obtain a high-resolution target image as a label. Sample pairs composed of features and corresponding labels are used to construct an infrared small target cluster dataset containing complex background interference. This allows the network model to face challenges that are more in line with actual application conditions during the research and development stage, thereby improving its engineering practicality.

[0021] (2) Introduce an exponential decay sparse mapping constraint that conforms to the physical imaging characteristics:

[0022] During the iteration of the deep unfolded network, the traditional empirical soft threshold contraction is extended to a sparse mapping form with exponential decay, so that the constraint process can better characterize the energy distribution characteristics of infrared small targets that gradually decay from the center to the periphery, thereby more effectively preserving weak real targets while suppressing noise.

[0023] (3) Construct a structural coupling mechanism between the constraint parameters of the exponentially decaying sparse mapping and the iterative state:

[0024] By associating key parameters in the exponentially decaying sparse map with the iterative state of the data consistency update process in the deep unfolded network, the sparsity constraint strength can be adaptively adjusted with each iteration stage, thereby enhancing the stability and consistency of the overall optimization process.

[0025] (4) Improve the generalization ability and stability of the network model under different data scale conditions:

[0026] By introducing physical structure priors independent of data distribution, deterministic constraints are imposed on the network optimization process, effectively alleviating the overfitting problem of the network model under small sample or complex background conditions, and enabling the network model to maintain relatively stable localization performance under different training data scales.

[0027] The objective of this invention is specifically achieved through the following technical solutions:

[0028] This invention discloses a sub-pixel localization method for infrared small target clusters based on physical structure priors and deep unfolded networks, comprising:

[0029] Step 1: Following the principle of energy superposition, the real infrared background image and the low-resolution target image obtained by aliasing multiple small infrared targets against a clean background using a point spread function are fused in the low-resolution pixel coordinate system to obtain a low-resolution observation image as a feature; the real position of the low-resolution target image is mapped to the sub-pixel grid coordinate system to obtain a high-resolution target image as a label; the dataset is constructed by forming sample pairs with features and corresponding labels, and divided into training set and test set;

[0030] Step 2: Solve the linear mapping from the low-resolution observation image to the high-resolution target image for each sample pair using the least squares method, and construct a linear mapping model; perform a transformation on the features of the sample pair using the linear mapping model to generate a high-resolution initial estimated image;

[0031] Step 3: Based on the idea of ​​iterative unfolding, the low-resolution observation image reconstruction process is mapped into a multi-module neural network structure. A deep unfolding network is constructed by sequentially cascading a data consistency update module, a feature transformation and dynamic mapping module, a constraint module obtained by introducing physical structure priors to construct an exponentially decaying sparse mapping function, and an inverse transformation and stage output module.

[0032] Step 4: Input the initial estimated images of the training set into the depthwise unfolded network, and sequentially call each module to perform the forward propagation process, performing data consistency update, feature transformation and dynamic mapping, exponential decay sparse mapping constraint and inverse transformation operation in sequence. After the termination condition is met, output the high-resolution reconstructed image; use the loss function to perform parameter optimization to constrain the network and obtain the trained target network.

[0033] Step 5: After inputting the initial estimated images of the test set into the target network, the output high-resolution reconstructed image positions are mapped back to the low-resolution pixel coordinate system, and the localization performance is evaluated.

[0034] The beneficial effects of this invention are:

[0035] 1. Constructing a training and evaluation system closely resembling real infrared imaging conditions: Multiple small infrared targets against a clean background are aliased using a point spread function to obtain a low-resolution target image in a low-resolution pixel coordinate system. Following the principle of energy superposition, the low-resolution target image is fused with the real infrared background image to obtain a low-resolution observation image as a feature. The real position of the low-resolution target image is mapped to a sub-pixel grid coordinate system to obtain a high-resolution target image as a label. Sample pairs composed of features and corresponding labels are used to construct a small target cluster dataset containing complex background interference. This dataset retains accurate target annotation information while introducing structural interference factors from the real background for network training and testing. This enables the deep unfolded network to adapt to real-world application scenarios during the model training stage and allows the network to be designed for imaging conditions that better suit real-world application scenarios during the R&D stage, thereby improving the applicability and reliability of the method in engineering applications.

[0036] 2. The linear mapping model constructed in this invention is determined by minimizing the reconstruction error on the training samples. It is used to characterize the overall mapping relationship between low-resolution observations and high-resolution target distributions. Without introducing additional computational complexity, it enables the network iteration process to start from an initial state that matches the training data distribution, which helps to improve the stability and consistency of the iterative update process.

[0037] 3. During the inference phase, for any input low-resolution observation image, a linear mapping model is used to transform it once to generate a high-resolution initial estimated image. This initial estimated image serves as the input to the first stage of the deep unfolded network, providing reasonable initial conditions for data consistency updates and sparsity constraints in subsequent stages, and providing a stable and consistent starting state for the subsequent iteration process.

[0038] 4. Construct a structural correlation mechanism between sparse constraint parameters and iterative states: Based on the idea of ​​iterative expansion, the low-resolution observation image reconstruction process is mapped into a multi-module neural network structure. A deep expansion network is constructed by sequentially cascading a data consistency update module, a feature transformation and dynamic mapping module, a constraint module obtained by introducing physical structure priors to construct an exponentially decaying sparse mapping function, and an inverse transformation and stage output module. Each module corresponds to one iterative update process, and the gradual reconstruction of low-resolution observation images is achieved through stage-by-stage feature updates.

[0039] 5. During the network training phase, an end-to-end approach is used to jointly optimize the network parameters. Initial estimated images from a training set containing small target clusters with realistic and complex backgrounds are input into the depth-unfolded network. The forward propagation process is executed sequentially, iteratively calling each module. This process includes data consistency updates, feature transformation and dynamic mapping, exponentially decaying sparse mapping constraints, and inverse transformation operations. By associating the exponentially decaying sparse mapping parameters with the iterative states during the data consistency update process, the constraint strength can be adaptively adjusted with each iteration stage. This constrains the difference between the network output and the corresponding high-resolution target distribution, guiding the network to gradually learn iterative mapping relationships suitable for the task and improving the stability and consistency of the multi-stage iterative mapping process.

[0040] 6. Introducing interpretable physical structure prior constraints into the deep unfolded network: In each stage of the iterative unfolded network, a constraint module is introduced by constructing an exponentially decaying sparse mapping function using a physical structure prior. This module incorporates an exponentially decaying sparse mapping constraint consistent with the imaging characteristics of infrared point targets. This structural constraint on intermediate features establishes a clear correspondence between the network's sparse adjustment behavior and the point spread function and the target energy decay pattern, thereby enhancing the network's physical interpretability. By introducing constraint modules into the deep unfolded network and embedding the target's physical structure prior into the network structure, the network's ability to distinguish between weak target responses and noise responses is improved while maintaining the interpretability of traditional iterative solutions, resulting in more stable and accurate target reconstruction.

[0041] 7. Improve network stability without significantly increasing computational complexity: While maintaining the end-to-end inference efficiency of the deep unfolded network, improve the network's performance under complex backgrounds and different training data scales through structural prior constraints, making the network output results smoother, more stable and reproducible.

[0042] In this invention, a structural constraint consistent with the infrared point target imaging mechanism is introduced into the iterative mapping process through a constraint module, exhibiting good stability and consistency under complex background conditions. Its main effects are reflected in the following aspects:

[0043] 1) Improve the consistency of results under strict localization conditions: By introducing a constraint module, the network has a smoother and more continuous adjustment capability when processing low to medium response amplitude characteristics. This reduces localization fluctuations caused by noise or unstable responses under strict localization threshold conditions, and makes the network output results show more consistent localization behavior in multiple experiments.

[0044] 2) Enhancing the network's robustness under complex backgrounds and small sample conditions: Physical structure priors, as a form of constraint independent of data distribution, effectively suppress the network's overfitting to random noise patterns in the training data. Under different training sample sizes and complex background conditions, the network performance changes more smoothly, demonstrating better generalization stability.

[0045] 3) Enhancing the interpretability and engineering usability of the network inference process: By explicitly introducing a sparse mapping structure corresponding to the imaging characteristics of infrared point targets into the deep unfolded network, the key regulatory behaviors in the network have clear physical meanings, thus enhancing the interpretability of the network inference process. While maintaining end-to-end inference efficiency, it provides a structurally clear, stable, and reliable implementation scheme for sub-pixel localization of infrared small target clusters.

[0046] 8. This invention employs a loss function to constrain the network during training, minimizing the total loss through backpropagation to progressively update learnable parameters in the network, including data consistency update step size, feature transformation parameters, and related parameters such as exponentially decaying sparse mapping. During training, the loss function constructed in this invention exhibits a small oscillation amplitude in its convergence curve, demonstrating good training stability. Evaluation results on independent test sets show that this method has stronger generalization ability and maintains high reconstruction and detection accuracy for unseen data samples.

[0047] 9. After completing network training, the trained target network model is used to infer the test data, and the localization information of the infrared small target cluster is extracted from the network output to evaluate the localization performance. During the inference phase, the initial estimated images of the test set are input into the trained target network, which outputs high-resolution reconstructed images. The reconstructed images reflect the spatial response intensity distribution of the target image in the sub-pixel grid coordinate system. Target localization information is obtained from the high-resolution output by combining the sub-pixel coordinate mapping function, and the localization performance is accurately evaluated using the average accuracy. Attached Figure Description

[0048] The present invention will now be described in further detail with reference to the accompanying drawings and embodiments.

[0049] Figure 1 This is a comparison chart of Epoch-Loss performance between the baseline method and the 80k sample size provided in this embodiment of the invention.

[0050] Figure 2 This is a comparison chart of the Epoch-Val performance of the baseline method with 80k samples provided in this embodiment of the invention.

[0051] Figure 3 This is a comparison chart of Epoch-Loss performance between the present invention and the baseline method under 40k samples.

[0052] Figure 4 This is a comparison chart of the Epoch-Val performance of the baseline method with 40k samples provided in this embodiment of the invention.

[0053] Figure 5 This is a comparison chart of the Epoch-Loss performance of the baseline method with 8k samples provided in this embodiment of the invention.

[0054] Figure 6 This is a comparison chart of the Epoch-Val performance of the baseline method with 8k samples provided in this embodiment of the invention. Detailed Implementation

[0055] To enable those skilled in the art to better understand the present invention, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present invention.

[0056] This invention provides a sub-pixel localization method for infrared small target clusters based on physical structure priors and deep unfolding networks. Under the framework of deep unfolding networks, this method unifies the linear observation relationship of infrared imaging, data consistency updates, and physical structure prior constraints, and gradually recovers the high-resolution distribution of spatially adjacent small targets through multi-stage iterative mapping.

[0057] Compared to existing methods, this invention does not achieve performance improvements by simply increasing network size or training data volume. Instead, it introduces a constraint consistent with the imaging characteristics of infrared point targets in each iteration stage, directionally adjusting the amplitude and structure of intermediate features. This improves the stability, consistency, and physical interpretability of the network inference process under complex backgrounds and limited sample conditions. Specifically, it includes the following steps:

[0058] Step 1: Following the principle of energy superposition, the real infrared background image and the low-resolution target image obtained by aliasing multiple small infrared targets against a clean background using a point spread function are fused in the low-resolution pixel coordinate system to obtain a low-resolution observation image as a feature; the real position of the low-resolution target image is mapped to the sub-pixel grid coordinate system to obtain a high-resolution target image as a label; the dataset is constructed by forming sample pairs with features and corresponding labels, and divided into training set and test set;

[0059] This step constructs a dataset that integrates spatially proximate small target clusters against a complex infrared background, enabling the deep unfolded network to adapt to real-world application scenarios during model training. The low-resolution pixel coordinate system is the image coordinate system directly output from the simulated infrared focal plane array, with each coordinate point corresponding to the center position of a physical photosensitive element. This coordinate system serves as the model's input space. Since distant targets may be smaller than a pixel, multiple targets in this coordinate system will overlap into a blurry bright spot, making them indistinguishable. To overcome the limitations of physical resolution, each pixel in the low-resolution image is further divided into finer grids according to a super-resolution factor. The center positions of these fine grids constitute a high-resolution sub-pixel grid coordinate system.

[0060] Step 2: Solve the linear mapping from the low-resolution observation image to the high-resolution target image for each sample pair using the least squares method, and construct a linear mapping model; perform a transformation on the features of the sample pair using the linear mapping model to generate a high-resolution initial estimated image;

[0061] The constructed linear mapping model, determined by minimizing the reconstruction error on the training samples, is used to characterize the overall mapping relationship between low-resolution observations and high-resolution target distributions. It can enable the network iteration process to start from an initial state that matches the training data distribution without introducing additional computational complexity, which helps to improve the stability and consistency of the iterative update process.

[0062] During the inference phase, for any input low-resolution observation image, a linear mapping model is used to transform it once, generating a high-resolution initial estimated image. This initial estimated image serves as the input to the first stage of the deep unfolded network, providing reasonable initial conditions for data consistency updates and sparsity constraints in subsequent stages, and providing a stable and consistent starting state for the subsequent iteration process.

[0063] Step 3: Based on the idea of ​​iterative unfolding, the low-resolution observation image reconstruction process is mapped into a multi-module neural network structure. A deep unfolding network is constructed by sequentially cascading a data consistency update module, a feature transformation and dynamic mapping module, a constraint module obtained by introducing physical structure priors to construct an exponentially decaying sparse mapping function, and an inverse transformation and stage output module.

[0064] This step introduces constraint modules into the deep unfolded network, embedding the prior physical structure of the target into the network structure. This improves the network's ability to distinguish between weak target responses and noise responses while maintaining the interpretability of traditional iterative solutions, thus achieving more stable and accurate target reconstruction.

[0065] Step 4: Input the initial estimated images of the training set into the depthwise unfolded network, and sequentially call each module to perform the forward propagation process, performing data consistency update, feature transformation and dynamic mapping, exponential decay sparse mapping constraint and inverse transformation operation in sequence. After the termination condition is met, output the high-resolution reconstructed image; use the loss function to perform parameter optimization to constrain the network and obtain the trained target network.

[0066] During network training, an end-to-end approach is used to jointly optimize network parameters. During training, a dataset of small target clusters containing realistic and complex backgrounds is used to constrain the difference between the network output and the corresponding high-resolution target distribution, guiding the network to gradually learn an iterative mapping relationship suitable for the task. This ensures the stability and consistency of the network iteration process.

[0067] This invention uses a loss function to constrain the network during training, and minimizes the total loss through the backpropagation algorithm, gradually updating the learnable parameters in the network, including the data consistency update step size parameter, feature transformation parameter, and related parameters such as exponential decay sparse mapping.

[0068] Step 5: After inputting the initial estimated images of the test set into the target network, the output high-resolution reconstructed image positions are mapped back to the low-resolution pixel coordinate system, and the localization performance is evaluated.

[0069] After completing network training, the trained target network is used to infer the test set data, and the localization information of infrared small target clusters is extracted from the network output to evaluate the localization performance of the method. During the inference phase, the initial estimated images of the test set are input into the trained target network, and a multi-stage forward propagation process is executed sequentially to output a high-resolution reconstructed image. The reconstructed image reflects the spatial response intensity distribution of the target image in the sub-pixel grid coordinate system. Target localization information is obtained from the high-resolution output by combining the sub-pixel coordinate mapping function, and the localization performance is evaluated.

[0070] In step one, the methods for constructing the dataset include:

[0071] S11, In the low-resolution pixel coordinate system, multiple infrared small targets against a clean background are aliased using a point spread function to obtain a low-resolution target image, and a low-resolution target image is read.

[0072] For example, a low-resolution target image is read from an image library (CSIST-100K dataset) of spatially proximate small target clusters generated by simulation. This low-resolution target image simulates the imaging result of multiple infrared small targets after their radiated energy is aliased by the detector's point spread function (PSF), and its size is [missing information]. .

[0073] S12: Randomly select a real infrared background image from the infrared background dataset containing real complex scenes in the low-resolution pixel coordinate system. Based on the size of the low-resolution target image, randomly crop within the size range of the real infrared background image to obtain a background block with the same size as the low-resolution target image.

[0074] For example, a real infrared background image is randomly selected from the SIRST-V2 infrared background dataset, which contains realistic and complex scenes. To enhance data diversity and ensure a random combination of the target region and the background, the real infrared background image is randomly cropped. Let the coordinates of the top-left corner of the cropped region be... The cutting size is ,in and Randomly generated within the size range of a real infrared background image, and satisfying the following conditions: and It equals the size of the low-resolution target image.

[0075] S13, following the principle of energy superposition, physically fuses and normalizes the low-resolution target image and background block to obtain a low-resolution observation image of the infrared small target in the low-resolution pixel coordinate system, and uses the low-resolution observation image as a feature; at the same time, the real position of the low-resolution target image is mapped to the sub-pixel grid coordinate system to obtain a high-resolution target image, and uses the high-resolution target image as a label; sample pairs are formed by features and corresponding labels.

[0076] The resulting low-resolution observation image is used to simulate the mixed signal received by a real infrared sensor, which includes small infrared targets and complex backgrounds.

[0077] S14. Repeat steps S11 to S13 until the termination condition is met. The dataset is then divided into training and test sets according to a preset ratio.

[0078] The above fusion operation is performed on each pair of randomly combined low-resolution target images and real infrared background images to generate low-resolution observation images of infrared small targets. By repeating this process, a dataset containing infrared small target clusters with complex real backgrounds is constructed, and the dataset is divided into training set and test set according to a preset ratio.

[0079] The calculation method for low-resolution observation images is as follows:

[0080] ;

[0081] In the formula This is a low-resolution observation image, where i represents the sequence number. middle, M represents the maximum number of sample pairs in the dataset;

[0082] The i-th low-resolution target image is represented by the following expansion: ; middle, N represents the maximum number of low-resolution target images; where, Represents the true location coordinates of a low-resolution target image, and the set of center locations of all low-resolution target images. for , This represents the center position of the i-th low-resolution target image; This represents the intensity of the i-th radiation peak in the low-resolution target image. The set of peak radiation intensities for , Let be the standard deviation of the Gaussian distribution corresponding to the point spread function (PSF) of the optical system; These are fusion coefficients used to simulate different signal-to-noise ratio conditions. The background block is the same size as the low-resolution target image.

[0083] The numerator of the formula is a linear superposition of the target radiation and the background radiation, while the denominator is normalized to the overall energy.

[0084] The method for mapping the true location of the low-resolution target image to a sub-pixel grid coordinate system to obtain the high-resolution target image as the label is as follows:

[0085] low-resolution target image The actual location coordinates After magnification of super-resolution factor c and offset The summation, mapped to the subpixel grid coordinate system, yields the high-resolution subpixel position coordinates of the high-resolution target image. ;

[0086] The coordinates of the high-resolution subpixel position The pixel grayscale value at that location is set to the peak radiance of the high-resolution target image. High-resolution target images are obtained as labels. .

[0087] In step two, the method for generating the high-resolution initial estimation image is as follows:

[0088] ;in,

[0089] ;

[0090] In the formula, For high-resolution initial estimation images, The initialization mapping matrix is ​​output by the linear mapping model. This represents an observation image matrix consisting of M low-resolution observation images. ; Let be the variable matrix to be optimized, representing the linear mapping of each sample pair from the low-resolution observation image space to the high-resolution target image space. Denotes the Frobenius norm of a matrix; Indicates and The corresponding high-resolution target image matrix, , For a high-resolution target image, T represents the matrix transpose.

[0091] In step three, within the constructed deep unfolded network, each module in the neural network structure completes one feature iteration update process, and through the phased updates of all modules, progressive reconstruction of the low-resolution observed image is achieved; among which,

[0092] The data consistency update module is used to perform gradient descent update operation to update the input image by adjusting the update magnitude through the learning rate and the guidance matrix, so as to constrain the consistency between the network output and the input data, and output data consistency update data; the input image includes the high-resolution initial estimation image input for the first time or the high-resolution estimation image output by the inverse transform and the stage output module in the previous loop iteration.

[0093] The feature transformation and dynamic mapping module is used to rearrange the data consistency update data into a two-dimensional feature map form through a learnable non-linear transformation function and map it to a high-dimensional feature space through convolution operation, outputting a dynamic mapping branch; the dynamic mapping branch output by each stage of convolution operation is introduced into the next stage of convolution operation, outputting feature transformation and dynamic mapping features.

[0094] This module introduces a dynamic mapping branch based on the output of the previous stage, which is used to adaptively adjust some features to enhance the ability to express spatially adjacent target structures.

[0095] The constraint module is used to introduce prior physical structure to construct an exponentially decaying sparse mapping function for nonlinear shrinkage mapping, apply adaptive sparse constraints to the dynamic mapping features of feature transformation, and output exponentially decaying sparse mapping constraint features.

[0096] This module uses an adaptive shrinking scale generated by the attention mechanism. Under control, an amplitude-dependent nonlinear contraction mapping is constructed through an exponentially decaying sparse mapping function. Adaptive sparse constraints are applied to the dynamic mapping features of the feature transformation, so that the contraction intensity changes dynamically with the feature amplitude. This enables effective differentiation between noise response and weak target response, and outputs the exponentially decaying sparse mapping constraint features as a priori physical structure.

[0097] To effectively distinguish between noise responses and weak target responses in intermediate features, this invention introduces a constraint module in each iteration stage k to apply structured sparse constraints to the feature amplitudes. Unlike traditional soft thresholding operators that only rely on fixed or linear thresholds for pruning, the constraint module constructed in this invention has the following characteristics: when the feature amplitude is small, the exponential decay term... A larger value causes low-amplitude features to decay rapidly to zero after nonlinear contraction, effectively suppressing low-response noise. As the feature amplitude gradually increases, the exponential decay term decreases rapidly with increasing feature amplitude, causing larger-amplitude features to undergo only weaker gradual contraction. This achieves background noise suppression while preserving the target response characteristics. The parameter... It is adaptively generated by an attention mechanism and is used to dynamically adjust the contraction intensity at different feature locations.

[0098] The inverse transform and stage output module is used to map the exponentially decaying sparse mapping constraint features back to the original space through an inverse transform mapping function that is symmetrical with the forward transform structure, generating a high-resolution estimated image of the current stage output, which serves as the input to the next stage iterative data consistency update module, until the termination condition of the loop iteration is met, and outputting a high-resolution reconstructed image.

[0099] In step four, the initial estimated images from the training set are input into the depth unrolling network, and the forward propagation process is executed by sequentially calling each module in a loop:

[0100] S41, the data consistency update module is invoked. By adjusting the update magnitude through the learning rate and guidance matrix, gradient descent is performed to update the input image for data consistency, thus constraining the consistency between the network output and the input data. The output data consistency update data is then generated; represented as:

[0101] ;

[0102] In the formula, Update data to ensure consistency, where k is the number of iterations. The input image is a sparse signal vector, which includes the high-resolution initial estimation image input for the first time or the high-resolution estimation image output by the inverse transform and stage output module in the previous loop iteration. For learning rate, For guiding matrix;

[0103] S42 calls the feature transformation and dynamic mapping module, which uses a learnable nonlinear transformation function to rearrange the data consistency update data into a two-dimensional feature map and then maps it to a high-dimensional feature space through convolution operations, outputting a dynamic mapping branch. The dynamic mapping branch output from each stage of the convolution operation is introduced into the next stage of the convolution operation, outputting the feature transformation and dynamic mapping features; represented as:

[0104] ;

[0105] In the formula, For feature transformation, dynamic mapping features, It is a learnable nonlinear transformation function used to extract high-level features and promote sparsity;

[0106] S43, calls the constraint module, introduces prior physical structure to construct an exponentially decaying sparse mapping function for nonlinear shrinkage mapping, applies adaptive sparse constraints to the dynamic mapping features of the feature transformation, and outputs exponentially decaying sparse mapping constraint features; represented as:

[0107] ;

[0108] ;

[0109] In the formula, This is a constrained feature of exponentially decaying sparse mapping. It is an exponentially decaying sparse mapping function. It is a scale parameter in the prior physical structure, used to adjust the rate of change of the exponential decay term; This is a sparse suppression strength parameter in the prior physical structure, used to dynamically adjust the shrinkage strength of each feature;

[0110] Through the aforementioned exponential decay contraction mechanism, the degree of contraction can be adaptively adjusted according to the characteristic amplitude, thereby effectively distinguishing noise response from weak target response in the feature space, suppressing low-amplitude noise, and preserving target features with structural information.

[0111] In this invention, the parameters of the exponentially decaying sparse mapping function are not fixed constants, but are generated in a learnable and constrained manner, thereby achieving adaptive adjustment of the strength of the feature sparsity constraint. Among these, the scale parameter... Based on the dynamic mapping features of the current stage, an adaptive scaling mechanism is used to generate scales (e.g., adaptively learned from the current features through a lightweight attention module), allowing features at different spatial locations to obtain different scaling scales. To ensure the numerical stability of the parameters, a nonlinear constraint is applied to the generated scaling parameter to ensure it is positive, while a numerical range constraint is also applied to avoid unstable updates. Sparse suppression strength parameter. The parameters are learnable and constrained by a nonlinear activation function (e.g., ReLU) to keep them positive. The sparsity suppression strength is controlled by range limitation, thereby achieving stable sparsity constraints at different iteration stages.

[0112] The derivation principle of the exponentially decaying sparse mapping function is as follows: There are N low-resolution target images in the image library of spatially proximate small target clusters, each low-resolution target image... It can be characterized as The pure Gaussian diffusion model is trained as input to obtain a trained infrared long-range spatial proximity target super-resolution model. This model then performs inference on the input degraded image and outputs a set of estimated center locations for the predicted target image. and the corresponding estimated peak radiation intensity set ,in, For the first The estimated peak radiation intensity of a predicted target image. The peak radiation intensity estimate is then optimized. Compared with the actual peak radiation intensity The differences between them enable a joint high-precision estimation of the number of targets, sub-pixel-level positioning, and radiation intensity.

[0113] In infrared imaging systems, due to the point spread function (PSF) of the optical system, target radiation diffuses between pixels on the focal plane. The response value of each pixel to the target is the integral of the PSF within that pixel's photosensitive area. (Pixel definition follows.) For located The target response is:

[0114] ;

[0115] in, This represents the center coordinates of the pixel, where D is the pixel size. For a focal plane composed of U×V pixels, there are K1 targets with coordinates as follows: Expanding the focal plane response matrix column-wise into a UV×1 vector z, the following observation image matrix can be established:

[0116] ;

[0117] In the formula For the UV×K guidance matrix, its first... List Indicates the first The contribution vector of each target to all pixels; The vector formed by the target peak intensity; It is additive white Gaussian noise, following a distribution. Furthermore, the noise of each pixel is independent of each other.

[0118] For super-resolution tasks targeting nearby targets in infrared space, the observed image matrix is ​​transformed into a sparse signal recovery problem. At the sub-pixel level, each pixel is uniformly divided into n1×n1 sub-pixel grids, forming an overcomplete set of all possible target locations. ,in Furthermore, each sub-pixel grid contains at most one target, and the deviation of its actual position from the center of the region does not exceed [a certain value]. Based on this, the observation image matrix can be reformulated as:

[0119] ;

[0120] in, To achieve a complete guiding matrix, For a sparse signal vector, the positions of its non-zero elements correspond to the presence of the target, and the amplitude corresponds to the target intensity. Let represent the noise term. Therefore, the demixing problem for infrared small target clusters is transformed into the following: Regularized sparse recovery problem:

[0121] ;

[0122] in, is the regularization coefficient. This model achieves joint estimation of the number, location, and intensity of targets by promoting the sparsity of solutions.

[0123] To achieve the aforementioned sparse reconstruction, a numerical solution framework based on iterative shrinkage thresholding is adopted. Furthermore, considering the characteristics of infrared imaging of small targets, a novel differentiable shrinkage function is designed to replace the traditional soft thresholding operator. This function is defined as a PiE-type shrinkage mapping:

[0124] ;

[0125] ;

[0126] in, As input features, is a scaling parameter that controls the exponential decay rate of the function and its curvature near the origin; This is the shrinkage strength parameter, i.e., the sparsity suppression strength parameter. This mapping is in... When larger, it approaches contraction, in When smaller, it provides an approximation The steep threshold characteristic enhances the ability to retain weak targets while maintaining sparsity.

[0127] The above PiE-type contraction mapping is rearranged to obtain an exponential decay sparse mapping function that is introduced into the physical structure prior:

[0128] ;

[0129] Embed it into the constraint module.

[0130] S44, calling the inverse transform and stage output module, maps the exponentially decaying sparse mapping constraint features back to the original space through an inverse transform mapping function symmetrical to the forward transform structure, generating a high-resolution estimated image for the current stage output; expressed as:

[0131] ;

[0132] In the formula, To estimate images at high resolution, To be related to the forward transform structure, the learnable nonlinear transform function. Symmetric inverse transformation mapping function, with constraints applied. To ensure consistency during the reconstruction.

[0133] In step four, the loss function is constructed from the reconstruction consistency loss function and the transformation symmetry constraint loss function, and is expressed as follows:

[0134] ;in,

[0135] The consistency loss function for reconstruction is expressed as:

[0136] ;

[0137] The loss function for transformation symmetry constraints is expressed as:

[0138] ;

[0139] In the formula, The loss value output by the loss function. The reconstruction consistency loss value output by the reconstruction consistency loss function is used to constrain the overall reconstruction capability of the network by measuring the difference between the final output of the network and the true high-resolution target distribution, so as to ensure the consistency of the output in terms of spatial location and amplitude distribution. These are weighting coefficients. The transformation symmetry constraint loss value output by the transformation symmetry constraint loss function is used to constrain the consistency between the forward feature transformation and the inverse transformation, thereby suppressing unstable or biased mapping behavior of the network during the iteration process and enhancing the stability of the overall network structure.

[0140] A represents the total number of sample pairs in the training set, and B represents the total number of pixels in the image. To further expand the number of modules in the network, For high-resolution target images, High-resolution reconstructed images output by the depth-unfolding network;

[0141] In step five, the method for mapping the output high-resolution reconstructed image location back to the low-resolution pixel coordinate system and evaluating the localization performance includes:

[0142] S51, the response amplitude is filtered for the high-resolution reconstructed image positions corresponding to the output test set, and the positions with pixel values ​​lower than the preset pixel threshold are suppressed to zero, and the low response suppression result is output; wherein, the response amplitude is used to characterize the relative intensity of infrared small targets;

[0143] Positions with response amplitudes below a preset threshold are suppressed to zero to reduce the impact of background noise and unstable responses on subsequent positioning results. The threshold can be set according to actual application requirements.

[0144] S52, in the low response suppression results, search for local maxima points in the defined neighborhood as candidate target locations;

[0145] Each candidate target location corresponds to a potential infrared small target, and the spatial position of the infrared small target is represented by the sub-pixel grid coordinate system where the peak value is located;

[0146] S53 uses a sub-pixel coordinate mapping function to map the candidate target position back to the low-resolution pixel coordinate system, and uses the candidate target position in the low-resolution pixel coordinate system as the prediction point.

[0147] By using the subpixel coordinate mapping function, the fine positioning result of the target within the original pixel range can be obtained, thereby achieving subpixel-level position estimation.

[0148] S54. Set distance threshold groups. Under each distance threshold, determine the matching relationship based on the spatial distance between the predicted point and the real point. Count the number of true positives, false positives, and false negatives that are not matched in the prediction results. Calculate the corresponding precision and recall, and plot the precision-recall curve. Obtain the average precision under the distance threshold by calculating the area of ​​the curve. Average the average precision corresponding to all distance thresholds to obtain the mean average precision, and use the mean average precision as the localization performance evaluation index of the target network.

[0149] In S53, the sub-pixel coordinate mapping function is:

[0150] ;

[0151] In the formula, This refers to the position of the candidate target location in the low-resolution pixel coordinates, i.e., the predicted point. is the candidate target position in the sub-pixel grid coordinate system, and c is the preset super-resolution factor;

[0152] In S54, the accuracy is calculated as follows:

[0153] ;

[0154] The recall rate is calculated as follows:

[0155] ;

[0156] In the formula, For accuracy, For recall, TP is the number of predicted points recorded as true positives, representing the number of correctly detected targets; FP is the number of predicted points recorded as false positives, representing the number of falsely detected targets; and FN is the number of predicted points recorded as false negatives, representing the number of missed targets.

[0157] The method for calculating the mean precision is as follows:

[0158] ;

[0159] In the formula, Here, m represents the average precision, m is the total number of distance thresholds in the distance threshold group, and n is the distance threshold index. The average accuracy calculated for the nth distance threshold. Distance threshold This is the nth distance threshold.

[0160] To evaluate the localization performance of the target network, the predicted target location is matched with the actual target location in the corresponding test samples. To determine the correctness of the prediction, a distance threshold is set. For each real target point, in its Within the neighborhood, find the unmatched predicted point with the highest confidence and pair it with another. If the pairing is successful, the predicted point is recorded as a true positive and its number TP is recorded; the predicted point that fails to pair with any true target is recorded as a false positive and its number FP is recorded; the true target that is not matched with any predicted point is recorded as a false negative and its number FN is recorded.

[0161] Set distance threshold groups, such as (pixels) to evaluate performance under different positioning accuracy requirements. At each threshold Next, calculate a set of precision and recall values, plot a PR curve, and calculate the area under the curve to obtain the average precision. The CSO-mAP is calculated, which comprehensively reflects the network's overall performance in target detection, subpixel localization, and intensity ranking.

[0162] Verification experiment:

[0163] To verify the effectiveness of the technical solution of the present invention, a verification experiment is provided for illustration:

[0164] 1. Construct a simulation training dataset with a realistic infrared background:

[0165] To enable the network to be trained and evaluated under conditions closely resembling real-world imaging, a simulated dataset containing complex background and target signals must first be constructed. This dataset ensures that the mutual interference between the target and the background matches the actual imaging process by fusing real infrared background with simulated target signals.

[0166] (1) Target image reading:

[0167] A low-resolution target image is read from an image library containing simulated clusters of spatially proximate small targets (e.g., CSIST-100K). This low-resolution target image simulates the imaging result of multiple sub-pixel targets' radiant energy after aliasing via the detector's point spread function (PSF) against a clean background. Its size is [missing information]. This image is used to generate target signals, and its corresponding annotations include information such as the number, location, and intensity of targets.

[0168] (2) Random selection and cropping of background image:

[0169] Background images are randomly selected from real infrared background datasets (e.g., SIRST-V2). To enhance data diversity and ensure a random combination of the target region and background, the background images are randomly cropped. Let the coordinates of the top-left corner of the cropped region be... The cutting size is ,in and Randomly generated within the background image size range, and satisfying the following conditions: and Equal to the target image size at resolution.

[0170] (3) Image fusion and normalization:

[0171] The target signal is fused with the real background to simulate the mixed signal containing the target and complex background received by a real infrared sensor. The fusion process follows the physical principle of energy superposition, and the specific formula is as follows:

[0172] .

[0173] (4) Dataset construction and annotation inheritance:

[0174] The above fusion operation is performed on each randomly combined target image and real infrared background image to generate a low-resolution observation image of the small infrared target. The corresponding annotations (number of targets, sub-pixel coordinates of each target, and peak radiant intensity) are directly inherited from the annotations of the original target image to ensure the accuracy of the labels. By repeating this process, a dataset containing complex real-world backgrounds is constructed for network training, validation, and testing.

[0175] 2. Initial estimation generation based on linear mapping:

[0176] Similar to traditional iterative optimization algorithms such as ISTA, the deep unfolding network described in this invention also requires initialization of the high-resolution target image using a linear mapping method. Specifically, it is assumed that there is a network containing M sample pairs. The dataset.

[0177] High-resolution target image The generation depends on the original point target image. Prior information And the super-resolution factor c. Its core principle is: in the image grid after c-fold super-resolution, the coordinates... The pixel grayscale value at that location is set to the peak radiance of the target. This constructs a dataset containing clear point targets. .

[0178] Based on this dataset, this invention constructs a linear mapping model by solving a least-squares problem to obtain the initial mapping matrix. .make ,and Corresponding high-resolution target image matrix The linear mapping model is obtained through the following optimization problem:

[0179] .

[0180] In summary, for any new low-resolution observation image of a small infrared target, the initial estimated image of the high-resolution target image of the network can be obtained through a single linear transformation. This initialization method provides a high-performance starting point that matches the data distribution for subsequent iterative optimization of the unfolded network.

[0181] 3. Implementation of deep unfolded network structure and exponentially decaying sparse mapping:

[0182] The network performs the following iterative steps at each stage:

[0183] Call the data consistency update module and perform gradient descent update operation: .

[0184] The feature transformation and dynamic mapping module is invoked to perform adaptive feature transformation and shrinkage on the data for data consistency updates, followed by feature transformation and dynamic mapping. .

[0185] The constraint module is invoked, and a priori physical structure is introduced to construct an exponentially decaying sparse mapping function for nonlinear shrinkage mapping. Adaptive sparse constraints are applied to the dynamically mapped features of the feature transformation, outputting the exponentially decaying sparse mapping constraint features: scale parameter. Based on current characteristics Or the output of the previous stage Dynamic prediction enables content-adaptive shrinkage shape adjustment. Intensity parameter. As trainable variables, or generated by an auxiliary network during network training. Then, an exponentially decaying sparse mapping constraint is applied:

[0186] ;

[0187] ;

[0188] In the inverse transform and stage output module, perform the inverse transform operation on the exponentially decaying sparse mapping constraint features: Apply constraints To ensure consistency during the reconstruction.

[0189] By cascading multiple of the above modules, the network can dynamically adjust the shape and intensity of the contraction function during iterative optimization, thereby more accurately recovering the number of dense and weak targets, sub-pixel positions, and radiation intensity, significantly improving super-resolution performance and robustness.

[0190] 4. Network forward propagation and iterative optimization:

[0191] This patent constructs an end-to-end depth unfolding network. The network consists of multiple cascaded stages with identical structures. Each stage inputs the input image into the constructed depth unfolding network containing K stages. Through multi-stage iteration, a high-resolution target distribution image is gradually recovered from low-resolution aliased observations.

[0192] To ensure network performance, the following composite loss is used for end-to-end supervised training:

[0193] Reconstructing the consistency loss function: The loss-constrained network ultimately outputs the mean square error between the actual high-resolution target signal and the mean square error.

[0194] Transformation symmetry constraint loss function: Constraints are imposed through this loss. To ensure consistency during the reconstruction.

[0195] The total loss function is the weighted sum of the above two terms: ;

[0196] 5. Model evaluation and location information extraction:

[0197] To accurately measure the super-resolution performance of the model for spatially nearby targets, this invention defines and adopts "Closely-Spaced Objectives Mean Average Precision" (CSO-mAP) as the core evaluation metric, the calculation of which includes the following steps:

[0198] In the high-resolution reconstructed image output by the target network, pixels with values ​​below a preset threshold (e.g., 50) are set to zero to filter noise. Local maxima are extracted as candidate prediction targets, and their image coordinates are recorded. and the original strength value.

[0199] For the case where the super-resolution factor is c, the coordinates on the sub-pixel grid coordinate system will be... Low-resolution cell coordinates mapped back to the original 11×11 grid The mapping formula is:

[0200] ;

[0201] To determine the accuracy of the prediction, a distance threshold is set. For each real target point, in its Within the neighborhood, find the unmatched predicted point with the highest confidence and pair it with another. A successfully paired predicted point is denoted as a true positive (TP), an unmatched predicted point is denoted as a false positive (FP), and an unmatched predicted point is denoted as a false negative (FN).

[0202] Set distance threshold group (pixels) to evaluate performance under different positioning accuracy requirements. At each threshold Next, calculate a set of precision and recall values, plot a PR curve, and calculate the area under the curve to obtain the average precision. .

[0203] CSO-mAP is the average AP across all thresholds:

[0204] ;

[0205] This metric comprehensively reflects the overall performance of the target network in target detection, sub-pixel localization, and intensity ranking.

[0206] Model training configuration: The model is trained on hardware equipped with a GPU. The Adaptive Moment Estimation (Adam) optimizer is used, with an initial learning rate set to... The training consists of 250 epochs, with 64 images processed per batch. The data loader reads pairs of low-resolution simulated images (PNG format) and label files containing ground truth locations and intensities from a specified path, converting them to the tensor format required for network training. All training hyperparameters and path configurations are recorded in the configuration file to ensure experimental reproducibility. The training, validation, and testing processes are completed within an open-source deep learning framework, and the optimal model weights obtained during training are saved for subsequent inference.

[0207] 6. Experimental Results:

[0208] The effectiveness of the method of this invention was verified through a series of comparative experiments, and the experimental results are as follows:

[0209] 6.1 Superior Performance Across Larger Data Sets: Tests were conducted on datasets of three different sizes: 80k, 40k, and 8k samples. The comparative results of localization accuracy under different numbers of training samples in this invention are shown in Table 1.

[0210] Table 1

[0211]

[0212] The comparison diagram of the localization methods in the examples of this invention under 80k samples is shown below. Figure 1 and Figure 2 As shown in the figure, a comparison of the localization methods in the examples of this invention under 40k samples is presented. Figure 3 and Figure 4 As shown in the figure, a comparison of the localization methods in the examples of this invention under 8k samples is presented. Figure 5 and Figure 6 As shown in Table 1 and Figures 1 to 6 As can be seen, the performance metrics of the method of this invention (such as CSO-mAP) are superior to the original DISTA network benchmark model. In particular, the performance improvement is more significant in the evaluation range with medium positioning accuracy requirements (such as setting the distance threshold δ to 0.10 to 0.15 pixels).

[0213] 6.2 Training Stability and Generalization Ability: During training, the loss function constructed in this invention exhibits small oscillation amplitude in its convergence curve, demonstrating good training stability. Evaluation results on the independent test set show that this method has stronger generalization ability and maintains high reconstruction and detection accuracy for unseen data samples.

[0214] 6.3 Real-time inference efficiency: On a standard GPU hardware platform, the average inference latency of the method of this invention is about 0.04 seconds for a single frame input image, which meets the application requirements of real-time processing.

[0215] 6.4 Convergence Characteristics of the Core Function: The proposed exponentially decaying sparse mapping function imposes stronger convergence constraints on intermediate states during the optimization process. This constraint helps the network generate a more consistent and smoother optimization trajectory during iteration, resulting in more stable and accurate target sub-pixel localization results at the output level.

[0216] The above description is merely a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in the present invention should be included within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.

Claims

1. A sub-pixel localization method for infrared small target clusters based on physical structure priors and deep unfolded networks, characterized in that, include: Step 1: Following the principle of energy superposition, the real infrared background image is fused with the low-resolution target image obtained by aliasing multiple small infrared targets in a clean background using a point spread function in the low-resolution pixel coordinate system to obtain a low-resolution observation image as a feature. The true location of the low-resolution target image is mapped to the sub-pixel grid coordinate system to obtain the high-resolution target image as the label; The dataset is constructed by pairing features with corresponding labels, and then divided into training and test sets. Step 2: Solve the linear mapping from the low-resolution observation image to the high-resolution target image for each sample pair using the least squares method, and construct a linear mapping model; A high-resolution initial estimation image is generated by transforming the features of the sample pair using a linear mapping model. Step 3: Based on the idea of ​​iterative unfolding, the low-resolution observation image reconstruction process is mapped into a multi-module neural network structure. A deep unfolding network is constructed by sequentially cascading a data consistency update module, a feature transformation and dynamic mapping module, a constraint module obtained by introducing physical structure priors to construct an exponentially decaying sparse mapping function, and an inverse transformation and stage output module. Step 4: Input the initial estimated images of the training set into the depthwise unfolded network, and sequentially call each module to perform the forward propagation process, performing data consistency update, feature transformation and dynamic mapping, exponential decay sparse mapping constraint and inverse transformation operation in sequence. After the termination condition is met, output the high-resolution reconstructed image; use the loss function to perform parameter optimization to constrain the network and obtain the trained target network. Step 5: After inputting the initial estimated images of the test set into the target network, the output high-resolution reconstructed image positions are mapped back to the low-resolution pixel coordinate system, and the localization performance is evaluated.

2. The sub-pixel localization method for infrared small target clusters based on physical structure priors and deep unfolded networks as described in claim 1, characterized in that, In step one, the methods for constructing the dataset include: S11, In the low-resolution pixel coordinate system, multiple infrared small targets against a clean background are aliased using a point spread function to obtain a low-resolution target image, and a low-resolution target image is read. S12: Randomly select a real infrared background image from the infrared background dataset containing real complex scenes in the low-resolution pixel coordinate system. Based on the size of the low-resolution target image, randomly crop within the size range of the real infrared background image to obtain a background block with the same size as the low-resolution target image. S13, following the principle of energy superposition, physically fuses and normalizes the low-resolution target image and background block to obtain a low-resolution observation image of the infrared small target in the low-resolution pixel coordinate system, and uses the low-resolution observation image as a feature; at the same time, the real position of the low-resolution target image is mapped to the sub-pixel grid coordinate system to obtain a high-resolution target image, and uses the high-resolution target image as a label; sample pairs are formed by features and corresponding labels. S14. Repeat steps S11 to S13 until the termination condition is met. The dataset is then divided into training and test sets according to a preset ratio.

3. The sub-pixel localization method for infrared small target clusters based on physical structure priors and deep unfolded networks as described in claim 2, characterized in that, The calculation method for low-resolution observation images is as follows: ; In the formula, This is a low-resolution observation image, where i represents the sequence number. middle, M represents the maximum number of sample pairs in the dataset; The i-th low-resolution target image is represented by the following expansion: ; middle, N represents the maximum number of low-resolution target images; where, Represents the true location coordinates of a low-resolution target image, and the set of center locations of all low-resolution target images. for , This represents the center position of the i-th low-resolution target image; This represents the intensity of the i-th radiation peak in the low-resolution target image. The set of peak radiation intensities for , Let be the standard deviation of the Gaussian distribution corresponding to the point spread function (PSF) of the optical system; These are fusion coefficients used to simulate different signal-to-noise ratio conditions. The background block is the same size as the low-resolution target image.

4. The sub-pixel localization method for infrared small target clusters based on physical structure priors and deep unfolded networks as described in claim 3, characterized in that, The method for mapping the true location of a low-resolution target image to a sub-pixel grid coordinate system to obtain a high-resolution target image as a label is as follows: low-resolution target image The actual location coordinates After magnification of super-resolution factor c and offset The summation, mapped to the subpixel grid coordinate system, yields the high-resolution subpixel position coordinates of the high-resolution target image. ; The coordinates of the high-resolution subpixel position The pixel grayscale value at that location is set to the peak radiance of the high-resolution target image. High-resolution target images are obtained as labels. .

5. The sub-pixel localization method for infrared small target clusters based on physical structure priors and deep unfolded networks as described in claim 4, characterized in that, In step two, the method for generating the high-resolution initial estimation image is as follows: ;in, ; In the formula, For high-resolution initial estimation images, The initialization mapping matrix is ​​output by the linear mapping model. This represents an observation image matrix consisting of M low-resolution observation images. ; Let be the variable matrix to be optimized, representing the linear mapping of each sample pair from the low-resolution observation image space to the high-resolution target image space. Denotes the Frobenius norm of a matrix; Indicates and The corresponding high-resolution target image matrix, , For a high-resolution target image, T represents the matrix transpose.

6. The sub-pixel localization method for infrared small target clusters based on physical structure priors and deep unfolded networks as described in claim 5, characterized in that, In step three, within the constructed deep unfolded network, each module in the neural network structure completes one feature iteration update process, and through the phased updates of all modules, progressive reconstruction of the low-resolution observed image is achieved; among which, The data consistency update module is used to perform gradient descent update operation to update the input image by adjusting the update magnitude through the learning rate and the guidance matrix, so as to constrain the consistency between the network output and the input data, and output data consistency update data; the input image includes the high-resolution initial estimation image input for the first time or the high-resolution estimation image output by the inverse transform and the stage output module in the previous loop iteration. The feature transformation and dynamic mapping module is used to rearrange the data consistency update data into a two-dimensional feature map form through a learnable non-linear transformation function and map it to a high-dimensional feature space through convolution operation, outputting a dynamic mapping branch; the dynamic mapping branch output by each stage of convolution operation is introduced into the next stage of convolution operation, outputting feature transformation and dynamic mapping features. The constraint module is used to introduce prior physical structure to construct an exponentially decaying sparse mapping function for nonlinear shrinkage mapping, apply adaptive sparse constraints to the dynamic mapping features of feature transformation, and output exponentially decaying sparse mapping constraint features. The inverse transform and stage output module is used to map the exponentially decaying sparse mapping constraint features back to the original space through an inverse transform mapping function that is symmetrical with the forward transform structure, generating a high-resolution estimated image of the current stage output, which serves as the input to the next stage iterative data consistency update module, until the termination condition of the loop iteration is met, and outputting a high-resolution reconstructed image.

7. The sub-pixel localization method for infrared small target clusters based on physical structure priors and deep unfolded networks as described in claim 6, characterized in that, In step four, the method of inputting the initial estimated images of the training set into the depthwise unfolded network and sequentially calling each module to perform the forward propagation process includes: S41, the data consistency update module is invoked. By adjusting the update magnitude through the learning rate and guidance matrix, gradient descent is performed to update the input image for data consistency, thus constraining the consistency between the network output and the input data. The output data consistency update data is then generated; represented as: ; In the formula, Update data to ensure consistency, where k is the number of iterations. The input image is a sparse signal vector, which includes the high-resolution initial estimation image input for the first time or the high-resolution estimation image output by the inverse transform and stage output module in the previous loop iteration. For learning rate, For guiding matrix; S42 calls the feature transformation and dynamic mapping module, which uses a learnable nonlinear transformation function to rearrange the data consistency update data into a two-dimensional feature map and then maps it to a high-dimensional feature space through convolution operations, outputting a dynamic mapping branch. The dynamic mapping branch output from each stage of the convolution operation is introduced into the next stage of the convolution operation, outputting the feature transformation and dynamic mapping features; represented as: ; In the formula, For feature transformation, dynamic mapping features, It is a learnable nonlinear transformation function used to extract high-level features and promote sparsity; S43, calls the constraint module, introduces prior physical structure to construct an exponentially decaying sparse mapping function for nonlinear shrinkage mapping, applies adaptive sparse constraints to the dynamic mapping features of the feature transformation, and outputs exponentially decaying sparse mapping constraint features; represented as: ; ; In the formula, This is a constrained feature of exponentially decaying sparse mapping. It is an exponentially decaying sparse mapping function. It is a scale parameter in the prior physical structure, used to adjust the rate of change of the exponential decay term; This is a sparse suppression strength parameter in the prior physical structure, used to dynamically adjust the shrinkage strength of each feature; S44, calling the inverse transform and stage output module, maps the exponentially decaying sparse mapping constraint features back to the original space through an inverse transform mapping function symmetrical to the forward transform structure, generating a high-resolution estimated image for the current stage output; expressed as: ; In the formula, To estimate images at high resolution, To be related to the forward transform structure, the learnable nonlinear transform function. Symmetric inverse transformation mapping function, with constraints applied. To ensure consistency during the reconstruction.

8. The sub-pixel localization method for infrared small target clusters based on physical structure priors and deep unfolded networks as described in claim 7, characterized in that, In step four, the loss function is constructed from the reconstruction consistency loss function and the transformation symmetry constraint loss function, and is expressed as follows: ;in, The consistency loss function for reconstruction is expressed as: ; The loss function for transformation symmetry constraints is expressed as: ; In the formula, The loss value output by the loss function. The reconstruction consistency loss value is the output of the reconstruction consistency loss function. These are weighting coefficients. The transformed symmetry constraint loss value is output by the transformed symmetry constraint loss function; A represents the total number of sample pairs in the training set, and B represents the total number of pixels in the image. To further expand the number of modules in the network, For high-resolution target images, High-resolution reconstructed images output by the deep unfolding network.

9. The sub-pixel localization method for infrared small target clusters based on physical structure priors and deep unfolded networks as described in claim 8, characterized in that, In step five, the method for mapping the output high-resolution reconstructed image location back to the low-resolution pixel coordinate system and evaluating the localization performance includes: S51, the response amplitude is filtered for the high-resolution reconstructed image positions corresponding to the output test set, and the positions with pixel values ​​lower than the preset pixel threshold are suppressed to zero, and the low response suppression result is output. S52, in the low response suppression results, search for local maxima points in the defined neighborhood as candidate target locations; S53 uses a sub-pixel coordinate mapping function to map the candidate target position back to the low-resolution pixel coordinate system, and uses the candidate target position in the low-resolution pixel coordinate system as the prediction point. S54. Set distance threshold groups. Under each distance threshold, determine the matching relationship based on the spatial distance between the predicted point and the real point. Count the number of true positives, false positives, and false negatives that are not matched in the prediction results. Calculate the corresponding precision and recall, and plot the precision-recall curve. Obtain the average precision under the distance threshold by calculating the area of ​​the curve. Average the average precision corresponding to all distance thresholds to obtain the mean average precision, and use the mean average precision as the localization performance evaluation index of the target network.

10. The sub-pixel localization method for infrared small target clusters based on physical structure priors and deep unfolded networks as described in claim 9, characterized in that, In S53, the sub-pixel coordinate mapping function is: ; In the formula, This refers to the position of the candidate target location in the low-resolution pixel coordinates, i.e., the predicted point. is the candidate target position in the sub-pixel grid coordinate system, and c is the preset super-resolution factor; In S54, the accuracy is calculated as follows: ; The recall rate is calculated as follows: ; In the formula, For accuracy, For recall, TP is the number of predicted points recorded as true positives, representing the number of correctly detected targets; FP is the number of predicted points recorded as false positives, representing the number of falsely detected targets; and FN is the number of predicted points recorded as false negatives, representing the number of missed targets. The method for calculating the mean precision is as follows: ; In the formula, Here, m represents the average precision, m is the total number of distance thresholds in the distance threshold group, and n is the distance threshold index. The average accuracy calculated for the nth distance threshold. Distance threshold This is the nth distance threshold.