Radar image aircraft target detection method based on diffusion model
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- NAT UNIV OF DEFENSE TECH
- Filing Date
- 2024-01-30
- Publication Date
- 2026-06-30
Smart Images

Figure CN117788956B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of aircraft target detection technology, and in particular to a radar image aircraft target detection method based on a diffusion model. Background Technology
[0002] Synthetic Aperture Radar (SAR) is an active microwave remote sensing imaging radar with all-weather, 24 / 7 image acquisition capabilities, and has been widely used in topographic mapping, disaster risk monitoring, and ecology. SAR target detection is a crucial foundation for SAR image interpretation. In particular, the detection of aircraft targets in SAR images can effectively assist in dynamic surveillance of key areas, situational analysis, and emergency rescue, and has significant application value for aviation safety and national defense. Traditional SAR detection algorithms are mostly based on target scattering characteristics and texture features. A commonly used algorithm is Constant False Alarm Rate (CFAR), which is based on the difference between the electromagnetic backscattering characteristics of the target and the clutter background. It models the statistical distribution of local background clutter and then sets an appropriate false alarm rate to detect targets. In addition, some researchers have proposed aircraft target detection algorithms based on gradient texture saliency. However, these algorithms are computationally cumbersome and struggle to handle complex aircraft scenes. Especially when the radar backscattering intensity of man-made targets such as jet bridges and aprons is higher than that of aircraft targets, the texture and structure of aircraft targets become less prominent, leading to false detections.
[0003] However, deep learning-based methods have made significant progress in aircraft target detection and recognition in SAR images. For example, rapid aircraft target detection in complex, large-scale scenes has been achieved based on airport localization and masking techniques. To better match the multi-scale nature of aircraft targets, a SAR image aircraft target detection method based on a salient location regression network has been designed, which can accurately fit the size differences of aircraft targets. Other methods focus on designing various efficient multi-scale feature fusion techniques to promote cross-scale information interaction and achieve complementary advantages between high-level semantic information and low-level location information. There are also target detection techniques based on scattering features and structural information, which enhance the scattering topology of aircraft targets. To address the low signal-to-noise ratio problem in SAR image aircraft target detection, a combination of Swin-Transformer and CNN is used to achieve feature enhancement. However, most of these methods require high image quality and detailed prior information on components, failing to fully utilize the implicit spatial structure and semantic relationships in the target scattering information. Furthermore, most algorithms are limited to the detection of the broad category of aircraft targets, lacking further fine-grained recognition.
[0004] Compared to other targets such as vehicles and ships, the complex structure and scattering mechanism of aircraft targets pose significant challenges to accurate detection. First, aircraft scattering features are discrete and discontinuous, with incomplete structures and weak correlations between components, making complete detection difficult. Second, aircraft targets vary considerably in scale, and using fixed bounding boxes and receptive fields reduces detection accuracy. Third, in real-world scenarios, buildings, vehicles, and metal structures around aircraft within airports easily cause strong scattering similar to aircraft targets, making accurate target location and identification difficult. Fourth, due to the sensitivity of imaging azimuth, even aircraft of the same model exhibit different geometric contours and scattering centers in SAR images, resulting in significant intra-class variations and increasing the difficulty of aircraft identification. Summary of the Invention
[0005] Therefore, it is necessary to provide a radar image aircraft target detection method based on a diffusion model that can improve the accuracy of aircraft identification, addressing the aforementioned technical problems.
[0006] A radar image aircraft target detection method based on a diffusion model, the method comprising:
[0007] Acquire SAR images to be detected; construct an aircraft target detection model; the aircraft target detection model includes a backbone network, a scattering feature enhancement module, a diffusion model, and a detection decoder;
[0008] The backbone network is used to extract features from the SAR image to be detected, resulting in a feature map; the scattering feature enhancement module is used to enhance the semantic features of the feature map, resulting in an enhanced feature map.
[0009] The feature map and the enhanced feature map are detected separately according to the diffusion model. Bounding boxes of the feature map and the enhanced feature map are set in the forward diffusion process. Gaussian noise with variance scheduling control is added to the boundary boxes in each diffusion step to obtain noise boxes. In the backward learning process, the noise boxes are trained using a pre-set noise loss function to obtain trained noise boxes. Target features are cropped from the feature map and the enhanced feature map based on the trained noise boxes. The target features are input into the detection decoder to obtain noise-free prediction boxes and aircraft target categories.
[0010] In one embodiment, a scattering feature enhancement module is used to perform semantic feature enhancement on the feature map to obtain an enhanced feature map, including:
[0011] The feature map is semantically enhanced by using the scattering feature enhancement module. After padding the outermost layer of the feature matrix corresponding to the feature map with zeros, multiple pixel pairs are created in the angular direction of the local feature map. The enhanced feature map is obtained by multiplying and summing the pairs element-wise with the pre-set convolution kernel weights.
[0012] Clutter suppression is applied to the enhanced feature map to obtain a clutter-suppressed feature map; the enhanced feature map and the clutter-suppressed feature map are then fused to obtain an enhanced feature map.
[0013] In one embodiment, after padding the outermost layer of the feature matrix corresponding to the feature map with zeros, multiple pixel pairs are created in the angular direction of the local feature map. These pairs are then multiplied element-wise with pre-set convolutional kernel weights to obtain the enhanced feature map, including:
[0014] After padding the outermost layer of the feature matrix corresponding to the feature map with zeros, multiple pixel pairs are created along the angular direction of the local feature map. These pairs are then summed element-wise with pre-set convolutional kernel weights to obtain the enhanced feature map.
[0015]
[0016] Where, x i and x i ′ represents the input pixel, w i These are the weights of a k×k convolution kernel. It is the pixel pair geometry selected from the current region, and m≤k×k.
[0017] In one embodiment, the enhanced feature map and the clutter-suppressed feature map are fused to obtain the enhanced feature map, including:
[0018] The enhanced feature map and the clutter-suppressed feature map are fused to obtain the enhanced feature map.
[0019] y′=y+FE(y)
[0020] Where y represents the feature map after feature enhancement, and FE() represents the clutter suppression operation.
[0021] In one embodiment, setting the bounding boxes of the feature map and the enhanced feature map during the forward diffusion process includes:
[0022] Set the bounding boxes of the feature map and the enhanced feature map during the forward diffusion process as follows:
[0023]
[0024] Where i represents the bounding box number, These are the center coordinates of the bounding box, (w i ,h i ) represents the width and height of the bounding box, and x is the input feature map.
[0025] In one embodiment, variance-controlled Gaussian noise is added to the phase boundary box during each diffusion step to obtain a noise box, including:
[0026] In each diffusion step, Gaussian noise with variance scheduling control is added to the phase boundary box, resulting in the noise box as follows:
[0027]
[0028] Where q represents the diffusion process, z o =b represents the bounding box, t represents the diffusion step number, z t Represents the noise box. This represents the Gaussian noise of the predefined variance scheduling control.
[0029] In one embodiment, the pre-set noise loss function is:
[0030]
[0031] Among them, z o =b represents the bounding box, t represents the diffusion step number, z t Represents the noise box.
[0032] The aforementioned radar image aircraft target detection method based on the diffusion model, in this application, constructs an aircraft target detection model, uses a backbone network to extract features from the SAR image to be detected to obtain a feature map, and uses a scattering feature enhancement module to enhance the semantic features of the feature map, thereby reducing background scattering intensity to suppress clutter and increasing target scattering intensity, thus improving detection accuracy. Then, detection is performed on the feature map and the enhanced feature map according to the diffusion model. Bounding boxes of the feature map and the enhanced feature map are set in the forward diffusion process. Gaussian noise with variance scheduling control is added to the boundary boxes in each diffusion step to obtain noise boxes. In the backward learning process, the noise boxes are trained using a pre-set noise loss function to obtain trained noise boxes. Target features are cropped from the feature map and the enhanced feature map based on the trained noise boxes. The target features are input into the detection decoder to obtain noise-free prediction boxes and aircraft target categories. Compared with traditional fixed boxes, the noise random boxes in this application are not limited to the size and aspect ratio of the boxes, and can better adapt to the characteristics of aircraft scale changes, improving the adaptability and robustness of the algorithm. It does not require prior candidate boxes and specific detection box ratios, and can alleviate the detection box offset and missed detection problems caused by large differences in aircraft scale and manual setting deviations. Attached Figure Description
[0033] Figure 1 This is a flowchart illustrating a radar image aircraft target detection method based on a diffusion model in one embodiment.
[0034] Figure 2 This is a framework diagram of an aircraft target detection model in one embodiment;
[0035] Figure 3 This is a schematic diagram of scattering feature enhancement in one embodiment. Detailed Implementation
[0036] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.
[0037] In one embodiment, such as Figure 1 As shown, a radar image aircraft target detection method based on a diffusion model is provided, including the following steps:
[0038] Step 102: Obtain the SAR image to be detected; construct an aircraft target detection model; the aircraft target detection model includes a backbone network, a scattering feature enhancement module, a diffusion model, and a detection decoder.
[0039] The framework for building an aircraft target detection model is as follows: Figure 2 .
[0040] Step 104: The backbone network is used to extract features from the SAR image to be detected, resulting in a feature map. The scattering feature enhancement module is then used to enhance the semantic features of the feature map, resulting in an enhanced feature map. First, the backbone network extracts features from the SAR image to be detected. However, the acquired SAR image set contains severe background interference, causing the target to be obscured. To effectively suppress background clutter, this application innovatively utilizes target neighborhood information for clutter suppression. This application designs a scattering feature enhancement module to enhance the semantic features of the feature map, thereby reducing background scattering intensity to suppress clutter and improve target scattering intensity. Specifically, to capture rich gradient information, the convolutional kernel size is 3×3 with a stride of 1. To alleviate the excessive sensitivity of the convolutional layer to position, a pooling function is used, replacing the network's output at a certain position with the overall statistical features of the adjacent outputs. The pooling layer is set to 1 to increase feature translation invariance. A layer of zeros is added to the outermost padding of the feature matrix corresponding to the feature map, so that the convolutional kernel slides precisely to the edge.
[0041] padding = (k-1) / 2
[0042] This ensures that the input and output maintain a consistent spatial dimension. For example... Figure 3As shown, multiple pixel pairs are then created along the angular direction of the local feature map. These pairs are then multiplied element-wise with the kernel weights and summed. The resulting interpolations are then convolved with the kernel to generate the values in the output feature map. Since the convolution kernel has a higher internal efficiency with closely related pixel encodings during network training, it can create higher activation responses. Therefore, the target scattering characteristics are enhanced in the output features, while background clutter intensity is suppressed. To further enhance target saliency, a feature fusion module is designed to fuse the enhanced feature map with the clutter-suppressed feature map to improve target scattering intensity. The scattering feature enhancement module can better highlight the target and reduce background interference, thereby improving detection accuracy.
[0043] Step 106: Detect the feature map and the enhanced feature map according to the diffusion model. Set the bounding boxes of the feature map and the enhanced feature map in the forward diffusion process. Add Gaussian noise with variance scheduling control to the bounding boxes in each diffusion step to obtain noise boxes. Train the noise boxes using a pre-set noise loss function in the backward learning process to obtain trained noise boxes. Crop the target features from the feature map and the enhanced feature map based on the trained noise boxes. Input the target features into the detection decoder to obtain noise-free prediction boxes and aircraft target categories.
[0044] Detection is performed on both the feature map and the enhanced feature map using a diffusion model. Aircraft target detection is described as a task of generating bounding boxes (center coordinates) and their sizes (width and height) in the image space. The learning objective for aircraft target detection is the input target pair (x, b, c), where x is the input image, and b and c are a set of bounding boxes and class labels for objects in image x, respectively. Bounding boxes are defined for the feature map and the enhanced feature map during the forward diffusion process, and the i-th box is represented as... in These are the center coordinates of the bounding box, (w i ,h i The width and height of the bounding box are respectively. During each diffusion step, Gaussian noise controlled by variance scheduling is added to the bounding box to obtain a noise box. In the backward learning process, the noise box is trained using a pre-set noise loss function to obtain a trained noise box. Compared to traditional fixed boxes, the noise random box in this application is not limited by the size and aspect ratio of the box, and can better adapt to the characteristics of aircraft scale variations, improving the algorithm's adaptability and robustness. It does not require prior candidate boxes or specific detection box ratios, and can alleviate the detection box offset and missed detection problems caused by large differences in aircraft scale and manual setting deviations. The target features are input into the detection decoder, which is trained to predict noise-free ground truth boxes, resulting in noise-free predicted boxes and the aircraft target category.
[0045] In the aforementioned radar image aircraft target detection method based on the diffusion model, this application constructs an aircraft target detection model, uses a backbone network to extract features from the SAR image to be detected, and obtains a feature map; it uses a scattering feature enhancement module to enhance the semantic features of the feature map, thereby reducing background scattering intensity to suppress clutter, increasing target scattering intensity, and thus improving detection accuracy. Then, detection is performed on the feature map and the enhanced feature map according to the diffusion model. Bounding boxes of the feature map and the enhanced feature map are set in the forward diffusion process. Gaussian noise with variance scheduling control is added to the boundary boxes in each diffusion step to obtain noise boxes. In the backward learning process, the noise boxes are trained using a pre-set noise loss function to obtain trained noise boxes. Target features are cropped from the feature map and the enhanced feature map based on the trained noise boxes. The target features are input into the detection decoder to obtain noise-free prediction boxes and aircraft target categories. Compared with traditional fixed boxes, the noise random boxes in this application are not limited to the size and aspect ratio of the boxes, and can better adapt to the characteristics of aircraft scale changes, improving the adaptability and robustness of the algorithm. It does not require prior candidate boxes and specific detection box ratios, and can alleviate the detection box offset and missed detection problems caused by large differences in aircraft scale and manual setting deviations.
[0046] In one embodiment, a scattering feature enhancement module is used to perform semantic feature enhancement on the feature map to obtain an enhanced feature map, including:
[0047] The feature map is semantically enhanced by using the scattering feature enhancement module. After padding the outermost layer of the feature matrix corresponding to the feature map with zeros, multiple pixel pairs are created in the angular direction of the local feature map. The enhanced feature map is obtained by multiplying and summing the pairs element-wise with the pre-set convolution kernel weights.
[0048] Clutter suppression is applied to the enhanced feature map to obtain a clutter-suppressed feature map; the enhanced feature map and the clutter-suppressed feature map are then fused to obtain an enhanced feature map.
[0049] In one embodiment, after padding the outermost layer of the feature matrix corresponding to the feature map with zeros, multiple pixel pairs are created in the angular direction of the local feature map. These pairs are then multiplied element-wise with pre-set convolutional kernel weights to obtain the enhanced feature map, including:
[0050] After padding the outermost layer of the feature matrix corresponding to the feature map with zeros, multiple pixel pairs are created along the angular direction of the local feature map. These pairs are then summed element-wise with pre-set convolutional kernel weights to obtain the enhanced feature map.
[0051]
[0052] Where, xi and x′ i It is the input pixels, w i These are the weights of a k×k convolution kernel. It is the pixel pair geometry selected from the current region, and m≤k×k.
[0053] In one embodiment, the enhanced feature map and the clutter-suppressed feature map are fused to obtain the enhanced feature map, including:
[0054] The enhanced feature map and the clutter-suppressed feature map are fused to obtain the enhanced feature map.
[0055] y′=y+FE(y)
[0056] Where y represents the feature map after feature enhancement, and FE() represents the clutter suppression operation.
[0057] In one embodiment, setting the bounding boxes of the feature map and the enhanced feature map during the forward diffusion process includes:
[0058] Set the bounding boxes of the feature map and the enhanced feature map during the forward diffusion process as follows:
[0059]
[0060] Where i represents the bounding box number, These are the center coordinates of the bounding box, (w i ,h i ) represents the width and height of the bounding box, and x is the input feature map.
[0061] In one embodiment, variance-controlled Gaussian noise is added to the phase boundary box during each diffusion step to obtain a noise box, including:
[0062] In each diffusion step, Gaussian noise with variance scheduling control is added to the phase boundary box, resulting in the noise box as follows:
[0063]
[0064] Where q represents the diffusion process, z o =b represents the bounding box, t represents the diffusion step number, z t Represents the noise box. This represents the Gaussian noise of the predefined variance scheduling control.
[0065] In one embodiment, the pre-set noise loss function is:
[0066]
[0067] Among them, z o =b represents the bounding box, t represents the diffusion step number, z t Represents the noise box.
[0068] In practice, It is the bounding box of the target value z o With network estimate f θ (z t Minimize the sum of squares of the differences between (t) and (t). Square the error (which amplifies the error significantly if it is greater than 1), and the model's error will be much larger than that of the 1 norm. The model will be more sensitive to this type of sample, more robust, and can improve the accuracy of cropping the target features.
[0069] It should be understood that, although Figure 1 The steps in the flowchart are shown sequentially as indicated by the arrows, but these steps are not necessarily executed in the order indicated by the arrows. Unless otherwise specified herein, there is no strict order in which these steps are executed, and they can be performed in other orders. Figure 1 At least some of the steps in the process may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily completed at the same time, but can be executed at different times. The execution order of these sub-steps or stages is not necessarily sequential, but can be executed in turn or alternately with other steps or at least some of the sub-steps or stages of other steps.
[0070] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.
[0071] The embodiments described above are merely illustrative of several implementation methods of this application, and while the descriptions are relatively specific and detailed, they should not be construed as limiting the scope of the invention patent. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the protection scope of this application. Therefore, the protection scope of this patent application should be determined by the appended claims.
Claims
1. A radar image aircraft target detection method based on a diffusion model, characterized in that, The method includes: Acquire SAR images to be detected; construct an aircraft target detection model; the aircraft target detection model includes a backbone network, a scattering feature enhancement module, a diffusion model, and a detection decoder; The backbone network is used to extract features from the SAR image to be detected to obtain a feature map; the scattering feature enhancement module is used to enhance the semantic features of the feature map to obtain an enhanced feature map. The feature map and the enhanced feature map are detected according to the diffusion model. Bounding boxes of the feature map and the enhanced feature map are set in the forward diffusion process. Gaussian noise with variance scheduling control is added to the bounding boxes in each diffusion step to obtain noise boxes. In the backward learning process, the noise boxes are trained using a pre-set noise loss function to obtain trained noise boxes. Target features are cropped from the feature map and the enhanced feature map based on the trained noise boxes. The target features are input into the detection decoder to obtain noise-free prediction boxes and aircraft target categories. The feature map is semantically enhanced using a scattering feature enhancement module to obtain an enhanced feature map, including: The feature map is semantically enhanced using a scattering feature enhancement module. After padding the outermost layer of the feature matrix corresponding to the feature map with zeros, multiple pixel pairs are created in the angular direction of the local feature map. The enhanced feature map is obtained by multiplying and summing the pairs element-wise with the pre-set convolution kernel weights. Clutter suppression is applied to the enhanced feature map to obtain a clutter-suppressed feature map; the enhanced feature map and the clutter-suppressed feature map are then fused to obtain an enhanced feature map.
2. The method according to claim 1, characterized in that, After padding the outermost layer of the feature matrix corresponding to the feature map with zeros, multiple pixel pairs are created in the angular direction of the local feature map. These pairs are then multiplied element-wise with pre-set convolutional kernel weights and summed to obtain the enhanced feature map, including: After padding the outermost layer of the feature matrix corresponding to the feature map with zeros, multiple pixel pairs are created in the angular direction of the local feature map. These pairs are then summed element-wise with pre-set convolutional kernel weights to obtain the enhanced feature map. in, and These are the input pixels. It is the size of The weights of the convolution kernel, It is the geometry of pixel pairs selected from the current region, and .
3. The method according to claim 1, characterized in that, The enhanced feature map and the clutter-suppressed feature map are fused to obtain the enhanced feature map, including: The enhanced feature map and the clutter-suppressed feature map are fused to obtain the enhanced feature map. in, This represents the feature map after feature enhancement. This indicates clutter suppression operation.
4. The method according to claim 1, characterized in that, Setting the bounding boxes of the feature map and the enhanced feature map during the forward diffusion process includes: The bounding boxes of the feature map and the enhanced feature map are set as follows during the forward diffusion process: in, Indicates the bounding box number. These are the center coordinates of the bounding box. This indicates the width and height of the bounding box. The input feature map.
5. The method according to claim 1, characterized in that, In each diffusion step, Gaussian noise with variance scheduling control is added to the bounding box to obtain a noise box, including: In each diffusion step, Gaussian noise with variance scheduling control is added to the bounding box to obtain the noise box. in, Indicates the diffusion process. Represents the bounding box. Indicates the sequence number of the diffusion step. Represents the noise box. This represents the Gaussian noise of the predefined variance scheduling control.
6. The method according to claim 1, characterized in that, The pre-set noise loss function is: in, Represents the bounding box. Indicates the sequence number of the diffusion step. Represents the noise box.