A SAR image target recognition method based on saliency map guidance
By generating target saliency maps and fusing deep and shallow features, combined with multi-layer dilated convolution and fully connected layers, the problems of insufficient accuracy and noise interference in SAR image target recognition are solved, achieving efficient and robust target recognition results.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BEIJING INST OF TECH
- Filing Date
- 2024-04-30
- Publication Date
- 2026-06-26
AI Technical Summary
Existing SAR image target recognition methods suffer from insufficient recognition accuracy when faced with the problem of large differences within target classes and small differences between target classes. Furthermore, they are subject to severe background noise interference, making it difficult to meet practical needs.
By generating a target saliency map, deep and shallow features are extracted and fused. Multi-layer dilated convolution is used to refine the features, and fully connected layers are combined for type recognition to avoid background noise interference and improve feature consistency and recognition accuracy.
It effectively improves the accuracy and efficiency of SAR image target recognition, enhances the intra-class consistency of target features, reduces background noise interference, and achieves high-precision target recognition.
Smart Images

Figure CN118397473B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of synthetic aperture radar target detection technology, and in particular to a SAR image target recognition method based on saliency map guidance. Background Technology
[0002] Synthetic Aperture Radar (SAR) can conduct Earth observation around the clock and in all weather conditions, playing an important role in urban planning, disaster assessment, and military operations.
[0003] SAR image target recognition involves classifying target slices in SAR images according to their categories. With the continuous development of deep learning methods, convolutional neural networks have been widely applied in image processing. Due to the unique imaging mechanism of SAR, in addition to imaging specific targets, speckle noise interference exists in the background, posing a significant challenge to SAR image interpretation. Furthermore, SAR image target recognition suffers from the problem of large intra-class differences and small inter-class differences. As the demand for SAR image target recognition continues to increase, current recognition accuracy is insufficient to meet practical needs.
[0004] Therefore, there is an urgent need for a SAR image target recognition method that can improve the intra-class consistency of target features and achieve robust high-precision SAR image target recognition. Summary of the Invention
[0005] Therefore, it is necessary to provide a SAR image target recognition method based on saliency map guidance to address the above-mentioned technical problems.
[0006] A SAR image target recognition method guided by saliency maps includes the following steps: generating a corresponding target saliency map based on the original SAR image, wherein the target saliency map contains key target information; extracting deep and shallow features from the original SAR image based on the target saliency map, and guiding the fusion of the deep and shallow features; refining the fused image features using multi-layer dilated convolution to obtain target features; and performing type recognition on the target features through a fully connected layer to obtain the target recognition result.
[0007] In one embodiment, generating a corresponding target saliency map based on the original SAR image includes: obtaining regions containing target information in the original SAR image based on semantic segmentation, marking the target region and the background region; using morphological closure operation to find the maximum contour of the target region as the target contour, creating a mask based on the target contour, and extracting the target saliency map from the original SAR image through the mask.
[0008] In one embodiment, obtaining regions containing target information in the original SAR image based on semantic segmentation and marking target regions and background regions includes: calculating an intensity histogram of the original SAR image using the following formula:
[0009]
[0010] In the formula, I represents different pixel intensities, and n i This represents the number of pixels with a pixel value of i.
[0011] Normalizing the intensity histogram yields the cumulative intensity distribution function of the image, as shown in the formula:
[0012]
[0013] In the formula, MN represents the total number of pixels in the image; the partitioning threshold T is determined based on the cumulative intensity distribution function, using the following formula:
[0014]
[0015] In the formula, The inter-class variance at the threshold t is:
[0016]
[0017] In the formula, ω1(t) and ω2(t) are the normalized weights of the segmented target region and background region, and μ1(t) and μ2(t) are the average gray values of the segmented target region and background region. The original SAR image is converted into a binary image according to the segmentation threshold, wherein the target region is marked as the foreground and the background region is marked as the background.
[0018] In one embodiment, the step of using morphological closure operation to find the maximum contour of the target region as the target contour, and creating a mask based on the target contour, and extracting the target saliency map from the original SAR image using the mask, includes: performing a morphological closure operation on the binary image using a combination of dilation and erosion operations, with the following formula:
[0019] B closed =(B∪B) kernel )∩B kernel
[0020] In the formula, ∪ represents the expansion operation, ∩ represents the erosion operation, and B kernel It is the kernel of morphological operations; using image processing technology, it finds the contours in the binary image, takes the largest contour as the target contour, and the largest contour corresponds to the image target; it uses the target contour to create a mask, and the mask is used to extract the corresponding target region from the original SAR image as the target saliency map.
[0021] In one embodiment, the step of extracting deep and shallow features from the original SAR image based on the target saliency map and guiding the fusion of the deep and shallow features includes: extracting deep and shallow features from the original SAR image based on the target saliency map; denoted as S for the target saliency map and F for the shallow features. shallow The target saliency map is downsampled to the same size as the shallow features to obtain the adjusted saliency map S', i.e.:
[0022] S' = down(S)
[0023] The adjusted saliency map is multiplied pixel-by-pixel with the shallow features to obtain the corrected feature map F. s ' hallow ,Right now:
[0024] F s ' hallow =F shallow ·S'
[0025] The shallow features are added pixel by pixel to the adjusted shallow features to obtain the final shallow features, i.e.:
[0026] F shallow-final =F shallow +F s ' hallow
[0027] The final deep features are obtained using the same method, as follows:
[0028] F deep-final =F deep +F deep ·(down(S))
[0029] The final shallow features and the final deep features are then fused to obtain the fused image features, as follows:
[0030] F fusion =concat(F deep-final ,(down(F shallow-final )))
[0031] In the formula, concat(·) refers to a cascading operation, which is to perform feature downsampling first and then feature fusion.
[0032] In one embodiment, the step of refining the fused image features using multi-layer dilated convolution to obtain target features, and then performing type recognition on the target features through a fully connected layer to obtain a target recognition result, includes: refining the fused image features using dilated convolution, and then refining the fused image features using 1×1 convolution to obtain target features, represented as:
[0033] F final =conv 1×1 [concat(conv-atrous 1×1 (F fusion ),
[0034] conv-atrous 2×2 (F fusion ),
[0035] conv-atrous 3×3 (F fusion ))]
[0036] The target features are passed through a fully connected layer to obtain the target recognition result, which is:
[0037] C = FC(F) final )
[0038] In the formula, C represents the identified target category, and FC represents the fully connected layer.
[0039] Compared with existing technologies, the advantages and beneficial effects of this invention are as follows: A corresponding target saliency map is generated from the original SAR image, containing key target information. Deep and shallow features of the original SAR image are extracted based on the target saliency map, and the deep and shallow features are fused. Shallow features can help improve the contextual information of the target slice, and the target saliency map can guide the network to learn key features through the inherent structural features of the target, improving the intra-class consistency of target features. Multi-layer dilated convolution is used to refine the fused image features to obtain target features. Fully connected layers are used to perform type recognition on the target features to obtain the target recognition result. Convolutions with different dilation rates are used to obtain features with different receptive fields, effectively modeling size changes related to target texture, avoiding noise interference in the background image, and efficiently and robustly mining the basic features of the target, improving the accuracy and efficiency of SAR image target recognition to meet the accuracy requirements of SAR image recognition. Attached Figure Description
[0040] Figure 1 This is a flowchart illustrating a saliency map-guided SAR image target recognition method in one embodiment.
[0041] Figure 2This is a flowchart of a saliency map-guided SAR image target recognition method in one embodiment;
[0042] Figure 3 This is a schematic diagram of image deep and shallow feature fusion guided by target saliency map in one embodiment. Detailed Implementation
[0043] Before describing the specific embodiments of the present invention, the overall concept of the present invention will be explained as follows:
[0044] This invention is mainly about the development of SAR image target recognition process, as the current recognition accuracy of SAR images cannot meet practical needs.
[0045] Therefore, this invention proposes a SAR image target recognition method based on saliency map guidance. A corresponding target saliency map is generated from the original SAR image, containing key target information. Deep and shallow features of the original SAR image are extracted based on the target saliency map, and the deep and shallow features are fused. Shallow features help to improve the contextual information of the target slice, and the target saliency map guides the network to learn key features through the inherent structural features of the target, improving the intra-class consistency of target features. Multi-layer dilated convolution is used to refine the fused image features to obtain target features. Fully connected layers are used to perform type recognition on the target features to obtain the target recognition result. Convolutions with different dilation rates are used to obtain features with different receptive fields, effectively modeling size changes related to target texture, avoiding noise interference in the background image, and efficiently and robustly mining the basic features of the target, improving the accuracy and efficiency of SAR image target recognition to meet the accuracy requirements of SAR image recognition.
[0046] Having introduced the overall concept of the present invention, to make the objectives, technical solutions, and advantages of the present invention clearer, the present invention will be further described in detail below through specific embodiments in conjunction with the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the present invention and are not intended to limit the present invention.
[0047] In one embodiment, such as Figure 1 and Figure 2 As shown, a saliency map-guided SAR image target recognition method is provided, including the following steps:
[0048] Step S101: Generate a corresponding target saliency map based on the original SAR image. The target saliency map contains key target information.
[0049] Specifically, the original SAR image contains target slices. Since the background of the original SAR image contains some speckle noise, it interferes with the target recognition of the SAR image. Therefore, by extracting the target saliency map of the target area in the original SAR image, the target feature map contains key target information, which can guide the network to learn key features through the inherent structural features of the target, improve the intra-class consistency of target features, and further expand the inter-class distinguishability.
[0050] Step S101 includes: obtaining regions containing target information in the original SAR image based on semantic segmentation, and marking the target region and background region; using morphological closure operation to find the maximum contour of the target region as the target contour, and creating a mask based on the target contour, and extracting the target saliency map from the original SAR image through the mask.
[0051] Specifically, to address speckle noise in the original SAR image, semantic segmentation can be employed to divide key regions containing target information into target and background regions. This allows the network to efficiently extract image features based on the target region, avoiding the influence of noise. By employing morphological closure operations to find the contour of the obtained target region, the maximum contour is obtained as the target contour. A mask is then created based on the target contour, and the target saliency map can be extracted from the original SAR image using this mask. This allows for feature fusion based on the target saliency map, improving the intra-class consistency of extracted features.
[0052] The specific steps for marking the target and background regions are as follows: Calculate the intensity histogram of the original SAR image using the following formula:
[0053]
[0054] In the formula, I represents the intensity of different pixels, and n i The number of pixels with pixel value i represents the total number of pixels. The intensity histogram is normalized to obtain the cumulative intensity distribution function of the image, as shown in the formula:
[0055]
[0056] In the formula, MN represents the total number of pixels in the image; the partitioning threshold T is determined based on the cumulative intensity distribution function, and the formula is as follows:
[0057]
[0058] In the formula, The inter-class variance at threshold t is:
[0059]
[0060] In the formula, ω1(t) and ω2(t) are the normalized weights of the segmented target region and background region, and μ1(t) and μ2(t) are the average gray values of the segmented target region and background region. The original SAR image is converted into a binary image according to the segmentation threshold, where the target region is marked as the foreground and the background region is marked as the background.
[0061] Specifically, by calculating the intensity histogram of the SAR image, the overall pixel intensity distribution of the image is understood. The cumulative intensity distribution function of the image is obtained by normalization, and a segmentation threshold is determined based on the cumulative intensity distribution function. The original SAR image is converted into a binary image by the segmentation threshold, in which the target area is marked as the foreground (e.g., white) and the background area is marked as the background (e.g., black), thereby realizing the segmentation of the target and the background and avoiding the influence of noise in the background area on target recognition.
[0062] The specific steps for extracting the target saliency map are as follows: A morphological closure operation is performed on the binary image using a combination of dilation and erosion operations, with the following formula:
[0063] B closed =(B∪B) kernel )∩B kernel
[0064] In the formula, ∪ represents the expansion operation, ∩ represents the erosion operation, and B kernel It is the kernel of morphological operations; it uses image processing techniques to find contours in the binary image, takes the largest contour as the target contour, and the largest contour corresponds to the image target; it uses the target contour to create a mask, which is used to extract the corresponding target region from the original SAR image as the target saliency map.
[0065] Specifically, morphological closure operations are used to process the binary image, filling in small holes and breaks within the target to make the target more complete. The morphological closure operation is a combination of dilation and erosion operations. Then, image processing techniques, such as connected component analysis, are used to find the target contour in the binary image. The contour is usually the boundary of the target and can be used to describe the shape and size of the target. Among all detected contours, the largest contour is taken as the target contour, which corresponds to the target in the image. A mask is created using the target contour, which can be used to extract the target region from the original SAR image to obtain the target saliency map.
[0066] Step S102: Extract deep and shallow features from the original SAR image based on the target saliency map, and guide the fusion of deep and shallow features.
[0067] Specifically, deep and shallow features are extracted from the original SAR image based on the target saliency map. Deep features are those with target structural characteristics, while shallow features are those reflecting the contour. Guided by the target saliency map, the deep and shallow features are fused, introducing shallow features with contour information. This helps to improve the contextual information of the target slice and guides the network to learn key features through the inherent structural features of the target, thereby improving the intra-class consistency of target features.
[0068] Step S102 includes: extracting deep features and shallow features from the original SAR image based on the target saliency map; denoted as S for the target saliency map and F for the shallow features. shallow By downsampling the target saliency map, a saliency map S' of the same size as the shallow features is obtained, i.e.:
[0069] S' = down(S)
[0070] The adjusted saliency map is multiplied pixel-by-pixel with the shallow features to obtain the corrected feature map F. s ' hallow ,Right now:
[0071] F s ' hallow =F shallow ·S'
[0072] The shallow features are added pixel by pixel to the adjusted shallow features to obtain the final shallow features, i.e.:
[0073] F shallow-final =F shallow +F s ' hallow
[0074] The final deep features are obtained using the same method, as follows:
[0075] F deep-final =F deep +F deep ·(down(S))
[0076] The final shallow features and the final deep features are then fused to obtain the fused image features, as follows:
[0077] F fusion =concat(F deep-final ,(down(F shallow-final )))
[0078] In the formula, concat(·) refers to a cascading operation, which is to perform feature downsampling first and then feature fusion.
[0079] Specifically, based on the target saliency map, after extracting deep and shallow features from the original SAR image, such as... Figure 3 As shown, deep and shallow feature fusion is performed. The input image is denoted as X, and the target saliency map is the same size as the input image. The target saliency map is downsampled to the same size as the shallow features to obtain an adjusted saliency map. This adjusted saliency map is then multiplied pixel-by-pixel with the shallow features to obtain a corrected feature map. This corrected feature map is then added pixel-by-pixel with the shallow features to obtain the final shallow features. The same method is used to process the deep features to obtain the final deep features. The final shallow features and the final deep features are then fused to obtain the fused image features. Guided by the target saliency map, the representational ability of deep and shallow features is improved, thereby achieving efficient fusion of deep and shallow features by combining upper and lower information.
[0080] Step S103: The fused image features are refined using multi-layer dilated convolution to obtain target features. The target features are then used for type recognition through a fully connected layer to obtain the target recognition result.
[0081] Specifically, multi-layer dilated convolution is used to refine the fused features to obtain target features, expand the receptive field, improve the information between the upper and lower layers of the target, and enhance the representation ability of key target features. Finally, the target features are type-identified through a fully connected layer to obtain the target recognition result, thereby achieving accurate identification of target slices.
[0082] Step S103 includes: thinning the fused image features using dilated convolution, and then thinning the fused image features using 1×1 convolution to obtain the target features, represented as:
[0083] F final =conv 1×1 [concat(conv-atrous 1×1 (F fusion ),
[0084] conv-atrous 2×2 (F fusion ),
[0085] conv-atrous 3×3 (F fusion ))]
[0086] The target features are passed through a fully connected layer to obtain the target recognition result, which is:
[0087] C = FC(F) final )
[0088] In the formula, C represents the identified target category, and FC represents the fully connected layer.
[0089] Specifically, dilated convolution is used to refine the fused image features. The dilation rate can be set to 1, 2, or 3. Then, 1×1 convolution is used to further refine the fused features to obtain target features, making the target features more stable. By using convolution with different dilation rates, features with different receptive fields are obtained, which can effectively model the size changes related to the target texture. Finally, a fully connected layer is used to output the target recognition result, obtain the target classification, and achieve accurate target recognition.
[0090] In this embodiment, a target saliency map is generated from the original SAR image, containing key target information. Deep and shallow features of the original SAR image are extracted based on the target saliency map, and these features are fused. Shallow features help refine the contextual information of the target slice, while the target saliency map guides the network to learn key features from the target's inherent structural features, improving intra-class consistency of target features. Multi-layer dilated convolutions are used to refine the fused image features, yielding target features. Fully connected layers are then used to perform type recognition on these target features, resulting in target recognition. Convolutions with different dilation rates acquire features with different receptive fields, effectively modeling size variations related to target texture, avoiding noise interference from the background image, and enabling efficient mining of basic target features, thus improving the accuracy and efficiency of SAR image target recognition.
[0091] It is obvious to those skilled in the art that the modules or steps of the present invention described above can be implemented using general-purpose computing devices. They can be centralized on a single computing device or distributed across a network of multiple computing devices. Optionally, they can be implemented using computer-executable program code, thereby storing them in a computer storage medium (ROM / RAM, magnetic disk, optical disk) for execution by the computing device. In some cases, the steps shown or described can be performed in a different order than those described herein, or they can be fabricated as separate integrated circuit modules, or multiple modules or steps can be fabricated as a single integrated circuit module. Therefore, the present invention is not limited to any particular hardware and software combination.
[0092] The above description, in conjunction with specific embodiments, provides a further detailed explanation of the present invention. It should not be construed that the specific implementation of the present invention is limited to these descriptions. For those skilled in the art, various simple deductions or substitutions can be made without departing from the concept of the present invention, and all such deductions or substitutions should be considered within the scope of protection of the present invention.
Claims
1. A SAR image target recognition method based on saliency map guidance, characterized in that, Includes the following steps: A target saliency map is generated based on the original SAR image. The target saliency map contains key target information, including: obtaining regions containing target information in the original SAR image based on semantic segmentation, marking target regions and background regions; using morphological closure operation to find the maximum contour of the target region as the target contour, and creating a mask based on the target contour; and extracting the target saliency map from the original SAR image through the mask. The step of marking the target area and the background area includes: calculating the intensity histogram of the original SAR image, using the formula: , In the formula, I represents different pixel intensities. This represents the number of pixels with a pixel value of i. Normalizing the intensity histogram yields the cumulative intensity distribution function of the image, as shown in the formula: , In the formula, MN is the total number of pixels in the image; The division threshold T is determined based on the cumulative intensity distribution function, using the following formula: , In the formula, The inter-class variance at the threshold t is: , In the formula, and Normalized weights for the segmented target and background regions. and It is the average gray value of the segmented target region and background region; The original SAR image is converted into a binary image according to the segmentation threshold, wherein the target region is marked as the foreground and the background region is marked as the background; Based on the target saliency map, deep and shallow features are extracted from the original SAR image, and the deep and shallow features are then fused. The fused image features are refined using multi-layer dilated convolution to obtain target features. The target features are then used for type recognition through a fully connected layer to obtain the target recognition result.
2. The SAR image target recognition method based on saliency map guidance according to claim 1, characterized in that, The process involves using morphological closure operations to find the maximum contour of the target region, which is then used as the target contour. A mask is created based on the target contour, and the target saliency map is extracted from the original SAR image using the mask. This includes: A morphological closure operation is performed on the binary image using a combination of dilation and erosion operations, as shown in the formula: , In the formula, This represents an expansion operation. Represents corrosion operation. It is the nucleus of morphological operations; the meaning of B. Image processing techniques are used to find contours in the binary image, and the largest contour is taken as the target contour, with the largest contour corresponding to the image target. A mask is created using the target contour, and the mask is used to extract the corresponding target region from the original SAR image as a target saliency map.
3. The SAR image target recognition method based on saliency map guidance according to claim 1, characterized in that, The step of extracting deep and shallow features from the original SAR image based on the target saliency map, and guiding the fusion of the deep and shallow features, includes: Based on the target saliency map, deep and shallow features are extracted from the original SAR image. Let the target saliency map be S, and the shallow features be... The target saliency map is downsampled to the same size as the shallow features to obtain the adjusted saliency map. ,Right now: , The adjusted saliency map is multiplied pixel by pixel with the shallow features to obtain the corrected feature map. ,Right now: , The shallow features are added pixel by pixel to the adjusted shallow features to obtain the final shallow features, i.e.: , The final deep features are obtained using the same method, as follows: , The final shallow features and the final deep features are then fused to obtain the fused image features, as follows: , In the formula, This refers to a cascaded operation, which involves first downsampling features and then fusing them.
4. The SAR image target recognition method based on saliency map guidance according to claim 3, characterized in that, The process involves refining the fused image features using multi-layer dilated convolution to obtain target features, and then performing type recognition on these target features through a fully connected layer to obtain the target recognition result, including: The image features after fusion are thinned using dilated convolution, and then further thinned using 1×1 convolution to obtain the target features, which are represented as follows: , The target features are passed through a fully connected layer to obtain the target recognition result, which is: , In the formula, C represents the identified target category, and FC represents the fully connected layer.