A SAR ship detection method, device, program and storage medium based on spatial constraint self-supervised learning and adaptive slice inference

By employing a spatially constrained self-supervised learning and adaptive slice inference approach, the problems of scarce labeled data, domain differences, and inference efficiency in SAR image ship target detection are solved, achieving efficient and robust SAR ship detection and improving detection performance and resource utilization.

CN122244701APending Publication Date: 2026-06-19HARBIN ENG UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HARBIN ENG UNIV
Filing Date
2026-03-05
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Current methods for ship target detection in SAR images face challenges such as scarce labeled data, domain differences, poor cross-domain transfer performance, difficulty in detecting small targets, and bottlenecks in inference efficiency. Traditional methods consume huge computational resources and cannot meet real-time requirements.

Method used

We employ a spatially constrained self-supervised learning and adaptive slice inference approach. By constructing a detection network for self-supervised comparative learning and pre-training with large-scale unlabeled SAR data, we combine pixel-level uncertainty heatmaps and adaptive slicing technology to accurately focus on the detection area, perform re-detection, and fuse the results.

🎯Benefits of technology

It improves detection performance and efficiency, reduces computing resource requirements, achieves efficient and robust SAR ship detection, is suitable for ordinary workstations, and improves detection recall and system throughput.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122244701A_ABST
    Figure CN122244701A_ABST
Patent Text Reader

Abstract

This invention discloses a SAR ship detection method, apparatus, program, and storage medium based on spatially constrained self-supervised learning and adaptive slice inference, belonging to the fields of remote sensing image processing and computer vision technology. It addresses the problems of scarce labeled data, high false negative rate for small targets, and large inference redundancy in SAR ship detection. The method includes a spatially constrained self-supervised contrastive learning pre-training method and a heatmap-driven adaptive slice-assisted inference mechanism. Positive and negative sample pairs are constructed using anchor point image patches, and unsupervised feature learning is performed using an improved multi-positive sample InfoNCE loss, reducing dependence on labeled data and achieving lightweight and efficient training. The HASI mechanism, during the inference stage, constructs a heatmap based on the initial full-image detection results and dynamically selects high-uncertainty regions for re-detection, effectively improving the recall rate of small targets while significantly reducing computational redundancy. This invention provides an efficient solution for lightweight, high-precision SAR ship detection, with excellent deployment flexibility.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of remote sensing image processing and computer vision technology, and in particular to a SAR ship detection method, apparatus, program and storage medium based on spatially constrained self-supervised learning and adaptive slice inference. Background Technology

[0002] Synthetic Aperture Radar (SAR) possesses irreplaceable value in marine monitoring due to its all-weather, all-day imaging capabilities. However, ship target detection in SAR images faces multiple technical bottlenecks. The primary challenge is the scarcity of labeled data; existing publicly available labeled datasets (such as SSDD) are limited in size (containing only 1160 images), making fully supervised learning models prone to overfitting and limiting their generalization performance. Secondly, significant domain differences restrict model transfer performance. A common approach is to use models pre-trained on optical images for initialization, but the speckle noise, weak contrast targets, and unique backscattering mechanism inherent in SAR images result in poor cross-domain transfer model performance. Furthermore, small target detection is particularly difficult; ships in SAR images often appear as tiny, sparse, and low-contrast pixel regions, making traditional single-shot full-image inference detectors prone to missed detections in complex sea conditions or densely populated port areas. Finally, inference efficiency is a bottleneck. To improve recall, existing slice-assisted inference methods (such as SAHI) employ a uniform overlapping slice strategy. While this can amplify local details, it leads to the repeated calculation of a large number of empty regions or simple backgrounds, introducing significant redundant overhead and making it difficult to meet the real-time requirements of large-scale scenarios. Although self-supervised learning (SSL) provides a new approach to alleviate label dependence, mainstream methods still have limitations: masked autoencoder (MAE) methods tend to restore the dominant background structure during reconstruction, which may result in the "flattening" of weak ship signals; while contrastive learning methods (such as SimCLR and MoCo) usually rely on ultra-large batches or complex momentum queue mechanisms, consuming huge computational resources and hindering lightweight deployment. Therefore, the industry urgently needs a comprehensive solution that can efficiently utilize massive amounts of unlabeled SAR data for feature pre-training and intelligently allocate computing power and accurately focus on difficult regions during the inference stage. Summary of the Invention

[0003] The purpose of this invention is to propose a SAR ship detection method, device, program, and storage medium based on spatially constrained self-supervised learning and adaptive slice reasoning, so as to overcome the shortcomings of the prior art and provide an efficient, robust, and easy-to-use SAR ship detection method.

[0004] This invention proposes a SAR ship detection method, device, program, and storage medium based on spatially constrained self-supervised learning and adaptive slice inference, the core of which includes the following technical solutions:

[0005] A SAR ship detection method based on spatially constrained self-supervised learning and adaptive slice inference includes the following steps:

[0006] Step 1: Obtain a large-scale unlabeled SAR image dataset and the SAR image to be detected; construct a detection network and train the detection network using spatially constrained self-supervised contrastive learning to obtain a trained detection network with excellent spatial continuity and discriminativeness.

[0007] Step 2: Input the SAR image to be detected into the trained detection network to perform preliminary full-image detection and obtain a preliminary detection result set.

[0008] Step 3: Based on the preliminary detection result group, construct a pixel-level uncertainty heatmap with the same size as the SAR image to be detected, and initialize it as an all-zero matrix. For each detection box, update the heatmap value within its spatial support window to obtain the final uncertainty heatmap.

[0009] Step 4: Perform peak selection and slice extraction on the final uncertain heatmap to obtain the coordinates of the selected peak points and the image slice group.

[0010] Step 5: Input the image slice group into the trained detection network, and perform independent re-detection for each slice in the image slice group to obtain a re-detection result group.

[0011] Step 6: Merge the preliminary detection result group and the re-detection result group, and perform duplicate bounding box deduplication and confidence score adjustment to obtain the final detection result set.

[0012] Furthermore, the spatial constraint self-supervised contrastive learning training method described in step 1 specifically includes:

[0013] Step 1.1: Select an image from the large-scale unlabeled SAR image dataset, divide it into multiple non-overlapping image patches of the same size, and combine them into an image patch grid.

[0014] Step 1.2: Based on the image patch grid, randomly select an image patch as the anchor image patch to construct a set of contrastive learning sample pairs containing a set of positive samples and a set of negative samples.

[0015] Step 1.3: Perform feature embedding extraction on the learning sample pairs and anchor image blocks to obtain the feature embedding vector of the anchor image block, the set of positive sample feature embedding vectors, and the set of negative sample feature embedding vectors.

[0016] Step 1.4: Combining the feature embedding vectors of the anchor point image patch and the feature embedding vector set of the training sample pair set, introduce a loss function and calculate the loss value. ;

[0017]

[0018] in, The cosine similarity function is used. It is an exponential function. For the feature embedding vector of the anchor point image patch, The positive sample feature embedding vector set, the first Feature embedding vectors of positive samples The negative sample feature embedding vector set, the first The feature embedding vector of each negative sample and These represent the number of positive samples and the number of negative samples, respectively. This is the temperature coefficient.

[0019] Step 1.5: Determine whether the model has converged based on the loss value. If it has, complete the training; otherwise, update the model parameters and return to step 1.1.

[0020] Further, the positive sample set in step 1.2 consists of multiple four-connected neighborhood image patches on the image patch grid; the negative sample set consists of several image patches randomly sampled from outside the local neighborhood of the anchor point.

[0021] Furthermore, the method for updating the heatmap values ​​in step 3 specifically includes:

[0022]

[0023] in, For heatmap at pixel position The cumulative uncertainty value at that point, For the first The confidence score of each detection box. The Gaussian space neighbor weight function is used. For the first The center point coordinates of each detection box For pixel position to the The squared Euclidean distance between the center points of each detection box The side length parameter of the re-detection slice, As a scale weighting factor, For the first The area of ​​each detection frame.

[0024] Furthermore, the method for deduplication of overlapping boxes and adjustment of confidence scores in step 6 specifically includes:

[0025]

[0026] in, For detection box and The intersection and union ratio, The attenuation coefficient is... For the first The confidence score of each detection box.

[0027] A computer device includes a memory, a processor, and a computer program stored in the memory, wherein the processor executes the computer program to implement the steps of the method described above.

[0028] A computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the above-described method.

[0029] A computer program product includes computer instructions that, when executed by a processor, implement the steps of the method described above.

[0030] The beneficial effects of this invention are as follows: This invention possesses significant and multifaceted advantages: In terms of pre-training efficiency, the SC2L strategy eliminates the reliance on large batches or momentum queues, significantly reducing GPU memory requirements and training time, making efficient pre-training possible on ordinary workstations. It exhibits superior detection performance, particularly improving the detection metrics of traditional supervised training models and mainstream self-supervised baseline methods such as MoCo and BYOL on the highly challenging SSDD dataset for small object (APs). Its inference process is intelligent and efficient; the HASI mechanism accurately guides re-detection regions by constructing an uncertainty heatmap, avoiding the massive redundant computations brought by the uniform slicing strategy. With only a minimal increase in latency, its recall performance surpasses traditional uniform slicing schemes (such as SAHI). It offers significant advantages in deployment flexibility; HASI, as a plug-and-play inference wrapper, is seamlessly compatible with various existing detector architectures without requiring any model structure adjustments or retraining. It significantly optimizes resource utilization; experimental data shows that generating only 5 carefully selected adaptive slices achieves the same recall level as the traditional 9-slice uniform strategy, effectively saving computational resources and significantly improving system throughput. Attached Figure Description

[0031] Figure 1 This is a flowchart of the present invention.

[0032] Figure 2 This is a flowchart of the reasoning and slicing process of the present invention. Detailed Implementation

[0033] refer to Figure 1 A SAR ship detection method based on spatially constrained self-supervised learning and adaptive slice inference:

[0034] Step 1: Obtain a large-scale unlabeled SAR image dataset and the SAR image to be detected; construct a detection network and train the detection network using spatially constrained self-supervised contrastive learning to obtain a trained detection network with excellent spatial continuity and discriminativeness.

[0035] The spatial constraint self-supervised contrastive learning training method specifically includes:

[0036] Step 1.1: Select an image from the large-scale unlabeled SAR image dataset, divide it into multiple non-overlapping image patches of the same size, and combine them into an image patch grid.

[0037] Step 1.2: Based on the image patch grid, randomly select an image patch as the anchor image patch to construct a set of contrastive learning sample pairs containing a set of positive samples and a set of negative samples.

[0038] The positive sample set consists of multiple four-connected neighborhood image patches on the image patch grid; the negative sample set consists of several image patches randomly sampled from outside the local neighborhood of the anchor point.

[0039] Step 1.3: Perform feature embedding extraction on the learning sample pairs and anchor image blocks to obtain the feature embedding vector of the anchor image block, the set of positive sample feature embedding vectors, and the set of negative sample feature embedding vectors.

[0040] Step 1.4: Combining the feature embedding vectors of the anchor point image patch and the feature embedding vector set of the training sample pair set, introduce a loss function and calculate the loss value. ;

[0041]

[0042] in, The cosine similarity function is used. It is an exponential function. For the feature embedding vector of the anchor point image patch, The positive sample feature embedding vector set, the first Feature embedding vectors of positive samples The negative sample feature embedding vector set, the first The feature embedding vector of each negative sample and These represent the number of positive samples and the number of negative samples, respectively. This is the temperature coefficient.

[0043] Step 1.5: Determine whether the model has converged based on the loss value. If it has, complete the training; otherwise, update the model parameters and return to step 1.1.

[0044] Step 2: Input the SAR image to be detected into the trained detection network to perform preliminary full-image detection and obtain a preliminary detection result set.

[0045] Step 3: Based on the preliminary detection result group, construct a pixel-level uncertainty heatmap with the same size as the SAR image to be detected, and initialize it as an all-zero matrix. For each detection box, update the heatmap value within its spatial support window to obtain the final uncertainty heatmap.

[0046] The method for updating heatmap values ​​specifically includes:

[0047]

[0048] in, For heatmap at pixel position The cumulative uncertainty value at that point, For the first The confidence score of each detection box. The Gaussian space neighbor weight function is used. For the first The center point coordinates of each detection box For pixel position to the The squared Euclidean distance between the center points of each detection box The side length parameter of the re-detection slice, As a scale weighting factor, For the first The area of ​​each detection frame.

[0049] Step 4: Perform peak selection and slice extraction on the final uncertain heatmap to obtain the coordinates of the selected peak points and the image slice group.

[0050] Step 5: Input the image slice group into the trained detection network, and perform independent re-detection for each slice in the image slice group to obtain a re-detection result group.

[0051] Step 6: Merge the preliminary detection result group and the re-detection result group, and perform duplicate bounding box deduplication and confidence score adjustment to obtain the final detection result set.

[0052] The methods for deduplication of overlapping boxes and adjustment of confidence scores specifically include:

[0053]

[0054] in, For detection box and The intersection and union ratio, The attenuation coefficient is... For the first The confidence score of each detection box.

[0055] Example

[0056] refer to Figure 1 and Figure 2 The input SAR images were uniformly adjusted to a resolution of 512×512 pixels. During the pre-training phase, a large-scale unlabeled SAR dataset acquired by the Chinese Gaofen-3 (GF-3) satellite was used, dividing the images into 256×256 non-overlapping blocks with a batch size of 64. The backbone network preferably used the CSPDarknet53 architecture, and the downstream detector was YOLOv8s coupled with a PAFPN feature pyramid network to enhance multi-scale feature fusion capabilities. During the fine-tuning phase, only 20% of the labeled samples from the SSDD dataset were used for training, with the AdamW optimizer selected and an initial learning rate of [missing information]. A total of 200 training epochs were performed to achieve full convergence. During the inference phase, the redetection slice size for HASI was set to a = 256 pixels, and the maximum number of peaks allowed to be selected was determined. The final detection results were fused using the Soft-NMS algorithm, with an overlap threshold (NMSthreshold) set to 0.5. Performance evaluation was conducted on the authoritative public benchmarks SSDD and LS-SSDD-V1.0. Experimental data clearly demonstrate that this invention significantly outperforms the original YOLOv8s, CSS-YOLO optimized for small targets, and state-of-the-art self-supervised pre-training methods (MoCo v3, BYOL, SoftCon) in key metrics such as mAP@0.5 and APs, strongly proving the significant advantages of this invention in efficiently utilizing limited labeled data and improving detection robustness.

[0057] The above description is merely a preferred embodiment of the present invention and is not intended to limit the invention. Various modifications and variations can be made to the present invention by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.

Claims

1. A SAR ship detection method based on spatially constrained self-supervised learning and adaptive slice inference, characterized in that, Includes the following steps: Step 1: Obtain a large-scale unlabeled SAR image dataset and the SAR image to be detected; construct a detection network and train the detection network using spatially constrained self-supervised contrastive learning to obtain a trained detection network with excellent spatial continuity and discriminativeness; Step 2: Input the SAR image to be detected into the trained detection network to perform preliminary full-image detection and obtain a preliminary detection result set; Step 3: Based on the preliminary detection result group, construct a pixel-level uncertainty heatmap with the same size as the SAR image to be detected, and initialize it as an all-zero matrix. For each detection box, update the heatmap value within its spatial support window to obtain the final uncertainty heatmap. Step 4: Perform peak selection and slice extraction on the final uncertain heatmap to obtain the coordinates of the selected peak points and the image slice group; Step 5: Input the image slice group into the trained detection network, and perform independent re-detection for each slice in the image slice group to obtain a re-detection result group; Step 6: Merge the preliminary detection result group and the re-detection result group, and perform duplicate bounding box deduplication and confidence score adjustment to obtain the final detection result set.

2. The SAR ship detection method based on spatially constrained self-supervised learning and adaptive slice inference according to claim 1, characterized in that, The spatial constraint self-supervised contrastive learning training method described in step 1 specifically includes: Step 1.1: Select an image from the large-scale unlabeled SAR image dataset, divide it into multiple non-overlapping image patches of the same size, and combine them into an image patch grid; Step 1.2: Based on the image patch grid, randomly select an image patch as the anchor image patch to construct a set of contrastive learning sample pairs containing a set of positive samples and a set of negative samples; Step 1.3: Perform feature embedding extraction on the learning sample pairs and anchor image patches to obtain the feature embedding vector of the anchor image patch, the set of positive sample feature embedding vectors, and the set of negative sample feature embedding vectors; Step 1.4: Combining the feature embedding vectors of the anchor point image patch and the feature embedding vector set of the training sample pair set, introduce a loss function and calculate the loss value. ; in, The cosine similarity function is used. It is an exponential function. For the feature embedding vector of the anchor point image patch, The positive sample feature embedding vector set, the first Feature embedding vectors of positive samples The negative sample feature embedding vector set, the first The feature embedding vector of each negative sample and These represent the number of positive samples and the number of negative samples, respectively. Temperature coefficient; Step 1.5: Determine whether the model has converged based on the loss value. If it has, complete the training; otherwise, update the model parameters and return to step 1.

1.

3. The SAR ship detection method based on spatially constrained self-supervised learning and adaptive slice inference according to claim 2, characterized in that: Step 1.2 The positive sample set consists of multiple four-connected neighborhood image patches on the image patch grid; the negative sample set consists of several image patches randomly sampled from outside the local neighborhood of the anchor point.

4. The SAR ship detection method based on spatially constrained self-supervised learning and adaptive slice inference according to claim 1, characterized in that, The method for updating heatmap values ​​in step 3 specifically includes: in, For heatmap at pixel position The cumulative uncertainty value at that point, For the first The confidence score of each detection box. The Gaussian space neighbor weight function is used. For the first The center point coordinates of each detection box For pixel position to the The squared Euclidean distance between the center points of each detection box The side length parameter of the re-detection slice, As a scale weighting factor, For the first The area of ​​each detection frame.

5. The SAR ship detection method based on spatially constrained self-supervised learning and adaptive slice inference according to claim 1, characterized in that, The method for deduplication of overlapping boxes and adjustment of confidence scores in step 6 specifically includes: in, For detection box and The intersection and union ratio, The attenuation coefficient is... For the first The confidence score of each detection box.

6. A computer device comprising a memory, a processor, and a computer program stored in the memory, characterized in that: The processor executes the computer program to implement the steps of the method according to any one of claims 1 to 5.

7. A computer-readable storage medium having a computer program stored thereon, characterized in that: When executed by a processor, the computer program implements the steps of the method according to any one of claims 1 to 5.

8. A computer program product comprising computer instructions, characterized in that: When executed by a processor, the computer instructions implement the steps of the method according to any one of claims 1 to 5.