Remote sensing image feature interference ship detection method based on multi-scale dilated convolution attention
By employing a multi-scale dilated convolutional attention mechanism and feature fusion technology, the problem of accurate identification and efficient processing of ships in remote sensing image feature interference scenarios was solved, achieving high-precision and high-efficiency ship detection results.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- DALIAN MARITIME UNIVERSITY
- Filing Date
- 2026-03-10
- Publication Date
- 2026-06-19
AI Technical Summary
Existing ship detection technologies in remote sensing image feature interference scenarios lack accurate identification and efficient processing capabilities. In particular, under problems such as the coexistence of multi-scale targets, changing viewpoints, target occlusion, and image blurring, the geometric shape of ships is severely distorted, and water interference leads to image degradation, resulting in serious confusion between targets and background.
A detection framework based on multi-scale dilated convolutional attention is constructed by adopting a multi-scale dilated convolutional attention mechanism, combining Haar wavelet transform, dual-branch pooling convolutional fusion module and spatial pyramid pooling enhancement layer aggregation network. By preprocessing and feature fusion of remote sensing images, the detection accuracy and stability are improved.
It effectively removes high-frequency noise from images, improves low contrast, enhances the accuracy and stability of ship inspection, and enables high-efficiency, high-precision ship inspection, adapting to the computing needs of embedded devices in real-world environments.
Smart Images

Figure CN122244633A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of computer vision technology, and more particularly to a method for detecting ships that are interfered with by remote sensing image features based on multi-scale dilated convolutional attention. Background Technology
[0002] As the core carriers of maritime activities such as ocean transport and maritime search and rescue, accurate ship detection is of great significance. Although intelligent and modern ship detection technologies have achieved certain results, ship detection technologies for scenarios with interference from remote sensing image features still need improvement. There is an urgent need to develop ship detection methods for remote sensing image feature interference with accurate identification and efficient processing capabilities to assist maritime departments in real-time ship detection and improve work efficiency.
[0003] Ship detection using remote sensing image features is one of the most cutting-edge research directions in computer vision, attracting considerable attention from scholars in recent years. With the rise of deep learning technology, many mainstream detection models have been applied to ship detection tasks using remote sensing image features. Although mainstream detection models such as YOLOv6 and SSD have demonstrated powerful target detection capabilities, breakthroughs are still needed for ship detection in remote sensing scenarios.
[0004] In real-world remote sensing scenarios, ship targets exhibit significant size heterogeneity, with multiple scales coexisting and a high proportion of small targets. Coupled with the variable viewing angles, target occlusion, and image blurring inherent in remote sensing imaging, this easily leads to substantial distortion of ship geometry. Furthermore, the selective absorption of light by water and atmospheric scattering cause image degradation phenomena such as color cast, reduced contrast, and blurred details. Additionally, the similar grayscale to the land background in some ships, combined with water turbidity, further exacerbates the confusion between target and background, weakening the effective representation of features. Summary of the Invention
[0005] To address the aforementioned technical problems, this paper proposes a method for detecting ships with feature interference in remote sensing images based on multi-scale dilated convolutional attention. First, a publicly available dataset is processed to establish a dataset of remote sensing images with feature interference issues. This dataset is then preprocessed and partitioned. A framework for detecting ships with feature interference in remote sensing images based on multi-scale dilated convolutional attention is constructed. The detection framework is trained on the preprocessed and partitioned dataset to obtain a model for detecting ships with feature interference in remote sensing images. The trained model is then deployed and its hyperparameters are adjusted. The resulting model is used for the task of detecting ships with feature interference in remote sensing images.
[0006] A remote sensing image feature-based ship detection method based on multi-scale dilated convolutional attention includes:
[0007] S1. Establish a dataset of ship remote sensing images with feature interference problems; S2. Preprocess and segment the ship remote sensing image dataset that has feature interference problems; S3. Construct a framework for detecting ships that are disturbed by remote sensing image features based on multi-scale dilated convolutional attention; S4. Train the detection framework on the preprocessed and partitioned dataset to obtain a remote sensing image feature interference ship detection model. S5. Deploy a remote sensing image feature-based ship interference detection model and adjust hyperparameters for remote sensing image feature-based ship interference detection tasks.
[0008] Furthermore, in S1, the step of establishing a ship remote sensing image dataset with feature interference problems includes: The locality-sensitive hashing algorithm was used to evaluate the image duplication of the public dataset MAritime SATellite Imagery dataset version 2 (MASATI v2), removing duplicate and invalid labeled images.
[0009] Furthermore, in S2, the preprocessing and segmentation steps for the ship remote sensing image dataset with feature interference problems include: Images from a ship remote sensing image dataset with feature interference problems are read in, and the Haar wavelet transform method is used to enhance image details and generate image set A. Read in the images from image set A, scale the images to a uniform size, generate image set B to be trained, and randomly divide the training set and test set according to a 7:3 ratio.
[0010] Furthermore, in S3, the step of constructing a remote sensing image feature-based ship detection framework based on multi-scale dilated convolutional attention includes: Build an input layer to receive input remote sensing images; A backbone network using a dual-branch pooling convolutional fusion module and a spatial pyramid pooling enhancement layer aggregation network is constructed to extract input image features; A neck network using a multi-scale dilated convolutional attention mechanism is constructed to fuse multi-scale features, and the fused features are then fed into the detection head to generate detection results. Combining the networks constructed in the above steps yields a remote sensing image feature-based ship detection framework based on multi-scale dilated convolutional attention.
[0011] Furthermore, in S3, the dual-branch pooling convolution fusion module processes remote sensing images in the following manner: Perform 2×2 average pooling on the feature map output by the upper layer network; The input remote sensing image is divided into two branches along the channel dimension, with each branch having half the number of channels as the input channels; Branch 1 performs 3×3 convolution to extract features; Branch 2 first performs 2×2 max pooling, then performs 1×1 convolution to adjust the number of channels. Finally, the outputs of the two branches are concatenated along the channel dimension to output a new feature map.
[0012] Furthermore, in S3, the spatial pyramid pooling enhancement layer aggregation network processes remote sensing images in the following manner: The input feature map is subjected to a 1×1 convolution to adjust the number of channels, and then the output new feature map is input into the four branches in parallel. Branch 1 acts as a residual connection, directly passing the transformed features to the concatenation operation; Branch 2, Branch 3, and Branch 4 are three parallel branches, each performing a 5×5 max pooling operation to obtain a new feature map; The output feature maps of branches 2, 3, and 4 are concatenated with those of branch 1 to output a new feature map. The input feature map is subjected to a 1×1 convolution to adjust the number of channels, resulting in a new feature map.
[0013] Furthermore, in S3, when performing multi-scale feature fusion based on the multi-scale dilated convolutional attention mechanism: The input feature map is segmented into different heads according to the channel dimension. The dimension is adjusted by linear transformation. The output result is input into a multi-head sliding window to expand the attention. Different expansion rates are used to obtain multi-scale output features. The obtained multi-scale output features are then stitched together. The input feature map is adjusted in dimension and information is fused to obtain the output feature map.
[0014] Furthermore, in S5, the deployment of a remote sensing image feature-based ship interference detection model and the specific implementation of remote sensing image feature-based ship interference detection include: Deploy a remote sensing image feature interference ship detection model onto the detection equipment; Read the parameter configuration file and load the pre-trained model weights; Adaptively adjust hyperparameters to configure model processing speed; Read the real-time input image and preprocess it. The processed image is then sent to remote sensing image feature interference ship detection to perform target detection. Visualize the location and category information in the detection results.
[0015] Compared with the prior art, the present invention has the following advantages: This invention discloses a remote sensing image feature interference ship detection method based on multi-scale dilated convolutional attention. By introducing Haar wavelet transform to preprocess the image, it effectively removes high-frequency noise in the image and improves the low contrast problem caused by uneven illumination and atmospheric scattering interference in remote sensing images.
[0016] In addition, the traditional YOLO series algorithms have been improved by introducing a dual-branch pooling convolutional fusion module, a spatial pyramid pooling enhancement layer aggregation network, and a multi-scale dilated convolutional attention mechanism to reconstruct the feature fusion link, enhance the model's adaptability, and improve the accuracy and stability of the algorithm for detecting ship remote sensing images with feature interference problems.
[0017] In addition, a single-stage anchor-free detection model is proposed, which can meet the computational complexity requirements of embedded devices in real-world environments while maintaining high detection performance, and achieve high-efficiency and high-precision detection of remote sensing images of ships with feature interference. Attached Figure Description
[0018] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0019] Figure 1 This is a flowchart of the remote sensing image feature interference ship detection method based on multi-scale dilated convolutional attention provided by the present invention.
[0020] Figure 2 This is a schematic diagram of the network structure of the single-stage anchorless detection algorithm provided by the present invention.
[0021] Figure 3 This is a flowchart of the dual-branch pooling convolution fusion module.
[0022] Figure 4 This is a flowchart of the spatial pyramid pooling enhancement layer aggregation network.
[0023] Figure 5 This is a flowchart of the multi-scale dilated convolutional attention mechanism.
[0024] Figure 6 This is a visual representation of the test results.
[0025] Figure 7 This is a comparative experimental result of our detection method compared with other mainstream detection methods. Detailed Implementation
[0026] To enable those skilled in the art to better understand the present invention, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present invention.
[0027] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this invention are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of the invention described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.
[0028] The terms "second," "second," etc., are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that embodiments of the invention described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.
[0029] like Figure 1 As shown, this invention provides a method for detecting ships interfering with remote sensing image features based on multi-scale dilated convolutional attention, including: S1. Establish a dataset of ship remote sensing images with feature interference problems; In a specific implementation, as a preferred embodiment of the present invention, the public dataset MAritimeSATellite Imagery dataset version 2 (MASATI v2) is processed by using a hash algorithm to evaluate the image duplication of the dataset and remove duplicate and invalid labeled images.
[0030] S2. Preprocess and segment the ship remote sensing image dataset that has feature interference problems; In a specific implementation, as a preferred embodiment of the present invention, the ship remote sensing image dataset with feature interference problems is preprocessed and divided. Images from the ship remote sensing image dataset with feature interference problems are read in, and the Haar wavelet transform method is used to enhance image details to generate image set A. Images from image set A are read in, and the image size is scaled to a uniform size to generate image set B to be trained. The training set and test set are randomly divided according to a 7:3 ratio.
[0031] S3. Construct a framework for detecting ships that are disturbed by remote sensing image features based on multi-scale dilated convolutional attention; In a preferred embodiment of this invention, the construction of a remote sensing image feature-based ship detection framework based on multi-scale dilated convolutional attention involves: constructing an input layer to receive the input remote sensing image; constructing a backbone network using a dual-branch pooling convolutional fusion module and a spatial pyramid pooling enhancement layer to extract input image features; constructing a neck network using a multi-scale dilated convolutional attention mechanism to fuse multi-scale features and inputting the fused features into a detection head to generate detection results; and combining the networks constructed in the above steps to obtain the remote sensing image feature-based ship detection framework based on multi-scale dilated convolutional attention.
[0032] The dual-branch pooling convolution fusion module performs 2×2 average pooling on the output feature map of the upper network; divides the input remote sensing image into two branches along the channel dimension, with each branch having half the number of channels as the input; performs 3×3 convolution to extract features on branch one; performs 2×2 max pooling on branch two, followed by 1×1 convolution to adjust the number of channels; and finally concatenates the outputs of the two branches along the channel dimension to output a new feature map.
[0033] The computation process of the spatial pyramid pooling enhancement layer aggregation network involves performing a 1×1 convolution on the input feature map to adjust the number of channels, and then inputting the new output feature map in parallel into four branches. Branch 1 acts as a residual connection, directly passing the transformed features to the concatenation operation. Branches 2, 3, and 4 act as three parallel branches, each performing a 5×5 max pooling operation to obtain a new feature map. The output feature maps of branches 2, 3, and 4 are concatenated with branch 1 to output a new feature map. Finally, a 1×1 convolution is performed on the input feature map to adjust the number of channels, resulting in a new feature map.
[0034] The multi-scale dilated attention mechanism calculation process involves segmenting the input feature map into different heads according to the channel dimension, adjusting the dimension through linear transformation, inputting the output result into a multi-head sliding window to expand the attention, using different expansion rates to obtain multi-scale output features, and concatenating the obtained multi-scale output features; adjusting the dimension and fusing information in the input feature map to obtain the output feature map.
[0035] S4. Train the detection framework on the preprocessed and partitioned dataset to obtain a remote sensing image feature interference ship detection model. In a preferred embodiment of the present invention, the training process employs an adaptive learning rate adjustment strategy, dynamically adjusting the learning rate according to the loss changes during training to ensure that the model can converge stably during training.
[0036] S5. Deploy the remote sensing image feature interference ship detection model and adjust the hyperparameters for the remote sensing image feature interference ship detection task; In a specific implementation, as a preferred embodiment of the present invention, the process of deploying the trained model for the remote sensing image feature interference ship detection task includes: deploying the remote sensing image feature interference ship detection model on the detection equipment; reading the parameter configuration file and loading the pre-trained model weights; adaptively adjusting the hyperparameters and configuring the model processing rate; reading the real-time input image and preprocessing the input image; sending the processed image to the remote sensing image feature interference ship detection to perform target detection; and visualizing the location and category information in the detection results.
[0037] Example like Figure 1 As shown, this invention provides a method for detecting ships with interference from remote sensing image features based on multi-scale dilated convolutional attention; The dataset was obtained by processing the MASATI v2 maritime satellite imagery dataset. A hash algorithm was used to evaluate image redundancy, removing severely blurred, duplicate, and invalidly labeled images. Ultimately, 4522 images were retained, representing 61.2% of the original dataset. This dataset includes two core sample classes: one class consists of positive samples with ship bounding boxes (the image contains one or more ship targets), and the other class consists of negative samples with empty labels (the image only contains the sea surface, coastline, etc., without any object labels). These two classes were randomly divided into training and test sets in a 7:3 ratio.
[0038] Training Process: All experiments were conducted on an Ubuntu 22.04 workstation equipped with an Intel Core i7 processor, three NVIDIA RTX A6000 GPUs, and 32GB of RAM. Experiments were implemented using PyTorch 2.0.0, CUDA 11.8, and cuDNN 8.7.0, with 200 rounds per round. Experimental results show that the remote sensing image feature-based ship detection method based on multi-scale dilated convolutional attention improves accuracy, AP, and AP50 metrics by 1.49%, 5.75%, and 1.35%, respectively, compared to the baseline model.
[0039] like Figure 2 As shown, this embodiment provides a single-stage anchorless frame detection algorithm.
[0040] like Figure 3 As shown, this embodiment provides a dual-branch pooling convolution fusion module, which performs 2×2 average pooling on the feature map output by the upper-layer network; divides the input remote sensing image into two branches along the channel dimension, with each branch having half the number of channels as the input channels; performs 3×3 convolution to extract features on branch one; performs 2×2 max pooling on branch two, and then performs 1×1 convolution to adjust the number of channels; finally, the outputs of the two branches are concatenated along the channel dimension to output a new feature map.
[0041] like Figure 4 As shown, this embodiment provides a spatial pyramid pooling enhancement layer aggregation network. The input feature map is subjected to a 1×1 convolution to adjust the number of channels, and then the new output feature map is input into four branches in parallel. Branch 1 serves as a residual connection, directly passing the transformed features to the concatenation operation. Branches 2, 3, and 4 serve as three parallel branches, each performing a 5×5 max pooling operation to obtain a new feature map. The output feature maps of branches 2, 3, and 4 are concatenated with branch 1 to output a feature map. The input feature map is then subjected to a 1×1 convolution to adjust the number of channels, resulting in a new feature map.
[0042] like Figure 5 As shown, this embodiment provides a multi-scale dilated attention mechanism. The input feature map is segmented into different heads according to the channel dimension. The dimension is adjusted by linear transformation. The output result is input into a multi-head sliding window to expand the attention. Different expansion rates are used to obtain multi-scale output features. The obtained multi-scale output features are spliced together. The dimension of the input feature map is adjusted and information is fused to obtain the output feature map.
[0043] like Figure 6 As shown in the figure, this embodiment provides a visualization diagram of the detection results of a remote sensing image feature interference ship detection method based on multi-scale dilated convolutional attention. The position of the detection box represents the spatial position of the target, and the number above the detection box represents the confidence level of the detection result.
[0044] like Figure 7 As shown, this embodiment provides comparative experimental results of the remote sensing image feature interference ship detection method based on multi-scale dilated convolutional attention and other mainstream detection methods in remote sensing image ship detection tasks.
[0045] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some or all of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present invention.
Claims
1. A remote sensing image feature interference ship detection method based on multi-scale dilated convolution attention, characterized in that, include: S1: Establish a dataset of ship remote sensing images with feature interference problems; S2: Preprocessing and partitioning the ship remote sensing image dataset with feature interference problems; S3: Construct a framework for detecting ships that are disturbed by remote sensing image features based on multi-scale dilated convolutional attention; S4: Train the detection framework on the preprocessed and partitioned dataset to obtain a remote sensing image feature interference ship detection model; S5: Deploy a remote sensing image feature-based ship detection model and adjust hyperparameters for remote sensing image feature-based ship detection tasks.
2. The method according to claim 1, wherein the method is characterized in that: The Locality Sensitive Hash (LSH) algorithm is used to evaluate the image duplication of public datasets, and images with duplicate and invalid annotations are removed to establish a ship remote sensing image dataset.
3. The method for detecting ships with interference from remote sensing image features based on multi-scale dilated convolutional attention as described in claim 1, characterized in that: When preprocessing and partitioning the ship remote sensing image dataset: read the images in the ship remote sensing image dataset with feature interference problems, use the Haar wavelet transform method to enhance image details, and generate image set A; analyze the images in image set A, scale the image size to a uniform size, generate image set B to be trained, and randomly divide the training set and test set according to a 7:3 ratio.
4. The remote sensing image feature interference ship detection method based on multi-scale dilated convolutional attention according to claim 1, characterized in that: The following method is used when building a ship detection model that uses remote sensing image features to interfere with the system: Build an input layer to receive input remote sensing images; A backbone network using a dual-branch pooling convolutional fusion module and a spatial pyramid pooling enhancement layer aggregation network is constructed to extract input image features; A neck network using a multi-scale dilated convolutional attention mechanism is constructed to fuse multi-scale features, and the fused features are then fed into the detection head to generate detection results. Combining the networks constructed in the above steps yields a remote sensing image feature-based ship detection model based on multi-scale dilated convolutional attention.
5. The remote sensing image feature interference ship detection method based on multi-scale dilated convolutional attention according to claim 4, characterized in that: The dual-branch pooling convolutional fusion module processes remote sensing images in the following manner: average pooling is performed on the feature map output by the upper-layer network; the input remote sensing image is divided into two branches along the channel dimension, with the number of channels in each branch being half of the input channels; convolution is performed on branch one to extract features; branch two first performs max pooling, and then convolution to adjust the number of channels; finally, the output results of the two branches are concatenated along the channel dimension to output a new feature map.
6. The remote sensing image feature interference ship detection method based on multi-scale dilated convolutional attention according to claim 4, characterized in that: The spatial pyramid pooling enhancement layer aggregation network processes remote sensing images in the following manner: the input feature map is subjected to a 1×1 convolution to adjust the number of channels, and then the new output feature map is input into four branches in parallel; branch one serves as a residual connection, directly passing the transformed features to the concatenation operation; branches two, three, and four serve as three parallel branches, each performing a 5×5 max pooling operation to obtain a new feature map; the output feature maps of branches two, three, and four are concatenated with branch one to output a feature map; the input feature map is then subjected to a 1×1 convolution to adjust the number of channels, resulting in a new feature map.
7. The remote sensing image feature interference ship detection method based on multi-scale dilated convolutional attention according to claim 4, characterized in that: When performing multi-scale feature fusion based on the multi-scale dilated convolutional attention mechanism: the input feature map is divided into different heads according to the channel dimension, the dimension is adjusted by linear transformation, the output result is input into a multi-head sliding window to expand the attention, different dilation rates are used to obtain multi-scale output features, and the obtained multi-scale output features are concatenated. The input feature map is adjusted in dimension and information is fused to obtain the output feature map.
8. The method for detecting ships with interference from remote sensing image features based on multi-scale dilated convolutional attention as described in claim 1, characterized in that: When deploying a remote sensing image feature-based ship detection model and performing remote sensing image feature-based ship detection: Deploy the remote sensing image feature interference ship detection model on the detection equipment; read the parameter configuration file and load the pre-trained model weights; Adaptively adjust hyperparameters to configure model processing speed; read real-time input images and preprocess them; The processed image is fed into the remote sensing image feature interference ship detection to perform target detection; the location and category information in the detection results are visualized.