Method for multi-scale infrared ship detection based on dynamic adaptive convolution

By enhancing the feature representation capability of the infrared ship detection network through dynamic adaptive convolution and multi-scale aggregation modulation layers, the problem of low accuracy of existing multi-scale infrared ship detection methods in coastal defense scenarios is solved, and a more efficient infrared ship detection effect is achieved.

CN122244632APending Publication Date: 2026-06-19DALIAN MARITIME UNIVERSITY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
DALIAN MARITIME UNIVERSITY
Filing Date
2026-03-10
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing multi-scale infrared ship detection methods have low accuracy in coastal defense scenarios and cannot effectively identify image degradation caused by water vapor absorption, aerosol scattering, and near-shore building occlusion.

Method used

We employ dynamic adaptive convolution to reshape cross-channel residual blocks, design multi-scale aggregation modulation layers and multi-branch fusion attention to reshape cross-channel residual blocks, and build a multi-scale infrared ship detection network based on dynamic adaptive convolution. Combined with a dynamic fusion detection head, we enhance feature representation and robustness.

🎯Benefits of technology

It improves the accuracy and robustness of infrared ship detection, effectively handles target overlap and ambiguity issues in complex infrared detection scenarios, reduces redundant information interference, and enhances the model's adaptability and detection capabilities.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122244632A_ABST
    Figure CN122244632A_ABST
Patent Text Reader

Abstract

This invention discloses a multi-scale infrared ship detection method based on dynamic adaptive convolution, comprising: establishing a multi-scale ship image dataset based on infrared detection technology, and preprocessing and partitioning the dataset; designing a multi-scale aggregation modulation layer, using dynamic adaptive convolution to reshape cross-channel residual blocks into dynamic adaptive residual blocks, and constructing a backbone network with dynamic adaptive residual blocks and a multi-scale aggregation modulation layer; defining the multi-branch fusion attention-reshaped cross-channel residual blocks as multi-branch fusion residual blocks, and constructing a neck network applying multi-branch fusion residual blocks; designing a dynamic fusion detection head as the output network; combining the backbone network, neck network, and output network to construct a multi-scale infrared ship detection network based on dynamic adaptive convolution; training the multi-scale infrared ship detection network on the dataset to obtain a multi-scale infrared ship detection model, deploying the multi-scale infrared ship detection model and adjusting hyperparameters for multi-scale infrared ship detection tasks.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of computer vision technology, and more particularly to a multi-scale infrared ship detection method based on dynamic adaptive convolution. Background Technology

[0002] Infrared detection technology is now widely used in coastal defense systems, and its application in detecting ships is of great significance. However, existing methods for detecting multi-scale infrared ship images have low accuracy and cannot meet the needs of coastal defense scenarios. There is an urgent need to develop accurate and efficient multi-scale infrared ship detection methods to detect infrared images and assist coastal defense systems in accurately identifying targets.

[0003] Infrared ship detection is one of the most cutting-edge research areas in computer vision, attracting considerable attention from scholars in recent years. With the rise of deep learning technology, many mainstream detection models have been applied to multi-scale infrared ship detection tasks. Although mainstream detection models such as YOLO and DETR have demonstrated powerful target detection capabilities, breakthroughs are still needed for ship detection in infrared scenes.

[0004] In real-world infrared ship detection environments, the absorption of infrared wavelengths by water vapor and the scattering by marine aerosols lead to degraded images characterized by blurred details, low contrast, and blurred edges. Furthermore, near-shore structures and floating objects obscure the ship, impairing target integrity and further weakening feature representation. Summary of the Invention

[0005] To address the aforementioned technical problems, this invention discloses a multi-scale infrared ship detection method based on dynamic adaptive convolution. The method first establishes a multi-scale ship image dataset detected using infrared detection technology, and preprocesses and partitions the dataset. A multi-scale aggregation modulation layer is designed, and dynamic adaptive convolution is used to reshape cross-channel residual blocks into dynamic adaptive residual blocks. A backbone network applying the dynamic adaptive residual blocks and the multi-scale aggregation modulation layer is constructed. Multi-branch fusion attention is used to reshape cross-channel residual blocks into multi-branch fusion residual blocks, and a neck network applying the multi-branch fusion residual blocks is constructed. A dynamic fusion detection head is designed as the output network. The backbone network, neck network, and output network are combined to construct a multi-scale infrared ship detection network based on dynamic adaptive convolution. The multi-scale infrared ship detection network is trained on the dataset to obtain a multi-scale infrared ship detection model. The multi-scale infrared ship detection model is deployed and its hyperparameters are adjusted for use in multi-scale infrared ship detection tasks.

[0006] The technical means employed in this invention include the following steps:

[0007] S1: Establish a multi-scale ship image dataset based on infrared detection technology, and preprocess and partition the multi-scale ship image dataset: S2: Design a multi-scale aggregation modulation layer, use dynamic adaptive convolution to reshape cross-channel residual blocks into dynamic adaptive residual blocks, and build a backbone network with dynamic adaptive residual blocks and multi-scale aggregation modulation layer; S3: Define the multi-branch fusion attention reshaping cross-channel residual block as a multi-branch fusion residual block, and build a neck network that applies the multi-branch fusion residual block; S4: Design a dynamic fusion detection head and use it as the output network; S5: Combine the backbone network, neck network and output network to build a multi-scale infrared ship detection network based on dynamic adaptive convolution; S6: Train the multi-scale infrared ship detection network on the dataset to obtain a multi-scale infrared ship detection model, deploy the multi-scale infrared ship detection model and adjust the hyperparameters for multi-scale infrared ship detection tasks.

[0008] Furthermore, in S1, the steps of preprocessing and dividing the multi-scale ship image dataset detected by infrared detection technology include: Locality-sensitive hashing algorithm was used to evaluate the image redundancy of the infrared marine vessel dataset, and images with duplicate and invalid annotations were removed. Images from a multi-scale ship image dataset detected using infrared detection technology are read in, and the Haar wavelet transform method is used to enhance image details and generate image set A. Read in the images from image set A, scale the images to a uniform size, generate image set B to be trained, and randomly divide the training set and test set according to a 7:3 ratio.

[0009] Further, in S2, the step of designing the multi-scale aggregation modulation layer includes: An aggregation module was built to enhance the model's ability to capture details of targets at different scales and shapes. Build an interactive module to enhance the model's ability to distinguish target features at different levels; The modules constructed in the above steps are combined and connected to the input and output to obtain a multi-scale aggregation modulation layer.

[0010] Further, in S2, the step of reshaping the cross-channel residual block into a dynamically adaptive residual block using dynamic adaptive convolution includes: Replace the convolutional modules in the cross-channel residual blocks with dynamic adaptive convolutions; By integrating modules from cross-channel residual blocks, a dynamic adaptive residual block is obtained.

[0011] Further, in S2, the step of using dynamic adaptive convolution includes: Build a pre-global context module to enhance the ability to extract convolutional information; Build a switchable convolution module to enhance the multi-scale feature adaptability of convolution; A global context module is built to enhance the ability to integrate global information from convolution. Combining the modules constructed in the above steps yields a dynamic adaptive convolution.

[0012] Further, in S3, the step of reshaping the cross-channel residual block into a multi-branch fusion residual block using multi-branch fusion attention includes: Add multi-branch fusion attention to the output of the convolutional module in the cross-channel residual block; By integrating modules from cross-channel residual blocks, a multi-branch fused residual block is obtained.

[0013] Furthermore, in S4, the step of designing the dynamic fusion detection head as the output network includes: The input module outputs a dynamic fusion module that combines multi-scale features. A dynamic fusion module is built to adaptively and dynamically fuse input multi-scale features and output a prediction module; Build a prediction module to output the core parameters of the detected target; The modules constructed in the above steps are combined to obtain a dynamic fusion detection head, which serves as the output network to receive input feature maps and output prediction results.

[0014] Furthermore, in S5, the aggregation module processes the ship images as follows: The input feature map is split into a first feature map, a second feature map, and a third feature map along the functional dimension. The first feature map is connected to the interaction module using a linear aggregation method, the second feature map is connected to the aggregation module using a linear aggregation method, and the third feature map is connected to the aggregation module using a lightweight linear aggregation method. The second feature map is processed using multi-scale separable convolution; The outputs of the first, second, and third layers of the second feature map are activated and enhanced to obtain the intermediate separation factor. The fourth layer output of the second feature map is subjected to activation enhancement and global average pooling to obtain the output separation. Dynamic gating aggregation processing is performed on the third feature map to obtain multi-layer gating output; The intermediate separation quantity and the output separation quantity are multiplied element-wise to gate the output quantity, and features are fused to output a new feature map.

[0015] Furthermore, in S5, the interaction module processes the ship image as follows: The input feature map is split into a first feature map, a second feature map, and a third feature map along the functional dimension. The first feature map is connected to the interaction module using a linear aggregation method, the second feature map is connected to the aggregation module using a linear aggregation method, and the third feature map is connected to the aggregation module using a lightweight linear aggregation method. The query volume q is obtained by querying the first feature map. The feature map output by the aggregation module is modulated to obtain the modulation amount; The query quantity q is multiplied element by element by the modulation quantity to output a new feature map.

[0016] Furthermore, in S5, the calculation process of the pre-global context module includes: Two branches are input in parallel to the input feature map, and the number of channels in each branch is the same as the number of channels in the original input. Perform global average pooling on branch one; Perform convolution processing on branch one; Branch 2, acting as a residual connection, is fused with the output features of Branch 1 to produce a new feature map.

[0017] Furthermore, in S5, the calculation process of the switchable convolution module includes: The input feature map is processed in parallel to input three branches, with the number of channels in each branch being the same as the number of channels in the original input. Perform a 3×3 convolution on branch one; Perform 5×5 average pooling on branch 2; Perform a 1×1 convolution on branch two; Perform a 3×3 convolution on branch 3; The output of branch 2 is multiplied element-wise with the outputs of branch 1 and branch 3 respectively, and the features are fused to output a new feature map.

[0018] Furthermore, in S5, the post-global context module calculation process includes: Two branches are input in parallel to the input feature map, and the number of channels in each branch is the same as the number of channels in the original input. Perform global average pooling on branch one; Perform a 1×1 convolution on branch one; Branch 2, acting as a residual connection, is fused with the output features of Branch 1 to produce a new feature map.

[0019] Furthermore, in S5, the multi-branch fusion attention calculation process includes: Perform a 1×1 convolution on the input feature map; The output feature map is divided into two branches along the channel dimension, with each branch having half the number of channels as the original input channels; Perform global average pooling on branch one; Four fully connected layers are input to branch one in parallel, ReLU activation is performed, and channels are concatenated. Branch 2 serves as a residual connection to fuse and output the channel splicing of Branch 1; The fused feature maps are then convolved with a 1×1 convolution to output a new feature map.

[0020] Furthermore, in S5, the calculation process of the dynamic fusion module includes: Upsampling and downsampling of multi-scale input feature maps; Perform 1×1 convolution on the multi-scale input feature maps; Concatenate multi-scale input feature maps; Perform 1×1 convolution on the multi-scale input feature maps; Apply Softmax activation to the multi-scale input feature maps; Feature fusion of multi-scale input feature maps Perform a 3×3 convolution on the multi-scale input feature map to output a new feature map; Furthermore, in S5, the backbone network, neck network, and output network are combined to build a multi-scale infrared ship detection network based on dynamic adaptive convolution: An input layer is constructed to receive multi-scale infrared ship images. A backbone network with dynamic adaptive residual blocks and multi-scale aggregation modulation layers is constructed to extract features from the input image. Construct a neck network that applies multi-branch fusion residual blocks to fuse multi-scale features; The fused features are fed into the dynamic fusion detection head to generate detection results; By combining the networks constructed in the above steps, a multi-scale infrared ship detection method based on dynamic adaptive convolution is obtained.

[0021] Furthermore, in S6, the process of deploying the trained model and adjusting hyperparameters for multi-scale infrared ship detection includes: Deploy multi-scale infrared ship detection models onto detection equipment; Read the parameter configuration file and load the pre-trained model weights; Adaptively adjust hyperparameters and configure model processing speed; Read the input image and preprocess it. The preprocessed image is fed into a multi-scale infrared ship detection model to perform target prediction. Visualize the location and category information in the detection results.

[0022] This invention discloses a multi-scale infrared ship detection method based on dynamic adaptive convolution. This method introduces Haar wavelet transform to preprocess the image, effectively removing high-frequency noise and improving the low contrast problem of infrared ship images caused by water vapor absorption and aerosol scattering. Furthermore, it proposes a multi-scale aggregation modulation layer that can dynamically allocate weights according to the features of the input image, enhancing the output quality and generalization ability of the backbone network, improving the robustness of the model, and optimizing the local details of infrared ship images.

[0023] Furthermore, this method improves traditional cross-channel residual blocks by replacing ordinary convolution with dynamic adaptive convolution, which can adaptively switch receptive fields for targets of different sizes. It also employs a contextual mechanism to enhance feature correlation and improve the model's feature representation of infrared images. Multi-branch fusion attention is used to reshape the cross-channel residual blocks, strengthening the ability to represent key features, reducing redundant information interference, and improving the model's adaptability to complex infrared detection scenarios such as target overlap.

[0024] Furthermore, this method proposes a dynamic fusion detection head as the output network, which can selectively fuse features for multi-scale images, effectively suppress inter-layer feature information conflicts, significantly improve the expressive power of the target detection head, and does not require extra computational overhead, thereby improving the expressive power and task adaptability of the infrared ship detection model. Attached Figure Description

[0025] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0026] Figure 1 This is a flowchart of the multi-scale infrared ship detection method based on dynamic adaptive convolution provided by the present invention.

[0027] Figure 2 This is a schematic diagram of the network structure of the single-stage anchorless detection algorithm provided by the present invention.

[0028] Figure 3 This is a flowchart of a multi-scale aggregation modulation layer.

[0029] Figure 4 This is a flowchart of dynamic adaptive convolution.

[0030] Figure 5 This is a flowchart of multi-branch fusion attention.

[0031] Figure 6 This is a flowchart of the dynamic fusion detection head.

[0032] Figure 7 This is a comparative experimental result of our detection method compared with other mainstream detection methods.

[0033] Figure 8 This is a visual schematic diagram of the detection method based on multi-scale infrared ship images. Detailed Implementation

[0034] To enable those skilled in the art to better understand the present invention, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present invention.

[0035] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this invention are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of the invention described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.

[0036] like Figure 1 As shown, this invention provides a multi-scale infrared ship detection method based on dynamic adaptive convolution, characterized by comprising: S1. Establish a multi-scale ship image dataset based on infrared detection technology, and preprocess and divide the dataset; In a specific implementation, as a preferred embodiment of the present invention, the locality-sensitive hashing algorithm is used to evaluate the image redundancy of the infrared maritime ship dataset, and images with duplicate and invalid annotations are removed; images from the multi-scale ship image dataset detected by infrared detection technology are read in, and the Haar wavelet transform method is used to enhance image details to generate image set A; images from image set A are read in, and the image sizes are scaled to a uniform size to generate image set B to be trained, and the training set and test set are randomly divided in a 7:3 ratio.

[0037] S2. Design a multi-scale aggregation modulation layer, use dynamic adaptive convolution to reshape cross-channel residual blocks into dynamic adaptive residual blocks, and build a backbone network that applies dynamic adaptive residual blocks and multi-scale aggregation modulation layer. In a preferred embodiment of the present invention, the design of a multi-scale aggregation modulation layer and the construction of an aggregation module enhance the model's ability to capture details of targets at different scales. The calculation process includes: splitting the input feature map into a first feature map, a second feature map, and a third feature map along the functional dimension; connecting the first feature map to the interaction module using linear aggregation; connecting the second feature map to the aggregation module using linear aggregation; and connecting the third feature map to the aggregation module using lightweight linear aggregation; performing context processing on the second feature map using multi-scale separable convolution; performing activation enhancement processing on the outputs of the first, second, and third layers of the second feature map to obtain intermediate separation quantities; performing activation enhancement and global average pooling processing on the fourth layer output of the second feature map to obtain output separation quantities; performing dynamic gated aggregation processing on the third feature map to obtain multi-layer gated output quantities; and multiplying the intermediate separation quantities and output separation quantities element-wise to gate the output quantities and performing feature fusion to output a new feature map.

[0038] An interactive module is built to enhance the model's ability to distinguish target features at different levels. The calculation process includes: splitting the input feature map into a first feature map, a second feature map, and a third feature map along the functional dimension; connecting the first feature map to the interactive module using linear aggregation; connecting the second feature map to the aggregation module using linear aggregation; and connecting the third feature map to the aggregation module using lightweight linear aggregation; querying the first feature map to obtain the query quantity q; modulating the feature map output by the aggregation module to obtain the modulation quantity; and multiplying the query quantity q and the modulation quantity element-wise to output a new feature map.

[0039] The modules constructed in the above steps are combined and connected to the input and output to obtain a multi-scale aggregation modulation layer.

[0040] The method uses dynamic adaptive convolution to build a pre-global context module, which enhances the ability to extract convolutional information. The calculation process includes: inputting two branches in parallel to the input feature map, with the number of channels in each branch being the same as the number of channels in the original input; performing global average pooling on branch one; performing 1×1 convolution on branch one; and fusing branch two as a residual connection with the output features of branch one to output a new feature map.

[0041] A switchable convolution module is constructed to enhance the multi-scale feature adaptation capability of convolution. The calculation process includes: inputting the input feature map into three branches in parallel, with the number of channels in each branch being the same as the number of channels in the original input; performing a 3×3 convolution on branch one; performing 5×5 average pooling on branch two; performing a 1×1 convolution on branch two; performing a 3×3 convolution on branch three; multiplying the output of branch two element-wise with the outputs of branch one and branch three respectively, and fusing the features to output a new feature map.

[0042] After building the global context module, the ability to integrate global information of convolution is enhanced. The calculation process includes: inputting two branches in parallel to the input feature map, with the number of channels in each branch being the same as the number of channels in the original input; performing global average pooling on branch one; performing 1×1 convolution on branch one; and fusing branch two as a residual connection with the output features of branch one to output a new feature map.

[0043] Combining the modules constructed in the above steps yields a dynamic adaptive convolution. The convolutional modules in the cross-channel residual block are replaced with dynamic adaptive convolutions; the modules in the cross-channel residual block are then integrated to obtain the dynamic adaptive residual block.

[0044] Finally, a backbone network is built by organically combining dynamic adaptive residual blocks, downsampling convolutional blocks, and multi-scale aggregation modulation layers for the image feature extraction stage.

[0045] S3. Define the multi-branch fusion attention reshaping cross-channel residual block as a multi-branch fusion residual block, and build a neck network that applies the multi-branch fusion residual block; In a specific implementation, as a preferred embodiment of the present invention, the multi-branch fusion attention calculation process includes: performing a 1×1 convolution on the input feature map; dividing the output feature map into two branches along the channel dimension, with each branch having half the number of channels as the original input channels; performing global average pooling on branch one; inputting branch one into four fully connected layers in parallel, performing ReLU activation, and concatenating the channels; using branch two as a residual connection to fuse the channel concatenation output of branch one; and performing a 1×1 convolution on the fused feature map to output a new feature map.

[0046] After outputting the convolutional modules in the cross-channel residual block, multi-branch fusion attention is added; the modules in the cross-channel residual block are integrated to obtain the multi-branch fusion residual block.

[0047] Finally, the cross-channel residual block, multi-branch fusion residual block, upsampling layer and downsampling convolution block are organically combined to build a neck network for the image feature fusion stage.

[0048] S4. Design a dynamic fusion detection head as the output network; In a preferred embodiment of the present invention, the input module is constructed to output a dynamic fusion module for multi-scale features; the dynamic fusion module adaptively and dynamically fuses the input multi-scale features and outputs a prediction module. The calculation process includes: upsampling and downsampling the multi-scale input feature map; performing a 1×1 convolution on the multi-scale input feature map; concatenating the multi-scale input feature maps; performing a 1×1 convolution on the multi-scale input feature map; performing Softmax activation on the multi-scale input feature map; performing feature fusion on the multi-scale input feature map; performing a 3×3 convolution on the multi-scale input feature map to output a new feature map; and constructing a prediction module to output the core parameters of the detected target.

[0049] The modules constructed in the above steps are combined to obtain a dynamic fusion detection head, which serves as the output network to receive input feature maps and output prediction results.

[0050] S5. Combine the backbone network, neck network and output network to build a multi-scale infrared ship detection network based on dynamic adaptive convolution; In a specific implementation, as a preferred embodiment of the present invention, the following steps are taken: First, an input layer is constructed to receive input multi-scale infrared ship images; a backbone network using dynamic adaptive residual blocks and multi-scale aggregation modulation layers is constructed to extract input image features; a neck network using multi-branch fusion residual blocks is constructed to fuse multi-scale features; the fused features are then connected to a dynamic fusion detection head to generate detection results; and the networks constructed in the above steps are combined to obtain a multi-scale infrared ship detection network based on dynamic adaptive convolution.

[0051] S6. Train the multi-scale infrared ship detection network on the dataset to obtain a multi-scale infrared ship detection model. Deploy the multi-scale infrared ship detection model and adjust the hyperparameters for use in multi-scale infrared ship detection tasks. In a specific implementation, as a preferred embodiment of the present invention, the following steps are taken: deploying a multi-scale infrared ship detection model on the detection device; reading the parameter configuration file and loading the pre-trained model weights; adaptively adjusting the hyperparameters and configuring the model processing rate; reading the input image and preprocessing the input image; sending the preprocessed image into the multi-scale infrared ship detection model to perform target prediction; and visualizing the location and category information in the detection results.

[0052] Example like Figure 1 As shown, this invention provides a multi-scale infrared ship detection method based on dynamic adaptive convolution.

[0053] The locality-sensitive hashing algorithm was used to evaluate the image redundancy of the infrared maritime vessel dataset, removing duplicate and invalid labeled images. A total of 6793 images were ultimately retained, representing 72.3% of the original dataset. The dataset includes seven types of objects: sailboats, container ships, warships, kayaks, bulk carriers, cruise ships, and fishing boats, and is randomly divided into training and test sets in a 7:3 ratio.

[0054] Training Process: All experiments were conducted on an Ubuntu 22.04 workstation equipped with an Intel Core i7 processor, three NVIDIA RTX A6000 GPUs, and 32GB of memory. Experiments were implemented using PyTorch 2.0.0, CUDA 11.8, and cuDNN 8.7.0, with 200 training epochs. Experimental results show that the multi-scale infrared ship detection method based on dynamic adaptive convolution improves accuracy, AP, and AP50 metrics by 0.41%, 0.41%, and 0.23%, respectively, compared to the baseline model.

[0055] like Figure 2 As shown, this embodiment provides a single-stage anchorless frame detection algorithm.

[0056] like Figure 3 As shown, this embodiment provides a multi-scale aggregation modulation layer for building a backbone network that applies the multi-scale aggregation modulation layer. The aggregation module is constructed as follows: the input feature map is split into a first feature map, a second feature map, and a third feature map along the functional dimension; the first feature map is connected to the interaction module using linear aggregation; the second feature map is connected to the aggregation module using linear aggregation; and the third feature map is connected to the aggregation module using lightweight linear aggregation. The second feature map undergoes context processing using multi-scale separable convolution; the outputs of the first, second, and third layers of the second feature map are activated and enhanced to obtain intermediate separation values; the fourth layer output of the second feature map is activated and enhanced, and subjected to global average pooling to obtain output separation values; the third feature map undergoes dynamic gated aggregation to obtain multi-layer gated output values; the intermediate separation values ​​and output separation values ​​are multiplied element-wise to gate the output values, and feature fusion is performed to output a new feature map. The interaction module is constructed as follows: The input feature map is split into a first feature map, a second feature map, and a third feature map along the functional dimension. The first feature map is connected to the interaction module using linear aggregation, the second feature map is connected to the aggregation module using linear aggregation, and the third feature map is connected to the aggregation module using lightweight linear aggregation. The first feature map is queried to obtain the query quantity q. The feature map output by the aggregation module is modulated to obtain the modulation quantity. The query quantity q and the modulation quantity are multiplied element-wise to output a new feature map.

[0057] The modules constructed in the above steps are combined and connected to the input and output to obtain a multi-scale aggregation modulation layer.

[0058] like Figure 4 As shown, this embodiment provides a dynamic adaptive convolution to reshape cross-channel residual blocks into dynamic adaptive residual blocks. A pre-global context module is constructed: two branches are input in parallel to the input feature map, with each branch having the same number of channels as the original input; global average pooling is performed on branch one; convolution is performed on branch one; branch two is used as a residual connection and fused with the output features of branch one to output a new feature map. A switchable convolution module is constructed: three branches are input in parallel to the input feature map, with each branch having the same number of channels as the original input; 3×3 convolution is performed on branch one; 5×5 average pooling is performed on branch two; 1×1 convolution is performed on branch two; 3×3 convolution is performed on branch three; the output of branch two is multiplied element-wise with the outputs of branch one and branch three respectively, and the features are fused to output a new feature map. The global context module is built as follows: the input feature map is input into two branches in parallel, and the number of channels in each branch is the same as the number of channels in the original input; global average pooling is performed on branch one; 1×1 convolution is performed on branch one; branch two is used as a residual connection to fuse with the output features of branch one to output a new feature map.

[0059] Combining the modules constructed in the above steps yields a dynamic adaptive convolution. The convolutional modules in the cross-channel residual block are replaced with dynamic adaptive convolutions; the modules in the cross-channel residual block are then integrated to obtain the dynamic adaptive residual block.

[0060] like Figure 5 As shown, this embodiment provides a multi-branch fusion attention method for reshaping cross-channel residual blocks, defined as multi-branch fusion residual blocks. A 1×1 convolution is performed on the input feature map; the output feature map is divided into two branches along the channel dimension, with each branch having half the number of channels as the original input; global average pooling is performed on branch one; branch one is input to four fully connected layers in parallel, ReLU activation is performed, and channel concatenation is performed; branch two is used as a residual connection to fuse the channel concatenation output of branch one; the fused feature map is then convolved with a 1×1 convolution to output a new feature map.

[0061] After outputting the convolutional modules in the cross-channel residual block, multi-branch fusion attention is added; the modules in the cross-channel residual block are integrated to obtain the multi-branch fusion residual block.

[0062] like Figure 6As shown, this embodiment provides a dynamic fusion detection head as the output network. An input module is built separately, and the multi-scale features are output to the dynamic fusion module. The dynamic fusion module is constructed by: upsampling and downsampling the multi-scale input feature maps; performing 1×1 convolutions on the multi-scale input feature maps; concatenating the multi-scale input feature maps; performing 1×1 convolutions on the multi-scale input feature maps; applying Softmax activation to the multi-scale input feature maps; fusing the multi-scale input feature maps; and performing 3×3 convolutions on the multi-scale input feature maps to output a new feature map. A prediction module is also built to output the core parameters of the detected target.

[0063] The modules constructed in the above steps are combined to obtain a dynamic fusion detection head, which serves as the output network to receive input feature maps and output prediction results.

[0064] like Figure 7 As shown, this embodiment provides comparative experimental results of the multi-scale infrared ship detection method based on dynamic adaptive convolution with other mainstream detection methods in the multi-scale infrared ship detection task.

[0065] like Figure 8 As shown in the figure, this embodiment provides a visualization diagram of the detection results of a multi-scale infrared ship detection method based on dynamic adaptive convolution. The position of the detection box represents the spatial position of the target, and the number above the detection box represents the confidence level of the detection result.

[0066] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some or all of the technical features therein. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A multi-scale infrared ship detection method based on dynamic adaptive convolution, characterized in that... include: A multi-scale ship image dataset based on infrared detection technology was established, and the dataset was preprocessed and partitioned. Design a multi-scale aggregation modulation layer, use dynamic adaptive convolution to reshape cross-channel residual blocks into dynamic adaptive residual blocks, and build a backbone network with dynamic adaptive residual blocks and multi-scale aggregation modulation layer; A multi-branch fusion attention approach is used to reshape cross-channel residual blocks into multi-branch fusion residual blocks, and a neck network applying multi-branch fusion residual blocks is built. Design a dynamic fusion detection head and use it as the output network; By combining the backbone network, neck network, and output network, a multi-scale infrared ship detection network based on dynamic adaptive convolution is constructed. The multi-scale infrared ship detection network is trained on the dataset to obtain a multi-scale infrared ship detection model. The multi-scale infrared ship detection model is then deployed and its hyperparameters are adjusted for use in multi-scale infrared ship detection tasks.

2. The multi-scale infrared ship detection method based on dynamic adaptive convolution according to claim 1, characterized in that: When preprocessing the multi-scale ship image dataset: the locality-sensitive hashing algorithm is used to evaluate the image repetition of the multi-scale ship image dataset, and duplicate and invalid labeled images are removed; the Haar wavelet transform method is used to enhance image details and generate image set A; the images in image set A are read, the image size is scaled to a uniform size, and the image set to be trained B is generated, and the training set and test set are randomly divided according to a 7:3 ratio.

3. The multi-scale infrared ship detection method based on dynamic adaptive convolution according to claim 1, characterized in that: When designing a multi-scale aggregation modulation layer: build an aggregation module to enhance the model's ability to capture details of targets at different scales; build an interaction module to enhance the model's ability to distinguish features of targets at different levels; combine the aggregation module and the interaction module and connect the input and output parts to obtain the multi-scale aggregation modulation layer.

4. The multi-scale infrared ship detection method based on dynamic adaptive convolution according to claim 1, characterized in that: When reshaping cross-channel residual blocks into multi-branch fusion residual blocks using a multi-branch fusion attention approach: a pre-global context module is built to enhance the ability to extract convolutional information; a switchable convolution module is built to enhance the ability to adapt to multi-scale features of convolution; a post-global context module is built to enhance the ability to integrate global information of convolution; and the modules constructed in the above steps are combined to obtain a dynamic adaptive residual block.

5. The multi-scale infrared ship detection method based on dynamic adaptive convolution according to claim 1, characterized in that: The output network includes: The input module outputs a dynamic fusion module that combines multi-scale features. A dynamic fusion module is built to adaptively and dynamically fuse input multi-scale features and output a prediction module; Build a prediction module to output the core parameters of the detected target; The modules constructed in the above steps are combined to obtain a dynamic fusion detection head, which serves as the output network to receive input feature maps and output prediction results.

6. The multi-scale infrared ship detection method based on dynamic adaptive convolution according to claim 3, characterized in that: The aggregation module processes the ship images as follows: The input feature map is split into a first feature map, a second feature map, and a third feature map along the functional dimension. The first feature map is connected to the interaction module using a linear aggregation method, the second feature map is connected to the aggregation module using a linear aggregation method, and the third feature map is connected to the aggregation module using a lightweight linear aggregation method. The second feature map is processed using multi-scale separable convolution; The outputs of the first, second, and third layers of the second feature map are activated and enhanced to obtain the intermediate separation factor. The fourth layer output of the second feature map is subjected to activation enhancement and global average pooling to obtain the output separation. Dynamic gating aggregation processing is performed on the third feature map to obtain multi-layer gating output; The intermediate separation quantity and the output separation quantity are multiplied element-wise to gate the output quantity, and features are fused to output a new feature map.

7. The multi-scale infrared ship detection method based on dynamic adaptive convolution according to claim 3, characterized in that: The interactive module processes the ship images as follows: The input feature map is split into a first feature map, a second feature map, and a third feature map along the functional dimension. The first feature map is connected to the interaction module using a linear aggregation method, the second feature map is connected to the aggregation module using a linear aggregation method, and the third feature map is connected to the aggregation module using a lightweight linear aggregation method. The query volume q is obtained by querying the first feature map. The feature map output by the aggregation module is modulated to obtain the modulation amount; The query quantity q is multiplied element by element by the modulation quantity to output a new feature map.

8. The multi-scale infrared ship detection method based on dynamic adaptive convolution according to claim 4, characterized in that: The pre-global context module processes the ship image as follows: Two branches are input in parallel to the input feature map, and the number of channels in each branch is the same as the number of channels in the original input. Perform global average pooling on branch one; Perform convolution processing on branch one; Branch 2, acting as a residual connection, is fused with the output features of Branch 1 to produce a new feature map.

9. The multi-scale infrared ship detection method based on dynamic adaptive convolution according to claim 4, characterized in that: When building a multi-scale infrared ship detection network: An input layer is constructed to receive multi-scale infrared ship images. A backbone network with dynamic adaptive residual blocks and multi-scale aggregation modulation layers is constructed to extract features from the input image. Construct a neck network that applies multi-branch fusion residual blocks to fuse multi-scale features; The fused features are fed into the dynamic fusion detection head to generate detection results; By combining the input layer, backbone network, neck network, and dynamic fusion detection head, a multi-scale infrared ship detection network based on dynamic adaptive convolution is obtained.

10. A multi-scale infrared ship detection method based on dynamic adaptive convolution according to claim 4, characterized in that: When deploying a multi-scale infrared ship detection network: Deploy the multi-scale infrared ship detection model on the detection equipment; Read the parameter configuration file and load the pre-trained model weights; Adaptively adjust hyperparameters and configure model processing rate; read input image and preprocess it; The preprocessed image is fed into a multi-scale infrared ship detection model to perform target prediction. Visualize the location and category information in the detection results.