A two-stage small sample target detection method based on an optimized CBAM attention mechanism

By optimizing the CBAM attention mechanism in small-sample object detection, inserting multi-scale convolution and pooling ratio parameters, and combining residual connections, the problems of insufficient perception and overfitting of the model for features at different scales are solved, thereby improving detection accuracy and robustness.

CN118570603BActive Publication Date: 2026-06-23TONGJI UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
TONGJI UNIV
Filing Date
2024-04-29
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Existing technologies suffer from overfitting in small sample object detection, and the models lack the ability to perceive features at different scales, failing to effectively mitigate the adverse effects of sparse target scale distribution in small sample datasets.

Method used

A two-stage few-shot object detection method based on an optimized CBAM attention mechanism is adopted. By inserting an improved CBAM attention module into the feature extraction backbone network and adding multi-scale convolution, pooling ratio parameters and residual connections, the channel and spatial attention parts are optimized to improve feature extraction capability and robustness.

Benefits of technology

It improves the model's ability to identify and generalize targets at different scales, reduces the impact of scale correlation, alleviates overfitting problems, and enhances detection accuracy and robustness.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN118570603B_ABST
    Figure CN118570603B_ABST
Patent Text Reader

Abstract

The application relates to the field of small sample target detection, in particular to a two-stage small sample target detection method based on an optimized CBAM attention mechanism, and comprises the following steps: training a two-stage target detection network Faster-RCNN by using a base class data set to obtain a base class detection model; freezing parameters of a feature extraction backbone network in the base class detection model; optimizing a CBAM attention mechanism module; placing the optimized CBAM attention module in the feature extraction backbone network to construct a detection network, then inputting a new class small sample data set with a small amount of labeled information to fine-tune parameters of a detection head part of the detection network; and inputting a to-be-detected data set into the detection network to obtain a detection result. Compared with the prior art, the application has the advantages of inhibiting the influence of unimportant spatial information, improving the attention degree of important spatial information, enhancing the sensitivity to different scale features, and having strong generalization ability and robustness and the like.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of small sample target detection, and in particular to a two-stage small sample target detection method based on an optimized CBAM attention mechanism. Background Technology

[0002] Object detection, which involves selecting any number of objects of interest in an image and identifying their categories, is a key area of ​​computer vision. However, in real-world environments, objects exhibit a long-tailed distribution, making it difficult to obtain samples for many categories. Furthermore, labeling samples and training models consumes significant resources. Therefore, object detection based on a limited number of samples is of great practical significance. To address this, few-shot object detection involves training a base class detection model using a base class dataset with sufficient annotation information, and then using a new class dataset with limited annotation information and the prior knowledge of the base class model to predict new class objects.

[0003] Due to the significant imbalance between the base class and new class data, small-sample models are prone to overfitting. To address this challenge, most algorithms directly introduce existing attention mechanisms into the backbone network. This strategy fails to effectively improve the network's ability to perceive features at different scales and does not alleviate the adverse effects of the sparse target scale distribution in small-sample datasets. In the prior art, Chinese patent CN 116977716 A discloses a method for target detection in small-sample remote sensing images. This invention directly cascades a CBAM attention mechanism module into the feature extraction backbone network, passing the feature map through channel attention and spatial attention mechanisms sequentially to alleviate the difficulty of representing small targets in remote sensing images. However, this invention does not adjust the model to suit the small target scale characteristics of remote sensing tasks, resulting in a less flexible model and significant room for performance improvement. Therefore, to improve the network's sensitivity to features at different scales and its attention to important spatial information, this invention provides a two-stage small-sample target detection method based on an optimized CBAM attention mechanism. Summary of the Invention

[0004] The purpose of this invention is to overcome the shortcomings of the existing technology and provide a two-stage small sample target detection method based on an optimized CBAM attention mechanism.

[0005] The objective of this invention can be achieved through the following technical solutions:

[0006] This invention provides a two-stage few-shot target detection method based on an optimized CBAM attention mechanism, comprising the following steps:

[0007] Step S1: Train the two-stage object detection network Faster-RCNN using the base class dataset to obtain the base class detection model;

[0008] Step S2: Freeze the parameters of the feature extraction backbone network in the base class detection model;

[0009] Step S3: Optimize the CBAM attention mechanism module;

[0010] Step S4: Place the optimized CBAM attention module in the feature extraction backbone network to build the detection network. Then, use the new class small sample dataset with a small amount of labeled information as input to fine-tune the parameters of the detection head part of the detection network.

[0011] Step S5: Input the dataset to be detected into the detection network to obtain the detection results.

[0012] Step S1 specifically includes:

[0013] Step S11: Construct the base class dataset;

[0014] Step S12, Model Selection: Select the two-stage object detection network Faster-RCNN as the base model, and load the pre-trained VGG16 network as the feature extraction backbone network;

[0015] Step S13, Model Building: Build the Faster-RCNN model, including the region proposal network, object classifier, and bounding box regressor, and set the model's hyperparameters;

[0016] Step S14: Define the loss function: Select the classification cross-entropy loss as the classification loss, and select the smoothing L1 loss as the bounding box regression loss;

[0017] Step S15, Model Training: Input the base class dataset into the model, update the model parameters through the backpropagation algorithm to minimize the loss function, and dynamically adjust the model's hyperparameters.

[0018] Furthermore, placing the optimized CBAM attention module in the feature extraction backbone network includes placing the optimized CBAM attention mechanism module between two convolutional layers in the feature extraction backbone network where the number of channels changes.

[0019] Furthermore, the two convolutional layers specifically include: inserting CBAM modules with a set number of channels between layers 2 and 3, layers 5 and 6, and layers 9 and 10 of the VGG16 network structure, respectively. The structure of the feature extraction network is as follows: conv3-64, conv3-64, A-64, conv3-128, conv3-128, A-128, conv3-256, conv3-256, conv3-256, A-256, conv3-512, conv3-512, conv3-512, conv3-512, conv3-512, conv3-512, FC-4096, FC-4096, FC-1000.

[0020] Freezing the parameters of the feature extraction backbone network includes setting its "required_grad" attribute to "False".

[0021] The optimized CBAM attention mechanism module includes: optimizing the channel attention part and spatial attention part of the CBAM attention mechanism module, and adding residual connections;

[0022] Step S31: In the channel attention part of the optimized CBAM attention mechanism module, grouped convolution is used to divide the input feature map X into two groups in the channel dimension. Convolution operation is performed on each group using convolution kernels of different sizes. The two groups of input feature maps are then subjected to global max pooling and global average pooling based on width and height, respectively. They are then fed into a two-layer neural network. The output features are fused using element-wise summation, and the two groups of feature maps are concatenated to obtain the channel attention part weights. The channel attention part weights and the input feature map X are multiplied element-wise to generate the input features required for the spatial attention part.

[0023] Step S32: In the spatial attention optimization part, firstly, max pooling and average pooling operations are performed on the input feature map X based on the feature dimension; the pooling weight parameters are introduced to adjust the ratio of max pooling features to average pooling features; the two pooled feature maps are concatenated based on the channel dimension to obtain a concatenated feature map, and the channel dimension is reduced to 1 through convolution operation to obtain the weights of the spatial attention part; the dimensionality-reduced feature map is multiplied element-wise with the input features generated in step S31 to obtain the fused final feature X. s ;

[0024] Step S33: Introduce residual connections into the optimized CBAM module to fuse the final feature X. s Add the input feature map X.

[0025] Furthermore, the channel attention component weights are calculated using the following formula:

[0026] A c1 =sigmoid(MLP(AvgPool(X1))+MLP(MaxPool(X1)))

[0027] A c2 =sigmoid(MLP(AvgPool(X2))+MLP(MaxPool(X2)))

[0028] A c =concat([A c1 A c2 ],dim=1)

[0029] Among them, A c1 A c2 The weights of the two groups are A and B, respectively. c The channel attention component weights.

[0030] Furthermore, the weights of the spatial attention component are calculated using the following formula:

[0031] A s =sigmoid(conv) 7×7 (concat[ωAvgPool(X),(1-ω)MaxPool(X)]))

[0032] Where X is the input feature map, AvgPool(X) is average pooling, MaxPool(X) is max pooling, and ω is the pooling weight parameter.

[0033] The number of samples in each category of the new class small sample dataset with limited annotation information is specified by the task.

[0034] Furthermore, the number of samples included in each category is set to 1, 2, 3, 5, and 10, respectively.

[0035] Compared with the prior art, the present invention has the following beneficial effects:

[0036] 1) This invention adds multi-scale convolution to the channel attention part of the original CBAM attention module, which helps the model acquire multi-level and multi-scale feature information, improves the object detection model's ability to identify and generalize targets of different scales, enhances feature expression, reduces the impact of scale correlation, and improves the sensitivity to features of different scales, thereby alleviating the problem of scale sparsity in small sample object detection.

[0037] 2) This invention adds a pooling ratio parameter to the spatial attention part of the original CBAM attention module, flexibly and dynamically adjusting the proportion of the two pooling methods. This helps the model obtain richer information during feature extraction, increases the attention to important spatial information, and comprehensively considers the importance of local saliency and global context based on the features of small sample targets, thereby improving the robustness and generalization ability of the model.

[0038] 3) This invention introduces residual connections into the optimized CBAM attention module, which effectively strengthens information transmission and gradient flow, alleviates optimization difficulty, maintains the balance between model depth and performance, promotes model training convergence, and improves the model's expressive ability. It can effectively alleviate the overfitting problem that is common in small sample target detection and improve detection accuracy.

[0039] 4) This invention places the improved CBAM attention mechanism module between two convolutional layers in the feature extraction backbone network where the number of channels changes. This fully utilizes the information changes in the channel dimension, which helps to establish an effective correlation between the low-level features and high-level semantic features of the model. This enables the module to effectively learn the correlation between features and improve the performance of small sample object detection. Attached Figure Description

[0040] Figure 1 This is a flowchart illustrating the implementation of the present invention;

[0041] Figure 2 This is an overall framework diagram of the present invention;

[0042] Figure 3 This is a diagram of the improved feature extraction backbone network structure of the present invention;

[0043] Figure 4 This is a structural diagram of the improved CBAM module of the present invention;

[0044] Figure 5 This is a structural diagram of the channel portion of the improved CBAM module of the present invention;

[0045] Figure 6 This is a structural diagram of the spatial portion of the improved CBAM module of the present invention. Detailed Implementation

[0046] The present invention will now be described in detail with reference to the accompanying drawings and specific embodiments. These embodiments are based on the technical solution of the present invention and provide detailed implementation methods and specific operating procedures. However, the scope of protection of the present invention is not limited to the following embodiments.

[0047] Example

[0048] This embodiment provides a two-stage few-sample target detection method based on an optimized CBAM attention mechanism, such as... Figure 1 and Figure 2 As shown, it includes the following steps:

[0049] Step 1: Train the two-stage object detection network Faster-RCNN using a base class dataset with a large amount of labeled information to obtain the base class detection model.

[0050] Specifically, the base class detection model in step 1 is obtained through the following calculation method:

[0051] Step 11) Construct the base class dataset. This example uses the PASCAL VOC 2012 dataset for experiments. The PASCAL VOC 2012 dataset contains 20 categories, from which 15 categories are selected as the base classes for this experiment. Specifically, the 15 selected base classes are: aeroplane, bicycle, boat, bottle, car, cat, chair, dining table, dog, horse, person, potted plant, sheep, train, and tvmonitor. For each base class, the number of samples is greater than 300.

[0052] Step 12) Model selection. The two-stage object detection network Faster-RCNN was selected as the base model, and a pre-trained VGG16 network was loaded as the feature extraction backbone network.

[0053] Step 13) Model Construction. Construct the Faster-RCNN model, which includes a Region Proposal Network (RPN), an object classifier, and a bounding box regressor. Set the model's hyperparameters, including: initial learning rate of 0.001 and batch size of 16.

[0054] Step 14) Define the loss function. Choose the classification cross-entropy loss as the classification loss and the smoothing L1 loss as the bounding box regression loss. Specifically:

[0055]

[0056]

[0057] in, This indicates that the model's predicted probability of the target class depends on the target's true class. This indicates the location of the bounding box predicted by the model. This indicates the actual location of the bounding box.

[0058] Step 15) Model Training. Input the base class dataset into the model, update the model parameters through the backpropagation algorithm to minimize the loss function, and dynamically adjust the model's hyperparameters.

[0059] Step 2: Freeze the parameters of the feature extraction backbone network, RPN, and RoI parts in the base class detection model. Specifically, set the "required_grad" attribute of the above parameters to "False".

[0060] Step 3: Optimize the CBAM attention mechanism module, such as... Figure 4 As shown, this includes optimizing the channel attention and spatial attention parts of the CBAM attention mechanism module, as well as adding residual connections.

[0061] Specifically, the CBAM module in step 3 is optimized using the following calculation method:

[0062] Step 3.1) In the channel attention section, such as Figure 5 As shown, the input feature map X is divided into two sub-features by average grouping along the channel dimension. These sub-features are then passed through two different convolutional layers, each using a different kernel size (kernels of size 5 and 7 are used in this embodiment), without changing the channel dimension. Specifically, let the input feature map be X, and let the dimension of X be C×H×W. Then the dimensions of the two sub-feature maps are C / 2×H×W.

[0063] Let the input feature map be Where C is the number of channels, and H and W are the height and width of the feature map, the two groups of features are respectively

[0064] The formula for calculating the convolution process is:

[0065] X1 = conv1(X1)

[0066] X2 = conv2(X2)

[0067] Here, Conv1 and Conv2 represent convolution operations with different kernel sizes.

[0068] Step 3.2) Perform global max pooling and global average pooling based on width and height, respectively, on the two sub-feature maps. Specifically, after this step, four output features are obtained, each with a dimension of C / 2×1×1.

[0069] Step 3.3) Feed the pooled features into a two-layer neural network (MLP), each MLP containing two fully connected layers to learn the non-linear mapping of the features. After this step, the four feature dimensions of the output remain unchanged.

[0070] Step 3.4) Feature Fusion. The features output by the MLP are summed element-wise, and finally, the two sets of features are concatenated to obtain the weight A. c . Specifically, A c The dimension is C×1×1.

[0071] Step 3.5) Place A c Element-wise multiplication is performed with the input feature map X to generate the input features needed by the spatial attention module. In summary, the formula for calculating the channel-specific attention weights is:

[0072] A c1 =sigmoid(MLP(AvgPool(X1))+MLP(MaxPool(X1)))

[0073] A c2 =sigmoid(MLP(AvgPool(X2))+MLP(MaxPool(X2)))

[0074] A c =concat([A c1 A c2 ],dim=1)

[0075] Among them, A c1 A c2 The weights of the two groups are A and B, respectively. c The channel attention component weights.

[0076] Step 3.6) In the spatial attention part, such as Figure 6 As shown, firstly, max pooling and average pooling operations are performed on the input feature map X based on the channel dimension, resulting in two pooled feature maps. Specifically, let the input feature map be X, and the dimension of X be C×H×W, then the dimension of the two pooled feature maps is 1×H×W.

[0077] Step 3.7) Introduce pooling weight parameters, with values ​​ranging from 0 to 1, to adjust the ratio between max pooling and average pooling features, controlling the importance of different pooling features. After this step, the dimensions of the two features remain unchanged.

[0078] Step 3.8) Concatenate the two pooled feature maps based on their channel dimensions to obtain a concatenated feature map with dimensions of 2×H×W. Then, reduce the channel dimension to 1 through a convolution operation to obtain the weights A. s A s Its dimensions are 1×H×W.

[0079] Step 3.9) Multiply the dimensionality-reduced feature map element-wise with the input features of the module to obtain the fused features, which is also the final output feature X of the improved CBAM module. s In summary, the formula for calculating the attention weights in the spatial component is:

[0080] A s =sigmoid(conv) 7×7 (concat[ωAvgPool(X),(1-ω)MaxPool(X)]))

[0081] Where X is the input feature map, AvgPool(X) is average pooling, MaxPool(X) is max pooling, and ω is the pooling weight parameter.

[0082] Step 3.10) Compare the feature map X input to the CBAM module with the feature map X output at the end. s Perform an addition operation to construct a residual connection.

[0083] Step 4: Place the optimized CBAM attention module obtained in step S3 into the feature extraction backbone network to construct the detection network. Then, use the new class small sample dataset with a small amount of labeled information as input to fine-tune the parameters of the detection head part of the detection network.

[0084] Specifically, the optimized CBAM attention module's position within the feature extraction backbone network can be found by referring to... Figure 3 An improved CBAM module with 64 input channels is inserted between layers 2 and 3 of the original VGG16 network structure; an improved CBAM module with 128 input channels is inserted between layers 5 and 6 of the original VGG16 network structure; and an improved CBAM module with 256 input channels is inserted between layers 9 and 10 of the original VGG16 network structure. After these additions, the structure of the feature extraction network is as follows: conv3-64, conv3-64, A-64, conv3-128, conv3-128, A-128, conv3-256, conv3-256, conv3-256, A-256, conv3-512, conv3-512, conv3-512, conv3-512, conv3-512, conv3-512, FC-4096, FC-4096, FC-1000, for a total of 19 layers. Here, conv3-xxx refers to a convolutional layer with a kernel size of 3×3 and xxx channels, FC-xxx refers to a fully connected layer with xxx nodes, and A-xxx refers to an improved CBAM attention mechanism module with xxx input channels.

[0085] Specifically, the new class small sample dataset consists of 5 other classes in the PASCAL VOC 2012 dataset besides the 15 classes of the base class, namely: cow, bus, motorbike, sofa, and bird.

[0086] Step 5: Input the detection dataset containing the new class samples into the detection network to obtain the detection results. Specifically, the test dataset contains both base class data and new class data, but when calculating the detection accuracy, the detection accuracy of the base class and the new class will be counted separately to intuitively understand the network's detection performance for small sample classes.

[0087] The number of samples in each class of a new class small sample dataset with limited annotation information is specified by the task. In the k-shot task, each class of the small sample dataset will contain k samples, where k is usually equal to 5, 10, 30, etc., which is much smaller than the size of the base class dataset.

[0088] In this embodiment, the PASCAL VOC 2012 dataset was used as the test data. The number of samples for each small sample category was set to 1, 2, 3, 5, and 10 respectively, following the conventional settings for small sample object detection tasks. The detection accuracy results are shown in Tables 1 to 5. Furthermore, the experimental conditions and operations before the improvement were basically the same as after the improvement, except that no improvements were made to the CBAM module of the inserted feature extraction backbone network; all other experimental conditions and operations were exactly the same.

[0089] Table 1. Accuracy comparison before and after improvement under 1-shot settings

[0090] Cow Motorbike Bus Sofa Bird AP (before improvement) (%) 49.12 38.89 37.98 8.01 3.75 AP (after improvement) (%) 49.25 37.48 41.53 8.55 2.69

[0091] Table 2. Accuracy comparison before and after improvement under 2-shot settings

[0092] Cow Motorbike Bus Sofa Bird AP (before improvement) (%) 61.73 50.11 37.28 22.54 6.49 AP (after improvement) (%) 62.36 50.02 37.77 20.48 9.57

[0093] Table 3. Accuracy comparison before and after improvement under 3-shot settings

[0094] Cow Motorbike Bus Sofa Bird AP (before improvement) (%) 60.18 55.24 49.52 24.13 15.68 AP (after improvement) (%) 60.84 56.86 47.48 28.91 13.36

[0095] Table 4. Accuracy comparison before and after improvement under 5-shot settings

[0096] Cow Motorbike Bus Sofa Bird AP (before improvement) (%) 60.52 54.24 51.18 35.91 16.00 AP (after improvement) (%) 57.97 55.83 53.78 37.82 17.80

[0097] Table 5. Accuracy comparison before and after improvement under 10shot settings

[0098]

[0099]

[0100] This embodiment compares the detection accuracy of new classes before and after improvement under five conditions: the number of new class samples is 1, 2, 3, 5, and 10. The results show that the detection accuracy before improvement is significantly lower than that after improvement. Specifically, the mAP of the proposed method in this embodiment increases by 0.35%, 0.41%, 0.54%, 1.07%, and 1.73% respectively compared to the mAP before improvement when the number of new class samples is 1, 2, 3, 5, and 10. This demonstrates that the method proposed in this embodiment can effectively improve the network's ability to detect targets and increase detection accuracy when the number of samples is small.

[0101] The preferred embodiments of the present invention have been described in detail above. It should be understood that those skilled in the art can make numerous modifications and variations based on the concept of the present invention without creative effort. Therefore, all technical solutions that can be obtained by those skilled in the art based on the concept of the present invention through logical analysis, reasoning, or limited experimentation on the basis of existing technology should be within the scope of protection defined by the claims.

Claims

1. A two-stage few-sample target detection method based on an optimized CBAM attention mechanism, characterized in that, Includes the following steps: Step S1: Train the two-stage object detection network Faster-RCNN using the base class dataset to obtain the base class detection model; Step S2: Freeze the parameters of the feature extraction backbone network in the base class detection model; Step S3: Optimize the CBAM attention mechanism module, including optimizing the channel attention and spatial attention parts of the CBAM attention mechanism module, and adding residual connections, specifically: Step S31: In the channel attention part of the optimized CBAM attention mechanism module, grouped convolution is used to encode the input feature map along the channel dimension. The input feature maps are divided into two groups, and convolution operations are performed on each group using kernels of different sizes. The two groups of input feature maps are then subjected to global max pooling based on width and global average pooling based on height, respectively. These are then fed into a two-layer neural network. The output features are fused using element-wise summation, and the two groups of feature maps are concatenated to obtain the channel attention weights. These channel attention weights are then combined with the input feature maps. Perform element-wise multiplication to generate the input features needed for the spatial attention component; Step S32: In the spatial attention optimization part, firstly, the input feature map... Max pooling and average pooling operations are performed based on the feature dimension; pooling weight parameters are introduced to adjust the ratio of max pooling features to average pooling features; The two pooled feature maps are concatenated along their channel dimensions to obtain a single concatenated feature map. A convolution operation is then used to reduce the channel dimension to 1, yielding the spatial attention weights. The reduced feature map is then multiplied element-wise with the input feature generated in step S31 to obtain the final fused feature. ; Step S33: Introduce residual connections into the optimized CBAM module to fuse the final features. With input feature map Perform an addition operation; Step S4: Place the optimized CBAM attention module in the feature extraction backbone network to build the detection network. Then, use the new class small sample dataset with a small amount of labeled information as input to fine-tune the parameters of the detection head part of the detection network. Step S5: Input the dataset to be detected into the detection network to obtain the detection results.

2. The two-stage small-sample target detection method based on an optimized CBAM attention mechanism according to claim 1, characterized in that, Step S1 specifically includes: Step S11: Construct the base class dataset; Step S12, Model Selection: Select the two-stage object detection network Faster-RCNN as the base model, and load the pre-trained VGG16 network as the feature extraction backbone network; Step S13, Model Building: Build the Faster-RCNN model, including the region proposal network, object classifier, and bounding box regressor, and set the model's hyperparameters; Step S14: Define the loss function: Select the classification cross-entropy loss as the classification loss, and select the smoothing L1 loss as the bounding box regression loss; Step S15, Model Training: Input the base class dataset into the model, update the model parameters through the backpropagation algorithm to minimize the loss function, and dynamically adjust the model's hyperparameters.

3. The two-stage small-sample target detection method based on an optimized CBAM attention mechanism according to claim 2, characterized in that, The step of placing the optimized CBAM attention module in the feature extraction backbone network includes: placing the optimized CBAM attention mechanism module between two convolutional layers in the feature extraction backbone network where the number of channels changes.

4. The two-stage small-sample target detection method based on an optimized CBAM attention mechanism according to claim 3, characterized in that, Specifically, the two convolutional layers include: inserting CBAM modules with a set number of channels between layers 2 and 3, layers 5 and 6, and layers 9 and 10 of the VGG16 network structure. The structure of the feature extraction network is as follows: conv3-64, conv3-64, A-64, conv3-128, conv3-128, A-128, conv3-256, conv3-256, conv3-256, A-256, conv3-512, conv3-512, conv3-512, conv3-512, conv3-512, conv3-512, FC-4096, FC-4096, FC-1000.

5. The two-stage small-sample target detection method based on an optimized CBAM attention mechanism according to claim 1, characterized in that, Freezing the parameters of the feature extraction backbone network includes setting its "required_grad" attribute to "False".

6. The two-stage small-sample target detection method based on an optimized CBAM attention mechanism according to claim 1, characterized in that, The channel attention component weights are calculated using the following formula: , , , in, , These are the weights for the two groups, The channel attention component weights.

7. The two-stage small-sample target detection method based on an optimized CBAM attention mechanism according to claim 1, characterized in that, The weights of the spatial attention component are calculated using the following formula: , in, For the input feature map, For average pooling, For max pooling, These are the pooling weight parameters.

8. The two-stage small-sample target detection method based on an optimized CBAM attention mechanism according to claim 1, characterized in that, The number of samples in each category of the new class small sample dataset with limited annotation information is specified by the task.

9. A two-stage small-sample target detection method based on an optimized CBAM attention mechanism according to claim 8, characterized in that, The number of samples in each category is set to 1, 2, 3, 5, and 10, respectively.