Forestry pest detection method and system based on multi-scale feature enhancement and fusion
By introducing a forestry pest detection method that combines multi-scale feature enhancement and fusion, and by optimizing ResNet50 using an attention mechanism and a bottom-up feature fusion network, the problem of low accuracy in small target detection in existing technologies is solved, and efficient and accurate forestry pest detection is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SOUTH CHINA AGRICULTURAL UNIVERSITY
- Filing Date
- 2023-11-23
- Publication Date
- 2026-06-23
AI Technical Summary
Existing forestry pest detection technologies fail to effectively consider the characteristics of pest detection scenarios when deploying lightweight models, resulting in low detection accuracy for small targets. Furthermore, existing methods fail to effectively combine pest feature capture with model structure improvement to enhance detection performance.
A forestry pest detection method based on multi-scale feature enhancement and fusion is adopted. By introducing a multi-scale feature extraction network with attention mechanism, a bottom-up feature fusion network and a detection head, and combining ResNet50 network pruning and SimAM attention mechanism, the loss function is optimized to improve the detection accuracy of small targets.
It achieves forest pest detection results that are lightweight, friendly to small target detection, and have high overall detection accuracy, thus improving the model's feature extraction capability and detection accuracy.
Smart Images

Figure CN117671655B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of computer vision and target detection technology, and more specifically, to a method and system for detecting forest pests based on multi-scale feature enhancement and fusion. Background Technology
[0002] With the rapid development of ecological construction, the area of planted forests across the country has been increasing year by year. However, trees are often damaged by harmful organisms during their growth, and the limited ability of humans to identify pests makes it difficult to determine their types. With the rapid development of computer information technology, automatic pest image recognition using computers has become a reality. Compared with manual methods, this technology is faster, more accurate, and more objective. In recent years, many researchers have applied deep learning to pest image recognition. Deep learning can automatically extract features from images, avoiding the subjectivity of manual feature extraction, and enabling accurate target detection and classification of forestry pests.
[0003] Current intelligent pest identification technologies primarily employ deep target detection models to detect pest image data. By detecting the number of different class instances within the pest image, they obtain the pest population distribution, thereby achieving pest identification. These solutions mainly consider intelligent pest identification for pest monitoring. Most solutions focus on establishing a pest detection system, with core work in data processing, lightweighting and simplifying the model, and then applying the model to intelligent forestry pest control.
[0004] Existing patent literature discloses a YOLOv5s-based algorithm for identifying 15 forestry pests. This algorithm constructs a dataset containing 15 forestry pests, uses the original YOLOv5s model for detection on this dataset, and filters the detection results using a weighted maximum suppression (weighted NMS) algorithm to obtain the final pest detection results, achieving the detection of multiple forestry pests. Additionally, Tang et al. (paper title: Pest-YOLO: Deep Image Mining and Multi-Feature Fusion for Real-Time Agriculture Pest Detection, conference name: IEEE International Conference on Data) Mining designed an improved model, Pest-YOLO, based on the YOLOv4 model. The model extracts multi-scale features of pests using a ResNet50 feature extraction network with SE attention enhancement. Then, a cross-stage multi-scale feature fusion network further fuses these features. The fused features are processed by a YOLOv4 detection head to obtain classification and localization information, which is then processed by NMS (Non-Maximum Suppression) to obtain the final result, which is used as the final detection result of Pest-YOLO. These methods focus on applying target detection technology to forestry pests and diseases, primarily emphasizing model lightweighting or simple improvements in feature extraction capabilities. They rarely consider the pest detection scenario and model structure characteristics while simultaneously addressing the requirements of lightweighting and high accuracy. To achieve efficient and accurate pest identification, improvements should be made considering the small size of pests and the concentration of features in shallow layers. Simple networks are insufficient to effectively represent the information contained in pest images. Only by pruning layers containing fewer pest features can high detection accuracy be maintained while reducing parameters, ensuring the feature extraction capability for small target pests. Currently, there is relatively little research combining these characteristics; most studies focus on only some of these features.
[0005] Existing research often focuses on lightweighting models to address the issue of deploying pest detection models. However, the proposed methods do not adequately consider the differences between pest detection and general detection problems (such as vehicle detection and human detection). These methods do not solve the problem of difficulty in capturing pest features; instead, the use of lightweight networks results in suboptimal model detection performance. In other words, they fail to combine the feature extraction network structure with the characteristics of forestry pest detection to improve feature extraction capabilities. Furthermore, some detection methods suffer from low accuracy in detecting small targets, which limits the overall recognition accuracy of most algorithms. Summary of the Invention
[0006] To overcome the shortcomings of existing technologies, such as difficulty in capturing pest features, lack of consideration for pest detection scenarios due to model lightweighting, and poor accuracy in detecting small and medium-sized targets while favoring large targets, this invention provides a forest pest detection method and system based on multi-scale feature enhancement and fusion. It also provides an efficient forest pest detection model, EFPDet (Efficient Forest Pest Detection), which features high lightweighting, better performance for small target detection, and high overall detection accuracy.
[0007] To solve the above-mentioned technical problems, the technical solution of the present invention is as follows:
[0008] A method for detecting forest pests based on multi-scale feature enhancement and fusion includes the following steps:
[0009] S1: Obtain and preprocess the forestry pest dataset;
[0010] The forestry pest dataset includes several images of forestry pests labeled with pest information.
[0011] S2: Establish a forest pest detection model EFPDet based on multi-scale feature enhancement and fusion. The forest pest detection model EFPDet includes: a multi-scale feature extraction network with an attention mechanism, a feature fusion network that performs feature fusion from the bottom up, and a detection head for classifying and locating forest pests.
[0012] The multi-scale feature extraction network, the feature fusion network, and the detection head are connected in sequence;
[0013] S3: Set the total loss function for the forestry pest detection model EFPDet, which includes a classification loss function and a localization loss function;
[0014] The preprocessed forest pest dataset is input into the forest pest detection model EFPDet for iterative training to obtain the trained forest pest detection model EFPDet.
[0015] S4: Obtain the images of forest pests to be detected and input them into the trained forest pest detection model EFPDet to obtain the classification and localization results of all pests in the images of forest pests to be detected.
[0016] The final detection result is obtained by performing non-maximum suppression (NMS) on the classification and localization results of all forest pests in the images to be detected, thus completing the forest pest detection.
[0017] Preferably, the specific method for preprocessing in step S1 is data augmentation processing;
[0018] The data augmentation processes include: horizontal flipping, vertical flipping, random rotation, random cropping, deformation scaling, and adding random noise.
[0019] Preferably, in step S2, the structure of the multi-scale feature extraction network that introduces the attention mechanism is as follows:
[0020] The multi-scale feature extraction network is pruned using a ResNet50 neural network. Based on the first three layers of the ResNet50 neural network, an attention mechanism is introduced into each layer.
[0021] The structure of the multi-scale feature extraction network includes, in sequence: a 7×7 convolutional layer, a first batch of normalization layers, a first ReLU activation function layer, a max pooling layer, a first feature extraction module, a second feature extraction module, and a third feature extraction module;
[0022] The first feature extraction module, the second feature extraction module, and the third feature extraction module are used to extract feature information at three different scales from the forestry pest image, which are respectively denoted as feature information 1, feature information 2, and feature information 3; the feature information at the three different scales is saved together as the output of the multi-scale feature extraction network;
[0023] The first feature extraction module includes two neck sub-modules and one neck sub-module with an attention mechanism connected in sequence; the second feature extraction sub-module includes three neck sub-modules and one neck sub-module with an attention mechanism connected in sequence; the third feature extraction sub-module includes five neck sub-modules and one neck sub-module with an attention mechanism connected in sequence.
[0024] Each of the neck sub-modules has the same structure, comprising the following sequentially connected layers: a 1×1 convolutional layer, a second batch of normalized layers, a 3×3 convolutional layer, a third batch of normalized layers, a 1×1 convolutional layer, a fourth batch of normalized layers, and a second ReLU activation function layer; the input of the neck sub-module is also connected to the output of the fourth batch of normalized layers to form a residual summation connection.
[0025] Each of the neck sub-modules that introduces the attention mechanism has the same structure, including the following sequentially connected layers: a 1×1 convolutional layer, a fifth batch normalization layer, a 3×3 convolutional layer, a sixth batch normalization layer, an attention layer, a 1×1 convolutional layer, a seventh batch normalization layer, and a third ReLU activation function layer; the input of the neck sub-module that introduces the attention mechanism is also connected to the output of the seventh batch normalization layer to form a residual summation connection.
[0026] Preferably, the attention mechanism in the multi-scale feature extraction network is a SimAM parameterless attention mechanism; in each neck sub-module that introduces the attention mechanism, the attention layer is a SimAM parameterless attention layer; the feature extraction network of the present invention adopts the ResNet50 network. Considering that forestry pests are small in size and the feature scale of the fourth layer of ResNet is small and cannot well preserve the feature information of pests, and that it has a large number of channels and parameters, the present invention prunes it, using only the first three layers and introducing parameterless attention SimAM, which greatly reduces the number of parameters while enhancing the feature extraction capability of the network.
[0027] Preferably, in step S2, the structure of the feature fusion network that performs bottom-up feature fusion is as follows:
[0028] The feature fusion network includes, in sequence, a first MSFF multi-scale feature fusion module and a second MSFF multi-scale feature fusion module;
[0029] The first MSFF multi-scale feature fusion module is used to initially fuse the feature information 1 extracted by the first feature extraction module and the feature information 2 extracted by the second feature extraction module;
[0030] The second MSFF multi-scale feature fusion module is used to perform a secondary fusion of the feature information 3 extracted by the third feature extraction module with the pre-fused feature information 1 and feature information 2;
[0031] The feature information 3 after secondary fusion is subjected to 2x convolution downsampling to obtain feature information 4;
[0032] Remove feature information 1 after secondary fusion, and save feature information 4 together with feature information 2 and feature information 3 after secondary fusion as the output of the feature fusion network. The present invention adopts a more efficient bottom-up feature fusion network, which enhances the feature extraction capability of small target pests by making maximum use of shallow features from the bottom up.
[0033] Preferably, in step S2, the structure of the detection head used for classifying and locating forest pests is as follows:
[0034] The detection head includes: a feature convolution concatenation module, a global average pooling layer, a classification task decoupling module, and a localization task decoupling module;
[0035] The output of the feature convolution concatenation module is connected to the input of the global average pooling layer; the classification task decoupling module and the localization task decoupling module are set up in parallel, and the output of the global average pooling layer is connected to the input of the classification task decoupling module and the localization task decoupling module, respectively; the output of the feature convolution concatenation module is also connected to the input of the classification task decoupling module and the localization task decoupling module, respectively.
[0036] The feature convolution concatenation module includes several convolution blocks with the same structure and connected in sequence. The structure of each convolution block includes the following connected in sequence: a 3×3 convolutional layer, an eighth batch normalization layer, and a fourth ReLU activation function layer.
[0037] The classification result is obtained by averaging the probabilities of the concatenated features output by the feature concatenation module and the probabilities of the classification decoupled features output by the classification task decoupling module.
[0038] The localization decoupling features output by the localization task decoupling module are subjected to a 3×3 convolution operation. The feature offset output by the feature convolution concatenation module is used as the offset parameter of the deformable convolution. The localization decoupling features after 3×3 convolution are used to predict the localization result. In the detection head part, the present invention adopts a task alignment structure to make the model maintain feature consistency as much as possible in the two tasks of classification and localization.
[0039] Preferably, in step S3, the total loss function of the forest pest detection model EFPDet is specifically as follows:
[0040] The classification loss function adopted is the softmaxFL function, specifically:
[0041]
[0042]
[0043] Among them, Loss cls Here, nc represents the classification loss function value, nc is the number of categories plus 1, p is the predicted probability of the category, and y∈{0,1}. nc Here, α and β represent the classification labels, respectively, and α and β represent the first and second hyperparameters, respectively; N represents the total number of samples. fg N represents the number of foreground samples. bg p represents the number of background samples. fg p is the probability of the prospect prediction. bg Predict the probability for the background;
[0044] The localization loss function adopts the CIoU function, specifically:
[0045]
[0046]
[0047]
[0048] Among them, Loss ciou To locate the loss function value, b represents the predicted bounding box parameter, b gt The bounding box represents the area code; IoU is the intersection-union ratio of the predicted bounding box and the area code, ρ. 2(b,b gt ) represents the square of the distance from the center of the prediction box to the center of the label box; d represents the diagonal length of the minimum bounding box between the prediction box and the label box; w and h are the width and height of the prediction box, w gt h gt The width and height of the annotation box are defined by u and υ, respectively, which are the first and second intermediate variables.
[0049] The total loss function is specifically as follows:
[0050] Loss = 5 Loss cls +Loss ciou (b,b gt )
[0051] Wherein, Loss is the total loss function value; this invention also improves the loss function and adjusts the model loss weights to ensure that different classification models can improve detection accuracy in a balanced way.
[0052] Preferably, in step S3, when the preprocessed forest pest dataset is input into the forest pest detection model EFPDet for iterative training, the mean average accuracy (mAP) is used as the evaluation index of detection accuracy. The formula for calculating the mean average accuracy (mAP) is as follows:
[0053]
[0054] AP j =∫0 1 p(r)dr
[0055] Where c is the number of categories, AP j Let be the average precision for the j-th category, and p(r) be the precision-recall (PR) curve function, which is controlled by the predefined intersection-union (IoU) threshold between the predicted bounding boxes and the labeled bounding boxes.
[0056] Preferably, the preset Intersection over Union (IoU) threshold for the predicted bounding box and the labeled bounding box is 0.5.
[0057] This invention also provides a forestry pest detection system based on multi-scale feature enhancement and fusion, which, when applied to the above-mentioned forestry pest detection method based on multi-scale feature enhancement and fusion, includes:
[0058] Data acquisition unit: used to acquire forest pest datasets and perform preprocessing;
[0059] The forestry pest dataset includes several images of forestry pests labeled with pest information.
[0060] Model building unit: used to build a forest pest detection model EFPDet based on multi-scale feature enhancement and fusion. The forest pest detection model EFPDet includes: a multi-scale feature extraction network with an attention mechanism, a feature fusion network that performs feature fusion from the bottom up, and a detection head for classifying and locating forest pests.
[0061] The multi-scale feature extraction network, the feature fusion network, and the detection head are connected in sequence;
[0062] Model training unit: used to set the total loss function of the forest pest detection model EFPDet, which includes a classification loss function and a localization loss function;
[0063] The preprocessed forest pest dataset is input into the forest pest detection model EFPDet for iterative training to obtain the trained forest pest detection model EFPDet.
[0064] Forest pest detection unit: used to acquire images of forest pests to be detected and input them into the trained forest pest detection model EFPDet to obtain the classification and localization results of all pests in the images of forest pests to be detected;
[0065] The final detection result is obtained by performing non-maximum suppression (NMS) on the classification and localization results of all forest pests in the images to be detected, thus completing the forest pest detection.
[0066] Compared with the prior art, the beneficial effects of the technical solution of the present invention are:
[0067] This invention provides a method and system for detecting forest pests based on multi-scale feature enhancement and fusion. First, a forest pest dataset is acquired and preprocessed. Then, a forest pest detection model EFPDet based on multi-scale feature enhancement and fusion is established. The EFPDet model includes: a multi-scale feature extraction network incorporating an attention mechanism, a feature fusion network performing bottom-up feature fusion, and a detection head for classifying and locating forest pests. The multi-scale feature extraction network, feature fusion network, and detection head are sequentially connected. Next, the model is trained by setting a total loss function for the EFPDet model, which includes a classification loss function and a localization loss function. The preprocessed forest pest dataset is input into the EFPDet model for iterative training to obtain a trained EFPDet model. Finally, images of forest pests to be detected are acquired and input into the trained EFPDet model to obtain the classification and localization results of all pests in the images. Finally, Non-Maximum Suppression (NMS) is performed to obtain the final detection results, thus completing the forest pest detection process.
[0068] Compared with the prior art, the present invention has the following advantages:
[0069] 1) The ResNet50 network was pruned based on the characteristics of pests and diseases, and the parameterless attention SimAM was introduced. By pruning the fourth layer of the network, which has the most parameters but the fewest pest features, a new feature extraction network ResNet50-SimAM was proposed, which enhanced the network's feature extraction ability. The model still has strong feature extraction ability while greatly reducing the number of parameters.
[0070] 2) A bottom-up feature fusion network is adopted. By prioritizing the fusion of shallow features, the feature extraction capability for shallow targets is effectively improved, and the feature extraction capability for small target pests is strengthened by making the maximum use of shallow features.
[0071] 3) An additional task alignment structure is introduced in the detection head. By aligning the features of the classification and localization tasks, the model can maintain feature consistency as much as possible in the two tasks, thereby improving the detection accuracy. In addition, considering that there are many background areas and far more negative samples than positive samples in pest detection, this invention also improves the loss according to the principle of sample balance, which, together with the improvement of the detection head, improves the pest detection accuracy. Attached Figure Description
[0072] Figure 1 This is a flowchart of a forest pest detection method based on multi-scale feature enhancement and fusion provided in Example 1.
[0073] Figure 2 This is a diagram of the multi-scale feature extraction network structure provided in Example 2.
[0074] Figure 3 This is a diagram of the feature fusion network structure provided in Example 2.
[0075] Figure 4 This is a structural diagram of the detection head provided in Example 2.
[0076] Figure 5 The diagram shows the structure of the forestry pest detection model EFPDet provided in Example 2.
[0077] Figure 6 This is a structural diagram of a forest pest detection system based on multi-scale feature enhancement and fusion, as provided in Example 3. Detailed Implementation
[0078] The accompanying drawings are for illustrative purposes only and should not be construed as limiting the scope of this patent.
[0079] To better illustrate this embodiment, some parts in the accompanying drawings may be omitted, enlarged, or reduced, and do not represent the actual product dimensions.
[0080] It will be understood by those skilled in the art that certain well-known structures and their descriptions may be omitted in the accompanying drawings.
[0081] The technical solution of the present invention will be further described below with reference to the accompanying drawings and embodiments.
[0082] Example 1
[0083] like Figure 1 As shown, this embodiment provides a forestry pest detection method based on multi-scale feature enhancement and fusion, including the following steps:
[0084] S1: Obtain and preprocess the forestry pest dataset;
[0085] The forestry pest dataset includes several images of forestry pests labeled with pest information.
[0086] S2: Establish a forest pest detection model EFPDet based on multi-scale feature enhancement and fusion. The forest pest detection model EFPDet includes: a multi-scale feature extraction network with an attention mechanism, a feature fusion network that performs feature fusion from the bottom up, and a detection head for classifying and locating forest pests.
[0087] The multi-scale feature extraction network, the feature fusion network, and the detection head are connected in sequence;
[0088] S3: Set the total loss function for the forestry pest detection model EFPDet, which includes a classification loss function and a localization loss function;
[0089] The preprocessed forest pest dataset is input into the forest pest detection model EFPDet for iterative training to obtain the trained forest pest detection model EFPDet.
[0090] S4: Obtain the images of forest pests to be detected and input them into the trained forest pest detection model EFPDet to obtain the classification and localization results of all pests in the images of forest pests to be detected.
[0091] The final detection result is obtained by performing non-maximum suppression (NMS) on the classification and localization results of all forest pests in the images to be detected, thus completing the forest pest detection.
[0092] In the specific implementation process, firstly, a forest pest dataset is acquired and preprocessed; then, a forest pest detection model EFPDet based on multi-scale feature enhancement and fusion is established. The EFPDet model includes: a multi-scale feature extraction network incorporating an attention mechanism, a feature fusion network performing bottom-up feature fusion, and a detection head for classifying and locating forest pests; the multi-scale feature extraction network, feature fusion network, and detection head are sequentially connected; next, model training is performed, setting the total loss function of the EFPDet model, which includes a classification loss function and a localization loss function; the preprocessed forest pest dataset is input into the EFPDet model for iterative training to obtain a trained EFPDet model; finally, images of forest pests to be detected are acquired and input into the trained EFPDet model to obtain the classification and localization results of all pests in the images; finally, Non-Maximum Suppression (NMS) is performed to obtain the final detection results, thus completing the forest pest detection.
[0093] This method provides an efficient forest pest detection model, EFPDet (Efficient Forest Pest Detection), which is lightweight, more friendly to small target detection, and has high overall detection accuracy.
[0094] Example 2
[0095] This embodiment provides a forest pest detection method based on multi-scale feature enhancement and fusion, including the following steps:
[0096] S1: Obtain and preprocess the forestry pest dataset;
[0097] The forestry pest dataset includes several images of forestry pests labeled with pest information.
[0098] S2: Establish a forest pest detection model EFPDet based on multi-scale feature enhancement and fusion. The forest pest detection model EFPDet includes: a multi-scale feature extraction network with an attention mechanism, a feature fusion network that performs feature fusion from the bottom up, and a detection head for classifying and locating forest pests.
[0099] The multi-scale feature extraction network, the feature fusion network, and the detection head are connected in sequence;
[0100] S3: Set the total loss function for the forestry pest detection model EFPDet, which includes a classification loss function and a localization loss function;
[0101] The preprocessed forest pest dataset is input into the forest pest detection model EFPDet for iterative training to obtain the trained forest pest detection model EFPDet.
[0102] S4: Obtain the images of forest pests to be detected and input them into the trained forest pest detection model EFPDet to obtain the classification and localization results of all pests in the images of forest pests to be detected.
[0103] The final detection result is obtained by performing NMS non-maximum suppression on the classification and localization results of all forest pests in the images to be detected, thus completing the forest pest detection.
[0104] The specific method for preprocessing in step S1 is data augmentation processing;
[0105] The data augmentation processes include: horizontal flipping, vertical flipping, random rotation, random cropping, deformation scaling, and adding random noise;
[0106] In step S2, the structure of the multi-scale feature extraction network that introduces the attention mechanism is as follows:
[0107] The multi-scale feature extraction network is pruned using a ResNet50 neural network. Based on the first three layers of the ResNet50 neural network, an attention mechanism is introduced into each layer.
[0108] The structure of the multi-scale feature extraction network includes, in sequence: a 7×7 convolutional layer, a first batch of normalization layers, a first ReLU activation function layer, a max pooling layer, a first feature extraction module, a second feature extraction module, and a third feature extraction module;
[0109] The first feature extraction module, the second feature extraction module, and the third feature extraction module are used to extract feature information at three different scales from the forestry pest image, which are respectively denoted as feature information 1, feature information 2, and feature information 3; the feature information at the three different scales is saved together as the output of the multi-scale feature extraction network;
[0110] The first feature extraction module includes two neck sub-modules and one neck sub-module with an attention mechanism connected in sequence; the second feature extraction sub-module includes three neck sub-modules and one neck sub-module with an attention mechanism connected in sequence; the third feature extraction sub-module includes five neck sub-modules and one neck sub-module with an attention mechanism connected in sequence.
[0111] Each of the neck sub-modules has the same structure, comprising the following sequentially connected layers: a 1×1 convolutional layer, a second batch of normalized layers, a 3×3 convolutional layer, a third batch of normalized layers, a 1×1 convolutional layer, a fourth batch of normalized layers, and a second ReLU activation function layer; the input of the neck sub-module is also connected to the output of the fourth batch of normalized layers to form a residual summation connection.
[0112] Each of the neck sub-modules that introduces the attention mechanism has the same structure, and each includes the following sequentially connected layers: a 1×1 convolutional layer, a fifth batch normalization layer, a 3×3 convolutional layer, a sixth batch normalization layer, an attention layer, a 1×1 convolutional layer, a seventh batch normalization layer, and a third ReLU activation function layer; the input of the neck sub-module that introduces the attention mechanism is also connected to the output of the seventh batch normalization layer to form a residual summation connection;
[0113] The attention mechanism in the multi-scale feature extraction network is specifically the SimAM parameterless attention mechanism; in each neck sub-module that introduces the attention mechanism, the attention layer is a SimAM parameterless attention layer; the feature extraction network in this embodiment adopts the ResNet50 network. Considering that forestry pests are small in size and the feature scale of the fourth layer of ResNet is small and cannot well preserve the feature information of pests, and that it has a large number of channels and parameters, this invention prunes it and only uses the first three layers and introduces the parameterless attention SimAM, which greatly reduces the number of parameters while enhancing the feature extraction capability of the network;
[0114] In step S2, the structure of the feature fusion network that performs bottom-up feature fusion is as follows:
[0115] The feature fusion network includes, in sequence, a first MSFF multi-scale feature fusion module and a second MSFF multi-scale feature fusion module;
[0116] The first MSFF multi-scale feature fusion module is used to initially fuse the feature information 1 extracted by the first feature extraction module and the feature information 2 extracted by the second feature extraction module;
[0117] The second MSFF multi-scale feature fusion module is used to perform a secondary fusion of the feature information 3 extracted by the third feature extraction module with the pre-fused feature information 1 and feature information 2;
[0118] The feature information 3 after secondary fusion is subjected to 2x convolution downsampling to obtain feature information 4;
[0119] Remove feature information 1 after secondary fusion, and save feature information 4 together with feature information 2 and feature information 3 after secondary fusion as the output of the feature fusion network. This embodiment adopts a more efficient bottom-up feature fusion network, which enhances the feature extraction capability of small target pests by making maximum use of shallow features from the bottom up.
[0120] In step S2, the structure of the detection head used to classify and locate forest pests is as follows:
[0121] The detection head includes: a feature convolution concatenation module, a global average pooling layer, a classification task decoupling module, and a localization task decoupling module;
[0122] The output of the feature convolution concatenation module is connected to the input of the global average pooling layer; the classification task decoupling module and the localization task decoupling module are set up in parallel, and the output of the global average pooling layer is connected to the input of the classification task decoupling module and the localization task decoupling module, respectively; the output of the feature convolution concatenation module is also connected to the input of the classification task decoupling module and the localization task decoupling module, respectively.
[0123] The feature convolution concatenation module includes several convolution blocks with the same structure and connected in sequence. The structure of each convolution block includes the following connected in sequence: a 3×3 convolutional layer, an eighth batch normalization layer, and a fourth ReLU activation function layer.
[0124] The classification result is obtained by averaging the probabilities of the concatenated features output by the feature concatenation module and the probabilities of the classification decoupled features output by the classification task decoupling module.
[0125] The localization decoupling features output by the localization task decoupling module are subjected to a 3×3 convolution operation. The feature offset output by the feature convolution concatenation module is used as the offset parameter of the deformable convolution. Localization prediction is then performed on the localization decoupling features after the 3×3 convolution to obtain the localization result. In this embodiment, a task alignment structure is adopted in the detection head part to keep the features as consistent as possible in the two tasks of classification and localization.
[0126] In step S3, the total loss function of the forest pest detection model EFPDet is specifically as follows:
[0127] The classification loss function adopted is the softmaxFL function, specifically:
[0128]
[0129]
[0130] Among them, Loss cls Here, nc represents the classification loss function value, nc is the number of categories plus 1, p is the predicted probability of the category, and y∈{0,1}. nc Here, α and β represent the classification labels, respectively, and α and β represent the first and second hyperparameters, respectively; N represents the total number of samples. fg N represents the number of foreground samples. bg p represents the number of background samples. fg p is the probability of the prospect prediction. bgPredict the probability for the background;
[0131] The localization loss function adopts the CIoU function, specifically:
[0132]
[0133]
[0134]
[0135] Among them, Loss ciou To locate the loss function value, b represents the predicted bounding box parameter, b gt The bounding box represents the area code; IoU is the intersection-union ratio of the predicted bounding box and the area code, ρ. 2 (b,b gt ) represents the square of the distance from the center of the prediction box to the center of the label box; d represents the diagonal length of the minimum bounding box between the prediction box and the label box; w and h are the width and height of the prediction box, w gt h gt The width and height of the annotation box are defined; u and v are the first and second intermediate variables, respectively.
[0136] The total loss function is specifically as follows:
[0137] Loss = 5 Loss cls +Loss ciou (b,b gt )
[0138] Where Loss is the total loss function value; this embodiment also improves the loss function and adjusts the model loss weights to ensure that different classification models can improve detection accuracy in a balanced way;
[0139] In step S3, when the preprocessed forest pest dataset is input into the forest pest detection model EFPDet for iterative training, the mean average accuracy (mAP) is used as the evaluation index of detection accuracy. The formula for calculating the mean average accuracy (mAP) is as follows:
[0140]
[0141] AP j =∫0 1 p(r)dr
[0142] Where c is the number of categories, AP j Let be the average accuracy of the j-th category, and p(r) be the classification accuracy-recall PR curve function, which is controlled by the preset intersection-union (IoU) threshold between the predicted bounding box and the labeled bounding box. In this embodiment, it is set to 0.5.
[0143] In the specific implementation process, firstly, a forestry pest dataset is constructed and data augmentation preprocessing is performed. The dataset contains pest image data and corresponding pest annotations in the images. The annotation format is consistent with Pascal VOC. Then, based on this, the corresponding training dataset, validation dataset and test dataset are divided, as shown in Table 1. The average relative size refers to the average percentage of the area of all instance boxes in the target class to the area of the corresponding image.
[0144] Table 1 shows the statistical data from the forestry pest dataset used.
[0145]
[0146] Subsequently, a forest pest detection model EFPDet based on multi-scale feature enhancement and fusion was established, including: a multi-scale feature extraction network with a parameter-free attention mechanism, a feature fusion network that performs feature fusion from the bottom up, and a detection head for classifying and locating forest pests; the multi-scale feature extraction network, the feature fusion network, and the detection head are connected in sequence.
[0147] like Figure 2 The image shows an improved lightweight feature extraction network, ResNet50-SimAM. Based on ResNet50, it uses only the first three layers of the network and inserts a parameterless SimAM attention layer in the last module of each layer. This can be expressed by the following formula:
[0148]
[0149] Among them, w t ,b t y, x i Let M be the target neuron weights, target neuron biases, output, and inputs to other neurons, and M be the number of energy functions. By minimizing this energy density function, we can ultimately obtain:
[0150]
[0151] in These represent the variance and mean of the input features in the horizontal and vertical dimensions, respectively; minimizing the reciprocal of the energy function corresponds to the importance of the neuron, which is used as the weight of the input features. The operation uses the importance of the corresponding neurons as weights to generate enhanced feature outputs; after adding attention, the overall structure of ResNet50-SimAM is as follows: Figure 2As shown, each layer of the network is composed of stacked neck modules. The SimAM attention is added after the normalization of the 3×3 convolution in the middle of the neck module. Furthermore, various different attention mechanisms can be inserted at these locations; this embodiment uses SimAM as the final choice, and a ResNet50-SimAM feature extraction network is formed according to this structure, denoted as...
[0152] Bottom-up feature fusion network structure such as Figure 3 As shown, the image input process... This process yields multi-scale feature outputs. Starting from feature layer 1, the height and width of each feature level are halved. Larger feature scales contain more shallow features. The feature fusion process proceeds from bottom to top. First, feature layer 1 and feature layer 2 are fused using the feature fusion module MSFF. Then, based on this, the MSFF module is used again to fuse the three feature layers, resulting in new feature layers 1, 2, and 3 after multi-scale feature fusion. Feature layer 1 is discarded, while feature layer 3 undergoes convolutional downsampling to obtain feature layer 4 with its height and width halved. Finally, feature layers 2, 3, and 4 are passed as the output of the feature fusion network to the detection head. The bottom-up feature fusion network is denoted as f. θ ;
[0153] The structure of the detection head is as follows Figure 4 As shown, this part involves using the multi-level features of the network to classify and locate pests, denoted as f. ω Feature fusion network f θ The output multi-layer features are concatenated through multiple convolutions using CPR to obtain concatenated features. These concatenated features, after global pooling, are used as decoupling features for classification and localization via two task decoupling blocks. Simultaneously, the detector head treats each pixel in each feature layer as a sample point for classification and regression tasks. The execution process for these two tasks is as follows: 1) For the classification task, both the concatenated and decoupling features generate classification probabilities after convolution. The predicted classification probability of the sample point is obtained by averaging the two probabilities. 2) For the localization task, the concatenated features generate feature offsets after convolution, which are used as offset parameters for deformable convolution. The deformable convolution predicts the four offsets (top, right, bottom, left) from the sample point to the four edges of the prediction box, i.e., ... Figure 4 In the context of (t,r,d,l), the detection head will eventually output the classification and localization results of all sample points by predicting all sample points.
[0154] In the detection head section, the task decoupling module accepts two inputs: one is the input f from a different branch itself. feat (i.e., serial characteristics), the other is the branch f after average pooling. gap ;f gap After being fully connected, the branch is reshaped into (b, 1, c)stack A tensor of shape (1, 1), where b is the batch size, is then used in conjunction with a tensor of shape (1, c). out ,c stack ,c out The learnable parameters are obtained by performing the Hadamard product. Where c in =c out ×c stack ;f feat The branch is directly reshaped into (b,c) in The target output features are obtained by multiplying the tensor of h, w) with Weight. The final output size after reshaping is (b, c) out The feature output of (h,w), i.e., the decoupling feature;
[0155] Finally, the forestry pest detection model EFPDet established in this embodiment is as follows: Figure 5 As shown, the main network structure includes f θ f ω The process consists of three parts. During the training phase, hyperparameters such as the image input batch size N, learning rate, and optimizer are set for the network, along with the network's loss function. During the testing phase, the main network directly uses the parameters obtained during training, and these parameters are frozen and no longer trained.
[0156] Next, model training is performed: 1) N pest images and corresponding annotation information are randomly selected from the training dataset, and the images are used as input for data augmentation processing; 2) The images are sequentially processed... f θ f ω The three parts yield preliminary prediction results; 3) Compare the prediction results with the labeled information to calculate the total loss function Loss, and update the model parameters according to the backpropagation gradient of the loss; 4) Repeat the above process until the Loss no longer decreases, and complete the training phase.
[0157] During the model testing phase: 1) N pest images and annotation information are sequentially extracted from the test set; 2) The images are processed by the main network to obtain preliminary prediction results; 3) The prediction results are processed by NMS to generate the final prediction; 4) The above process is repeated to collect prediction results and annotation information, and finally the model evaluation index mAP (mean accuracy) is calculated based on the prediction and annotation information.
[0158] In model training, the loss consists of two parts: classification loss and localization loss. The classification loss uses softmaxFL, which is balanced according to the number of foreground and background samples in the classification. Its calculation formula is as follows:
[0159]
[0160]
[0161] Among them, Loss cls Here, nc represents the classification loss function value, nc is the number of classes plus 1 (including the background class), p is the predicted classification probability, and y∈{0,1}. nc α represents the classification label, and β represents the first and second hyperparameters, respectively, with default settings of 0.5 and 3.0; N represents the total number of samples. fg N represents the number of foreground samples. bg p represents the number of background samples. fg p is the probability of the prospect prediction. bg Predict the probability for the background;
[0162] The localization loss function uses the CIoU function, specifically:
[0163]
[0164]
[0165]
[0166] Among them, Loss ciou To locate the loss function value, b represents the predicted bounding box parameter, b gt The bounding box represents the area code; IoU is the intersection-union ratio of the predicted bounding box and the area code, ρ. 2 (b,b gt ) represents the square of the distance from the center of the prediction box to the center of the label box; d represents the diagonal length of the minimum bounding box between the prediction box and the label box; w and h are the width and height of the prediction box, w gt h gt The width and height of the annotation box are defined by u and υ, respectively, which are the first and second intermediate variables.
[0167] The total loss function is as follows:
[0168] Loss = 5 Loss cls +Loss ciou (b,b gt )
[0169] Where Loss is the total loss function value; this embodiment also improves the loss function and adjusts the model loss weights to ensure that different classification models can improve detection accuracy in a balanced way;
[0170] When testing network performance, mAP (mean average accuracy) is used as an evaluation metric to measure the model's average accuracy in multi-class object detection. A higher mAP value indicates higher overall algorithm accuracy. Specifically, mAP is defined as follows:
[0171]
[0172] Where c represents the number of categories; AP j Representing the average precision across all categories, it can be obtained by calculating the area enclosed by the precision-recall (PR) curves for different categories and the coordinate axis. Precision and recall are defined as follows: Where TP represents the number of samples where both the prediction and the ground truth label are positive, FP represents the number of samples where the prediction is positive but the ground truth label is negative, and FN represents the number of samples where the prediction is negative but the ground truth label is positive; based on the precision-recall curve, AP j It can be defined as follows:
[0173] AP j =∫0 1 p(r)dr
[0174] Where c is the number of categories, AP j Let p(r) be the average precision for the j-th category, and p(r) be the precision-recall (PR) curve function, controlled by a preset intersection-union (IoU) threshold between the predicted bounding boxes and the labeled bounding boxes. Where A and B are the areas of the true bounding box and the predicted bounding box, respectively; by using different IoU settings, different PR curves can be plotted to calculate the mAP under different IoUs; considering that the core task of pest detection is to detect the number of pest categories rather than the target location, the mAP with an IoU of 0.5 is used as the final evaluation index.
[0175] In addition, this embodiment also provides a comparison of the detection results of this method with other mainstream detection models on a forestry pest dataset; where the parameter count refers to the total number of parameters of the model, the computational cost refers to the number of floating-point operations performed by the model each time, and in the average accuracy and mAP, bold indicates the best performance in a single column, and underline indicates the second best performance in a single column, as shown in Table 2:
[0176] Table 2 Comparison of Detection Results of Different Detection Models
[0177]
[0178] This method proposes a lightweight and high-precision forest pest detection model, EFPDet. According to the comparative experiments in Table 2, compared with existing technologies, EFPDet has the characteristics of high lightweightness, more friendly to small target detection, and high overall detection accuracy. Compared with other detection models, EFPDet achieves the highest detection accuracy in multiple pest categories while maintaining low parameter and low computational cost, with an average accuracy of over 90% for all categories.
[0179] To verify the effectiveness of different improvement strategies, this embodiment also provides another experiment, the results of which are shown in Table 3. In Table 3, A represents the use of data augmentation, B represents the use of ResNet50-SimAM feature extraction network, C represents the use of improved feature fusion network, and D represents the use of task-aligned detection head and balanced loss function.
[0180] Table 3. Experimental Results Comparing the Effectiveness of Different Improvement Strategies
[0181]
[0182] To achieve lightweight design, this method improves the ResNet50 network and proposes a lighter and more powerful feature extraction network, ResNet50-SimAM. This network structure takes into account the characteristics of pests in pest detection, such as small size and concentrated features in shallow layers of the network. By pruning the number of network layers, the number of network parameters is significantly reduced. At the same time, the SimAM parameterless attention mechanism is introduced to further improve the pest feature extraction capability.
[0183] To more effectively detect small-target pests, this method employs an innovative bottom-up feature fusion network. In understanding neural networks, shallow layers, with their larger feature map scale and fewer convolutions, better preserve local features. Conversely, deep feature layers, with their smaller feature map scale and more convolutions, focus more on large-scale characteristics. Considering the tiny size of small-target pests, it's difficult for deep feature layers to retain sufficient features for accurate detection. Therefore, this method uses a bottom-up feature fusion network, which aligns with the characteristic of many small targets among forest pests. Experimental results, as shown in Table 3, confirm that this strategy effectively improves the detection accuracy of small-target pests.
[0184] Building upon the first two points, this method addresses the potential inaccuracy caused by task misalignment (i.e., inconsistent features across different tasks) resulting from independent training of classification and localization tasks. To mitigate this, a task feature alignment module is introduced. This module reduces inconsistency by allowing the two tasks to share features. Furthermore, considering the anchorless design used in this method, each pixel of the feature map is treated as a sample during final prediction. However, insect images typically have few foreground regions and many background regions, leading to significant differences between positive and negative samples and impacting sample learning. Therefore, this method further balances the loss across different categories by adjusting the number of foreground and background points based on FocalLoss, thus improving the loss function. Combining these two improvements, the network maintains accuracy in detecting small-target insects while simultaneously enhancing overall accuracy, surpassing all other methods. Experimental results are shown in Tables 2 and 3.
[0185] Example 3
[0186] like Figure 6 As shown, this embodiment provides a forestry pest detection system based on multi-scale feature enhancement and fusion, applying a forestry pest detection method based on multi-scale feature enhancement and fusion as described in Embodiment 1 or 2, including:
[0187] Data acquisition unit 301: used to acquire forest pest datasets and perform preprocessing;
[0188] The forestry pest dataset includes several images of forestry pests labeled with pest information.
[0189] Model building unit 302: used to establish a forest pest detection model EFPDet based on multi-scale feature enhancement and fusion. The forest pest detection model EFPDet includes: a multi-scale feature extraction network with an attention mechanism, a feature fusion network that performs feature fusion from the bottom up, and a detection head for classifying and locating forest pests.
[0190] The multi-scale feature extraction network, the feature fusion network, and the detection head are connected in sequence;
[0191] Model training unit 303: used to set the total loss function of the forest pest detection model EFPDet, the total loss function including a classification loss function and a localization loss function;
[0192] The preprocessed forest pest dataset is input into the forest pest detection model EFPDet for iterative training to obtain the trained forest pest detection model EFPDet.
[0193] Forest pest detection unit 304: used to acquire images of forest pests to be detected and input them into the trained forest pest detection model EFPDet to obtain the classification and localization results of all pests in the images of forest pests to be detected;
[0194] The final detection result is obtained by performing non-maximum suppression (NMS) on the classification and localization results of all forest pests in the images to be detected, thus completing the forest pest detection.
[0195] In the specific implementation process, firstly, the data acquisition unit 301 acquires the forestry pest dataset and performs preprocessing; the model building unit 302 establishes the forestry pest detection model EFPDet based on multi-scale feature enhancement and fusion. The forestry pest detection model EFPDet includes: a multi-scale feature extraction network with an attention mechanism, a feature fusion network that performs bottom-up feature fusion, and a detection head for classifying and locating forestry pests; the multi-scale feature extraction network, the feature fusion network, and the detection head are connected sequentially; then, model training is performed, and the model training unit 303 sets up the forestry pest detection model E The total loss function of FPDet includes a classification loss function and a localization loss function. The preprocessed forest pest dataset is input into the forest pest detection model EFPDet for iterative training to obtain the trained forest pest detection model EFPDet. Finally, the forest pest detection unit 304 acquires the forest pest images to be detected and inputs them into the trained forest pest detection model EFPDet to obtain the classification and localization results of all pests in the forest pest images to be detected. Finally, NMS non-maximum suppression is performed to obtain the final detection result, thus completing the forest pest detection.
[0196] This system provides an efficient forest pest detection model, EFPDet (Efficient Forest Pest Detection), which is lightweight, more friendly to small target detection, and has high overall detection accuracy.
[0197] The same or similar labels correspond to the same or similar parts;
[0198] The terms used to describe positional relationships in the accompanying drawings are for illustrative purposes only and should not be construed as limiting this patent.
[0199] Obviously, the above embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the implementation of the present invention. Those skilled in the art can make other variations or modifications based on the above description. It is neither necessary nor possible to exhaustively describe all embodiments here. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of the present invention should be included within the scope of protection of the claims of the present invention.
Claims
1. A method for detecting forest pests based on multi-scale feature enhancement and fusion, characterized in that, Includes the following steps: S1: Obtain and preprocess the forestry pest dataset; The forestry pest dataset includes several images of forestry pests labeled with pest information. S2: Establish a forest pest detection model EFPDet based on multi-scale feature enhancement and fusion. The forest pest detection model EFPDet includes: a multi-scale feature extraction network with an attention mechanism, a feature fusion network that performs feature fusion from the bottom up, and a detection head for classifying and locating forest pests. The multi-scale feature extraction network, the feature fusion network, and the detection head are connected in sequence; The structure of the multi-scale feature extraction network that incorporates an attention mechanism is as follows: The multi-scale feature extraction network is pruned using a ResNet50 neural network. Based on the first three layers of the ResNet50 neural network, an attention mechanism is introduced into each layer. The structure of the multi-scale feature extraction network includes, in sequence: a 7×7 convolutional layer, a first batch of normalization layers, a first ReLU activation function layer, a max pooling layer, a first feature extraction module, a second feature extraction module, and a third feature extraction module; The first feature extraction module, the second feature extraction module, and the third feature extraction module are used to extract feature information at three different scales from the forestry pest image, which are respectively denoted as feature information 1, feature information 2, and feature information 3; the feature information at the three different scales is saved together as the output of the multi-scale feature extraction network; The first feature extraction module includes two neck sub-modules and one neck sub-module with an attention mechanism connected in sequence; the second feature extraction sub-module includes three neck sub-modules and one neck sub-module with an attention mechanism connected in sequence; the third feature extraction sub-module includes five neck sub-modules and one neck sub-module with an attention mechanism connected in sequence. Each of the neck sub-modules has the same structure, comprising the following sequentially connected layers: a 1×1 convolutional layer, a second batch of normalized layers, a 3×3 convolutional layer, a third batch of normalized layers, a 1×1 convolutional layer, a fourth batch of normalized layers, and a second ReLU activation function layer; the input of the neck sub-module is also connected to the output of the fourth batch of normalized layers to form a residual summation connection. Each of the neck sub-modules that introduces the attention mechanism has the same structure, and each includes the following sequentially connected layers: a 1×1 convolutional layer, a fifth batch normalization layer, a 3×3 convolutional layer, a sixth batch normalization layer, an attention layer, a 1×1 convolutional layer, a seventh batch normalization layer, and a third ReLU activation function layer; the input of the neck sub-module that introduces the attention mechanism is also connected to the output of the seventh batch normalization layer to form a residual summation connection; S3: Set the total loss function for the forestry pest detection model EFPDet, which includes a classification loss function and a localization loss function; The preprocessed forest pest dataset is input into the forest pest detection model EFPDet for iterative training to obtain the trained forest pest detection model EFPDet. S4: Obtain the images of forest pests to be detected and input them into the trained forest pest detection model EFPDet to obtain the classification and localization results of all pests in the images of forest pests to be detected. The final detection result is obtained by performing non-maximum suppression (NMS) on the classification and localization results of all forest pests in the images to be detected, thus completing the forest pest detection.
2. The forest pest detection method based on multi-scale feature enhancement and fusion according to claim 1, characterized in that, The specific method for preprocessing in step S1 is data augmentation processing; The data augmentation processes include: horizontal flipping, vertical flipping, random rotation, random cropping, deformation scaling, and adding random noise.
3. The forest pest detection method based on multi-scale feature enhancement and fusion according to claim 1, characterized in that, The attention mechanism in the multi-scale feature extraction network is specifically the SimAM parameterless attention mechanism; in each neck submodule that introduces the attention mechanism, the attention layer is a SimAM parameterless attention layer.
4. The forest pest detection method based on multi-scale feature enhancement and fusion according to claim 1, characterized in that, In step S2, the structure of the feature fusion network that performs bottom-up feature fusion is as follows: The feature fusion network includes, in sequence, a first MSFF multi-scale feature fusion module and a second MSFF multi-scale feature fusion module; The first MSFF multi-scale feature fusion module is used to initially fuse the feature information 1 extracted by the first feature extraction module and the feature information 2 extracted by the second feature extraction module; The second MSFF multi-scale feature fusion module is used to perform a secondary fusion of the feature information 3 extracted by the third feature extraction module with the pre-fused feature information 1 and feature information 2; The feature information 3 after secondary fusion is subjected to 2x convolution downsampling to obtain feature information 4; Remove feature information 1 after secondary fusion, and save feature information 4 together with feature information 2 and feature information 3 after secondary fusion as the output of the feature fusion network.
5. The forest pest detection method based on multi-scale feature enhancement and fusion according to claim 1, characterized in that, In step S2, the structure of the detection head used for classifying and locating forest pests is specifically as follows: The detection head includes: a feature convolution concatenation module, a global average pooling layer, a classification task decoupling module, and a localization task decoupling module; The output of the feature convolution concatenation module is connected to the input of the global average pooling layer; the classification task decoupling module and the localization task decoupling module are set up in parallel, and the output of the global average pooling layer is connected to the input of the classification task decoupling module and the localization task decoupling module, respectively; the output of the feature convolution concatenation module is also connected to the input of the classification task decoupling module and the localization task decoupling module, respectively. The feature convolution concatenation module includes several convolution blocks with the same structure and connected in sequence. The structure of each convolution block includes the following connected in sequence: a 3×3 convolutional layer, an eighth batch normalization layer, and a fourth ReLU activation function layer. The classification result is obtained by averaging the probabilities of the concatenated features output by the feature concatenation module and the probabilities of the classification decoupled features output by the classification task decoupling module. The positioning decoupling features output by the positioning task decoupling module are subjected to a 3×3 convolution operation. The feature offset output by the feature convolution concatenation module is used as the offset parameter of the deformable convolution. The positioning decoupling features after 3×3 convolution are then used for positioning prediction to obtain the positioning result.
6. The forest pest detection method based on multi-scale feature enhancement and fusion according to claim 1, characterized in that, In step S3, the total loss function of the forest pest detection model EFPDet is specifically as follows: The classification loss function adopted is the softmaxFL function, specifically: in, For the classification loss function value, Increment the number of categories by 1, where p is the predicted probability of each category. For category tags, These are the first and second hyperparameters, respectively; The total number of samples, Foreground sample size The number of background samples. Foreground probability prediction Predict the probability for the background; The localization loss function adopts the CIoU function, specifically: in, To locate the loss function value, Indicates the parameters of the prediction box. The bounding box represents the label box; IoU is the intersection-union ratio of the predicted bounding box and the label box. d represents the square of the distance from the center of the predicted bounding box to the center of the labeled bounding box; w and h represent the width and height of the predicted bounding box. , This refers to the width and height of the annotation box; and These are the first and second intermediate variables, respectively; The total loss function is specifically as follows: in, This represents the total loss function value.
7. A method for detecting forest pests based on multi-scale feature enhancement and fusion according to claim 1 or 6, characterized in that, In step S3, when the preprocessed forest pest dataset is input into the forest pest detection model EFPDet for iterative training, the mean average accuracy (mAP) is used as the evaluation index of detection accuracy. The formula for calculating the mean average accuracy (mAP) is as follows: in, For the number of categories, Let be the average accuracy of the j-th category. The PR curve function is the classification accuracy-recall ratio, which is controlled by the preset Intersection over Union (IoU) threshold between the predicted bounding boxes and the labeled bounding boxes.
8. The forest pest detection method based on multi-scale feature enhancement and fusion according to claim 7, characterized in that, The preset Intersection over Union (IoU) threshold for the predicted bounding box and the labeled bounding box is specifically 0.
5.
9. A forestry pest detection system based on multi-scale feature enhancement and fusion, characterized in that, include: Data acquisition unit: used to acquire forest pest datasets and perform preprocessing; The forestry pest dataset includes several images of forestry pests labeled with pest information. Model building unit: used to build a forest pest detection model EFPDet based on multi-scale feature enhancement and fusion. The forest pest detection model EFPDet includes: a multi-scale feature extraction network with an attention mechanism, a feature fusion network that performs feature fusion from the bottom up, and a detection head for classifying and locating forest pests. The multi-scale feature extraction network, the feature fusion network, and the detection head are connected in sequence; The structure of the multi-scale feature extraction network that incorporates an attention mechanism is as follows: The multi-scale feature extraction network is pruned using a ResNet50 neural network. Based on the first three layers of the ResNet50 neural network, an attention mechanism is introduced into each layer. The structure of the multi-scale feature extraction network includes, in sequence: a 7×7 convolutional layer, a first batch of normalization layers, a first ReLU activation function layer, a max pooling layer, a first feature extraction module, a second feature extraction module, and a third feature extraction module; The first feature extraction module, the second feature extraction module, and the third feature extraction module are used to extract feature information at three different scales from the forestry pest image, which are respectively denoted as feature information 1, feature information 2, and feature information 3; the feature information at the three different scales is saved together as the output of the multi-scale feature extraction network; The first feature extraction module includes two neck sub-modules and one neck sub-module with an attention mechanism connected in sequence; the second feature extraction sub-module includes three neck sub-modules and one neck sub-module with an attention mechanism connected in sequence; the third feature extraction sub-module includes five neck sub-modules and one neck sub-module with an attention mechanism connected in sequence. Each of the neck sub-modules has the same structure, comprising the following sequentially connected layers: a 1×1 convolutional layer, a second batch of normalized layers, a 3×3 convolutional layer, a third batch of normalized layers, a 1×1 convolutional layer, a fourth batch of normalized layers, and a second ReLU activation function layer; the input of the neck sub-module is also connected to the output of the fourth batch of normalized layers to form a residual summation connection. Each of the neck sub-modules that introduces the attention mechanism has the same structure, and each includes the following sequentially connected layers: a 1×1 convolutional layer, a fifth batch normalization layer, a 3×3 convolutional layer, a sixth batch normalization layer, an attention layer, a 1×1 convolutional layer, a seventh batch normalization layer, and a third ReLU activation function layer; the input of the neck sub-module that introduces the attention mechanism is also connected to the output of the seventh batch normalization layer to form a residual summation connection; Model training unit: used to set the total loss function of the forest pest detection model EFPDet, which includes a classification loss function and a localization loss function; The preprocessed forest pest dataset is input into the forest pest detection model EFPDet for iterative training to obtain the trained forest pest detection model EFPDet. Forest pest detection unit: used to acquire images of forest pests to be detected and input them into the trained forest pest detection model EFPDet to obtain the classification and localization results of all pests in the images of forest pests to be detected; The final detection result is obtained by performing non-maximum suppression (NMS) on the classification and localization results of all forest pests in the images to be detected, thus completing the forest pest detection.