Termite detection method based on attention and contrast learning

By constructing a dense small target detection model based on attention and contrastive learning, the problems of insufficient feature representation of small targets and feature confusion in complex backgrounds in termite detection are solved, achieving high-precision and high-robust termite detection results.

CN122265629APending Publication Date: 2026-06-23NORTH CHINA UNIV OF WATER RESOURCES & ELECTRIC POWER +2

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
NORTH CHINA UNIV OF WATER RESOURCES & ELECTRIC POWER
Filing Date
2026-03-26
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Existing termite detection methods suffer from insufficient representation of small target features, poor individual differentiation in dense cluster scenarios, and easy feature confusion in complex backgrounds, resulting in low detection accuracy and poor robustness.

Method used

A dense small target detection model based on attention and contrastive learning is constructed, including a feature extraction module, a dense small target adaptive attention module, a hierarchical decoding module, and a model optimization module. Through adaptive perception, explicit metric learning, and multi-module collaboration, the feature extraction and decoding processes are optimized.

Benefits of technology

It significantly improves the accuracy and robustness of termite detection, especially in complex backgrounds and dense scenes, effectively solving the problems of missed detection and duplicate detection, and achieving high-precision and high-stability target detection.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122265629A_ABST
    Figure CN122265629A_ABST
Patent Text Reader

Abstract

The application discloses a termite detection method based on attention and contrast learning, and belongs to the technical field of target detection, and comprises the following steps: a dense small target detection model based on attention and contrast feature learning is constructed, and the dense small target detection model is trained; a termite image to be detected is acquired, and the termite image is processed; the processed termite image is input into the trained dense small target detection model, a prediction bounding box and a category of a termite target are generated, and a detection result is obtained. By using the method, the detection precision of the dense small target and the robustness under a complex background can be effectively improved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of target detection technology, and in particular to a termite detection method based on attention and contrastive learning. Background Technology

[0002] Automated visual detection of termites and other microscopic pests is a key technology for intelligent pest control and structural health monitoring, and its detection accuracy directly affects subsequent risk assessment and precise intervention. Most existing methods rely on manual inspections and vision-based automated termite detection methods. Manual inspections are inefficient, and vision-based automated termite detection methods mostly use general target detection models without being optimized for the unique characteristics of termites, such as their small scale, high-density clusters, and low-contrast camouflage. These methods face three major technical bottlenecks in practical applications: 1. Insufficient representation of features of tiny targets. Existing technologies mostly use general target detection models, whose network structures are designed to extract high-level semantic features. However, this process can easily lead to the dilution or complete loss of effective feature information of tiny targets such as termites.

[0003] 2. Poor individual differentiation in dense cluster scenarios leads to severe overlap of target boxes. Existing detection models, with their fixed queries, preset anchor boxes, and global attention mechanisms, lack adaptive perception of local target density, making it difficult to effectively separate adjacent individuals and prone to duplicate detections and false detections.

[0004] 3. In scenarios with low contrast and high similarity between termites and backgrounds such as wood and soil, features are easily confused. Furthermore, robustness is poor and generalization ability is limited in real-world environments with uneven lighting and complex backgrounds. Summary of the Invention

[0005] The purpose of this invention is to provide a termite detection method based on attention and contrastive learning to solve the problems mentioned in the background art.

[0006] To achieve the above objectives, this invention provides a termite detection method based on attention and contrastive learning, comprising the following steps: S1. Construct a dense small object detection model based on attention and contrast feature learning, and train the dense small object detection model; S2. Acquire images of the termites to be detected and process the termite images; S3. Input the processed termite image into the trained dense small target detection model to generate the predicted bounding box and category of the termite target and obtain the detection result.

[0007] Preferably, the dense small target detection model is an improvement on the RT-DETR model, including: The feature extraction module extracts multi-scale feature maps based on the backbone network of the RT-DETR model; The dense small target adaptive attention module is used to receive multi-scale feature maps extracted by the feature extraction module and output multi-scale feature maps enhanced by deformable convolution. The hierarchical decoding module is used to receive the feature map enhanced by the dense small target adaptive attention module and generate the target detection result and category; The model optimization module is used to calculate a complex loss function to guide model parameter updates during the training of the dense small object detection model. The complex loss function consists of the backbone detection loss and the contrastive learning loss.

[0008] Preferably, the dense small target adaptive attention module consists of a density estimation branch and an adaptive receptive field branch; The density estimation branch includes at least one convolutional layer and an activation function for generating a pixel-level density map based on the multi-scale feature map. The adaptive receptive field branch includes convolutional layers and deformable convolutional layers, which are used to generate sampling offsets and to weight the features extracted after adjusting the receptive field with the sampling offsets using the density map.

[0009] Preferably, the hierarchical decoding module includes a hierarchical query generation unit and a Transformer decoder; The hierarchical query generation unit includes multiple learnable embedding layers associated with specific feature map scales, used to generate exclusive object queries for feature maps of different scales and generate an initial query set based on the enhanced feature maps. The Transformer decoder is used to interact with the initial query set and the enhanced feature map to decode the bounding box and category of the target.

[0010] Preferably, the model optimization module includes a contrastive feature learning unit and a complex loss function calculation unit; A contrastive feature learning unit is used to calculate the contrastive loss based on the true class label of the sample; The complex loss function calculation unit is used to obtain the contrastive loss and the backbone loss, and based on the contrastive loss and the backbone loss, calculates the complex loss according to the complex loss function to guide the model update.

[0011] Preferably, the contrastive feature learning unit includes a projection head composed of a multilayer perceptron, which projects the features extracted by the Transformer decoder. Mapping to the metric learning space yields the feature vector representation. The contrast loss is calculated using a supervised contrast loss function. The formula is: ; in, This is the index set of all samples within the batch. To be consistent with the sample A set of positive samples of the same type, To exclude samples All other sample sets outside of, For temperature hyperparameters, Indicates positive samples After mapping by the projection head, the feature vectors in the measurement learning space are then evaluated. express A positive sample index in the set. express A sample index in the set.

[0012] Preferably, the trunk loss The calculation formula is: ; in, Represents classification loss. This represents the bounding box regression loss. This represents the distributed focusing loss. This represents the balancing weights for each type of loss.

[0013] Preferably, the formula for calculating the complex loss function is as follows: ; in, and To balance hyperparameters, Indicates main trunk detection loss, This indicates auxiliary contrast loss.

[0014] Preferably, the training process of the dense small target detection model includes: S11. Obtain a termite image dataset containing multi-scene annotation information and preprocess the termite image dataset. S12. Input the preprocessed termite image dataset into the feature extraction module, which extracts multi-scale features. ; S13, The dense small target adaptive attention module receives multi-scale feature maps extracted by the feature extraction module. and multi-scale feature maps Enhancements will be made, specifically: Density estimation branch pairs multi-scale feature maps The process is performed to generate a pixel-level density map; Adaptive receptive field branching based on multi-scale feature maps A sampling offset is generated, and the features extracted after adjusting the receptive field using the sampling offset are weighted using a density map to obtain the enhanced features. The calculation formula is as follows: ; in, For the input feature map, This is the sampling offset. This is a deformable convolution operation. This indicates element-wise multiplication. These are learnable weight parameters; S14, The hierarchical decoding module receives the enhanced multi-scale feature map. Hierarchical query generation units generate custom learnable query embeddings for feature maps of small, medium, and large scales. And embed the generated learnable query The merged query sets constitute an initial query set containing scale priors. The Transformer decoder will and Perform cross-attention calculations to decode the bounding boxes and categories of potential targets; S15. The model optimization module calculates a complex loss function during training to guide model parameter updates.

[0015] Therefore, the termite detection method based on attention and contrastive learning described above, as used in this invention, has the following beneficial effects: (1) It integrates the DSOA module, CFL unit and HQG unit to build a complete optimization link from feature perception, discrimination to decoding. The DSOA module enhances the ability to capture the original features of small targets by adaptively adjusting the receptive field; the CFL unit reshapes the feature space and enhances the separability of features through explicit metric learning; the HQG unit ensures that these high-quality features can be used by the decoder in a refined manner. Through the collaboration of multiple modules, the overall detection performance of the model in complex scenes is improved.

[0016] (2) By introducing an auxiliary contrast loss term during training, the model is forced to learn a feature space that is compact within a class and separate between classes. This makes termite targets of different shapes and postures approach each other in the feature space, while the features of background camouflage (such as wood chips and mud stains) are significantly pushed away. This solves the problem of insufficient feature discrimination caused by traditional detection models relying solely on classification and regression loss, and significantly enhances the robustness of the model in challenging scenarios such as low signal-to-noise ratio and high similarity between the target and the background.

[0017] (3) The DSOA module dynamically senses dense target regions in the image through its density estimation branch and guides deformable convolution to focus on separating overlapping individuals. The HQG unit sets up a dedicated decoding channel for targets of different scales to ensure that the small target information on the high-resolution feature map can be fully decoded. By introducing the DSOA module and HQG unit, the problems of missed detection and repeated detection in dense scenes are effectively solved, and the accuracy of target bounding box localization is significantly improved.

[0018] The technical solution of the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. Attached Figure Description

[0019] Figure 1 This is a flowchart of a method according to an embodiment of the present invention; Figure 2 This is a diagram illustrating the architecture of a dense small target detection model according to an embodiment of the present invention. Figure 3 This is a performance comparison chart of different models in embodiments of the present invention; Figure 4 The following is a comparison of ablation experiment results in the embodiments of the present invention, wherein (a) represents absolute performance and (b) represents incremental contribution; Figure 5 These are PR curves for different models in embodiments of the present invention; Figure 6 This is a confidence analysis chart for an embodiment of the present invention. Detailed Implementation

[0020] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. The components of the embodiments of the present invention described and shown in the accompanying drawings can be arranged and designed in various different configurations, and therefore should not be construed as limiting the present invention.

[0021] Example like Figure 1-2 As shown, this invention provides a termite detection method based on attention and contrastive learning, including the following steps: S1. Construct a dense small object detection model based on attention and contrast feature learning, and train the dense small object detection model.

[0022] The dense small target detection model is an improvement on the RT-DETR model and includes a feature extraction module, a dense small target adaptive attention (DSOA) module, a hierarchical decoding module, and a model optimization module. The functions of each module are as follows: The feature extraction module extracts multi-scale feature maps based on the backbone network of the RT-DETR model.

[0023] The dense small-object adaptive attention module receives multi-scale feature maps extracted by the feature extraction module and outputs multi-scale feature maps enhanced by deformable convolution. The module consists of a density estimation branch and an adaptive receptive field branch. The density estimation branch includes at least one convolutional layer and an activation function to generate a pixel-level density map based on the multi-scale feature maps. The adaptive receptive field branch includes convolutional layers and deformable convolutional layers to generate sampling offsets and weights the extracted features after adjusting the receptive field using the density map and the sampling offsets.

[0024] The hierarchical decoding module receives the enhanced feature maps from the dense small-object adaptive attention module and generates object detection results and categories. The hierarchical decoding module includes a hierarchical query generation (HQG) unit and a Transformer decoder. The HQG unit contains multiple learnable embedding layers associated with specific feature map scales, used to generate specific object queries for feature maps of different scales based on the enhanced feature maps and to generate an initial query set. The Transformer decoder interacts with the enhanced feature maps using the initial query set to decode the bounding boxes and categories of the targets.

[0025] The model optimization module is used to calculate a complex loss function to guide model parameter updates during the training of the dense small object detection model. The complex loss function consists of the backbone detection loss and the contrastive learning loss.

[0026] The model optimization module includes a contrastive feature learning unit and a complex loss function calculation unit.

[0027] The Contrastive Feature Learning (CFL) unit is used to compute contrastive loss based on the ground truth class labels of the samples. This unit operates on the object-level features output by the Transformer decoder. In the final layer of the decoder, each object query evolves into a feature vector representing a potential object instance. During training, these object query features are matched against ground truth labels using bipartite graph matching (such as the Hungarian algorithm), thus creating a feature vector for each object query. Assign a category label (e.g., termites).

[0028] The contrastive feature learning unit includes a projection head composed of a multilayer perceptron, which projects the features extracted by the Transformer decoder. Mapping to the metric learning space yields the feature vector representation. The contrast loss is calculated using a supervised contrast loss function. The formula is: ; in, This is the index set of all samples within the batch. To be consistent with the sample A set of positive samples of the same type, To exclude samples All other sample sets outside of, For temperature hyperparameters, Indicates positive samples After mapping by the projection head, the feature vectors in the measurement learning space are then evaluated. express A positive sample index in the set. express A sample index in the set. This loss function, through explicit metric learning, forces the model to bring the feature distances of samples of the same class closer together and push the feature distances of samples of different classes further apart, thereby constructing a highly discriminative feature space that is compact within classes and separate between classes.

[0029] The complex loss function calculation unit is used to obtain the contrastive loss and the backbone loss, and based on the contrastive loss and the backbone loss, calculates the complex loss according to the complex loss function to guide model updates. The backbone loss is... The calculation formula is: ; in, Represents classification loss. This represents the bounding box regression loss. This represents the distributed focusing loss. This represents the balancing weights for each type of loss.

[0030] The classification loss is calculated using Focal Loss to address the imbalance between positive and negative samples. The formula is: ; in, This represents the model's predicted probability for the correct category. To balance the weights, For focusing parameters.

[0031] The bounding box regression loss consists of the L1 loss and the GIoU loss, which are responsible for optimizing the coordinate difference and geometric similarity between the predicted and ground truth boxes, respectively. The formula is expressed as: ; in, For prediction boxes With real frame The L1 norm distance between them. The generalized intersection-union loss is calculated as follows: , To simultaneously include prediction boxes With real frame The minimum convex set is the loss function that focuses on the overlapping area while also providing an effective gradient to optimize alignment when the two boxes do not overlap. and These are the balancing hyperparameters for L1 loss and GloU loss, used to adjust their contribution to the total regression loss, thereby achieving the optimal bounding box localization effect.

[0032] Distributed focusing loss is used to optimize the prediction of continuous values ​​for bounding box coordinates by learning the discrete probability distribution of the coordinate values. It can be expressed as the sum of the cross-entropy losses of two discrete points surrounding the target coordinates, as shown in the formula: ; in, These are the actual coordinate values. and For its adjacent discrete grid points, and It is the probability of the corresponding grid point predicted by the dense small target detection model.

[0033] The formula for calculating the complex loss function is: ; in, and To balance hyperparameters, Indicates main trunk detection loss, This indicates auxiliary contrast loss.

[0034] The training process for a dense small target detection model includes: S11. Obtain a termite image dataset containing multi-scene annotation information and preprocess the termite image dataset. The dataset includes a training set and a validation set. The preprocessing includes a series of preprocessing and data augmentation operations, such as size normalization, geometric and color space transformation, and multi-image blending (e.g., mosaic enhancement).

[0035] Standardized procedures will process the raw termite images. The pixel values ​​are scaled from the range [0, 255] to [0, 1], or further based on a preset average. and standard deviation The process of standardization can be represented as follows: .

[0036] Image enhancement includes mosaicking enhancement, blending enhancement, randomized affine transformation, and color perturbation. Mosaic enhancement enriches the background and object composition of the training samples by randomly scaling, cropping, and stitching four training images into a larger canvas. Blending enhancement uses two random samples... and Through a mixing coefficient sampled from a Beta distribution Generate a new sample with linear interpolation. Its mathematical expression is and , This represents the contribution weights of the two original samples to the newly generated sample, which effectively enhances the model's linear expressive power. To simulate diverse shooting angles and distances, a random affine transformation is implemented, which is achieved through a... Affine matrix Perform a combination of rotation, scaling, translation, and shearing operations on the image coordinates to transform the original coordinates. Map to new coordinates : Color space perturbation converts an image from RGB space to HSV (hue, saturation, brightness) space and applies random perturbations to the H, S, and V channels. For example, for an HSV pixel... The enhancement can be expressed as , as well as ,in For additive shift, and This is a multiplicative factor. All augmentation operations are dynamically generated in each batch of model training, ensuring that the model learns effectively on diverse data distributions.

[0037] S12. Input the preprocessed termite image dataset into the feature extraction module, which extracts multi-scale features. .

[0038] S13, The dense small target adaptive attention module receives multi-scale feature maps extracted by the feature extraction module. and multi-scale feature maps Enhancements will be made, specifically: Density estimation branch pairs multi-scale feature maps The process is performed to generate a pixel-level density map; Adaptive receptive field branching based on multi-scale feature maps A sampling offset is generated, and the features extracted after adjusting the receptive field using the sampling offset are weighted using a density map to obtain the enhanced features. The calculation formula is as follows: ; in, For the input feature map, This is the sampling offset. This is a deformable convolution operation. This indicates element-wise multiplication. These are learnable weight parameters.

[0039] S14, The hierarchical decoding module receives the enhanced multi-scale feature map. The hierarchical query generation unit generates multi-scale (e.g., high-resolution P3 layer and low-resolution P5 layer) perceptual object queries, and generates exclusive learnable query embeddings for feature maps of small, medium and large scales. Embed the generated learnable query The merged query sets constitute an initial query set containing scale priors. The Transformer decoder will and Perform cross-attention calculations to decode the bounding box and category of the potential target.

[0040] S15. The model optimization module calculates a complex loss function during training to guide model parameter updates.

[0041] S2. Acquire the termite images to be detected and process them, including a series of preprocessing and data augmentation operations such as size normalization, geometric and color space transformation, and multi-image blending (e.g., mosaic enhancement).

[0042] S3. Input the processed termite image into the trained dense small target detection model to generate the predicted bounding box and category of the termite target and obtain the detection result.

[0043] This invention also provides a termite detection system based on attention and contrastive learning, used to perform the aforementioned termite detection method based on attention and contrastive learning, comprising: The image acquisition module is used to acquire images of termites to be detected using a camera; The model building module is used to build and train a dense small object detection model; The image processing module is used to process the images of termites to be detected, including adjusting image specifications and standardization; The target detection module is used to detect termite images processed by the image processing module using a pre-trained dense small target detection model.

[0044] To verify the effectiveness of the method of this invention, ablation experiments were conducted. Using the RT-DETR model as the baseline model, the improvement effect of each module on the model was demonstrated by gradually adding DSOA modules, CFL units, and HQG units.

[0045] like Figure 3 , Figure 4 , Figure 5 and Figure 6As shown, the results demonstrate that the present invention achieves an average precision (mAP@0.5) of 88.31% in termite detection, a 4.91 percentage point improvement over the baseline model. On the more stringent mAP@0.5:0.95 metric, it achieves 45.46%, significantly outperforming existing target detection models. In challenging scenarios with dense targets and complex backgrounds, the precision is 80.67% and the recall is 83.90%, effectively solving the problems of high false negatives and high false positives, and accurately separating tightly clustered individuals. This invention achieves high-precision and robust detection of dense, small targets, improves detection stability in complex environments, and maintains a real-time inference speed of approximately 29 FPS, making it suitable for automated inspections in fields such as pest control and industrial quality inspection.

[0046] Therefore, the present invention adopts the above-mentioned termite detection method based on attention and contrast learning. Through multi-module collaboration, it can effectively improve the detection accuracy of dense small targets and the robustness in complex backgrounds.

[0047] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit them. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can still be made to the technical solutions of the present invention, and these modifications or equivalent substitutions cannot cause the modified technical solutions to deviate from the spirit and scope of the technical solutions of the present invention.

Claims

1. A termite detection method based on attention and contrastive learning, characterized in that, Including the following steps: S1. Construct a dense small object detection model based on attention and contrast feature learning, and train the dense small object detection model; S2. Acquire images of the termites to be detected and process the termite images; S3. Input the processed termite image into the trained dense small target detection model to generate the predicted bounding box and category of the termite target and obtain the detection result.

2. The termite detection method based on attention and contrastive learning according to claim 1, characterized in that, The dense small target detection model is an improvement on the RT-DETR model, including: The feature extraction module extracts multi-scale feature maps based on the backbone network of the RT-DETR model; The dense small target adaptive attention module is used to receive multi-scale feature maps extracted by the feature extraction module and output multi-scale feature maps enhanced by deformable convolution. The hierarchical decoding module is used to receive the feature map enhanced by the dense small target adaptive attention module and generate the target detection result and category; The model optimization module is used to calculate a complex loss function to guide model parameter updates during the training of the dense small object detection model. The complex loss function consists of the backbone detection loss and the contrastive learning loss.

3. The termite detection method based on attention and contrastive learning according to claim 2, characterized in that: The dense small target adaptive attention module consists of a density estimation branch and an adaptive receptive field branch; The density estimation branch includes at least one convolutional layer and an activation function for generating a pixel-level density map based on the multi-scale feature map. The adaptive receptive field branch includes convolutional layers and deformable convolutional layers, which are used to generate sampling offsets and to weight the features extracted after adjusting the receptive field with the sampling offsets using the density map.

4. The termite detection method based on attention and contrastive learning according to claim 3, characterized in that: The hierarchical decoding module includes a hierarchical query generation unit and a Transformer decoder; The hierarchical query generation unit includes multiple learnable embedding layers associated with specific feature map scales, used to generate exclusive object queries for feature maps of different scales and generate an initial query set based on the enhanced feature maps. The Transformer decoder is used to interact with the initial query set and the enhanced feature map to decode the bounding box and category of the target.

5. The termite detection method based on attention and contrastive learning according to claim 4, characterized in that, The model optimization module includes a contrastive feature learning unit and a complex loss function calculation unit; A contrastive feature learning unit is used to calculate the contrastive loss based on the true class label of the sample; The complex loss function calculation unit is used to obtain the contrastive loss and the backbone loss, and based on the contrastive loss and the backbone loss, calculates the complex loss according to the complex loss function to guide the model update.

6. The termite detection method based on attention and contrastive learning according to claim 5, characterized in that: The contrastive feature learning unit includes a projection head composed of a multilayer perceptron, which projects the features extracted by the Transformer decoder. Mapping to the metric learning space yields the feature vector representation. The contrast loss is calculated using a supervised contrast loss function. The formula is: ; in, This is the index set of all samples within the batch. To be consistent with the sample A set of positive samples of the same type, To exclude samples All other sample sets outside of, For temperature hyperparameters, Indicates positive samples After mapping by the projection head, the feature vectors in the measurement learning space are then evaluated. express A positive sample index in the set. express A sample index in the set.

7. The termite detection method based on attention and contrastive learning according to claim 5, characterized in that, The main trunk loss The calculation formula is: ; in, Represents classification loss. This represents the bounding box regression loss. Indicates the distributed focusing loss. This represents the balancing weights for each type of loss.

8. The termite detection method based on attention and contrastive learning according to claim 5, characterized in that, The formula for calculating the complex loss function is as follows: ; in, and To balance hyperparameters, Indicates main trunk detection loss, This indicates auxiliary contrast loss.

9. The termite detection method based on attention and contrastive learning according to claim 5, characterized in that: The training process of the dense small target detection model includes: S11. Obtain a termite image dataset containing multi-scene annotation information and preprocess the termite image dataset. S12. Input the preprocessed termite image dataset into the feature extraction module, which extracts multi-scale features. ; S13, The dense small target adaptive attention module receives multi-scale feature maps extracted by the feature extraction module. and multi-scale feature maps Enhancements will be made, specifically: Density estimation branch pairs multi-scale feature maps The process is performed to generate a pixel-level density map; Adaptive receptive field branching based on multi-scale feature maps A sampling offset is generated, and the features extracted after adjusting the receptive field using the sampling offset are weighted using a density map to obtain the enhanced features. The calculation formula is as follows: ; in, For the input feature map, This is the sampling offset. This is a deformable convolution operation. This indicates element-wise multiplication. These are learnable weight parameters; S14, The hierarchical decoding module receives the enhanced multi-scale feature map. Hierarchical query generation units generate custom learnable query embeddings for feature maps of small, medium, and large scales. And embed the generated learnable query The merged query sets constitute an initial query set containing scale priors. The Transformer decoder will and Perform cross-attention calculations to decode the bounding boxes and categories of potential targets; S15. The model optimization module calculates a complex loss function during training to guide model parameter updates.