A lightweight wildlife target detection method

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By making lightweight improvements to the YOLOv8 detection framework and introducing GCPCA and SimAM modules, the problem of high model complexity and difficulty in achieving detection accuracy in wildlife target detection has been solved, enabling efficient and accurate target detection in complex environments.

CN122244907APending Publication Date: 2026-06-19QINGHAI UNIV FOR NATITIES

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: QINGHAI UNIV FOR NATITIES
Filing Date: 2026-03-31
Publication Date: 2026-06-19

Application Information

Patent Timeline

31 Mar 2026

Application

19 Jun 2026

Publication

CN122244907A

IPC: G06V40/10; G06V10/25; G06V10/44; G06V10/77; G06V10/80; G06V10/764; G06V10/766; G06N3/045; G06N3/0464; G06N3/084

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

Existing deep learning detection models have a large number of parameters and high computational complexity in wildlife target detection, making it difficult to balance detection accuracy and real-time performance in complex natural environments. In particular, they suffer from missed detections and false detections in field research and portable devices.

⚗Method used

A lightweight target detection model is constructed by improving the YOLOv8 detection framework with a lightweight structure, introducing the grouped convolutional channel feature extraction module GCPCA and the parameterless attention module SimAM. Through a lightweight and efficient attention mechanism, feature expression is enhanced and background interference is suppressed.

🎯Benefits of technology

It significantly reduces the number of model parameters and computational complexity, while improving detection accuracy and robustness. It is suitable for resource-constrained edge computing devices and field monitoring scenarios, and achieves rapid target detection with high accuracy and low computing power.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122244907A_ABST

Patent Text Reader

Abstract

This invention discloses a lightweight wildlife target detection method. Based on the YOLOv8 target detection network, a lightweight YOLOv8-GS target detection model is designed. Addressing the issue of missed detections of wildlife targets in complex backgrounds, low contrast, and multi-scale distribution conditions, the network structure is lightweighted while maintaining real-time performance. By introducing the Grouped Convolutional Channel Feature Extraction (GCPCA) module in the feature extraction stage, the expressive power of key semantic features is enhanced. After Spatial Pyramid Pooling (SPPF), a parameter-free attention module, SimAM, is introduced to adaptively redistribute responses at different spatial locations in the feature map, highlighting wildlife target regions and suppressing background interference. This method improves the detection accuracy of small targets and wildlife targets in complex scenes while reducing the number of parameters and computational overhead, making it suitable for applications such as intelligent monitoring of nature reserves and long-term automated observation of wildlife.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of computer vision, and in particular relates to a lightweight method for detecting wild animal targets, which is applicable to various applications such as intelligent monitoring of wild animals, biodiversity conservation, anti-poaching patrols, and monitoring of plateau ecological environment. Background Technology

[0002] With the rapid development of computer vision and deep learning technologies, intelligent monitoring systems are increasingly widely used in plateau ecological protection and wildlife monitoring, especially in the automatic identification and detection of rare and endangered species such as snow leopards, lynxes, blue sheep, and Pallas's cats. Wildlife target detection, as a crucial foundational technology for various wildlife population surveys, activity trajectory analysis, and habitat assessment, directly impacts the accuracy of subsequent population statistics and ecological behavior studies. High-precision, low-cost automatic wildlife detection technologies can effectively improve the monitoring efficiency of nature reserves and provide reliable data support for wildlife conservation and ecological research.

[0003] In actual field monitoring scenarios, wild animals often live in high-altitude mountainous areas and complex terrain environments. Affected by factors such as low light, strong light reflection, rain and snow, and complex background interference, wild animals usually have evolved camouflage, making their texture and color highly similar to the natural background (such as gravel, withered grass, and forest shadows). This results in low differentiation from the surrounding environment in terms of texture and color, posing a significant challenge to target detection. At the same time, wild animals frequently change postures and scales, and are often affected by rocks, vegetation, etc., leading to incomplete target outline information. In field research and long-term deployment of low-computing-power monitoring equipment, traditional detection methods are prone to missed detections and false detections.

[0004] Existing target detection methods mainly include traditional algorithms based on manual features and detection models based on deep learning. With the development of convolutional neural networks, target detection frameworks such as the YOLO series and Faster R-CNN have achieved good detection results in general scenarios. However, most existing deep learning detection models have a large number of parameters and high computational complexity, and are highly dependent on computing resources. In wildlife target detection tasks, they usually rely on infrared cameras or solar-powered edge computing nodes. These devices have common problems such as small memory, weak computing power, and high battery life requirements. Especially in field research, portable devices, and long-term unattended monitoring scenarios, it is difficult to simultaneously achieve detection accuracy and real-time performance. There is an urgent need for a snow leopard target detection method that combines high detection performance with lightweight characteristics. Summary of the Invention

[0005] The purpose of this invention is to overcome the shortcomings of existing technologies in wildlife target detection tasks, such as large model parameters, high computational complexity, and difficulty in balancing detection accuracy and real-time performance in complex natural environments. By improving the YOLOv8 detection framework with a lightweight structure and introducing a lightweight and efficient attention mechanism, this invention studies and provides a lightweight target detection method that can maintain stable detection performance for wildlife targets while significantly reducing network model size and computational overhead, in order to meet the practical needs for fast and accurate target detection in field research and long-term monitoring scenarios.

[0006] The objective of this invention is achieved through the following technical solution: A lightweight wildlife target detection method includes: Collect images of a certain wild animal and construct an image dataset: The wild animal images contain several scenes, and each wild animal image needs to be labeled with a bounding box, the labeling content of which includes the location information of the corresponding wild animal; Construct an object detection model, based on the YOLOv8 detection framework, and reconstruct its backbone network with a lightweight structure, replacing the original feature extraction module C2f with the grouped convolutional channel feature extraction module GCPCA to enhance the input feature map and obtain an enhanced discriminative feature map; After the Spatial Pyramid Pooling Module (SPPF), a parameterless attention module (SimAM) is introduced to highlight key features of the wildlife target area and suppress background interference information, resulting in a spatially enhanced feature map. The constructed image dataset is input into the object detection model for training, and the trained object detection model is used for the detection of corresponding wild animals.

[0007] Furthermore, the grouped convolutional channel feature extraction module GCPCA uses a channel-group-based feature modeling approach to represent the input feature map; First, the input feature maps are grouped along the channel dimension with a fixed stride, and each group forms a feature branch. Within each feature branch, a dual-path structure is constructed, consisting of depthwise separable convolutional paths and directly connected convolutional paths. The depthwise separable convolutional paths extract local spatial detail information through channel-wise 3×3 depthwise convolutions and combine them with 1×1 pointwise convolutions to achieve cross-channel information fusion. The directly connected convolutional paths directly map the input feature maps of the feature branches through 1×1 convolutions. The outputs of the two paths are then added element by element to obtain the fused feature map of the feature branch. Then, a lightweight convolution operation is introduced to expand and supplement the fused feature map; Finally, the output feature maps of all feature branches are concatenated along the channel dimension and input into the lightweight channel attention mechanism ECA to adaptively adjust the importance of different channels to obtain enhanced discriminative feature maps.

[0008] Furthermore, the parameterless attention module SimAM performs parameterless self-attention modeling on the feature map based on the energy function, and adaptively adjusts the response intensity at each spatial location in the enhanced discrimination feature map; First, the mean of each channel in the spatial dimension of the input enhanced discriminative feature map is calculated; then, the squared deviation between the feature value at each spatial location and its channel mean is calculated; based on this, an attention map based on an energy function is constructed to normalize and constrain the feature response. Finally, the energy function output is mapped to the [0,1] interval through the Sigmoid activation function and multiplied element-wise with the input enhanced discriminative feature map to obtain the weighted final output feature map.

[0009] Furthermore, during the training phase of the object detection model, the multi-task joint loss function built into YOLOv8 is used to jointly optimize the detection task. The multi-task joint loss function is composed of classification loss, bounding box regression loss, and distribution regression loss through weighted summation. Among them, the classification loss uses binary cross-entropy loss to measure the difference between the predicted class probability and the true label; the bounding box regression loss uses perfect intersection-union loss, which introduces center point distance and aspect ratio consistency constraints; and the distribution regression loss uses distribution focus loss to model the discrete distribution of the bounding box position.

[0010] Furthermore, during the training phase of the target detection model, a stochastic gradient descent optimizer is used to update the parameters of the target detection model; the initial learning rate is set to 0.01, the momentum coefficient is set to 0.937, and the weight decay coefficient is set to 0.0005.

[0011] Preferably, the present invention also provides a lightweight wildlife target detection device, based on the above-described lightweight wildlife target detection method, comprising: The data acquisition unit is used to acquire images of a certain wild animal and construct an image dataset: the wild animal images contain several scenes, and each wild animal image needs to be labeled with a bounding box, the labeling content of which includes the location information of the corresponding wild animal; the target detection model construction unit is used to reconstruct the backbone network of the YOLOv8-based detection framework in a lightweight manner, and replace the original feature extraction module C2f with the grouped convolutional channel feature extraction module GCPCA to enhance the input feature map and obtain an enhanced discriminative feature map; An attention unit is introduced after the spatial pyramid pooling module SPPF to introduce a parameterless attention module SimAM, which is used to highlight key features of wildlife target areas and suppress background interference information to obtain spatially enhanced feature maps. The training unit is used to input the constructed image dataset into the object detection model for training, and obtain the trained object detection model for the detection of corresponding wild animals.

[0012] Preferably, the present invention also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the lightweight wildlife target detection method.

[0013] Preferably, the present invention also provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the lightweight wildlife target detection method.

[0014] Compared with the prior art, the beneficial effects of the technical solution of the present invention are as follows: 1. The C2f module is replaced by the grouped convolutional channel feature extraction module GCPCA to address the issues of high similarity in color and texture between wildlife targets and complex natural backgrounds (rocks, snow, shrubs, etc.) and the high computational cost of traditional models. Through channel grouping and grouped convolution strategies, structured modeling is performed, which enhances the discriminative feature representation of wildlife targets without increasing network depth, highlighting key visual features such as contours and textures. At the same time, the module adopts a lightweight design, which significantly reduces the number of model parameters and computational overhead, enabling the network to have good real-time performance.

[0015] 2. A parameterless attention module, SimAM, is introduced after the SPPF module to address the problems of low contrast, small scale, large pose variation, and severe background interference in the wild environment. Based on the neuron energy function, the feature map is spatially and channel-adaptively weighted, which can highlight the high response features of the wild animal target region without introducing additional learnable parameters. It effectively suppresses the interference of background texture and redundant information in complex environments, and shows good stability and generalization ability in wild animal target detection scenarios with low contrast, small scale, and large pose variation.

[0016] 3. The synergistic effect of the GCPCA module and the SimAM attention mechanism solves the problems of missed detections, false detections, and overall performance bottlenecks in existing technology models under complex scenarios and resource-constrained deployment conditions. This invention achieves precise focusing on wildlife target areas while maintaining the basic YOLOv8n network structure, effectively reducing missed detections and false detections. Experimental results show that, while keeping the original YOLOv8n detection head structure and feature pyramid structure unchanged, the method of this invention improves detection accuracy while significantly reducing the number of model parameters, computational resources, and computational complexity compared to the original YOLOv8 model. Specifically, the number of model parameters decreases from 3.01M to 2.00M, while the average accuracy mAP@0.5 reaches 94.1%, an improvement of 2.3% compared to the benchmark model. It perfectly balances high detection performance with lightweight characteristics, making it more suitable for deployment on resource-constrained edge computing devices and automatic field monitoring terminals.

[0017] In summary, this invention significantly reduces model complexity while effectively improving the detection accuracy and robustness of wildlife targets in complex natural environments. It demonstrates stable performance in multi-scale target detection, exhibiting excellent detection capabilities for distant, small-sized wildlife targets, and can meet the practical needs of long-term unattended monitoring in nature reserves. This method can be widely applied to intelligent monitoring of wildlife and other rare and endangered wildlife, biodiversity surveys, field research, and ecological conservation studies—applications requiring high real-time performance and computational power. It provides more accurate and reliable technical support for wildlife conservation and scientific decision-making, demonstrating significant application value and promising prospects for wider adoption. Attached Figure Description

[0018] Figure 1 This is a flowchart illustrating the target detection method in an embodiment of the present invention; Figure 2 This is a schematic diagram of the network structure of the YOLOv8-GS object detection model; Figure 3 This is a schematic diagram of the Grouped Convolutional Channel Feature Extraction (GCPCA) module. Figure 4 A schematic diagram of the parameterless attention module SimAM; Figure 5 This is a comparison chart of the detection performance indicators of the method of the present invention with those of other methods. The performance indicators include Precision, Recall, mAP, and number of parameters. Detailed Implementation

[0019] The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are only for explaining the present invention and are not intended to limit the present invention.

[0020] Example 1 This embodiment provides a lightweight target detection method based on YOLOv8-GS, using a snow leopard as the target to be detected. See [link to documentation]. Figure 1 This includes the following steps: S1. Prepare the snow leopard image dataset, including snow leopard image data and tagged images; This embodiment constructs an image dataset by collecting snow leopard images in various complex environments, and annotates each snow leopard image with bounding boxes to generate labeled image files. The snow leopard image data sources include snow leopard images captured by infrared cameras in nature reserves, as well as images obtained through web crawling, video capture, and other methods, covering complex environments such as snowfields, scree slopes, and grasslands.

[0021] S2. Construct a lightweight snow leopard target detection model using YOLOv8-GS, see... Figure 2 It consists of two parts: a grouped convolutional channel feature extraction module (GCPCA) and a parameterless attention module (SimAM), which are respectively: 201) Grouped Convolutional Channel Feature Extraction Module (GCPCA), see Figure 3 This module achieves a lightweight algorithm by reducing the overall number of parameters and computational cost of the model while maintaining feature extraction efficiency through a grouping strategy, the introduction of depthwise separable convolution, and the combination of a lightweight attention mechanism (ECA). The module includes: a stride grouping strategy (filtering specific channel information with a fixed stride); depthwise separable convolution (reducing computational cost during convolution by combining channel-wise and pointwise convolution); and a lightweight attention mechanism (ECA) that uses one-dimensional convolution to capture local dependencies between channels, dynamically adjusting the response intensity of each channel without explicit dimensionality reduction, thereby enhancing the expressive power of key semantic channels. After replacing the original feature extraction module C2f with the grouped convolutional channel feature extraction module GCPCA, the number of parameters decreased by 1.01M compared to the original YOLOv8 algorithm. Specifically: Based on the YOLOv8 detection framework, its backbone network is restructured with a lightweight design, replacing the original feature extraction module C2f with the grouped convolutional channel feature extraction module GCPCA. GCPCA employs a channel-group-based feature modeling approach to efficiently represent the input feature map. Let the input feature map be: ; in, Indicates batch size, Indicates the number of channels. and These represent the spatial dimensions of the input feature maps. To reduce computational complexity and enhance feature diversity across channels, the Grouped Convolutional Channel Feature Extraction (GCPCA) module first extracts features along the channel dimension with a fixed stride. The input feature maps are grouped, with each group forming a feature branch, which is used to model the feature responses of different channel subspaces.

[0022] Within each feature branch, the Grouped Convolutional Channel Feature Extraction (GCPCA) module constructs a dual-path structure consisting of depthwise separable convolutional paths and directly connected convolutional paths. The depthwise separable convolutional paths extract local spatial details through channel-wise 3×3 depthwise convolutions and combine them with 1×1 pointwise convolutions to achieve cross-channel information fusion. The calculation process can be represented as follows: ; in, Indicates the first The input feature map of each feature branch, Indicates the first The output results after processing each branch This represents a 3×3 depthwise convolution operation. This represents a 1×1 pointwise convolution operation. This feature branch is mainly used to enhance the local texture, contour, and structural information of snow leopard targets. At the same time, depthwise separable convolution significantly reduces the number of model parameters and floating-point computation.

[0023] Meanwhile, the direct convolution path directly maps the input features of the feature branch through 1×1 convolution, and its calculation form is as follows: ; in, Indicates the first The input feature map of each feature branch, This represents the result after a 1×1 pointwise convolution operation. This represents a 1×1 progressive convolution operation.

[0024] This path is used to preserve the original feature information, avoid the loss of key information due to excessive feature transformation, and play a role in stabilizing gradients and compensating for information during the feature fusion stage.

[0025] Subsequently, the outputs of the two paths are summed element by element to obtain the fused feature of this feature branch. : ; This fusion operation effectively combines local detail modeling with global channel mapping while maintaining feature integrity.

[0026] Building upon this, the Grouped Convolutional Channel Feature Extraction (GCPCA) module further introduces lightweight convolution operations. Feature expansion and supplementation are performed on the fused features to enhance the richness of feature representation without significantly increasing the computational burden. Its output... Represented as: ; in, The output is the result of adding each element of the two paths mentioned above. This represents a 3×3 convolution operation. Indicates the first The output results of each feature branch.

[0027] This step can improve the discriminative ability of snow leopard targets in complex natural backgrounds while maintaining the lightweight characteristics of the model.

[0028] Finally, the output features of all feature branches are concatenated along the channel dimension and input into the ECA lightweight channel attention mechanism to adaptively adjust the importance of different channels, thereby obtaining an enhanced discriminative feature map. This enhances the response strength of snow leopard target-related features and suppresses background interference. Through the above design, the GCPCA module effectively improves the model's feature modeling ability and detection robustness under complex natural environments and resource-constrained deployment conditions while significantly reducing the number of parameters and computational complexity.

[0029] 202) SimAM, a parameterless attention module, see Figure 4 This approach statistically models the input enhanced discriminative feature map in the spatial dimension, constructing an energy function using the deviation between the feature response and its spatial mean to characterize the importance of features at different spatial locations. It adaptively redistributes the response intensity of each pixel in the feature map without introducing additional trainable parameters. After globally replacing C2f with GCPCA, the introduction of the parameterless attention module SimAM amplifies important pixel information, improving mAP@0.5 by 1.5% and precision by 1.7%, thus enhancing the model's ability to locate snow leopard information. Specifically: The enhanced feature map is input into the Spatial Pyramid Pooling Module (SPPF) for multi-scale contextual information aggregation to expand the receptive field and fuse spatial information at different scales. Following SPPF, a parameter-free attention module (SimAM) is introduced to perform parameter-free self-attention modeling based on an energy function on the enhanced discriminative feature map. This adaptively adjusts the response intensity at different spatial locations within the enhanced discriminative feature map to enhance the salience of the snow leopard target region and suppress interference from complex natural backgrounds, resulting in the spatially enhanced feature map. Let the enhanced discriminative feature map output by the SPPF be: ; in, Indicates batch size, Indicates the number of channels. and These represent the spatial dimensions of the feature maps. The SimAM attention mechanism adaptively adjusts the response intensity at each spatial location in the enhanced discriminative feature map without introducing any learnable parameters.

[0030] First, the mean value of each channel in the input enhanced discriminative feature map is calculated in the spatial dimension to characterize the global response level of that channel. The calculation method is as follows: ; in, Indicates the first The sample, the first The average value of each channel in the spatial dimension The height of the channel. The width of the channel. For the first The sample, the first The first channel, the first The pixel value at each spatial location is used to reflect the overall energy distribution of the feature map.

[0031] Subsequently, the squared deviation between the feature value of each spatial location and its channel mean is calculated and denoted as the feature response, which measures the degree of dispersion of that location relative to the overall feature distribution. Its calculation form is as follows: ; in, No. The sample, the first The first channel, the first The feature response between each pixel and its channel mean. For the first The sample, the first The first channel, the first The pixel value at each spatial location. Indicates the first The sample, the first The average value of each channel in the spatial dimension.

[0032] This reflects the significant differences in feature representation across different spatial locations.

[0033] Building upon this, the parameterless attention module SimAM normalizes and constrains feature responses by constructing an attention map based on an energy function. It can be represented as: ; in, This is a smoothing coefficient used to enhance numerical stability, preventing numerical instability caused by an excessively small denominator.

[0034] Finally, the output of the above energy function is mapped to the [0,1] interval using the Sigmoid activation function, and then multiplied element-wise with the enhanced discriminative feature map to obtain the weighted spatial enhanced feature map. Its calculation form is: ; in, This represents the Sigmoid activation function. This indicates element-wise multiplication.

[0035] By inputting the spatial augmentation feature map into the YOLOv8 feature fusion network and the detection head, the classification branch and the bounding box regression branch respectively output the snow leopard target category prediction result, target confidence and corresponding bounding box position information, and the final snow leopard target detection result can be obtained. The target detection model in this embodiment is built on the YOLOv8 network structure. Its backbone network adopts a lightweight convolution and feature fusion structure design, which effectively reduces the model parameter scale and computational complexity while ensuring the ability to express multi-scale features, thereby meeting the application requirements of real-time snow leopard target detection for field research and low-computing-power monitoring equipment.

[0036] Through the self-attention modeling process based on the energy function described above, SimAM can adaptively enhance the feature response of the snow leopard target region without introducing additional parameters and computational overhead. At the same time, it can effectively suppress the interference of background texture, noise and redundant information in complex natural environments, thereby improving the accuracy and robustness of snow leopard target detection while maintaining the model's lightweight and real-time performance.

[0037] S3. Loss Function Design: The loss function adopts the mature multi-task joint loss function design already validated in the original YOLOv8 framework, uniformly optimizing target localization, category discrimination, and target confidence. This loss function demonstrates good stability and convergence characteristics in a large number of general and complex scene target detection tasks, achieving an effective balance between detection accuracy and robustness while ensuring training efficiency. Since the core improvements focus on the network structure level, introducing a lightweight attention mechanism and a lightweight feature extraction module to enhance the feature representation capability of low-contrast snow leopard targets without changing the output format and prediction logic of the detection head, continuing to use the original YOLOv8 loss function ensures the stability and reproducibility of the model training process, avoiding the introduction of additional hyperparameter tuning costs and training uncertainties due to redesigning the loss term. Experimental results show that, while maintaining the loss function, the proposed structural improvements can fully play their role, effectively improving the model's snow leopard target detection performance in complex backgrounds and low-discrimination scenes, verifying the good adaptability and engineering practical value between the loss function and the proposed method. Specifically: The detection task is jointly optimized using the original multi-task joint loss function built into YOLOv8. This function consists of classification loss, bounding box regression loss, and distribution regression loss, with the classification loss employing binary cross-entropy loss. (Binary Cross Entropy, BCE) measures the difference between the predicted class probability and the true label, and its mathematical expression is: ; in, The total number of categories, For the first The true label of the category in each prediction box The loss for the positive class is used to measure the model's ability to predict positive class samples. The loss component, which represents the negative class label, measures the model's ability to predict negative class samples. Indicates the first The predicted probability of each predicted bounding box for its category. This represents the corresponding real category label; The bounding box regression loss uses the full intersection-over-union loss. (Complete Intersection over Union, CIoU) considers the overlap between the predicted bounding box and the ground truth bounding box, while introducing center point distance and aspect ratio consistency constraints. Its mathematical expression is: ; in, The center point of the prediction box, The center point of the true bounding box. The intersection-union ratio (IUU) of the ground truth bounding boxes and the predicted bounding boxes. This represents the Euclidean distance between the center points of the predicted bounding box and the ground truth bounding box. The diagonal distance between the overlapping areas of the ground truth bounding box and the predicted bounding box. Used to measure the consistency of the relative proportions of two rectangles. This is the balance coefficient; The distributed regression loss uses the distributed focus loss. (Distribution Focal Loss, DFL) is used to model the discrete distribution of bounding box locations. Its mathematical expression is: ; in, This indicates the number of discrete regression intervals. For the predicted probability distribution, These are discrete labels corresponding to the actual locations; The three loss functions mentioned above are weighted and summed to form the overall multi-task joint loss function, so as to achieve joint optimization of target category discrimination accuracy and bounding box localization accuracy while keeping the original YOLOv8 training strategy unchanged.

[0038] S4, Training and Optimization; The snow leopard target detection model was trained and optimized by minimizing the YOLOv8 multi-task joint loss function, co-optimizing target classification, target localization, and target confidence. The network parameters were updated using the backpropagation algorithm. During training, the model was considered complete and its parameters were saved when the training epochs reached a preset threshold or the overall loss function converged to a set range. Simultaneously, the mean accuracy (mAP) at 0.5:0.95 was selected as the detection accuracy evaluation metric, and the number of parameters, floating-point computation (GFLOPs), and model size were selected as the model lightweighting evaluation metrics. Specifically: To ensure the stability and convergence efficiency of the training process, an initial learning rate of 0.01 was set at the beginning of training, and a dynamic learning rate adjustment strategy was used to regulate the training process, ensuring that the model maintained a good optimization state at different training stages. The optimizer used was stochastic gradient descent (SGD), and a momentum mechanism was introduced to accelerate model convergence. Simultaneously, a weight decay term was used to constrain model complexity and reduce the risk of overfitting. Considering the characteristics of snow leopard targets in complex natural environments with strong background interference and low target-background differentiation, the target detection model gradually enhanced its ability to discriminate key target features through multiple rounds of iterative training. During training, the model performance was periodically evaluated on the validation set, and targeted training and fine-tuning were performed to gradually adapt the network parameters to the feature distribution of snow leopard targets in complex natural environments, such as high similarity between the target and background, severe occlusion, and significant scale changes. Precision, Recall, mAP@0.5, and mAP@0.5:0.95 were used as performance indicators, and the model parameters with the best detection performance were saved, thus ensuring that the final model has good detection accuracy and robustness in complex wild scenes.

[0039] like Figure 5As shown, this embodiment achieves 95.7% precision, 87.4% recall, 94.1% mAP@0.5, and 65.4% mAP@0.5:0.95 respectively on the test set, representing improvements of 5.1%, 3.5%, 2.3%, and 0.9% compared to the baseline model, while reducing the model parameter count from 3.01M to 2.00M. Simultaneously, while ensuring a steady improvement in detection accuracy, the model parameter count is reduced from 3.01M to 2.00M, decreasing the model size and computational overhead. The improved method proposed in this invention balances detection accuracy, model lightweighting, and practical deployment feasibility, making it suitable for efficient and stable detection of snow leopard targets in complex field environments, meeting practical application requirements.

[0040] The lightweight target detection method based on YOLOv8-GS provided in this embodiment, although verified using a snow leopard as an example, is designed based on the common technical requirements of handling 'low-contrast camouflaged targets' and 'computationally limited environments', with its core GCPCA feature extraction logic and SimAM attention enhancement mechanism. Therefore, those skilled in the art will understand that this method is also applicable to wild animals with similar visual characteristics, including but not limited to lynx, blue sheep, and Pallas's cat.

[0041] In summary, this method addresses the challenges of detecting wild animals in complex natural environments, such as low target-background differentiation, diverse posture changes, significant scale variations, and frequent occlusion. Designed to meet the practical needs of lightweight models, low computational consumption, and real-time response in field research and long-term monitoring scenarios, it significantly reduces the number of model parameters and computational complexity while maintaining stable detection capabilities for wild animal targets. This enables rapid and accurate identification of wild animal targets in complex backgrounds and at multiple scales, providing efficient and reliable intelligent technical support for rapid field patrols, portable monitoring equipment, and plateau ecological surveys.

[0042] Example 2 Based on the same inventive concept, this application also provides a lightweight wildlife target detection device, which can be used to implement the method described in the above embodiments, specifically including the following: The data acquisition unit is used to acquire images of a certain wild animal and construct an image dataset: the wild animal images contain several scenes, and each wild animal image needs to be labeled with a bounding box, the labeling content of which includes the location information of the corresponding wild animal; the target detection model construction unit is used to reconstruct the backbone network of the YOLOv8-based detection framework in a lightweight manner, and replace the original feature extraction module C2f with the grouped convolutional channel feature extraction module GCPCA to enhance the input feature map and obtain an enhanced discriminative feature map; An attention unit is introduced after the spatial pyramid pooling module SPPF to introduce a parameterless attention module SimAM, which is used to highlight key features of wildlife target areas and suppress background interference information to obtain spatially enhanced feature maps. The training unit is used to input the constructed image dataset into the object detection model for training, and obtain the trained object detection model for the detection of corresponding wild animals.

[0043] Preferably, embodiments of this application also provide a specific implementation of an electronic device capable of implementing all steps in the lightweight wildlife target detection method described above. The electronic device specifically includes the following components: Processor, memory, communications interface, and bus; The processor, memory, and communication interface communicate with each other via a bus; the communication interface is used to realize information transmission between server-side devices, metering devices, and user-side devices.

[0044] The processor is used to call a computer program in memory, and when the processor executes the computer program, it implements all the steps in the lightweight wildlife target detection method in the above embodiments.

[0045] Embodiments of this application also provide a computer-readable storage medium capable of implementing all steps of the lightweight wildlife target detection method in the above embodiments. The computer-readable storage medium stores a computer program that, when executed by a processor, implements all steps of the lightweight wildlife target detection method in the above embodiments.

[0046] The various embodiments in this specification are described in a progressive manner. Similar or identical parts between embodiments can be referred to interchangeably. Each embodiment focuses on its differences from other embodiments. In particular, hardware + program embodiments are relatively simple in description because they are fundamentally similar to method embodiments; relevant parts can be referred to the descriptions in the method embodiments.

[0047] The foregoing has described specific embodiments of this specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims may be performed in a different order than that shown in the embodiments and may still achieve the desired result. Furthermore, the processes depicted in the drawings do not necessarily require the specific or sequential order shown to achieve the desired result. In some embodiments, multitasking and parallel processing are possible or may be advantageous.

[0048] While this application provides method operation steps as shown in the embodiments or flowcharts, more or fewer operation steps may be included based on conventional or non-inventive labor. The order of steps listed in the embodiments is merely one possible execution order among many and does not represent the only execution order. In actual device or client product execution, the method can be executed in the order shown in the embodiments or drawings or in parallel (e.g., in a parallel processor or multi-threaded processing environment).

[0049] Those skilled in the art will understand that embodiments of the present invention can be provided as methods, systems, or computer program products. Therefore, the present invention can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0050] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0051] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0052] This invention is not limited to the embodiments described above. The above description of specific embodiments is intended to illustrate and explain the technical solutions of this invention. The specific embodiments described above are merely illustrative and not restrictive. Without departing from the spirit and scope of the claims, those skilled in the art can make many specific modifications based on the teachings of this invention, and these modifications all fall within the scope of protection of this invention.

Claims

1. A lightweight method for detecting wild animal targets, characterized in that, include: Collect images of a certain wild animal and construct an image dataset: The wild animal images contain several scenes, and each wild animal image needs to be labeled with a bounding box, the labeling content of which includes the location information of the corresponding wild animal; Construct an object detection model, based on the YOLOv8 detection framework, and reconstruct its backbone network with a lightweight structure, replacing the original feature extraction module C2f with the grouped convolutional channel feature extraction module GCPCA to enhance the input feature map and obtain an enhanced discriminative feature map; After the Spatial Pyramid Pooling Module (SPPF), a parameterless attention module (SimAM) is introduced to highlight key features of the wildlife target area and suppress background interference information, resulting in a spatially enhanced feature map. The constructed image dataset is input into the object detection model for training, and the trained object detection model is used for the detection of corresponding wild animals.

2. The lightweight wildlife target detection method according to claim 1, characterized in that, The grouped convolutional channel feature extraction module GCPCA uses a channel-group-based feature modeling approach to represent the input feature map; First, the input feature maps are grouped along the channel dimension with a fixed stride, and each group forms a feature branch. Within each feature branch, a dual-path structure is constructed, consisting of depthwise separable convolutional paths and directly connected convolutional paths. The depthwise separable convolutional paths extract local spatial detail information through channel-wise 3×3 depthwise convolutions and combine them with 1×1 pointwise convolutions to achieve cross-channel information fusion. The directly connected convolutional paths directly map the input feature maps of the feature branches through 1×1 convolutions. The outputs of the depthwise separable convolutional path and the directly connected convolutional path are then added element-wise to obtain the fused feature map of this feature branch. Then, a lightweight convolution operation is introduced to expand and supplement the fused feature map; Finally, the output feature maps of all feature branches are concatenated along the channel dimension and input into the lightweight channel attention mechanism ECA to adaptively adjust the importance of different channels to obtain enhanced discriminative feature maps.

3. The lightweight wildlife target detection method according to claim 1, characterized in that, The parameterless attention module SimAM performs parameterless self-attention modeling of the feature map based on the energy function, and adaptively adjusts the response intensity at each spatial location in the enhanced discriminative feature map. First, the mean of each channel in the spatial dimension is calculated for the input enhanced discriminative feature map; Then, the squared deviation between the feature value of each spatial location and its channel mean is calculated; Based on this, an attention mapping based on an energy function is constructed to normalize and constrain the feature response; Finally, the energy function output is mapped to the [0,1] interval through the Sigmoid activation function and multiplied element-wise with the input enhanced discriminative feature map to obtain the weighted final output feature map.

4. The lightweight wildlife target detection method according to claim 1, characterized in that, During the training phase of the object detection model, the multi-task joint loss function built into YOLOv8 is used to jointly optimize the detection task. The multi-task joint loss function is composed of classification loss, bounding box regression loss and distribution regression loss through weighted summation. Among them, the classification loss adopts binary cross-entropy loss to measure the difference between the predicted class probability and the true label; the bounding box regression loss adopts perfect intersection-union loss and introduces center point distance and aspect ratio consistency constraints; the distribution regression loss adopts distribution focus loss to model the discrete distribution of the bounding box position.

5. The lightweight wildlife target detection method according to claim 1, characterized in that, During the training phase of the target detection model, a stochastic gradient descent optimizer was used to update the model parameters; the initial learning rate was set to 0.01, the momentum coefficient to 0.937, and the weight decay coefficient to 0.0005.

6. A lightweight wildlife target detection device, based on the lightweight wildlife target detection method according to any one of claims 1-5, characterized in that, include: The data acquisition unit is used to acquire images of a certain wild animal and construct an image dataset: the wild animal images contain several scenes, and each wild animal image needs to be labeled with a bounding box, the labeling content of which includes the location information of the corresponding wild animal; The object detection model building unit is used to reconstruct the backbone network of the YOLOv8-based detection framework in a lightweight manner, and replaces the original feature extraction module C2f with the grouped convolutional channel feature extraction module GCPCA to enhance the input feature map and obtain the enhanced discriminative feature map. An attention unit is introduced after the spatial pyramid pooling module SPPF to introduce a parameterless attention module SimAM, which is used to highlight key features of wildlife target areas and suppress background interference information to obtain spatially enhanced feature maps. The training unit is used to input the constructed image dataset into the object detection model for training, and obtain the trained object detection model for the detection of corresponding wild animals.

7. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the steps of the lightweight wildlife target detection method according to any one of claims 1 to 5.

8. A computer-readable storage medium having a computer program stored thereon, characterized in that, When executed by a processor, the computer program implements the steps of the lightweight wildlife target detection method according to any one of claims 1 to 5.