A lightweight target detection method for strawberry ripeness detection
By employing the GhostHGNetV2 backbone network, SlimNeck neck network, LAMP pruning, and MSD multi-scale feature knowledge distillation strategy, a lightweight target detection network was designed in a collaborative manner. This solved the problems of large number of parameters and high computational complexity in the strawberry maturity detection model, enabling real-time detection and high-precision application on resource-constrained terminal devices.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- XUZHOU NORMAL UNIVERSITY
- Filing Date
- 2026-05-05
- Publication Date
- 2026-06-12
AI Technical Summary
Existing methods for detecting strawberry maturity suffer from problems such as large number of model parameters, high computational complexity, slow inference speed, and difficulty in deployment on resource-constrained terminal devices. Furthermore, the detection accuracy decreases after lightweighting.
We employ the GhostHGNetV2 backbone network, SlimNeck neck network, LAMP pruning strategy, and MSD multi-scale feature knowledge distillation strategy to collaboratively design a lightweight object detection network, reducing the number of model parameters and computational cost while maintaining detection accuracy.
Real-time deployment of strawberry ripeness detection on resource-constrained terminal devices improves detection accuracy and inference speed, making it suitable for smart harvesting applications in natural agricultural scenarios.
Smart Images

Figure CN122200366A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of computer vision and smart agriculture technology, specifically relating to a lightweight target detection method for strawberry maturity detection. Background Technology
[0002] With the rapid development of facility agriculture and large-scale planting, strawberry harvesting operations have placed higher demands on the accuracy, real-time performance, and automation of maturity identification. Traditional methods relying on manual visual judgment of strawberry maturity suffer from low efficiency, strong subjectivity, and high labor costs. Furthermore, strawberry maturity is closely related to visual characteristics such as fruit color, texture, luster, and shape. Improper harvesting timing can easily lead to a decline in fruit quality and increased transportation losses. Therefore, there is an urgent need for a precise, rapid, and non-destructive strawberry maturity detection technology.
[0003] In recent years, with the development of smart agriculture and computer vision technology, deep learning-based object detection methods have been gradually applied to strawberry ripeness detection tasks. These methods can extract target features under agricultural conditions such as natural lighting, complex backgrounds, and fruit occlusion, providing technical support for automated harvesting and grading operations. Among them, the YOLO series models are widely used in agricultural object detection due to their fast detection speed and high accuracy, especially the YOLOv8 model, which has a good foundation for application in object detection tasks.
[0004] However, existing strawberry maturity detection methods still have the following shortcomings: On the one hand, improved methods for robustness in complex environments usually enhance feature representation by designing multi-branch structures, attention mechanisms, or multi-module stacking. While these methods can improve detection performance to some extent, they also lead to problems such as increased parameter count, increased computational complexity, and decreased inference speed, making it difficult to meet the real-time deployment requirements of resource-constrained platforms such as agricultural robots and edge computing modules. On the other hand, some lightweight methods mainly rely on backbone network replacement, module pruning, or structural compression to reduce model size. Although this improves deployment efficiency, it can easily lead to a decrease in feature representation ability, thereby affecting the accuracy of fine-grained recognition tasks such as strawberry maturity detection.
[0005] Furthermore, existing technologies mostly focus on single-level improvements, lacking coordinated consideration of multiple aspects such as lightweight backbone network design, neck feature fusion optimization, network pruning and compression, and distillation compensation recovery. A systematic technical solution that balances detection accuracy, computational efficiency, and deployment feasibility has not yet been formed. In particular, research on stable deployment in natural agricultural scenarios, especially on platforms with more limited computing and storage resources such as Raspberry Pi, remains relatively insufficient, making it difficult to meet the comprehensive requirements of practical intelligent harvesting equipment for low power consumption, high frame rate, and high accuracy.
[0006] Therefore, it is necessary to propose a lightweight target detection method for strawberry maturity detection, so as to effectively reduce the number of model parameters and computational load while ensuring detection accuracy, and improve the real-time deployment capability of the model on resource-constrained terminal devices, thereby meeting the actual needs of automatic strawberry maturity detection and intelligent harvesting applications. Summary of the Invention
[0007] To address the problems of existing strawberry maturity detection models, such as large parameter count, high computational complexity, slow inference speed, difficulty in deployment on resource-constrained devices, and decreased detection accuracy after model lightweighting, this invention provides a lightweight target detection method for strawberry maturity detection. This method collaboratively designs the backbone network, neck feature fusion structure, model pruning strategy, and knowledge distillation strategy of the target detection network. While maintaining the accuracy of strawberry maturity detection, it effectively reduces the number of model parameters and computational load, improves the model's real-time detection capability and edge deployment capability, and is therefore suitable for strawberry maturity recognition and intelligent harvesting applications in natural agricultural scenarios.
[0008] To achieve the above-mentioned objectives, the present invention adopts the following technical solution:
[0009] A lightweight target detection method for strawberry maturity detection includes the following steps:
[0010] Step 1: Obtain strawberry images and preprocess them to obtain model input images; construct a strawberry maturity detection dataset from the preprocessed strawberry images for model training, validation, and testing.
[0011] Step 2: Construct an improved target detection network based on the YOLOv8n network architecture. The target detection network includes a GhostHGNetV2 backbone network, a SlimNeck neck network, and a detection head. The GhostHGNetV2 backbone network is used to extract multi-scale features of the strawberry target, and the SlimNeck neck network is used to perform lightweight fusion of the multi-scale features.
[0012] Step 3: Input the model input image into the GhostHGNetV2 backbone network to extract multi-scale features corresponding to strawberry flowers, immature fruits and ripe fruits; the GhostHGNetV2 backbone network enhances feature reuse capability and reduces parameter quantity and computational complexity through lightweight convolutional structure;
[0013] Step 4: Input the multi-scale features output by the GhostHGNetV2 backbone network into the SlimNeck network for feature fusion to obtain fused features; then input the fused features into the detection head to output the category and location information of the strawberry target; wherein, the SlimNeck network improves the efficiency of multi-scale feature interaction and reduces the computational overhead of the feature fusion stage through a lightweight feature fusion structure;
[0014] Step 5: The LAMP pruning strategy is used to prune the target detection network constructed in Step 2. Redundant channels are pruned according to the importance of parameters in each layer to obtain the pruned network, so as to further reduce the number of parameters and computation.
[0015] Step 6: Using the unpruned target detection network as the teacher model and the pruned network obtained in Step 5 as the student model, the MSD multi-scale feature knowledge distillation strategy is used to distill and train the student model to compensate for the loss of feature representation ability caused by pruning, and a lightweight target detection model after distillation is obtained.
[0016] Step 7: Use the distilled lightweight target detection model to detect the strawberry image to be tested, and output the detection results of strawberry flowers, immature fruits and ripe fruits, thereby realizing the strawberry maturity detection.
[0017] Furthermore, in step 1, the strawberry image is a three-channel RGB image; the preprocessing includes unifying the image size and normalizing the pixels to adapt to the input requirements of the target detection network; preferably, the image is scaled to 640×640 pixels.
[0018] Furthermore, in step 2, the GhostHGNetV2 backbone network is built on HGNetv2 as the base network, including HGStem and multiple Ghost_HGBlocks; the Ghost_HGBlocks retain the hierarchical feature aggregation structure of HGBlocks and adopt GhostConv convolution operation to reduce network redundant computation and enhance feature reuse capability.
[0019] Furthermore, the GhostConv process includes the following steps: First, the input feature map is convolved using standard convolution to obtain intrinsic features; then, the intrinsic features are mapped using a cheap linear transformation to generate ghost features; finally, the intrinsic features and the ghost features are concatenated to obtain output features; wherein the cheap linear transformation includes at least one of depthwise convolution.
[0020] Furthermore, in step 4, the SlimNeck network consists of VoVGSCSPns and GSConvns. Specifically, multi-scale features output from the GhostHGNetV2 backbone network are subjected to channel compression and mapping to give features of different scales a unified channel dimension. VoVGSCSPns is used to perform cross-stage splitting and fusion of the features with the unified channel dimension. GSConvns is used to perform lightweight downsampling of the feature map and further fuse it with high-level features, thereby improving feature fusion efficiency and reducing the computational overhead of the fusion stage.
[0021] Furthermore, VoVGSCSPns performs split processing on the input features, with one branch performing deep feature transformation and the other branch directly passing through cross-stage bypass. The outputs of each branch are concatenated in the channel dimension and then fused through convolution. GSConvns uses a combination of group convolution and pointwise convolution to achieve downsampling.
[0022] Furthermore, in step 5, the LAMP pruning strategy includes the following process: statistically analyzing the channel-level weight magnitudes of the prunable convolutional layers in the object detection network; calculating a normalized importance score within each layer based on the relative contribution of the weights in that layer; pruning the channels of each layer based on the normalized importance score, deleting channels with lower importance; and fine-tuning the pruned network to restore its detection performance.
[0023] Furthermore, in step 6, the MSD multi-scale feature knowledge distillation strategy includes the following process: extracting multiple scale feature maps of the teacher model and the student model in the neck network output stage; aligning and mapping the feature maps of the student model so that they are in the same feature space as the feature maps of the teacher model at the corresponding scales; constructing a multi-scale feature distillation loss based on the differences between the teacher model and the student model in the feature maps at each corresponding scale, and using the multi-scale feature distillation loss to train the student model.
[0024] Furthermore, the multi-scale feature distillation loss is constructed using the mean squared error loss function, and the total loss function of the student model is obtained by weighted summation of the target detection loss and the multi-scale feature distillation loss.
[0025] Compared with the prior art, the present invention has the following beneficial effects:
[0026] This invention designs the GhostHGNetV2 backbone network in the YOLOv8n object detection network and utilizes GhostConv convolution operations to effectively reduce the number of network parameters and computational cost while ensuring feature extraction capabilities, thereby improving the lightweight nature of the model.
[0027] This invention utilizes a SlimNeck network composed of VoVGSCSPns and GSConvns to perform lightweight fusion of multi-scale features. This reduces computational overhead in the feature fusion stage while enhancing information interaction between features at different scales, thereby improving the detection accuracy of strawberry flowers, immature fruits, and ripe fruits.
[0028] This invention employs the LAMP pruning strategy to perform channel pruning on the target detection network. This strategy can adaptively prune redundant channels while maintaining the effectiveness of the model structure, further reducing model complexity and improving the feasibility of deploying the model on resource-constrained terminal devices.
[0029] This invention employs the MSD multi-scale feature knowledge distillation strategy to distill and train the pruned network, which can compensate for the decrease in feature representation ability caused by pruning, improve the detection performance of the lightweight model, and maintain high detection accuracy while compressing the model size.
[0030] This invention makes synergistic improvements in multiple aspects, including lightweight backbone network design, neck feature fusion optimization, model pruning and compression, and knowledge distillation compensation and recovery, forming a systematic technical solution that balances detection accuracy, inference speed, and deployment efficiency. It is suitable for real-time strawberry ripeness detection and intelligent picking scenarios on resource-constrained terminal devices such as Raspberry Pi, and has good practical application value. Attached Figure Description
[0031] Figure 1 is a flowchart of a lightweight target detection method for strawberry maturity detection according to the present invention;
[0032] Figure 2 is a schematic diagram of the lightweight target detection model structure based on YOLOv8n improvement of the present invention;
[0033] Figure 3 is a schematic diagram of the GhostHGNetV2 backbone network of the present invention;
[0034] Figure 4 is a schematic diagram of the structure of the SlimNeck network of the present invention;
[0035] Figure 5 is a schematic diagram of the LAMP pruning training process of the present invention;
[0036] Figure 6 is a schematic diagram of the MSD multi-scale feature knowledge distillation training process of the present invention. Detailed Implementation
[0037] To make the technical problems, technical solutions, and beneficial effects of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are for illustrative purposes only and are not intended to limit the scope of protection of this invention. The processes, conditions, experimental methods, etc., for implementing this invention, except as specifically mentioned below, are all common knowledge and general knowledge in the field, and this invention does not have any particular limitations.
[0038] This invention provides a lightweight target detection method for strawberry maturity detection, mainly targeting the detection tasks of strawberry flowers, immature fruits and mature fruits in natural agricultural scenarios. By synergistically improving the backbone network, neck feature fusion structure, model pruning strategy and knowledge distillation strategy of the target detection network, the method reduces the number of model parameters and computational load while ensuring detection accuracy, and improves the real-time deployment capability of the model on resource-constrained terminal devices.
[0039] As shown in Figure 1, the lightweight target detection method for strawberry maturity detection in this embodiment includes the following steps:
[0040] Step 1: Obtain strawberry images and construct a strawberry maturity detection dataset.
[0041] Specifically, strawberry images are collected in a natural agricultural setting. These images include strawberry flowers, unripe fruit, and ripe fruit. The collection scenario may include different lighting conditions, different shooting angles, leaf occlusion, fruit overlap, and background interference. The collected strawberry images are labeled to obtain target bounding boxes and category labels. The labeled image data is then divided into training, validation, and test sets according to a preset ratio for subsequent model training, validation, and testing.
[0042] In this embodiment, the strawberry image is preferably a three-channel RGB image. To meet the input requirements of the object detection network, the image is preprocessed, including image scaling and pixel normalization. Preferably, the input image is uniformly scaled to 640×640 pixels to facilitate subsequent network training and inference. To improve the model's adaptability to complex agricultural scenes, online data augmentation can also be performed on the image during the training phase. The online data augmentation includes at least one of random horizontal flipping, random rotation, random scaling, brightness perturbation, saturation perturbation, contrast perturbation, and Mosaic data augmentation.
[0043] Step 2: Construct a lightweight target detection network based on YOLOv8n.
[0044] As shown in Figure 2, this invention uses the YOLOv8n object detection network as the base network and performs lightweight improvements to obtain an object detection network suitable for strawberry maturity detection tasks. The object detection network includes a GhostHGNetV2 backbone network, a SlimNeck network, and a detection head. The GhostHGNetV2 backbone network is used to extract multi-scale features from the input image, the SlimNeck network is used to achieve lightweight fusion between multi-scale features, and the detection head is used to output the category and location information of the strawberry target.
[0045] Step 3: Extract multi-scale features of the strawberry target using the GhostHGNetV2 backbone network.
[0046] As shown in Figure 3, the GhostHGNetV2 backbone network is built on the HGNetv2 base network and includes HGStem and multiple Ghost_HGBlocks to output feature maps at at least three scales to represent target information at different resolutions. Compared with traditional backbone networks, this invention reduces the number of network parameters and computational complexity, and improves feature reuse capability by employing GhostConv convolution operations while retaining the hierarchical feature extraction structure of HGNetv2.
[0047] The GhostConv module first extracts a set of intrinsic features through a small number of standard convolutions. These features contain the main semantic information. Then, it uses low-cost operations such as depthwise convolution to linearly map the intrinsic features, generating more ghost features to supplement and expand the feature representation. Finally, the two sets of features are concatenated to form a complete output.
[0048] Let the input feature map be Traditional convolution directly generates output feature maps. , can be represented as:
[0049]
[0050] in, This represents the convolution operation. These are the convolution kernel parameters.
[0051] GhostConv does not directly generate all output features, but does it in two steps. First, it generates intrinsic features by extracting them through a small number of standard convolutions:
[0052]
[0053] in, This represents the standard convolution operation. This is the intrinsic feature map.
[0054] Apply a cheap linear transformation to each intrinsic feature map to generate ghost features:
[0055]
[0056] in, Let represent the j-th inexpensive transformation, which is generally implemented through low-cost operations such as depthwise convolution, and s be the Ghost feature expansion ratio.
[0057] The final output is:
[0058]
[0059] Specifically, GhostConv first performs convolution operations on the input feature map using standard convolution to obtain intrinsic features; then, it maps the intrinsic features using a cheap linear transformation to generate ghost features; finally, it concatenates the intrinsic features and the ghost features along the channel dimension to obtain the output features of GhostConv. The cheap linear transformation preferably includes depthwise convolution. This method enables the generation of richer feature representations with lower computational cost, thereby improving the lightweight performance in the strawberry object detection task.
[0060] Step 4: Use the SlimNeck network to fuse multi-scale features.
[0061] Referring again to Figure 3, the multi-scale features output by the GhostHGNetV2 backbone network are input into the SlimNeck network for feature fusion. The SlimNeck network consists of VoVGSCSPns and GSConvns.
[0062] Specifically, firstly, channel compression and mapping are performed on feature maps at different scales to give features of all scales a unified channel dimension for subsequent fusion. Then, VoVGSCSPns is used to perform cross-stage split fusion on the unified channel dimension features. One branch performs deep feature transformation, while another branch is directly passed through a cross-stage bypass. Finally, the outputs of each branch are concatenated along the channel dimension and fused using convolution to enhance the information interaction capability of multi-scale features. Afterwards, GSConvns is used to perform lightweight downsampling of the feature maps and further fuse them with high-level semantic features to maintain good feature representation while reducing computational overhead.
[0063] Let the multi-scale features output by the GhostHGNetV2 backbone network be as follows: , and ,in To facilitate subsequent cross-scale feature interaction, this paper first adopts... Convolution performs channel compression and mapping to obtain a feature representation with a unified channel dimension:
[0064]
[0065] in C represents the unified number of channels. Through this process, while preserving key semantic information and spatial details as much as possible, features at different scales have a consistent channel dimension, providing conditions for subsequent feature splicing and fusion operations, while effectively controlling the parameter scale and computational cost of the neck network.
[0066] As the core feature fusion unit of SlimNeck, VoVGSCSPns is a lightweight variant that combines the feature aggregation idea of VoVNet with the CSP mechanism. Its core lies in reducing computational redundancy while maintaining good gradient fluidity and feature diversity by simplifying the network structure and optimizing the feature propagation path. Let the input feature be X. VoVGSCSPns first performs splitting on the input feature, which can be expressed as:
[0067]
[0068] Subsequently, the branches entering the queue perform deep feature transformations, while the remaining branches are directly passed through cross-stage bypasses, i.e.:
[0069]
[0070] Finally, the two feature sets are concatenated along the channel dimension and then fused and output using convolution:
[0071]
[0072] in, This represents a feature extraction map consisting of several lightweight convolutional units. Compared to the standard CSP module, VoVGSCSPns reduces the repeated stacking of bottleneck structures and the number of 1×1 convolutional projections, and adopts a split-feature processing strategy.
[0073] Building upon this, this paper adopts an enhanced version of the VoVGSCSPns structure. Without significantly increasing the overall computational complexity, it introduces depthwise separable convolutions to construct cascaded feature refinement units, further enhancing the representation capability for fine-grained targets. Its enhanced branch can be represented as:
[0074]
[0075] The final output is:
[0076]
[0077] in, and This represents a cascaded feature refinement operation that includes depthwise separable convolutions.
[0078] This structure has a deeper residual connection form, which can maintain good feature representation accuracy without significantly increasing the number of parameters and computation, and alleviate the feature information loss problem that may be caused by channel pruning to a certain extent, thereby ensuring the integrity and stability of feature fusion.
[0079] GSConvns, another core component of SlimNeck, primarily reduces the computational redundancy of standard convolutions through a lightweight convolutional design and works efficiently in conjunction with VoVGSCSPns. This combined application achieves synergistic optimization of structural design and computational efficiency, supporting a balance between lightweight network and high performance. Furthermore, SlimNeck further compresses information transmission paths in its structural design, making the multi-scale feature fusion process more efficient by reducing redundant feature transmission and ineffective computation.
[0080] In the SlimNeck structure designed in this paper, GSConvns is mainly used in the downsampling path to replace the traditional standard convolution downsampling operation. Its basic transformation process can be expressed as:
[0081]
[0082] in, Represents group convolution. This represents pointwise convolution. This indicates an activation or channel rearrangement operation.
[0083] Specifically, in the bottom-up feature enhancement process, GSConvns combines group convolutions and pointwise convolutions to significantly reduce computational cost while maintaining good feature representation capabilities. This allows the network to balance feature map resolution changes with feature extraction effectiveness and computational efficiency. In the bottom-up feature enhancement path, SlimNeck utilizes GSConvns to achieve lightweight downsampling and further fuses it with high-level features. This process can be represented as follows:
[0084]
[0085] in, This indicates a downsampling operation based on GSConvns. This represents a further feature fusion mapping function.
[0086] The SlimNeck structure design described above enhances the fusion effect between features at different scales while keeping the number of parameters and computational costs low, thereby improving the detection accuracy of strawberry flowers, immature fruits, and ripe fruits in complex agricultural scenarios.
[0087] Step 5: Use the detection head to output the detection results of the strawberry target.
[0088] The features fused by the SlimNeck network are input into the detection head, which outputs the category and location information of the target to be detected. The category information is used to distinguish between strawberry flowers, unripe fruits, and ripe fruits, and the location information is used to determine the location range of the corresponding target in the image. Through this step, the strawberry ripeness detection result can be obtained.
[0089] Step 6: Perform channel pruning on the improved target detection network using the LAMP pruning strategy.
[0090] As shown in Figure 5, after the initial training of the improved network is completed, the LAMP pruning strategy is used to prune the channels of the object detection network to further compress the network size and reduce computational complexity. Specifically, firstly, the channel-level weight magnitudes of the prunable convolutional layers in the object detection network are statistically analyzed; then, within each layer, a normalized importance score is calculated based on the relative contribution of the weights in that layer; next, channels in each layer are pruned based on the normalized importance score, deleting channels with lower importance, thus obtaining the pruned network; finally, the pruned network is fine-tuned to recover some of the performance loss caused by pruning.
[0091] The LAMP pruning strategy redefines the pruning criterion based on the relative importance of weights within a layer. Unlike traditional methods, LAMP does not directly compare the absolute magnitudes of weights in each layer. Instead, it measures the relative contribution of each weight within its respective layer, eliminating the bias caused by inconsistencies in weight scaling between layers and making pruning decisions more targeted and scientific.
[0092] Specifically, for the Lth layer in the network, LAMP first sorts the weights of that layer from smallest to largest. Then, it defines a normalized importance score for the i-th weight of that layer:
[0093]
[0094] in, This represents the relative importance of the i-th weight in the L-th layer. This refers to the number of weights in that layer. Through this method, LAMP can map the contribution of each layer's weights to a unified scale space, effectively avoiding the unreasonable pruning problems caused by differences in weight scales between layers, and ensuring the scientific validity and reliability of the pruning process.
[0095] By adopting the LAMP pruning strategy, redundant channels can be adaptively pruned while taking into account the differences between layers, thereby effectively reducing the number of network parameters and floating-point operations, making it more suitable for deployment scenarios with limited resources.
[0096] Step 7: Use the MSD multi-scale feature knowledge distillation strategy to distill and train the pruned network.
[0097] As shown in Figure 6, an unpruned target detection network is used as the teacher model, and the pruned network obtained in step 6 is used as the student model to construct the MSD multi-scale feature knowledge distillation mechanism. Specifically, multiple scale feature maps of the teacher and student models are extracted at the neck network output stage, and the feature maps of the student model at corresponding scales are aligned and mapped so that they are in the same feature space as the feature maps of the teacher model at corresponding scales. Then, a multi-scale feature distillation loss is constructed based on the differences between the teacher and student models at each corresponding scale feature map. Finally, the multi-scale feature distillation loss and the target detection loss are weighted and summed to serve as the total loss function for training the student model, thereby performing distillation training on the student model.
[0098] Suppose that the teacher model and the student model generate K scale feature maps respectively in the Neck output stage, then the multi-scale features corresponding to the teacher model and the student model can be represented as follows:
[0099]
[0100] in, This represents the feature mapping of the teacher model at the k-th scale. K represents the feature mapping of the student model at the corresponding scale, and K represents the number of feature layers in the distillation.
[0101] Because the teacher and student models differ in network structure, channel size, and feature distribution, their features typically cannot be directly constrained element-wise. Therefore, this paper introduces a lightweight alignment mapping operation on the student model features during the distillation process to achieve consistency between spatial and channel dimensions. The process can be represented as follows:
[0102]
[0103] in, The feature alignment mapping function at the k-th scale is typically represented by... Convolutional or lightweight linear mapping implementations are used to project student model features into a feature space consistent with the teacher model.
[0104] After feature alignment, this paper employs a multi-scale feature distillation loss based on mean squared error to constrain the feature differences between the teacher and student models at corresponding scales, defined as follows:
[0105]
[0106] in, , and Let represent the number of channels, height, and width of the k-th and scale feature maps, respectively; This represents the square Euclidean norm.
[0107] This loss function can provide fine-grained supervision to the student model at multiple scales, guiding it to gradually approximate the feature response distribution of the teacher model at different levels. This effectively compensates for the loss of feature information caused by pruning and improves the student model's ability to represent and distinguish strawberry targets at different scales.
[0108] During training, the total loss function of the student model consists of the original object detection loss and the multi-scale feature distillation loss, which can be expressed as:
[0109]
[0110] in, This represents the standard target detection loss, used to ensure that the student model can accurately complete the strawberry ripeness detection task. This represents the multi-scale feature distillation loss, used to pass intermediate feature knowledge from the teacher model to the student model; This is the distillation loss weighting coefficient, used to balance the impact between detection task learning and distillation knowledge transfer. It is adjusted... The value of can be chosen to maintain the stability of student model task learning while fully leveraging the role of knowledge distillation in promoting performance recovery.
[0111] In this embodiment, the multi-scale feature distillation loss is preferably constructed using the mean squared error loss function. Through knowledge distillation, the strong feature representation ability of the teacher model can be transferred to the pruned student model, thereby compensating for the decrease in detection performance caused by model pruning. This allows the lightweight model to maintain high detection accuracy while keeping its complexity low.
[0112] Step 8: Use the lightweight target detection model after distillation to detect and deploy strawberry maturity.
[0113] After distillation training, a final lightweight target detection model is obtained. Inputting a strawberry image into this model will output the detection results for strawberry flowers, immature fruits, and ripe fruits. These detection results can be used to assist in determining strawberry maturity, thereby providing technical support for intelligent harvesting, ripe fruit screening, and autonomous operation of agricultural robots.
[0114] The lightweight target detection model obtained by this invention can be deployed on resource-constrained terminal devices, such as embedded edge devices, mobile terminals, or agricultural robot control platforms, to meet the application requirements for real-time detection of strawberry maturity in natural agricultural scenarios.
[0115] In summary, the lightweight target detection method for strawberry maturity detection provided by this invention organically combines the GhostHGNetV2 backbone network, the SlimNeck neck network, the LAMP pruning strategy, and the MSD multi-scale feature knowledge distillation strategy to construct a systematic technical solution that balances detection accuracy, model complexity, and real-time deployment capability. This method effectively solves the problems of existing strawberry maturity detection models, such as large parameter count, high computational complexity, performance degradation after lightweighting, and difficulty in deployment on resource-constrained terminal devices, and has good practical application value.
[0116] The above description is merely a preferred embodiment of the present invention and is not intended to limit the scope of protection of the present invention. For those skilled in the art, any equivalent substitutions, simple modifications, or improvements made to the technical solutions of the present invention without departing from the concept and essence of the present invention should fall within the scope of protection of the present invention.
Claims
1. A lightweight target detection method for strawberry maturity detection, characterized in that, Includes the following steps: Step 1: Obtain a strawberry image and preprocess the strawberry image to obtain the model input image; Step 2: Construct a target detection network based on the improved YOLOv8n network architecture, which includes a GhostHGNetV2 backbone network, a SlimNeck neck network, and a detection head; Step 3: Input the model input image into the GhostHGNetV2 backbone network to extract multi-scale features of the strawberry target; Step 4: Input the multi-scale features into the SlimNeck network for feature fusion to obtain fused features, and input the fused features into the detection head to output the category information and location information of the strawberry target; Step 5: Apply the LAMP pruning strategy to perform channel pruning on the target detection network constructed in Step 2 to obtain the pruned network; Step 6: Using the unpruned target detection network as the teacher model and the pruned network obtained in Step 5 as the student model, the student model is trained by distillation using the MSD multi-scale feature knowledge distillation strategy to obtain the distilled lightweight target detection model. Step 7: Use the distilled lightweight target detection model to detect the strawberry image to be tested, and output the detection results of strawberry flowers, unripe fruits and ripe fruits.
2. The lightweight target detection method for strawberry maturity detection according to claim 1, characterized in that, In step 1, the strawberry image is a three-channel RGB image; the preprocessing includes scaling the strawberry image to a preset size and normalizing the pixels; the preset size is 640×640 pixels.
3. The lightweight target detection method for strawberry maturity detection according to claim 1, characterized in that, In step 2, the GhostHGNetV2 backbone network is built on the HGNetv2 base network and includes HGStem and multiple Ghost_HGBlocks to output feature maps at at least three scales.
4. A lightweight target detection method for strawberry maturity detection according to claim 3, characterized in that, The Ghost_HGBlock is based on the HGBlock hierarchical feature aggregation structure, but uses GhostConv convolution operation to reduce the number of parameters and computational complexity.
5. A lightweight target detection method for strawberry maturity detection according to claim 1, characterized in that, In step 4, the SlimNeck network consists of VoVGSCSPns and GSConvns, which are used to perform channel unification, splitting and fusion, and downsampling fusion on the multi-scale features output by the GhostHGNetV2 backbone network. Among them, VoVGSCSPns achieves feature fusion through branching and concatenating convolution, and GSConvns achieves lightweight downsampling by combining group convolution and pointwise convolution.
6. A lightweight target detection method for strawberry maturity detection according to claim 1, characterized in that, In step 5, the LAMP pruning strategy includes the following process: statistically analyzing the channel-level weight magnitudes of the prunable convolutional layers in the object detection network; and calculating a normalized importance score within each layer based on the relative contribution of the weights in that layer. Based on the normalized importance score, each layer of channels is pruned, and channels with lower importance are deleted.
7. A lightweight target detection method for strawberry maturity detection according to claim 1, characterized in that, In step 6, the MSD multi-scale feature knowledge distillation strategy includes the following process: extracting multiple scale feature maps of the teacher model and the student model in the neck network output stage; aligning and mapping the feature maps of the student model so that they are in the same feature space as the feature maps of the corresponding scales of the teacher model. A multi-scale feature distillation loss is constructed based on the differences between the teacher model and the student model at each corresponding scale feature map, and the student model is trained using the multi-scale feature distillation loss.
8. A lightweight target detection method for strawberry maturity detection according to claim 1, characterized in that, In step 7, the distilled lightweight target detection model can be deployed on resource-constrained terminal devices for real-time detection of strawberry maturity in natural agricultural scenarios.