A liquor bottle cap defect detection method based on improved YOLOv10

By improving the YOLOv10 network structure and introducing technologies such as Slide Loss, DCN, LSKA, and CBAM modules, the problems of insufficient detection accuracy and parameter quantity in the detection of defects in liquor bottle caps have been solved, and more efficient automated detection has been achieved.

CN122199357APending Publication Date: 2026-06-12SICHUAN UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SICHUAN UNIV
Filing Date
2024-12-10
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Traditional liquor bottle cap defect detection relies on manual operation, and existing deep learning algorithms are insufficient in terms of detection accuracy and the number of network parameters, making it difficult to meet the needs of automation.

Method used

An improved YOLOv10 network was adopted, and the network structure was optimized by introducing the Slide Loss loss function, DCN module, LSKA attention mechanism, CBAM module and CARAFE upsampling operator to improve detection accuracy and reduce the number of parameters.

Benefits of technology

It improves the accuracy and inference speed of liquor bottle cap defect detection, reduces the number of network parameters, and promotes the automation process of liquor bottle cap defect detection.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure FT_1
    Figure FT_1
  • Figure FT_2
    Figure FT_2
  • Figure FT_3
    Figure FT_3
Patent Text Reader

Abstract

The application discloses a kind of liquor bottle cap defect detection algorithm based on improved YOLOv10.First, introduce Slide Loss loss function to eliminate the positive and negative sample imbalance problem existing in data set;Second, deformable convolution is introduced in the backbone network, the influence brought by image geometric transformation in the convolution process is relieved, and the ability to identify different size defects is improved;In addition, LSKA module is introduced in the backbone network to improve the calculation efficiency and memory usage of the model;Then, replace the PSA attention mechanism at the end of the backbone network with CBAM attention mechanism, which improves the model's capture of local features;Finally, introduce CARAFE up-sampling module in the neck, with little parameter amount and calculation cost, the accuracy is significantly improved.Compared with the original network, while improving the detection accuracy, it reduces the floating-point calculation amount, and has a broad application prospect in promoting the automation of liquor production defect detection.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the problem of detecting defects in liquor bottle caps in the field of computer vision, and in particular to a method for detecting defects in liquor bottle caps based on an improved YOLOv10. Background Technology

[0002] Baijiu, a distilled spirit unique to China, was officially released on June 18th as part of the "2024 China Baijiu Market Mid-Term Research Report." The report indicates that there are currently 20 listed companies in the Chinese baijiu industry, with a total market capitalization of 3.51 trillion yuan. As a traditional industry, the baijiu sector has broad prospects for future development. However, traditional baijiu production still relies heavily on manual labor, making it a typical labor-intensive industry. The production process requires multiple manual operations, all demanding a large workforce. Furthermore, baijiu companies also need to perform subsequent processing steps such as bottling and packaging, which similarly require significant manual labor. Therefore, technological innovation is one of the key driving forces for the development of the baijiu industry.

[0003] Bottling is a crucial step in the production of baijiu (Chinese liquor). However, due to factors such as the production environment, equipment, and manufacturing processes, defects such as broken caps, cap breaks, damaged edges, twisted caps, deformed caps, air bubbles in labels, skewed labels, wrinkled labels, and abnormal coding can easily occur. Single-stage algorithms in deep learning offer significant solutions to these problems. Based on an end-to-end, regression-based approach, single-stage algorithms directly generate bounding boxes and class probabilities from the input image simultaneously. Examples include the YOLO (You Only Look Once) series and the SSD (Single Shot MultiBox Detector) algorithm. YOLO treats the entire detection process as a regression problem, quickly generating target bounding boxes and classes, encompassing versions from YOLOv1 in 2016 to the latest YOLOv10. SSD performs detection on feature maps at different scales, enabling it to handle multi-scale targets, similar to the YOLO series. Summary of the Invention

[0004] This invention proposes a method for detecting defects in liquor bottle caps based on an improved YOLOv10 network. The aim is to improve the accuracy of liquor bottle cap defect detection while reducing the number of network parameters, thereby promoting the automation of liquor bottle cap defect detection. Compared to the YOLOv10 network, the method of this invention improves detection accuracy, reduces the number of network parameters, and increases inference speed.

[0005] The present invention specifically adopts the following technical solution:

[0006] (1) This invention uses the YOLOv10s network as the basic network to improve the network. First, the loss function of the original model is replaced by the Slide Loss loss function.

[0007] (2) Introduce deformable convolution DCN into the C2f mechanism and replace the C2fCIB structure in the Backbone of YOLOv10;

[0008] (3) Introduce the LSKA attention mechanism in the Backbone part of YOLOv10 to enhance the SPPF structure of the original version;

[0009] (4) The CBAM attention mechanism is used to replace the PSA attention mechanism in the Backbone part of YOLOv10;

[0010] (5) Introduce the CARAFE upsampling operator in the Head part of YOLOv10 and replace the nn.Upsample structure. Attached Figure Description

[0011] Figure 1 This is the adaptive positive and negative sample threshold function for Slide Loss.

[0012] Figure 2 This is a structural diagram of the DCN module.

[0013] Figure 3 This is a structural diagram of the LSKA module.

[0014] Figure 4 This is a structural diagram of the CBAM module.

[0015] Figure 5 This is a structural diagram of the CARAFE module.

[0016] Figure 6 To improve the YOLOv10 network architecture diagram. Detailed Implementation

[0017] The present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be noted that the following embodiments are only used to further illustrate the present invention and should not be construed as limiting the scope of protection of the present invention. Those skilled in the art can make some non-essential improvements and adjustments to the present invention based on the above-described invention, and these improvements and adjustments should still fall within the scope of protection of the present invention.

[0018] A method for detecting defects in liquor bottle caps based on an improved YOLOv10 is described below:

[0019] (1) Slide Loss Function

[0020] The bottle cap defect dataset exhibits a significant imbalance between positive and negative samples, with substantial differences in their distribution, leading to a sample imbalance problem. In most cases, the number of positive samples is large, while negative samples are relatively sparse, impacting algorithm accuracy. Therefore, this paper introduces the Slide Loss function to address this imbalance between positive and negative samples.

[0021] The Slide Loss function distinguishes between positive and negative samples based on the IoU between the predicted and ground truth bounding boxes, and assigns higher weights to negative samples through a Slide weighting function. The design of the Slide function allows the model to better optimize samples at the boundaries and make fuller use of these samples to train the network, thereby increasing the model's attention to negative samples.

[0022] (2) Add DCN module

[0023] Defects generated during the bottling process of baijiu (Chinese liquor) can vary in size and form. Traditional convolution operations divide the feature map into squares of the same size as the convolution kernel before performing the convolution operation, resulting in a fixed position for each part on the feature map. This presents limitations for baijiu bottle cap defect detection tasks. Therefore, to mitigate the impact of image geometric transformations during convolution, this paper introduces DCNv2 into the Backbone-enhanced C2f module of YOLOv10.

[0024] Since the geometry of the modules used to construct convolutional neural networks is fixed, their ability to model geometric transformations is inherently limited. DCNv1 introduced two new modules to improve the modeling ability of convolutional neural networks for transformations: deformable convolution and deformable region of interest (ROI) pooling. However, adding offsets might introduce irrelevant information, affecting the final result. Therefore, DCNv2 proposed a modulated deformable convolution and feature simulation scheme to guide network training, improving upon DCNv1 by reducing interference from irrelevant information and increasing model accuracy.

[0025] (3) Add LSKA module

[0026] In tasks like detecting defects in liquor bottles and caps, where real-time performance is crucial, efficiency is another equally important metric as accuracy. SPPF in YOLOv10 is a spatial pyramid pooling layer from YOLOv8, used to aggregate features across multiple scales. To improve computational efficiency and memory usage, the LSKA (Large-Separable-Kernel-Attention) mechanism is introduced to enhance SPPF's ability to aggregate features at multiple scales, achieving comparable performance while reducing computational complexity and memory consumption.

[0027] By replacing the k×k convolutional kernel with cascaded 1×k and k×1 convolutional kernels, the quadratic increase in the number of parameters in LKA-trivial and LKAin VAN with increasing kernel size is effectively reduced. This solves the computational efficiency problem of depthwise convolutional kernels in LKA-trivial and LKAin VAN with increasing kernel size, without performance degradation. Compared to the small kernel in the original LKA, LSKA can benefit from a large kernel while maintaining the same inference time cost. Compared to ViTs and previous large-kernel CNNs, the features learned by the large kernel in the LSKA-based VAN encode more shape information and less texture. Furthermore, there is a high correlation between the amount of shape information encoded in the feature representation and robustness to different image distortions.

[0028] (4) Add CBAM module

[0029] The original PSA module in YOLOv10 requires a lot of computation, which leads to a sharp increase in computational resources and is not conducive to the efficiency of the model. At the same time, since it is necessary to perform global self-attention calculation on the entire image patch, it may lead to insufficient capture of local features, especially in tasks such as bottle cap defect detection that require in-depth analysis of details. By introducing the CBAM (Convolutional Block Attention Module)

[18] attention mechanism to replace the PSA module, its lightweight characteristics can ignore the overhead of the module and be seamlessly integrated into any CNN architecture, and can be trained end-to-end with the basic CNN.

[0030] To demonstrate the effectiveness of the algorithm, we conducted experimental verification. The network training used the SGD optimizer with an initial learning rate of 0.01 and a final learning rate of 0.0001, a weight decay of 0.0005, and 600 epochs. The momentum was 0.8 for the first three warm-up phases, then increased to 0.937. The image input size was 640*640 pixels. The Tianchi liquor bottle defect detection dataset was used to verify the algorithm's effectiveness. Background label samples were removed, retaining the entire dataset across 10 categories without secondary processing. This dataset contains 3370 images of bottle cap defects and was divided into training and testing sets in an 8:2 ratio. This dataset is a publicly available enterprise liquor production dataset from Tianchi. mAP50 was used as the evaluation metric for overall detection performance.

[0031] The experimental results are shown in Table 1. The present invention achieves an mAP50 of 76.8% on the Tianchi wine bottle defect detection dataset, which is more accurate than other comparative algorithms, thus verifying the effectiveness of the present invention.

[0032] Table 1 Performance Comparison of Various Network Models

[0033]

Claims

1. A method for detecting defects in liquor bottle caps based on an improved YOLOv10, characterized in that... Includes the following steps: (1) This invention patent uses the YOLOv10s network as the basic network to improve the network. First, the loss function of the original model is replaced by the Slide Loss loss function. (2) Introduce deformable convolution DCN into the C2f mechanism and replace the C2fCIB structure in the Backbone of YOLOv10; (3) Introduce the LSKA attention mechanism in the Backbone part of YOLOv10 to enhance the SPPF structure of the original version; (4) The CBAM attention mechanism is used to replace the PSA attention mechanism in the Backbone part of YOLOv10; (5) Introduce the CARAFE upsampling operator in the Head part of YOLOv10 to replace the nn.Upsample structure.

2. The method for detecting defects in liquor bottle caps based on an improved YOLOv10 as described in claim 1, characterized in that... In step (1), the Slide Loss function was used to replace the loss function of the original model. Due to the obvious imbalance between positive and negative samples in the bottle body and bottle cap dataset, and the large differences in the number distribution, the sample imbalance problem was caused. In most cases, the number of positive samples is large, while the number of negative samples is relatively sparse, which affects the accuracy of the algorithm. Therefore, the Slide Loss function was introduced to solve the imbalance problem between positive and negative samples.

3. The method for detecting defects in liquor bottle caps based on the improved YOLOv10 as described in claim 1, characterized in that... In step (2), deformable convolution DCN is introduced into the Backbone and integrated into the C2f mechanism to replace the C2fCIB structure. Defects generated during the bottling process of liquor have various sizes and forms. The traditional convolution operation divides the feature map into squares of the same size as the convolution kernel and then performs the convolution operation. The position of each part on the feature map is fixed, which has limitations for the task of detecting defects in liquor bottle body and cap. Therefore, in order to alleviate the influence of image geometric transformation during the convolution process, DCNv2 is introduced into the Backbone of YOLOv10 to enhance the C2f module.

4. The method for detecting defects in liquor bottle caps based on an improved YOLOv10 as described in claim 1, characterized in that... In step (3), the LSKA attention mechanism is introduced in the Backbone part to enhance the original version of the SPPF structure. In the task of detecting defects in the bottle cap of liquor, which pursues real-time detection, efficiency is another important indicator, just as important as accuracy. SPPF in YOLOv10 is the spatial pyramid pooling layer of YOLOv8, which is used to aggregate features at multiple scales. In order to improve computational efficiency and memory usage, the SPPF is improved by introducing the Large-Separable-Kernel-Attention, LSKA attention mechanism to enhance the SPPF module's ability to aggregate features at multiple scales. While achieving comparable performance, it can reduce computational complexity and memory usage.

5. The method for detecting defects in liquor bottle caps based on an improved YOLOv10 as described in claim 1, characterized in that... In step (4), the CBAM attention mechanism is used to replace the PSA attention mechanism in the backbone part. The original PSA module in YOLOv10 requires a lot of computation, which leads to a sharp increase in computing resources and is not conducive to the efficiency of the model. At the same time, since global self-attention calculation is required for the entire image patch, it may lead to insufficient capture of local features, especially in tasks such as bottle cap defect detection that require in-depth analysis of details. By introducing the Convolutional Block Attention Model and replacing the PSA module with the CBAM attention mechanism, its lightweight characteristics can ignore the overhead of the module and be seamlessly integrated into any CNN architecture. It can also be trained end-to-end with the basic CNN.

6. The method for detecting defects in liquor bottle caps based on the improved YOLOv10 as described in claim 1, characterized in that... In step (5), the CARAFE upsampling operator is introduced into the Head part of YOLOv10 to replace the nn.Upsample structure. The original nn.Upsample upsampling module in YOLOv10 performs a dot product between the upsampling kernel at each position and the corresponding neighboring pixels in the input feature map. It has disadvantages such as limited interpolation methods, limited ability to handle boundary problems, high space complexity, and inability to adapt. In order to further improve the accuracy of the liquor bottle cap defect detection task, the Content-Aware ReAssemblyof FEatures, CARAFE lightweight general upsampling operator is introduced to replace the nn.Upsample module. It obtains a larger receptive field in the upsampling recombination process. Compared with the nearest neighbor and bilinear upsampling operators, it achieves a significant improvement in accuracy with very few parameters and computational cost.