A glass sheet defect detection method based on improved YOLOv5

By constructing a polarization cross-modal enhancement network and utilizing polarization imaging technology and attention fusion strategy, the glass slide defect detection performance of YOLOv5 was improved, solving the problems of easy missed detection of small defects and interference from reflective background, and achieving efficient and accurate defect identification.

CN122243960APending Publication Date: 2026-06-19ZHEJIANG SCI-TECH UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
ZHEJIANG SCI-TECH UNIV
Filing Date
2026-03-23
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing glass slide defect detection methods based on YOLOv5 have insufficient feature extraction capabilities when detecting minute defects, making them prone to missed detections. Furthermore, surface reflection and background noise interference factors affect the distinction between defects and the background, resulting in poor detection performance.

Method used

A polarization cross-modal enhancement network is constructed to extract polarization features from the glass surface using polarization imaging technology. A polarization attention fusion strategy is then used in conjunction with YOLOv5 for defect detection, thus constructing a defect recognition network with a polarization attention-guided mechanism to improve the effectiveness of feature fusion and the accuracy of defect recognition.

Benefits of technology

It improves the accuracy and robustness of glass slide defect detection in complex scenarios, effectively identifies minute defects, reduces missed detections, and operates quickly, making it suitable for a wider range of industrial scenarios.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122243960A_ABST
    Figure CN122243960A_ABST
Patent Text Reader

Abstract

This invention provides a glass slide defect detection method based on an improved YOLOv5, comprising the following: (1) constructing a defect recognition dataset based on polarization images; (2) building a defect recognition network structure with a polarization attention guidance mechanism; (3) inputting data for network training; and (4) an active polarization imaging system for defect recognition. Compared with traditional target detection models applicable to light intensity images, this invention designs a novel network structure based on an optical model to extract the physical features of polarization images, utilizes the differences in polarization characteristics of different targets, and employs a polarization attention guidance method to enhance the accuracy of the recognition effect. It is not affected by the appearance shape of the target and has the ability to recognize multiple targets in complex environments.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of polarization imaging technology, and more specifically to a method for detecting glass slide defects based on an improved YOLOv5. Background Technology

[0002] As a crucial basic material in electronic equipment, optical instruments, and architectural decoration, the surface quality of glass sheets directly determines the performance and lifespan of the products. During the production and processing of glass sheets, factors such as the purity of raw materials, forming processes, the precision of processing equipment, and environmental dust can easily cause defects on the surface, such as scratches, bubbles, cracks, and impurities. These defects not only affect the appearance of the product but may also lead to a decrease in the mechanical properties of the glass sheet, a reduction in optical transmittance, and even safety hazards. Therefore, defect detection of glass sheets before they leave the factory is a critical step in ensuring product quality.

[0003] Traditional glass defect detection mainly relies on manual visual inspection, which suffers from low efficiency, high labor intensity, subjective inspection standards, and easy omission of minute defects, making it unable to meet the demands of modern industrial large-scale, high-precision production. With the development of deep learning technology, automated inspection methods based on object detection algorithms are gradually replacing manual inspection. Among them, YOLOv5 is widely used in industrial defect detection scenarios due to its advantages such as fast detection speed, lightweight model, and low deployment cost.

[0004] However, existing YOLOv5-based methods for detecting glass slide defects still have several shortcomings: First, the size of defects on the glass slide surface varies greatly, especially small defects such as micro-scratches and tiny bubbles, whose feature information is weak, making the traditional YOLOv5 feature extraction capability insufficient and prone to missed detections; Second, the original YOLOv5 feature fusion network (FPN+CSPNet) does not consider the differences in the contribution of features at different levels to defect detection when fusing features at different levels, resulting in insufficient effectiveness of feature fusion; Third, interference factors such as glass slide surface reflection and background noise can easily mask defect features, reducing the distinction between defects and the background. Therefore, it is urgent to make targeted improvements to the YOLOv5 network to enhance its performance in detecting glass slide defects. Summary of the Invention

[0005] To overcome the shortcomings of the existing technology, this invention provides a glass slide defect detection method based on an improved YOLOv5, which constructs a polarization cross-modal enhancement network. The encoder extracts the polarization features of the polarization image, then performs feature stitching, and finally obtains a prediction map for salient defect detection through a cross-modal attention fusion strategy.

[0006] To achieve the above objectives, the technical solution adopted by the present invention is as follows: a glass slide defect detection method based on improved YOLOv5, comprising the following steps:

[0007] Step 1: Construct a defect dataset of glass slide polarization images: First, preprocess and group the dataset, then label the dataset to mark the location and category of the target objects in the images, and construct raw data suitable for defect identification;

[0008] Step 2: Constructing a network structure for polarization attention-guided defect identification: The distribution law of polarization information of reflected light on the target surface is derived by combining the physical model of polarization bidirectional distribution function and Fresnel formula, and the mapping relationship between the distribution of physical properties of the target surface and polarization state information of reflected light is obtained. According to the refractive index and surface features of different defects, they are extracted into the network by polarization imaging technology, and the polarization defect identification network is constructed by combining YOLOv5.

[0009] Step 3: Input data for network training: The polarization component images of 0°, 45°, and 90° linear polarization are used as the input to the model. In the polarization state calculation layer (PSCL), matrix operations are performed on the polarization component images based on the Stokes vector to calculate the Stokes parameters, thereby obtaining the AOP and DOLP images. After entering the backbone network, the attention-based defect perception module (MLCA) extracts polarization attention information that is beneficial to defect recognition. Finally, the next layer, the detection head module, predicts the output.

[0010] Step 4: Multi-target defect identification in complex scenes: Train the network using the defect identification dataset described in Step 1 to obtain the weight file of the defect identification model. Use this training weight file to identify defects in objects in complex scenes, and use the Mean Average Precision (mAP) evaluation metric to measure the model performance.

[0011] Furthermore, the construction of the defect identification dataset mentioned in step one involves placing objects with different defects into the environment as target objects, capturing images using a split-plane camera under active light conditions, obtaining images in four polarization directions, extracting polarization information from them, and obtaining multi-dimensional polarization information such as polarization angle and degree of polarization in the images. At the same time, data interfered with by noise is removed. Based on the data filtering, data augmentation techniques such as data cropping, flipping, moving, and mirroring are used to expand the dataset and improve the generalization ability of the model.

[0012] Furthermore, the defect perception module in step three includes a pooling module and an attention module; the pooling module helps to learn polarization extraction from image defects; and the attention module helps to extract important defect information from the image.

[0013] Furthermore, the attention module includes a channel attention mechanism module and a spatial attention mechanism module; the channel attention mechanism module is used to learn polarization extraction in the channel, and the spatial attention mechanism module is used to extract important defect information at different locations in space.

[0014] Furthermore, the spatial attention mechanism module adopts a coordinate attention mechanism, which encodes the positional information of the feature map in both horizontal and vertical directions, enabling the network to better capture long-distance dependencies.

[0015] Furthermore, in step four, the weight file of the defect recognition model is generated by performing max pooling and average pooling along the channel dimension on the input polarization feature map in space to generate features at different context scales. This feature map is then processed by a convolutional layer to finally generate attention weights.

[0016] Furthermore, the attention weights are output and restricted to between 0 and 1 using a Sigmoid function. The resulting weights are then applied to the original feature map, weighting each location in the space to highlight key regions, and finally input into the detection module of the first layer.

[0017] Compared with the prior art, the present invention has the following beneficial effects:

[0018] In complex scenarios, especially when shape information is interfering, the polarization attention-guided defect recognition algorithm can effectively focus on the part of the defect that reflects the change in the refractive index of the defect. In contrast, mainstream target detection frameworks only focus on the pixel information of the RGB layer and do not utilize polarization information. Compared with this, our method has high recognition accuracy, fast running speed, and better robustness and generalization under the guidance of physical prior information, making it applicable to more general industrial scenarios. Attached Figure Description

[0019] Figure 1 This is a flowchart illustrating the overall implementation of the present invention;

[0020] Figure 2 This is a diagram of the experimental apparatus for the present invention;

[0021] Figure 3 A schematic diagram of the defect identification network based on the attention guidance mechanism.

[0022] Figure 4 MLCA network diagram for the defect-aware attention module;

[0023] Figure 5 This is a diagram showing the detection effect of the present invention;

[0024] Figure 6 This is a table showing the comparison results of the detection effects of this invention. Detailed Implementation

[0025] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0026] This invention discloses a method for detecting defects in glass slides based on an improved YOLOv5, such as... Figure 1 As shown, it includes the following steps:

[0027] Step 1: Construct a target detection dataset of polarized images and preprocess the dataset by grouping the images according to the different object shape categories. Then, label the images in the target detection dataset to mark the location and category of the target objects in the images, thus constructing the original data suitable for training the target detection model.

[0028] Step 2: Combine the polarization bidirectional reflection distribution function (pBRDF) model and the Fresnel reflection model to obtain the distribution law of polarization information of reflected light from the target surface, thereby obtaining the relationship expression between the physical properties of the target surface and the polarization information of reflected light from the target surface.

[0029] Different defective target surfaces have different physical properties. Polarization imaging technology can store this difference in polarization images. Based on the construction of convolutional neural networks, a network layer for extracting polarization defect information is built according to the expression.

[0030] By combining the mainstream target detection network framework—YOLOv5, a defect recognition network based on the polarization attention guidance mechanism is constructed. The target detection network model consists of a polarization state computation layer (PSCl), a defect perception module (MLCA), and a target detection network (YOLOv5).

[0031] Step 3: Using the polarization component images (0°, 45°, 90° linear polarization) as input to the model, the polarization state calculation layer performs matrix operations on the polarization component images based on the Stokes vector to calculate the Stokes parameters, thereby obtaining the AOP and DOLP images. Then, the AOP and DOLP images are input into the YOLOv5 network. After entering the backbone network, they pass through the attention-based defect perception module (MLCA) to extract polarization attention information that is beneficial to defect recognition, and output a polarization saliency feature map. Finally, prediction is performed in the detection head module. The detection head module has three detection heads of different sizes, which are used to identify objects of different sizes.

[0032] Step 4: Train the network using the target detection dataset described in Step 1 to obtain the weight file of the polarization attention defect recognition model. Use this trained weight file to identify defects in objects in complex scenes. Use the Mean Average Precision (mAP) evaluation metric to measure the model performance and inference speed. The output model prediction results include the target object location, confidence level, and category information.

[0033] In the above steps, using through , , The polarization component images (0°, 45°, 90° linear polarization) are used as input to the model, and the Stokes vector parameters are calculated; then, the formula is used... , The polarization angle and linear polarization degree are calculated; then AOP and DOLP are input into the YOLOv5 network. After entering the backbone network, the polarization attention information that is beneficial to defect recognition is extracted by the attention-based defect perception module (MLCA), and the output is a polarization saliency feature map.

[0034] According to the theory of micro-surface elements, the complex surface of a target object is composed of a series of tiny smooth surfaces, all of which satisfy Fresnel's law of reflection. This means that at every angle of entry into the camera... Since the refractive index of the medium can be approximated as a known constant value, it is only related to the degree of polarization of the reflected light. The degree of polarization (DOPt) of the reflected light can be expressed by the following formula:

[0035]

[0036] Based on this principle, physical modeling and analysis of input polarization images help us identify defects. Polarization angle images can effectively describe different surface orientations. Based on this principle, physical modeling and analysis of input polarization images help us identify defects in complex objects. Based on the physical properties of different defects, such as refractive index and surface features, polarization imaging technology is used to extract them into the network. Combined with YOLOv5, a polarization defect identification network is constructed.

[0037] Further, the method for constructing the defect identification dataset in step one is as follows: the laser emitted by the light source 2 illuminates the target object plane 3, the polarization component images of the target object 3 at different angles and positions are captured by the polarization camera 1, the defect identification dataset of the original polarization image is obtained, the multi-dimensional polarization information such as the polarization angle and polarization degree required in the image is obtained, the noisy image is filtered out, the data is subjected to data enhancement operations such as flipping and mirroring cropping, and the captured dataset is saved to the PC computer 4.

[0038] Furthermore, the defect perception module in step three includes a pooling module and an attention module; the pooling module helps to learn polarization extraction from image defects; and the attention module helps to extract important defect information from the image.

[0039] Furthermore, the attention module includes a channel attention mechanism module and a spatial attention mechanism module. The attention module calculates attention weights to determine which parts are important. The channel attention mechanism module helps learn the parts in the channel that are beneficial for polarization extraction, while the spatial attention mechanism module helps extract important defect information at different locations in space.

[0040] In the spatial attention mechanism module, this invention employs a coordinate attention mechanism. The coordinate attention mechanism, by encoding positional information in both the horizontal and vertical directions of the feature map, enables the network to better capture long-range dependencies. Specifically, for the input feature map... Global average pooling is performed along its width (w) and height (h) directions respectively to generate a pair of orientation-aware feature maps:

[0041] At height h, the width direction is encoded as follows:

[0042] At width w, the encoding in the height direction is:

[0043] Then, the concatenation and convolution are performed, and the data is fused and dimensionality reduced using a shared 1×1 convolution transformation function, followed by processing with a non-linear activation function δ (Sigmoid).

[0044]

[0045] Next, the fused feature map f is split into two independent tensors along the spatial direction. and And through 1×1 convolution respectively and The transformation is performed, and finally the Sigmoid activation function is applied. Generate attention weights in the height and width directions. and :

[0046]

[0047]

[0048] Finally, the generated attention weights are applied to the original input feature map. For each position (i, j), the weighted output feature map is obtained:

[0049]

[0050] Through the above process, the coordinate attention mechanism can effectively highlight the polarization characteristics of the defect region and suppress irrelevant background information, thereby improving the network's accuracy in identifying defects.

[0051] Furthermore, in step four, the weight file of the defect recognition model is generated by performing max pooling and average pooling along the channel dimension on the input polarization feature map in space to generate features at different context scales. This feature map is then processed by a convolutional layer to finally generate attention weights.

[0052] Furthermore, the attention weights are constrained to the range of 0-1 using a sigmoid function. These weights are then applied to the original feature map, weighting each location in the space to highlight key regions, before being input into a single detection layer. This module adaptively learns channel and spatial attention weights to improve the feature representation capability of the convolutional neural network. Subsequently, a fusion layer fuses the attention weights and feature map to generate the output image. In this process, the attention weights are used to weight the feature map, highlighting important polarization components in the defect recognition task, thereby improving the performance of the defect recognition task.

[0053] Figure 3 This diagram illustrates the defect recognition network based on an attention-guided mechanism. Input linearly polarized images at 0°, 45°, and 90° are processed by a polarization state calculation layer to obtain polarization angle and degree images. These are then processed by the backbone network for feature extraction. C2f, primarily the feature extraction and fusion module, combines with convolutional layers, Botneck blocks (conv), and Concat blocks to achieve efficient feature extraction, enhancement, and fusion of the input image, providing rich feature representations for subsequent detection and classification tasks. MLCA is... Figure 4 The designed defect perception module mainly processes the input feature map, extracts polarization information that is beneficial for defect identification, and outputs a weighted polarization salient feature map. The Detect head is the last part of the network, responsible for predicting bounding boxes, class confidence, and bounding box coordinates based on the extracted features. The three detect heads correspond to large, medium, and small detector heads, respectively.

[0054] Figure 4This is the MLCA network for the defect perception module. The MLCA (Mixed Local Channel Attention) process begins by simultaneously performing Local Average Pooling (LAP) and Global Average Pooling (GAP) on the input feature map to extract local spatial information and global contextual information, respectively. Then, the results of the two pooling branches are processed by one-dimensional convolution (Conv1D) to transform the features. Subsequently, the spatial resolution of the local feature branches is restored through unpooling, and the size of the global feature branches is matched with it through upsampling. Then, the processed local and global features are weighted and fused, and a hybrid attention map is generated by activating the sigmoid function. Finally, the attention map is multiplied element-wise with the original input feature map to obtain the weighted output features.

[0055] Figure 5 This is an example image for predicting complex scenes. The image shows a small, pit-like defect. Using previously trained weights for defect identification, the model outputs defect type information, confidence level information, and location information. It can be seen that this algorithm, utilizing a polarization-attention physical model, can effectively detect defects without being affected by shape information, providing a new solution for the field of defect identification.

[0056] Figure 6 This is a quantitative evaluation graph of the detection results. The Means Average Precision (mAP) metric is used to measure the model's performance and inference speed. The experimental results clearly show that the model has higher detection accuracy compared to traditional object detection models YOLOv5 and V3, as well as the two-stage detection algorithm FastRCNN.

[0057] Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing embodiments or make equivalent substitutions for some of the technical features. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.

Claims

1. A method for detecting defects in glass slides based on an improved YOLOv5, characterized in that, Includes the following steps: Step 1: Construct a defect dataset of glass slide polarization images: First, preprocess and group the dataset, then label the dataset to mark the location and category of the target objects in the images, and construct raw data suitable for defect identification; Step 2: Constructing a network structure for polarization attention-guided defect identification: The distribution law of polarization information of reflected light on the target surface is derived by combining the physical model of polarization bidirectional distribution function and Fresnel formula, and the mapping relationship between the distribution of physical properties of the target surface and polarization state information of reflected light is obtained. According to the refractive index and surface features of different defects, they are extracted into the network by polarization imaging technology, and the polarization defect identification network is constructed by combining YOLOv5. Step 3: Input data for network training: The polarization component images of 0°, 45°, and 90° linear polarization are used as the input to the model. In the polarization state calculation layer (PSCL), matrix operations are performed on the polarization component images based on the Stokes vector to calculate the Stokes parameters, thereby obtaining the AOP and DOLP images. After entering the backbone network, the attention-based defect perception module (MLCA) extracts polarization attention information that is beneficial to defect recognition. Finally, the next layer, the detection head module, predicts the output. Step 4: Multi-target defect identification in complex scenes: Train the network using the defect identification dataset described in Step 1 to obtain the weight file of the defect identification model. Use this training weight file to identify defects in objects in complex scenes, and use the Mean Average Precision (mAP) evaluation metric to measure the model performance.

2. The glass slide defect detection method based on improved YOLOv5 according to claim 1, characterized in that: The construction of the defect identification dataset mentioned in step one: objects with different defects are placed in the environment as target objects, and images are captured by a focal plane camera (Dofp) under active light conditions to obtain images in four polarization directions. The polarization information is extracted from the images to obtain multi-dimensional polarization information such as polarization angle and polarization degree. At the same time, data interfered with by noise is removed. Based on the data screening, data augmentation techniques such as data cropping, flipping, moving, and mirroring are used to expand the dataset and improve the generalization ability of the model.

3. The glass slide defect detection method based on improved YOLOv5 according to claim 2, characterized in that: Step 3, the defect perception module includes a pooling module and an attention module; the pooling module helps to learn polarization extraction from image defects; the attention module helps to extract important defect information from the image.

4. The glass slide defect detection method based on improved YOLOv5 according to claim 3, characterized in that: The attention module includes a channel attention mechanism module and a spatial attention mechanism module; the channel attention mechanism module is used to learn polarization extraction in the channel, and the spatial attention mechanism module is used to extract important defect information at different locations in space.

5. The glass slide defect detection method based on improved YOLOv5 according to claim 4, characterized in that: The spatial attention mechanism module adopts a coordinate attention mechanism, which encodes the positional information of the feature map in both horizontal and vertical directions, enabling the network to better capture long-distance dependencies.

6. The glass slide defect detection method based on improved YOLOv5 according to claim 1, characterized in that: In step four, the weight file of the defect recognition model is generated by performing max pooling and average pooling along the channel dimension on the input polarization feature map in space to generate features at different context scales. This feature map is then processed by a convolutional layer to finally generate attention weights.

7. The glass slide defect detection method based on improved YOLOv5 according to claim 6, characterized in that: The attention weights are output through a Sigmoid function and limited to between 0 and 1. The resulting weights are then applied to the original feature map, weighting each location in the space to highlight key regions. Finally, the weights are input into the detection module of the first layer.