Method for training defect detection model, defect detection method and device

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By using a dual-path defect detection model to extract features and adjust parameters from glass images, the problem of low efficiency in glass defect detection is solved, and efficient and accurate defect identification is achieved.

CN116309272BActive Publication Date: 2026-06-19SHENZHEN XUMI YUNTU SPACE TECH CO LTD

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: SHENZHEN XUMI YUNTU SPACE TECH CO LTD
Filing Date: 2022-12-08
Publication Date: 2026-06-19

Application Information

Patent Timeline

08 Dec 2022

Application

19 Jun 2026

Publication

CN116309272B

IPC: G06T7/00; G06V10/82; G06V10/764; G06V10/52; G06V10/80; G06N3/0464; G06N3/048; G06N3/08

AI Tagging

Application Domain

Image analysis Character and pattern recognition

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing technologies have low efficiency in glass defect detection, and manual inspection is prone to missed or incorrect detections.

Method used

A dual-path defect detection model is adopted. The first network is used to quickly locate the glass area, and the second network is used for fine classification. By extracting features from low-resolution and high-resolution images and adjusting parameters by combining loss values, a defect detection model that meets the preset conditions is obtained.

Benefits of technology

It enables rapid and accurate glass defect detection, improving detection efficiency and accuracy.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN116309272B_ABST

Patent Text Reader

Abstract

The present disclosure relates to the technical field of computers, and provides a training method of a defect detection model, a defect detection method and device. The training method comprises: obtaining a first image and a second image which are the same as the image content of a glass image; inputting the first image into a first network to obtain a first feature map of multiple scales and a first probability map; the first network comprises a first number of basic extraction units; inputting the second image into a second network and inputting the first feature map of different target scales into different first feature extraction networks of the second network to obtain a second feature map; the second network comprises a second number of basic extraction units, and the second number is greater than the first number; determining a defect detection result based on the last obtained first feature map, first probability map and second feature map; and performing parameter adjustment based on a loss value determined by the defect detection result and defect annotation data to obtain a defect detection model meeting a preset condition, and the performance of the defect detection model is higher.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of computer technology, and in particular to training methods for defect detection models, defect detection methods, and apparatus. Background Technology

[0002] Glass may have various defects during the actual production process, such as breakage, bubbles, and impurities. Therefore, in order to ensure the quality of glass used in subsequent applications, defect detection is necessary. Currently, defect detection of glass is often done manually. However, manual inspection is not only labor-intensive but also prone to missed or incorrect detections, resulting in low efficiency in existing glass defect detection methods. Summary of the Invention

[0003] In view of this, the present disclosure provides a training method for a defect detection model, a defect detection method, and an apparatus to solve the technical problem of low efficiency in glass defect detection in the prior art.

[0004] A first aspect of this disclosure provides a method for training a defect detection model, comprising:

[0005] For each glass image, obtain a first image and a second image with the same image content as the glass image; the resolution of the first image is lower than the resolution of the second image;

[0006] The first image is input into the first network to obtain a multi-scale first feature map and a first probability map; the first network includes a first number of basic extraction units.

[0007] The second image is input into the second network, and the first feature maps of different target scales are input into different first feature extraction networks of the second network to obtain the second feature map output by the last first feature extraction network; the multiple first feature extraction networks of the second network are connected sequentially, each first feature extraction network includes at least one basic extraction unit, and the second network includes a second number of basic extraction units, the second number being greater than the first number;

[0008] Based on the first feature map, the first probability map, and the second feature map obtained at the end, the defect detection result is determined; and based on the loss value determined by the defect detection result and the defect annotation data, the parameters of the first network and the second network are adjusted to obtain a defect detection model including the first network and the second network that meets the preset conditions.

[0009] A second aspect of this disclosure provides a defect detection method, comprising:

[0010] Acquire a first detection image and a second detection image that have the same image content as the image of the glass to be detected; the resolution of the first detection image is lower than the resolution of the second image;

[0011] The first detected image is input into the first network of the defect detection model, and the second detected image is input into the second network of the defect detection model to obtain the defect detection result.

[0012] The defect detection model was trained using the method described above.

[0013] A third aspect of this disclosure provides a training apparatus for a defect detection model, comprising:

[0014] The image acquisition module is configured to acquire a first image and a second image with the same image content as each glass image; the resolution of the first image is lower than the resolution of the second image.

[0015] The first processing module is configured to input the first image into the first network to obtain a multi-scale first feature map and a first probability map; the first network includes a first number of basic extraction units.

[0016] The second processing module is configured to input the second image into the second network and input first feature maps of different target scales into different first feature extraction networks of the second network to obtain the second feature map output by the last first feature extraction network; the multiple first feature extraction networks of the second network are connected sequentially, each first feature extraction network includes at least one basic extraction unit, and the second network includes a second number of basic extraction units, the second number being greater than the first number;

[0017] The parameter adjustment module is configured to determine the defect detection result based on the last acquired first feature map, first probability map and second feature map; and adjust the parameters of the first network and the second network based on the loss value determined by the defect detection result and defect annotation data to obtain a defect detection model including the first network and the second network that meets the preset conditions.

[0018] A fourth aspect of this disclosure provides a defect detection apparatus, comprising:

[0019] The data acquisition module is configured to acquire a first detection image and a second detection image that have the same image content as the image of the glass to be detected; the resolution of the first detection image is lower than the resolution of the second image;

[0020] The data input module is configured to input the first detected image into the first network of the defect detection model and the second detected image into the second network of the defect detection model to obtain the defect detection result.

[0021] The defect detection model was trained using the method described above.

[0022] A fifth aspect of this disclosure provides an electronic device including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the method described above.

[0023] A sixth aspect of this disclosure provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the steps of the above-described method.

[0024] The beneficial effects of the embodiments disclosed herein compared to the prior art are as follows:

[0025] By processing the glass image, a first image and a second image corresponding to the glass image are determined. The first image and the second image have the same image content as the corresponding glass image, and the resolution of the first image is lower than that of the second image. The low-resolution first image is input into a first network to obtain a multi-scale first feature map and a first probability map. Each probability value in the first probability map represents the probability that the corresponding pixel is a glass region. The first network includes a first number of basic extraction units, which are used for feature extraction. The high-resolution second image is input into a second network, and the first feature maps of different target scales are input into different first feature extraction networks of the second network to obtain a second feature map output by the last first feature extraction network. The multiple first feature extraction networks of the second network are sequentially connected, and each first feature extraction network includes at least one basic extraction unit. The number of basic extraction units in the second network is a second number, which is greater than the first number. Further, based on the finally obtained first feature map, first probability map, and second image, the defect detection result is determined. Based on the loss value determined by the defect detection result and defect annotation data, the parameters of the first network and the second network are adjusted to obtain a defect detection model including the first network and the second network that meets the preset conditions. The obtained defect detection model consists of two networks: a first network and a second network. The first network has fewer basic extraction units than the second network, indicating that the first network is a fast branch network and the second network is a slow branch network. The first network is mainly used to locate glass regions in the image, while the second network is used for fine classification. The first feature map obtained from the first network is introduced into the second network, which allows the second network to obtain a richer receptive field and more feature information, which is beneficial for obtaining a high-performance defect detection model. Using this defect detection model, defects in glass images can be identified quickly and accurately, with high defect detection efficiency and accuracy. Attached Figure Description

[0026] To more clearly illustrate the technical solutions in the embodiments of this disclosure, the accompanying drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this disclosure. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0027] Figure 1 This is a flowchart illustrating a training method for a defect detection model provided in an embodiment of this disclosure;

[0028] Figure 2 This is a schematic diagram of the structure of a defect detection model provided in an embodiment of this disclosure;

[0029] Figure 3 This is a schematic flowchart of a defect detection method provided in an embodiment of this disclosure;

[0030] Figure 4 This is a schematic diagram of the structure of a training device for a defect detection model provided in an embodiment of this disclosure;

[0031] Figure 5 This is a schematic diagram of the structure of a defect detection device provided in an embodiment of this disclosure;

[0032] Figure 6 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this disclosure. Detailed Implementation

[0033] In the following description, specific details such as particular system architectures and techniques are set forth for illustrative purposes and not for limitation, so as to provide a thorough understanding of the embodiments of this disclosure. However, those skilled in the art will understand that this disclosure may also be implemented in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, apparatuses, circuits, and methods have been omitted so as not to obscure the description of this disclosure with unnecessary detail.

[0034] Figure 1 This is a flowchart illustrating a training method for a defect detection model provided in an embodiment of this disclosure.

[0035] Figure 1 The image-based glass defect detection method can be executed by a server, and the method includes:

[0036] S101, for each glass image, obtain a first image and a second image with the same image content as the glass image; the resolution of the first image is less than the resolution of the second image.

[0037] S102, the first image is input into the first network to obtain a multi-scale first feature map and a first probability map; the first network includes a first number of basic extraction units.

[0038] S103, the second image is input into the second network, and the first feature maps of different target scales are input into different first feature extraction networks of the second network to obtain the second feature map output by the last first feature extraction network; the multiple first feature extraction networks of the second network are connected sequentially, each first feature extraction network includes at least one basic extraction unit, and the second network includes a second number of basic extraction units, the second number being greater than the first number.

[0039] S104. Based on the first feature map, the first probability map and the second feature map obtained last, the defect detection result is determined; and the parameters of the first network and the second network are adjusted based on the loss value determined by the defect detection result and the defect annotation data to obtain a defect detection model including the first network and the second network that meets the preset conditions.

[0040] Specifically, images of the glass are acquired to obtain glass images. Each acquired glass image is processed to obtain a first image and a second image corresponding to that glass image. The first image, the second image, and the glass image have the same image content, and the resolution of the first image is lower than that of the second image. For example, given a glass image A, the resolution of glass image A is reduced to obtain a first image and a second image corresponding to glass image A. The resolution of the first image is (224, 224), and the resolution of the second image is (448, 448).

[0041] It should be noted that the first image and the second image correspond to the glass images. For each glass image in the sample dataset, there exists a corresponding first image and second image. The image content of the first image and the second image from different glass images are unrelated. For example, there is a glass image A, and corresponding first image A and second image A; there is a glass image B, and corresponding first image B and second image B. The image content of glass image A, first image A, and second image A are the same; the image content of glass image B, first image B, and second image B are the same; the image content of first image A and second image B are unrelated; and the image content of first image B and second image A are unrelated.

[0042] Furthermore, the first image is input into the first network, and the first network performs multi-scale feature extraction on the first image to obtain a multi-scale first feature map and a first probability map. The probability values in the first probability map are used to indicate the probability that the corresponding pixel is a glass region. Therefore, the first probability map can be used to locate the glass region in the image. The first network includes a first number of basic extraction units, which are used for feature extraction.

[0043] Specifically, the first network includes multiple second feature extraction networks, each of which includes at least one basic extraction unit; the network structures of different second feature extraction networks can be the same or different, the multiple second feature extraction networks are connected sequentially, and each second feature extraction network outputs a first feature map.

[0044] Furthermore, the second image is input into the second network, which performs feature extraction on the second image. The second network comprises multiple sequentially connected first feature extraction networks, each including at least one basic extraction unit. The second network includes a second number of basic extraction units, which is greater than the first number. Because each basic extraction unit can be used for feature extraction, the more basic extraction units there are, the more complex the network structure becomes, the more refined the feature extraction, and the longer the feature extraction time. Therefore, the first network is a fast-branch network, used for rapid localization of glass regions in lower-resolution images; the second network is a slow-branch network, primarily used for refined classification.

[0045] Furthermore, the first feature maps of different target scales are input into different first feature extraction networks of the second network, thereby introducing the first feature maps into the second network. This allows the second network to obtain a richer receptive field and more feature information, making the second feature map output by the last first feature extraction network more expressive and providing a prerequisite for obtaining a defect detection model with better performance.

[0046] The multi-scale first feature map includes the target-scale first feature map. In one possible implementation, the target-scale first feature map is the same as the multi-scale first feature map, and each first feature map obtained by the first network is introduced into the second network. In another possible implementation, the target-scale first feature map is a subset of the multi-scale first feature maps. In this case, the target scale is determined beforehand, and the target-scale first feature map is then input into the second network. Typically, because the first feature maps extracted by the shallow layers of the first network have less significance for the second network during feature extraction, they may not be input into the second network.

[0047] For example, the first network includes four feature extraction stages, with each feature extraction stage corresponding to a second feature extraction network. That is, there are second feature extraction networks A, B, C, and D. After the first image is input into the first network, the first feature map A output by the second feature extraction network A, the first feature map B output by the second feature extraction network B, the first feature map C output by the second feature extraction network C, and the first feature map D output by the second feature extraction network D are obtained. The first feature map A is not input into the second network, while the first feature maps B, C, and D are used as first feature maps at the target scale and input into different first feature extraction networks of the second network.

[0048] Furthermore, based on the finally acquired first feature map, first probability map, and second feature map, the defect detection result is jointly determined. This defect detection result considers both the data obtained from the first network and the data obtained from the second network, resulting in high accuracy. Defect-labeled data is pre-acquired, and a loss value is determined based on the defect detection result and the defect-labeled data. Based on this loss value, the parameters of the first network and the second network are adjusted to obtain a defect detection model that meets preset conditions. This defect detection model includes a first network and a second network, where the preset conditions are pre-set training termination conditions, such as achieving a preset accuracy or reaching a set number of training iterations.

[0049] According to the technical solution provided in this disclosure, a first image and a second image corresponding to the glass image are determined by processing the glass image. The first image and the second image have the same image content as the corresponding glass image, and the resolution of the first image is lower than that of the second image. The low-resolution first image is input into a first network to obtain a multi-scale first feature map and a first probability map. Each probability value in the first probability map is used to characterize the probability that the corresponding pixel is a glass region. The first network includes a first number of basic extraction units, which are used for feature extraction. The high-resolution second image is input into a second network, and the first feature maps of different target scales are input into different first feature extraction networks of the second network to obtain a second feature map output by the last first feature extraction network. The multiple first feature extraction networks of the second network are sequentially connected, and each first feature extraction network includes at least one basic extraction unit. The number of basic extraction units included in the second network is a second number, which is greater than the first number. Further, based on the last obtained first feature map, first probability map, and second image, a defect detection result is determined, and the parameters of the first network and the second network are adjusted based on the loss value determined by the defect detection result and defect annotation data to obtain a defect detection model including the first network and the second network that meets preset conditions. The obtained defect detection model consists of two networks: a first network and a second network. The first network has fewer basic extraction units than the second network, indicating that the first network is a fast branch network and the second network is a slow branch network. The first network is mainly used to locate glass regions in the image, while the second network is used for fine classification. The first feature map obtained from the first network is introduced into the second network, which allows the second network to obtain a richer receptive field and more feature information, which is beneficial for obtaining a high-performance defect detection model. Using this defect detection model, defects in glass images can be identified quickly and accurately, with high defect detection efficiency and accuracy.

[0050] In some embodiments, S102 inputs the first image into the first network to obtain a multi-scale first feature map and a first probability map, including:

[0051] S1021, the first image is input into the first network, and features are extracted using multiple second feature extraction networks of the first network to obtain a multi-scale first feature map; the first feature map output by any second feature extraction network is used as the input data of the next second feature extraction network; each second feature extraction network includes at least one basic extraction unit.

[0052] S1022, Based on the first feature map obtained last, obtain the first probability map.

[0053] Specifically, the first network includes multiple second feature extraction networks, each of which includes at least one basic extraction unit. Feature extraction is performed using the multiple second feature extraction networks to obtain a multi-scale first feature map. The first feature map output by any second feature extraction network is used as the input data for the first and second feature extraction networks, and the multiple second feature extraction networks are sequentially connected.

[0054] like Figure 2 As shown, the first network includes four second feature extraction networks, and each second feature extraction network includes two basic extraction units ( Figure 2 In the 2-block structure, four second feature extraction networks are sequentially connected. The first feature map A output by second feature extraction network A is input to second feature extraction network B, the first feature map B output by second feature extraction network B is input to second feature extraction network C, and the first feature map C output by second feature extraction network C is input to second feature extraction network D, resulting in the first feature map output by second feature extraction network D. Figure 2 (M1 in the middle).

[0055] Furthermore, based on the first feature map obtained at the end, i.e., the first feature map output by the last second feature extraction network, a first probability map is obtained. For example... Figure 2 As shown, the first probability map is obtained by applying a sigmoid function to M1.

[0056] According to the technical solution provided in the embodiments of this disclosure, feature extraction is performed sequentially using a plurality of second feature networks, each including at least one basic extraction unit, to ensure the accuracy and richness of the acquired features, which is beneficial for achieving rapid localization of the glass region of an image.

[0057] In some embodiments, S103 inputs the second image into the second network and inputs first feature maps of different target scales into different first feature extraction networks of the second network to obtain the second feature map output by the last first feature extraction network, including:

[0058] S1031, the second image is input into the second network, and the first feature maps of different target scales are input into different first feature extraction networks of the second network.

[0059] S1032, for each target feature extraction network, the target feature extraction network is any first feature extraction network that has been input with the first feature map; the input first feature map is upsampled, and the upsampled first feature map is fused with the extracted data of the target feature extraction network to obtain fused data; when there is a next first feature extraction network, the fused data is input into the next first feature extraction network to obtain the second feature map output by the last first feature extraction network.

[0060] Specifically, the second image is input into the first network, and first feature maps of different target scales are input into different first feature extraction networks of the second network to introduce first feature maps of the target scale into the second network. The first feature extraction network with the input first feature map is regarded as the target feature extraction network. For each target feature extraction network, the input first feature map is upsampled, and the upsampled first feature map is fused with the extracted data of the target feature extraction network to obtain fused data. When there is a next first feature extraction network, the fused data is input into the next first feature extraction network until the second feature map output by the last first feature extraction network is obtained.

[0061] For example, such as Figure 2 As shown, the first network includes a second feature extraction network A, a second feature extraction network B, a second feature extraction network C, and a second feature extraction network D; the second network includes a first feature extraction network E, a first feature extraction network F, a first feature extraction network G, and a first feature extraction network H.

[0062] The first feature map B, the first feature map C, and the first feature map D are input into different first feature extraction networks of the second network, that is, the first feature map B is input into the first feature extraction network F, the first feature map C is input into the first feature extraction network G, and the first feature map D is input into the first feature extraction network H.

[0063] When the first feature map B is input into the first feature extraction network F, the first feature map B is upsampled to obtain the upsampled first feature map B. The upsampled first feature map B is then fused with the extracted data output by the first feature extraction network F to obtain the first fused data. The first fused data is then input into the first feature extraction network G.

[0064] When the first feature map C is input into the first feature extraction network G, the first feature map C is upsampled to obtain the upsampled first feature map C. The upsampled first feature map C is then fused with the extracted data output by the first feature extraction network G to obtain the second fused data. The second fused data is then input into the first feature extraction network H.

[0065] When the first feature map D is input into the first feature extraction network H, the first feature map D is upsampled to obtain the upsampled first feature map D. The upsampled first feature map D is then fused with the extracted data output by the first feature extraction network H to obtain the third fused data. This third fused data is the second feature output by the last first feature extraction network.

[0066] According to the technical solution provided in the embodiments of this disclosure, when the first feature map of the target scale is input into the second network, it is fused with the second feature map, which is beneficial to extracting richer semantic information and to obtaining a defect detection model with higher performance.

[0067] In some embodiments, S104 determines the defect detection result based on the last acquired first feature map, first probability map, and second feature map, including:

[0068] S1041, Set the probability values in the first probability graph that are less than the probability threshold to preset values to obtain the second probability graph.

[0069] S1042, after upsampling the second probability map, multiply it with the second feature map, and then perform self-attention learning calculation to obtain the third feature map.

[0070] S1043, Based on the last acquired first and third feature maps, determine the defect detection result.

[0071] Specifically, a probability threshold (e.g., 0.5) and a preset value (e.g., 0) are pre-set. After obtaining the first probability map, each probability value in the first probability map is compared with the probability threshold. When the probability value is less than the probability threshold, the pixel is considered to correspond to a non-glass area; when the probability value is greater than the probability threshold, the pixel is considered to correspond to a glass area. By setting the probability values less than the probability threshold to the preset value, pixels in non-glass areas are occluded. Therefore, the obtained second probability map can be used to accurately locate glass areas in the image. The second probability map is upsampled to obtain a third probability map of the same size as the second feature map. The third probability map is multiplied by the second feature map, and then self-attention learning is performed to obtain the third feature map. The third feature map contains rich information about the glass area and has strong expressive power. Based on the obtained first and third feature maps, an accurate defect detection result is determined.

[0072] According to the technical solution provided in this disclosure, pixels in the first probability map that are less than a probability threshold are occluded using preset values to obtain a second probability map that can accurately locate the glass region. Then, the second probability map and the second feature map are multiplied together, and self-attention learning is performed to obtain a third feature map that can also locate the glass region. When determining the defect detection result, the last obtained first and third feature maps are considered to accurately classify defects in the glass region, resulting in an accurate defect detection result.

[0073] In some embodiments, S1043, based on the last acquired first feature map and third feature map, a defect detection result is determined, including:

[0074] S10431, Perform global max pooling on the last obtained first feature map to obtain the pooled first feature map.

[0075] S10432, perform global average pooling on the third feature map to obtain the pooled third feature map.

[0076] S10433: The first and third pooled feature maps are stacked and then connected to the fully connected layer and the classification layer to obtain the defect detection results.

[0077] Specifically, global max pooling is performed on the first feature map obtained last to obtain the pooled first feature map; global average pooling is performed on the third feature map to obtain the pooled third feature map; the pooled first feature map and the pooled third feature map are stacked (i.e., a concat operation is performed) and connected to a fully connected layer, where the number of fully connected layers can be more than one, such as two layers; the data output from the last fully connected layer is input into the classification layer so that the classification layer outputs a more accurate defect detection result.

[0078] According to the technical solution provided in the embodiments of this disclosure, when determining the defect detection result, not only the third feature map is considered, but the first feature map is also introduced, which is beneficial to obtaining a more accurate defect detection result.

[0079] In some embodiments, each basic extraction unit processes the fourth feature map of the input basic extraction unit based on the following steps:

[0080] The fourth feature map is subjected to the first convolution calculation, the first activation operation, the second convolution calculation, the second activation operation, and the third convolution calculation in sequence to obtain the fifth feature map;

[0081] The fourth and fifth feature maps are fused to obtain the sixth feature map output by the basic extraction unit. The size and channel number information of the fourth feature map are the same as those of the sixth feature map.

[0082] Specifically, the first network includes multiple second feature extraction networks, each second feature extraction network including at least one basic extraction unit; the second network includes multiple first feature extraction networks, each first feature extraction network including at least one basic extraction unit; the basic extraction unit is the basic structure of the second and first feature extraction networks, and different second feature extraction networks may include the same or different numbers of basic extraction units; different first feature extraction networks may include the same or different numbers of basic extraction units. The basic extraction units of the second and first feature extraction networks have the same structure, and each basic extraction unit performs feature extraction on the data input to that basic extraction unit. The data input to the basic extraction unit is denoted as the fourth feature map, and the data output to the basic extraction unit is denoted as the sixth feature map.

[0083] Each basic extraction unit performs the first convolution calculation, the first activation operation, the second convolution calculation, the second activation operation, and the third convolution calculation on the input fourth feature map in sequence to obtain the fifth feature map. Then, the fourth and fifth feature maps are fused to obtain the sixth feature map.

[0084] For example, using the Bottleneck Block from the classic ResNet50 network as the basic extraction unit, the input fourth feature map f in f in The parameters are (B, C, H, W), where B represents the batch size, C represents the number of channels, H represents the height, and W represents the width. First, a convolution operation (Conv) with a 1x1 kernel and C / 4 channels is performed; then, a ReLU activation operation is performed (the first activation operation); next, a convolution operation (Conv) with a 3x3 kernel and C / 4 channels is performed (the second convolution operation); then, a ReLU activation operation is performed (the second activation operation); finally, a convolution operation (Conv) with a 1x1 kernel and C channels is performed (the third convolution operation), resulting in the fifth feature map f. conv , will f conv with f in The addition process involves fusing the fourth and fifth feature maps to obtain the output sixth feature map f. out f out The parameters are (B,C,H,W), f out The formula is expressed as follows:

[0085] f out =f in +Conv(ReLU(Conv(ReLU(Conv(f in ,1x1),3x3),1x1).

[0086] According to the technical solution provided in the embodiments of this disclosure, performing multiple convolution calculations on the fourth feature map using each basic extraction unit is beneficial for extracting feature maps with stronger expressive power, providing a prerequisite for obtaining a high-performance defect detection model.

[0087] Furthermore, the number of second feature extraction networks included in the first network is the same as the number of first feature extraction networks included in the second network, and the number of basic extraction units included in the first feature extraction network at the same extraction stage is greater than the number of basic extraction units included in the second feature extraction network. For example, as Figure 2 As shown, the first network includes second feature extraction networks A, B, C, and D. The second network includes first feature extraction networks E, F, G, and H. Second feature extraction networks A and E are at the same extraction stage; second feature extraction networks B and F are at the same extraction stage; second feature extraction networks C and G are at the same extraction stage; and second feature extraction networks D and H are at the same extraction stage. Second feature extraction network A includes 2 basic extraction units, first feature extraction network E includes 3 basic extraction units; second feature extraction network B includes 2 basic extraction units, first feature extraction network F includes 4 basic extraction units; second feature extraction network C includes 2 basic extraction units, first feature extraction network G includes 9 basic extraction units; second feature extraction network D includes 2 basic extraction units, and first feature extraction network H includes 4 basic extraction units.

[0088] Furthermore, the first feature extraction network and the second feature extraction network also include a downsampling unit, which is located before the basic extraction unit. The downsampling unit is used to downsample the input data and then input the downsampled data as the fourth feature map into the basic extraction unit.

[0089] Furthermore, the first network also includes a downsampling network, which is located before the second feature extraction network. The downsampling network is used to downsample the input first image, and then the downsampled data is input into multiple second feature extraction networks that are connected in sequence.

[0090] Furthermore, the second network also includes a downsampling network, which is located before the first feature extraction network. The downsampling network is used to downsample the input second image, and then the downsampled data is input into multiple first feature extraction networks that are connected in sequence.

[0091] For example, when performing glass breakage detection, a defect detection model is trained, which includes a first network and a second network.

[0092] The first network includes four feature extraction stages, and each feature extraction stage corresponds to a second feature extraction network. That is, the first network includes second feature extraction network A, second feature extraction network B, second feature extraction network C, and second feature extraction network D. Each second feature extraction network includes one downsampling unit and two basic extraction units.

[0093] A first image with a resolution of (1, 3, 224, 224) is input into a first network. The first network processes the first image through the following steps:

[0094] Step 1-1: Use a downsampling network for downsampling processing: that is, perform calculations through a 3x3 convolutional layer with 64 channels, a stride of 2, and padding of 1, and a BN layer to output data with a resolution of (1, 64, 112, 112).

[0095] [Entering the first feature extraction stage of the first network]

[0096] Step 1-2: Use the downsampling unit in the second feature extraction network A to perform downsampling processing: that is, calculate the output data with a resolution of (1, 64, 56, 56) by passing through a 3x3 convolutional layer with 64 channels, a stride of 2, and padding of 1.

[0097] Steps 1-3 involve feature extraction using two basic extraction units in the second feature extraction network A: that is, passing through two blocks; and outputting the first feature map. The resolution is (1, 64, 56, 56).

[0098] [Entering the second feature extraction stage of the first network]

[0099] Steps 1-4: Downsampling is performed using the downsampling unit in the second feature extraction network B: that is, after calculation by a 3x3 convolutional layer with 128 channels, a stride of 2, and padding of 1, the output data has a resolution of (1, 128, 28, 28).

[0100] Steps 1-5 involve feature extraction using two basic extraction units in the second feature extraction network B: that is, passing through two blocks; and outputting the first feature map. The resolution is (1, 128, 28, 28).

[0101] [Entering the third feature extraction stage of the first network]

[0102] Steps 1-6: Downsampling is performed using the downsampling unit in the second feature extraction network C: that is, after calculation by a 3x3 convolutional layer with 256 channels, a stride of 2, and padding of 1, the output data has a resolution of (1, 256, 14, 14).

[0103] Steps 1-7 involve feature extraction using two basic extraction units in the second feature extraction network C: that is, passing through two blocks; and outputting the first feature map. The resolution is (1,256,14,14).

[0104] [Entering the fourth feature extraction stage of the first network]

[0105] Steps 1-8: Downsampling is performed using the downsampling unit in the second feature extraction network D: that is, after calculation by a 3x3 convolutional layer with 512 channels, a stride of 2, and padding of 1, the output data has a resolution of (1, 512, 7, 7).

[0106] Steps 1-9 involve feature extraction using two basic extraction units in the second feature extraction network D: that is, passing through two blocks; and outputting the first feature map. The resolution is (1, 512, 7, 7). The first feature map output from the fourth stage of the first network A is... Let M1 be the point. Applying a sigmoid function to M1 yields the first probability graph T1. Based on T1, we can obtain the probability of whether each point is glass.

[0107] The second network comprises four feature extraction stages, with each stage corresponding to a first feature extraction network. Specifically, the second network includes first feature extraction network E, first feature extraction network F, first feature extraction network G, and first feature extraction network H. First feature extraction network E includes one downsampling unit and three basic extraction units; first feature extraction network F includes one downsampling unit and four basic extraction units; first feature extraction network G includes one downsampling unit and nine basic extraction units; and first feature extraction network H includes one downsampling unit and four basic extraction units.

[0108] A second image with a resolution of (1, 3, 448, 448) is input into the second network. The second network processes the second image through the following steps:

[0109] Step 2-1: Use a downsampling network for downsampling processing: that is, perform calculations through a 3x3 convolutional layer with 64 channels, a stride of 2, and padding of 1, and a BN layer to output data with a resolution of (1, 64, 224, 224).

[0110] [Entering the first feature extraction stage of the second network]

[0111] Step 2-2: Use the downsampling unit in the first feature extraction network E to perform downsampling processing: that is, calculate the output data with a resolution of (1, 64, 112, 112) after passing through a 3x3 convolutional layer with 64 channels, a stride of 2 and padding of 1.

[0112] Steps 2-3 involve feature extraction using the three basic extraction units in the first feature extraction network E: that is, through three blocks; outputting a feature map. The resolution is (1, 64, 112, 112).

[0113] [Entering the second feature extraction stage of the second network]

[0114] Steps 2-4: Downsampling is performed using the downsampling unit in the first feature extraction network F: that is, after a 3x3 convolutional layer with 128 channels, a stride of 2, and padding of 1, and a BN layer, the output data has a resolution of (1, 128, 56, 56).

[0115] Steps 2-5 involve feature extraction using the four basic extraction units in the first feature extraction network F: that is, through four blocks; outputting a feature map. The resolution is (1, 128, 56, 56).

[0116] Steps 2-6, will Perform bilinear interpolation to achieve a 2x upsampling, then... Adding them together gives f2.

[0117] [Entering the third feature extraction stage of the second network]

[0118] Steps 2-7: Downsampling is performed using the downsampling unit in the first feature extraction network G: that is, after a 3x3 convolutional layer with 256 channels, a stride of 2, and padding of 1, and a BN layer, the output data has a resolution of (1, 256, 28, 28).

[0119] Steps 2-8 involve feature extraction using the nine basic extraction units in the first feature extraction network G: i.e., passing through nine blocks; and outputting a feature map. The resolution is (1, 256, 28, 28).

[0120] Steps 2-9, will Perform bilinear interpolation to achieve a 2x upsampling, then... Adding them together gives f3.

[0121] [Entering the fourth feature extraction stage of the second network]

[0122] Step 2-10: Use the downsampling unit in the first feature extraction network H to perform downsampling processing: that is, calculate through a 3x3 convolutional layer with 512 channels, a stride of 2, and padding of 1, and a BN layer to output data with a resolution of (1, 512, 14, 14).

[0123] Step 2-11: Feature extraction is performed using the four basic extraction units in the first feature extraction network H: that is, after passing through four blocks; output feature map. Data with a resolution of (1, 512, 14, 14).

[0124] Step 2-12, Perform bilinear interpolation to achieve a 2x upsampling, then... The sums are f4. The feature map f4 output by the fourth stage of the second network is denoted as M2, which is the second feature map.

[0125] This allows the feature maps of the second, third, and fourth stages in the first network to be upsampled by 2 times through bilinear interpolation and then fused into the second, third, and fourth feature maps of the second network, enabling the second network to obtain more information and a larger receptive field from the first network.

[0126] Furthermore, T1 is a probability map. Pixels with a probability greater than 0.5 are considered to belong to the glass region, while those with a probability less than 0.5 are considered to belong to the non-glass region. Pixels with a probability less than 0.5 are set to 0 to obtain the mask. After linearly interpolating and upsampling the mask by a factor of 2, M2 is multiplied by the mask, and then self-attention learning is performed to obtain M3, in order to extract more accurate and longer-range dependent semantic details of the glass.

[0127] M1 is subjected to global max pooling to obtain M'1, with dimensions (1, 512, 1, 1). M3 is subjected to global average pooling to obtain M'3. M'1 and M'3 are concatenated and then connected to a fully connected (FC) layer with dimensions (1024, 512), followed by another FC layer with dimensions (512, 128), and finally a classification layer with dimensions (128, 2). This outputs the probabilities of the glass being intact and broken, respectively. The cross-entropy loss function is used for classification loss, and a defect detection model that meets the preset conditions is trained.

[0128] This dual-path fast localization and fine classification architecture uses a fast localization network (first network) with low-resolution input to locate the glass region in the image, and a fine classification neural network (second network) with high-resolution input to fuse feature maps from multiple stages in the first network to obtain more information and a richer receptive field. The network achieves accurate identification of whether the glass is broken through the fast and slow dual-path architecture.

[0129] All of the above-mentioned optional technical solutions can be combined in any way to form the optional embodiments of this application, and will not be described in detail here.

[0130] Figure 3 This is a schematic flowchart of a defect detection method provided in an embodiment of this disclosure. Figure 3 The defect detection method can be executed by the server, and the method includes:

[0131] S301, acquire a first detection image and a second detection image with the same image content as the image of the glass to be detected; the resolution of the first detection image is lower than the resolution of the second image;

[0132] S302, input the first detection image into the first network of the defect detection model, input the second detection image into the second network of the defect detection model, and obtain the defect detection result;

[0133] The defect detection model was trained using the method described above.

[0134] According to the technical solution provided in this disclosure, an image of the glass to be detected is acquired, and a first detection image and a second detection image with the same image content as the image of the glass to be detected are determined. The resolution of the first detection image is lower than that of the second detection image. The defect detection model trained using the above-described training method is a dual-path model, which includes a first network and a second network. The first network is used for rapid localization of the glass region in the image, and the second network is used for refined classification. This model has good performance. Therefore, the first detection image is input into the first network of the defect detection model, and the second detection image is input into the second network of the defect detection model to obtain accurate defect detection results. The process of determining the defect detection results has a high degree of automation and high detection efficiency.

[0135] The following are embodiments of the apparatus disclosed herein, which can be used to execute embodiments of the method disclosed herein. For details not disclosed in the apparatus embodiments of this disclosure, please refer to the embodiments of the method disclosed herein.

[0136] Figure 4 This is a schematic diagram of a training device for a defect detection model provided in an embodiment of this disclosure. Figure 4 As shown, the training device for this defect detection model includes:

[0137] The image acquisition module 41 is configured to acquire a first image and a second image with the same image content as each glass image; the resolution of the first image is lower than the resolution of the second image.

[0138] The first processing module 42 is configured to input the first image into the first network to obtain a multi-scale first feature map and a first probability map; the first network includes a first number of basic extraction units.

[0139] The second processing module 43 is configured to input the second image into the second network and input first feature maps of different target scales into different first feature extraction networks of the second network to obtain the second feature map output by the last first feature extraction network; the multiple first feature extraction networks of the second network are connected sequentially, each first feature extraction network includes at least one basic extraction unit, and the second network includes a second number of basic extraction units, the second number being greater than the first number;

[0140] The parameter adjustment module 44 is configured to determine the defect detection result based on the last acquired first feature map, first probability map and second feature map; and adjust the parameters of the first network and the second network based on the loss value determined by the defect detection result and defect annotation data to obtain a defect detection model including the first network and the second network that meets the preset conditions.

[0141] According to the technical solution provided in this disclosure, a first image and a second image corresponding to the glass image are determined by processing the glass image. The first image and the second image have the same image content as the corresponding glass image, and the resolution of the first image is lower than that of the second image. The low-resolution first image is input into a first network to obtain a multi-scale first feature map and a first probability map. Each probability value in the first probability map is used to characterize the probability that the corresponding pixel is a glass region. The first network includes a first number of basic extraction units, which are used for feature extraction. The high-resolution second image is input into a second network, and the first feature maps of different target scales are input into different first feature extraction networks of the second network to obtain a second feature map output by the last first feature extraction network. The multiple first feature extraction networks of the second network are sequentially connected, and each first feature extraction network includes at least one basic extraction unit. The number of basic extraction units included in the second network is a second number, which is greater than the first number. Further, based on the last obtained first feature map, first probability map, and second image, a defect detection result is determined, and the parameters of the first network and the second network are adjusted based on the loss value determined by the defect detection result and defect annotation data to obtain a defect detection model including the first network and the second network that meets preset conditions. The obtained defect detection model consists of two networks: a first network and a second network. The first network has fewer basic extraction units than the second network, indicating that the first network is a fast branch network and the second network is a slow branch network. The first network is mainly used to locate glass regions in the image, while the second network is used for fine classification. The first feature map obtained from the first network is introduced into the second network, which allows the second network to obtain a richer receptive field and more feature information, which is beneficial for obtaining a high-performance defect detection model. Using this defect detection model, defects in glass images can be identified quickly and accurately, with high defect detection efficiency and accuracy.

[0142] In some embodiments, the first processing module includes:

[0143] The feature extraction unit is configured to input a first image into a first network, perform feature extraction using multiple second feature extraction networks of the first network, and obtain a multi-scale first feature map; the first feature map output by any second feature extraction network is used as input data for the next second feature extraction network; each second feature extraction network includes at least one basic extraction unit;

[0144] The probability map acquisition unit is configured to acquire a first probability map based on the last acquired first feature map.

[0145] In some embodiments, the second processing module is further configured to input the second image into the second network and input first feature maps of different target scales into different first feature extraction networks of the second network;

[0146] For each target feature extraction network, which is any first feature extraction network that has been input with a first feature map, the input first feature map is upsampled, and the upsampled first feature map is fused with the extracted data of the target feature extraction network to obtain fused data. If there is a next first feature extraction network, the fused data is input into the next first feature extraction network to obtain the second feature map output by the last first feature extraction network.

[0147] In some embodiments, the parameter adjustment module includes:

[0148] The probability adjustment unit is configured to set the probability values in the first probability graph that are less than the probability threshold to a preset value to obtain a second probability graph.

[0149] The feature fusion unit is configured to upsample the second probability map, multiply it with the second feature map, and then perform self-attention learning calculation to obtain the third feature map;

[0150] The result determination unit is configured to determine the defect detection result based on the last acquired first and third feature maps.

[0151] In some embodiments, the result determination unit includes:

[0152] The first processing subunit is configured to perform global max pooling on the last acquired first feature map to obtain the pooled first feature map.

[0153] The second processing subunit is configured to perform global average pooling on the third feature map to obtain the pooled third feature map.

[0154] The third processing subunit is configured to stack the pooled first feature map and the pooled third feature map, and then connect them to the fully connected layer and the classification layer to obtain the defect detection result.

[0155] In some embodiments, each basic extraction unit processes the fourth feature map of the input basic extraction unit based on the following steps:

[0156] The fourth feature map is subjected to the first convolution calculation, the first activation operation, the second convolution calculation, the second activation operation, and the third convolution calculation in sequence to obtain the fifth feature map;

[0157] The fourth and fifth feature maps are fused to obtain the sixth feature map output by the basic extraction unit. The size and channel number information of the fourth feature map are the same as those of the sixth feature map.

[0158] Figure 5 This is a schematic diagram of the defect detection device provided in an embodiment of this disclosure. Figure 5 As shown, the defect detection device includes:

[0159] The data acquisition module 51 is configured to acquire a first detection image and a second detection image that have the same image content as the image of the glass to be detected; the resolution of the first detection image is lower than the resolution of the second image;

[0160] The data input module 52 is configured to input the first detection image into the first network of the defect detection model and the second detection image into the second network of the defect detection model to obtain the defect detection result;

[0161] The defect detection model was trained using the method described above.

[0162] According to the technical solution provided in this disclosure, an image of the glass to be detected is acquired, and a first detection image and a second detection image with the same image content as the image of the glass to be detected are determined. The resolution of the first detection image is lower than that of the second detection image. The defect detection model trained using the above-described training method is a dual-path model, which includes a first network and a second network. The first network is used for rapid localization of the glass region in the image, and the second network is used for refined classification. This model has good performance. Therefore, the first detection image is input into the first network of the defect detection model, and the second detection image is input into the second network of the defect detection model to obtain accurate defect detection results. The process of determining the defect detection results has a high degree of automation and high detection efficiency.

[0163] It should be understood that the sequence number of each step in the above embodiments does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this disclosure.

[0164] Figure 6 This is a schematic diagram of the electronic device 6 provided in an embodiment of this disclosure. Figure 6 As shown, the electronic device 6 of this embodiment includes a processor 601, a memory 602, and a computer program 603 stored in the memory 602 and executable on the processor 601. When the processor 601 executes the computer program 603, it implements the steps in the various method embodiments described above. Alternatively, when the processor 601 executes the computer program 603, it implements the functions of each module / unit in the various device embodiments described above.

[0165] Electronic device 6 can be a desktop computer, laptop, handheld computer, cloud server, or other electronic device. Electronic device 6 may include, but is not limited to, processor 601 and memory 602. Those skilled in the art will understand that... Figure 6 This is merely an example of electronic device 6 and does not constitute a limitation on electronic device 6. It may include more or fewer components than shown, or different components.

[0166] The processor 601 may be a central processing unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.

[0167] The memory 602 can be an internal storage unit of the electronic device 6, such as a hard disk or RAM of the electronic device 6. The memory 602 can also be an external storage device of the electronic device 6, such as a plug-in hard disk, Smart Media Card (SMC), Secure Digital (SD) card, Flash Card, etc., equipped on the electronic device 6. The memory 602 can also include both internal and external storage units of the electronic device 6. The memory 602 is used to store computer programs and other programs and data required by the electronic device.

[0168] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the above-described division of functional units and modules is merely an example. In practical applications, the above functions can be assigned to different functional units and modules as needed, that is, the internal structure of the device can be divided into different functional units or modules to complete all or part of the functions described above. The functional units and modules in the embodiments can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.

[0169] If an integrated module / unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, all or part of the processes in the methods of the above embodiments can also be implemented by a computer program instructing related hardware. The computer program can be stored in a computer-readable storage medium, and when executed by a processor, it can implement the steps of the various method embodiments described above. The computer program may include computer program code, which can be in the form of source code, object code, executable files, or certain intermediate forms. A computer-readable medium may include: any entity or device capable of carrying computer program code, recording media, USB flash drives, portable hard drives, magnetic disks, optical disks, computer memory, read-only memory (ROM), random access memory (RAM), electrical carrier signals, telecommunication signals, and software distribution media, etc. It should be noted that the content included in a computer-readable medium may be appropriately added to or subtracted according to the requirements of legislation and patent practice in a jurisdiction. For example, in some jurisdictions, according to legislation and patent practice, computer-readable media may not include electrical carrier signals and telecommunication signals.

[0170] The above embodiments are only used to illustrate the technical solutions of this disclosure, and are not intended to limit it. Although this disclosure has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this disclosure, and should all be included within the protection scope of this disclosure.

Claims

1. A training method for a defect detection model, characterized in that, include: For each glass image, a first image and a second image with the same image content as the glass image are obtained; the resolution of the first image is lower than the resolution of the second image. The first image is input into the first network to obtain a multi-scale first feature map and a first probability map; the first network includes a first number of basic extraction units; the probability value in the first probability map is used to indicate the probability that the corresponding pixel is a glass region, and the first probability map is used to locate the glass region of the image. The second image is input into the second network, and the first feature maps at different target scales are input into different first feature extraction networks of the second network to obtain the second feature map output by the last first feature extraction network; multiple first feature extraction networks of the second network are connected sequentially, each first feature extraction network includes at least one basic extraction unit, and the second network includes a second number of basic extraction units, the second number being greater than the first number, such that the first network constitutes a fast branch network for locating glass regions in the image, and the second network constitutes a slow branch network for fine classification; The probability values less than the probability threshold in the first probability map are set to preset values to obtain a second probability map; the second probability map is upsampled and multiplied with the second feature map, and then self-attention learning is performed to obtain a third feature map; based on the last obtained first feature map and the third feature map, the defect detection result is determined; and based on the loss value determined by the defect detection result and defect annotation data, the parameters of the first network and the second network are adjusted to obtain a defect detection model including the first network and the second network that meets the preset conditions.

2. The method of claim 1, wherein, The step of inputting the first image into the first network to obtain a multi-scale first feature map and a first probability map includes: The first image is input into a first network, and features are extracted using multiple second feature extraction networks of the first network to obtain a multi-scale first feature map; the first feature map output by any second feature extraction network is used as the input data for the next second feature extraction network; each second feature extraction network includes at least one basic extraction unit; Based on the first feature map obtained last, a first probability map is obtained.

3. The method of claim 1, wherein, The step of inputting the second image into the second network and inputting the first feature maps at different target scales into different first feature extraction networks of the second network to obtain the second feature map output by the last first feature extraction network includes: The second image is input into the second network, and the first feature maps at different target scales are input into different first feature extraction networks of the second network; For each target feature extraction network, which is any first feature extraction network that has been input with a first feature map, the input first feature map is upsampled, and the upsampled first feature map is fused with the extracted data of the target feature extraction network to obtain fused data. When there is a next first feature extraction network, the fused data is input into the next first feature extraction network to obtain the second feature map output by the last first feature extraction network.

4. The method of claim 1, wherein, The step of determining the defect detection result based on the last acquired first feature map and the third feature map includes: Global max pooling is performed on the first feature map that was finally obtained to obtain the pooled first feature map; The third feature map is then subjected to global average pooling to obtain the pooled third feature map. The first and third pooled feature maps are stacked and then connected to a fully connected layer and a classification layer to obtain the defect detection results.

5. The method of claim 1, wherein, Each basic extraction unit processes the fourth feature map input to the basic extraction unit based on the following steps: The fourth feature map is subjected to a first convolution calculation, a first activation operation, a second convolution calculation, a second activation operation, and a third convolution calculation in sequence to obtain the fifth feature map; The fourth and fifth feature maps are fused to obtain the sixth feature map output by the basic extraction unit. The size information and channel number information of the fourth feature map are the same as those of the sixth feature map.

6. A defect detection method characterized by, include: Acquire a first detection image and a second detection image that have the same image content as the image of the glass to be detected; The resolution of the first detected image is smaller than the resolution of the second image; The first detected image is input into the first network of the defect detection model, and the second detected image is input into the second network of the defect detection model to obtain the defect detection result. The defect detection model is trained using the method described in any one of claims 1-5. 7.A device for training a defect detection model, comprising: include: The image acquisition module is configured to acquire a first image and a second image with the same image content as the glass image for each glass image; The resolution of the first image is smaller than the resolution of the second image; The first processing module is configured to input the first image into a first network to obtain a multi-scale first feature map and a first probability map; the first network includes a first number of basic extraction units; the probability value in the first probability map is used to indicate the probability that the corresponding pixel is a glass region, and the first probability map is used for locating the glass region of the image. The second processing module is configured to input the second image into the second network and input the first feature maps at different target scales into different first feature extraction networks of the second network to obtain the second feature map output by the last first feature extraction network; the multiple first feature extraction networks of the second network are sequentially connected, each first feature extraction network includes at least one basic extraction unit, and the second network includes a second number of basic extraction units, the second number being greater than the first number, such that the first network constitutes a fast branch network for locating glass regions in the image, and the second network constitutes a slow branch network for fine classification; The parameter adjustment module is configured to set the probability values less than the probability threshold in the first probability map to preset values to obtain a second probability map; after upsampling the second probability map, multiply it with the second feature map and perform self-attention learning calculation to obtain a third feature map; based on the last obtained first feature map and the third feature map, determine the defect detection result; and adjust the parameters of the first network and the second network based on the loss value determined by the defect detection result and defect annotation data to obtain a defect detection model including the first network and the second network that meets preset conditions.

8. A defect detection apparatus characterized by comprising: include: The data acquisition module is configured to acquire a first detection image and a second detection image that have the same image content as the image of the glass to be detected; The resolution of the first detected image is smaller than the resolution of the second image; The data input module is configured to input the first detected image into the first network of the defect detection model and the second detected image into the second network of the defect detection model to obtain the defect detection result; The defect detection model is trained using the method described in any one of claims 1-5.

9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the steps of the method as described in any one of claims 1 to 6.

10. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the method as described in any one of claims 1 to 6.