A surface damage and maintenance error detection method and system based on knowledge distillation and active learning

By using a YOLO model optimized through knowledge distillation and active learning, and a MobileNet v3 network structure, the accuracy and efficiency issues of surface damage and maintenance error detection in complex backgrounds are solved, enabling efficient and real-time detection on mobile devices.

CN122244634APending Publication Date: 2026-06-19BEIHANG UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
BEIHANG UNIV
Filing Date
2026-03-20
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies struggle to detect surface damage and repair errors in complex backgrounds during manual visual inspection, resulting in low detection efficiency and insufficient accuracy. In particular, they are unable to identify minute defects when lighting is insufficient.

Method used

We employ a knowledge distillation and active learning approach, combining a multi-scale feature knowledge distillation strategy with a dual-standard active learning strategy that addresses uncertainty and complexity, to optimize the object detection model. We then utilize an improved YOLO model and the MobileNet v3 network structure for image detection.

Benefits of technology

It improves the accuracy of surface damage and repair error detection in complex backgrounds, enables real-time and efficient detection on mobile devices, and reduces computing resource requirements and sample labeling costs.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122244634A_ABST
    Figure CN122244634A_ABST
Patent Text Reader

Abstract

This application discloses a surface damage and repair error detection method and system based on knowledge distillation and active learning, relating to the field of surface damage and repair error detection technology. The method includes: acquiring an image to be detected; the image to be detected includes a surface damage image and / or a repaired image of the object to be detected; applying a real-time-robust integrated optimized target detection model to detect surface damage and / or repair errors in the image to be detected, and obtaining the detection result; in this application, a multi-scale feature knowledge distillation strategy and a dual-standard active learning strategy based on uncertainty and complexity are used to train the target detection model, which can improve the accuracy of the detection result when facing demanding and complex background scenarios.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of surface damage and maintenance error detection technology, and in particular to a surface damage and maintenance error detection method and system based on knowledge distillation and active learning. Background Technology

[0002] Visual inspection of surface damage and identification of repair errors / improper repairs are essential parts of the daily maintenance of large and complex equipment. These defects are often very small and difficult to detect, and could lead to safety issues if they occur. During manual visual inspections, eye fatigue from prolonged work and insufficient maintenance experience can affect the quality and efficiency of surface damage inspection, leading to various maintenance oversights and repair errors, especially in conditions of insufficient lighting and when the damage is very small. Furthermore, manual visual inspection often requires identifying defects one by one, resulting in low inspection efficiency. Therefore, the current manufacturing industry has a significant demand and necessity for surface inspection technology based on mobile devices.

[0003] Automated surface detection technology based on computer vision has become a research hotspot in recent years due to its advantages such as high precision and non-contact detection. In recent years, due to the increased computing power of edge devices, deep learning-based object detection methods have become a major research frontier in this field due to their fast detection speed, strong robustness, and ease of deployment. Deep learning-based object detection methods currently have two main branches: two-stage object detection methods based on region extraction and one-stage object detection methods based on location regression. While two-stage object detection algorithms have satisfactory detection accuracy, their detection time is generally slow and their improvement potential is limited, making them unsuitable for real-time interactive systems such as mobile detection systems and augmented reality systems. To address the generally low detection speed of two-stage object detection algorithms, one-stage object detection algorithms are a simplified approach that transforms object detection into a regression problem. One-stage object detection algorithms no longer extract candidate regions but instead perform feature extraction, object classification, and location regression throughout the entire convolutional network. Moreover, these algorithms maintain good detection accuracy while improving detection speed; currently popular examples include YOLO (You Only Look Once) and SSD (Single Shot Detector).

[0004] In the field of surface damage and defect detection, numerous variants of the YOLO series algorithms have emerged to address various application scenarios and detection objects. However, when faced with demanding and complex scenarios, the accuracy of existing detection algorithms remains insufficient. Summary of the Invention

[0005] The purpose of this application is to provide a surface damage and maintenance error detection method and system based on knowledge distillation and active learning, which can improve the accuracy of surface damage and maintenance error detection in demanding and complex scenarios.

[0006] To achieve the above objectives, this application provides the following solution: Firstly, this application provides a surface damage and repair error detection method based on knowledge distillation and active learning, including: Acquire the image to be inspected; the image to be inspected includes images of surface damage to the object being inspected and / or images after repair. A pre-trained machine learning-based target detection model is applied to detect surface damage and / or repair errors in the image to be detected, and the detection results are obtained. During the training process of the target detection model, a multi-scale feature knowledge distillation strategy and a dual-standard active learning strategy based on uncertainty and complexity are adopted.

[0007] Secondly, this application provides a surface damage and repair error detection system based on knowledge distillation and active learning, comprising: a memory, a processor, and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the computer program to implement the above-described surface damage and repair error detection method based on knowledge distillation and active learning.

[0008] According to the specific embodiments provided in this application, this application has the following technical effects: This application provides a method and system for detecting surface damage and repair errors based on knowledge distillation and active learning. The method includes: acquiring an image to be detected; the image to be detected includes a surface damage image and / or a repaired image of the object to be detected; applying a real-time-robust integrated optimized target detection model to detect surface damage and / or repair errors in the image to be detected, and obtaining the detection result; in this application, a multi-scale feature knowledge distillation strategy and a dual-standard active learning strategy based on uncertainty and complexity are combined to train the target detection model, and high-quality unambiguous samples are identified by jointly using the dual standards of uncertainty and complexity, so that the target detection model can improve the accuracy of the detection result when facing demanding and complex background scenarios. Attached Figure Description

[0009] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0010] Figure 1 This is an application environment diagram of a surface damage and repair error detection method based on knowledge distillation and active learning in one embodiment of this application; Figure 2 A schematic flowchart of a surface damage and repair error detection method based on knowledge distillation and active learning provided in an embodiment of this application; Figure 3 A schematic diagram of the structure of an improved YOLO v11 model provided in an embodiment of this application; Figure 4 This is a schematic diagram of the MobileNet v3 network structure provided in an embodiment of this application; Figure 5 This is a schematic diagram of the structure of a depth-separable convolution module provided in an embodiment of this application; Figure 6 A schematic diagram of a lightweight technique for object detection models based on an improved MobileNet v3, provided in an embodiment of this application; Figure 7 A schematic diagram of the image processing effect of histogram equalization and the corresponding grayscale histogram provided in an embodiment of this application; Figure 8 This is a schematic diagram of a target detection model training technique based on multi-scale knowledge distillation provided in an embodiment of this application. Figure 9 This is a schematic diagram of a feature pyramid structure provided in an embodiment of this application; Figure 10 A schematic diagram of the optimization technique for a target detection model based on an active learning strategy for uncertainty and complexity, provided in an embodiment of this application; Figure 11 A schematic diagram of the functional module structure of a surface damage and repair error detection system based on knowledge distillation and active learning provided in an embodiment of this application; Figure 12 This is a schematic diagram of the structure of a computer device provided in an embodiment of this application. Detailed Implementation

[0011] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0012] To make the above-mentioned objectives, features and advantages of this application more apparent and understandable, the application will be further described in detail below with reference to the accompanying drawings and specific embodiments.

[0013] The surface damage and repair error detection method based on knowledge distillation and active learning provided in this application can be applied to, for example... Figure 1 In the application environment shown, terminal 102 communicates with server 104 via a network. A data storage system can store the data that server 104 needs to process. The data storage system can be set up independently, integrated into server 104, or placed in the cloud or on other servers. Terminal 102 can send the image to be detected (including images of surface damage and / or repaired surfaces of the object) to server 104. After receiving the image, server 104 applies a pre-trained machine learning-based object detection model to detect surface damage and / or repair errors in the image, obtaining the detection result. During the training of the object detection model, a multi-scale feature knowledge distillation strategy and a dual-standard active learning strategy based on uncertainty and complexity are employed. Server 104 can then feed back the obtained image to terminal 102. In addition, in some embodiments, the surface damage and repair error detection method based on knowledge distillation and active learning can also be implemented by the server 104 or the terminal 102 separately. For example, the terminal 102 can directly perform surface damage and repair error detection based on knowledge distillation and active learning on the image to be detected, or the server 104 can obtain the image to be detected from the data storage system and perform surface damage and repair error detection based on knowledge distillation and active learning.

[0014] Among them, terminal 102 can be, but is not limited to, various desktop computers, laptops, smartphones, tablets, IoT devices and portable wearable devices, and server 104 can be implemented by independent servers or server clusters composed of multiple servers, or it can be a cloud server.

[0015] In one exemplary embodiment, such as Figure 2 As shown, a surface damage and repair error detection method based on knowledge distillation and active learning is provided. This method is executed by a computer device, specifically by a terminal or server alone, or by both a terminal and a server. In this embodiment, the method is applied to... Figure 1 Taking server 104 as an example, the explanation includes the following steps 201 to 202.

[0016] Step 201: Obtain the image to be inspected; the image to be inspected includes the surface damage image of the object to be inspected and / or the image after repair.

[0017] Step 202: Apply the pre-trained machine learning-based target detection model to detect surface damage and / or repair errors in the image to be detected, and obtain the detection results; during the training process of the target detection model, a multi-scale feature knowledge distillation strategy and a dual-standard active learning strategy based on uncertainty and complexity are adopted.

[0018] In implementing steps 201 to 202 above, this application employs a combination of a multi-scale feature knowledge distillation strategy and a dual-standard active learning strategy based on uncertainty and complexity to train the target detection model. Furthermore, by jointly identifying high-quality unambiguous samples using the dual standards of uncertainty and complexity, the target detection model can improve the accuracy of detection results when facing demanding and complex scenarios.

[0019] In another exemplary embodiment of this application, the object detection model employs an improved YOLO model; in the improved YOLO model, the feature extraction network adopts the MobileNet v3 network structure. The feature extraction network (backbone network) of the YOLO model often concentrates the main parameters of the model. In order to adapt to the computing resource limitations of edge computing devices, the basic structure of the YOLO model must be further lightweighted. The more lightweight MobileNet v3 network structure can be used to replace the feature extraction network in the YOLO model architecture.

[0020] As an example, because the YOLO v11 model has advantages over the YOLO v12 model in terms of computational speed and training cost, it is more suitable for deployment on mobile devices and better suited for real-time assisted maintenance and inspection applications. This embodiment can choose to improve the YOLO v11 model from the YOLO series to construct the basic network structure of the object detection model. Alternatively, other types of YOLO models can be selected according to actual needs. As an example, the basic architecture of the improved YOLO v11 model (using the MobileNet v3 network structure for feature extraction) is as follows: Figure 3 As shown, it includes the following parts: Backbone: Responsible for extracting image features, using the MobileNet v3 network structure for feature extraction.

[0021] Neck (Feature Fusion Layer): This layer fuses features from different scales to help the model better identify targets at various scales. It uses the CBS module, C3K2 module, Upsample module, and Contact module.

[0022] Figure 3The paper clarifies the correspondence between each feature layer (Strides) and the specific layer structure, and specifies the specific layer for feature extraction: the improved YOLO v11 model architecture is not a blind replacement, but rather the connection points of MobileNet V3 are precisely specified according to the resolution scaling ratio (Stride) of the feature map.

[0023] Shallow features (C3): Block 6 in MobileNet V3 after the third downsampling is selected as the output point. Its feature map size is 1 / 8 of the original image (Stride 8), and it is connected to the feature fusion node 15 of the Neck network.

[0024] Mid-layer features (C4): Block 12 in MobileNet V3 after the 4th downsampling is selected as the output point. Its feature map size is 1 / 16 of the original image (Stride 16), and it is connected to the feature fusion node 12 of the neck network.

[0025] Deep features (C5): Block 15 at the end of the MobileNet V3 backbone after the 5th downsampling is selected as the output point. Its feature map size is 1 / 32 of the original image (Stride 32), and it is connected to the starting input node 11 of the Neck network.

[0026] MobileNet v3 is a lightweight CNN network designed for mobile or embedded devices. Compared to traditional convolutional neural networks, it significantly reduces model parameters and computational cost with only a slight decrease in accuracy. The MobileNet v3 network structure is as follows: Figure 4As shown in the diagram, the input image first undergoes a 3×3 convolution, then sequentially passes through multiple stacked MobileNetv3 blocks, and finally outputs the result via a 1×1 convolution. Within each MobileNetv3 block, the input image first passes through a 1×1 convolution to increase the number of channels; then, a depthwise separable convolution (DSC) module is used in high-dimensional space; next, the feature map data is optimized using the Squeeze-and-Excitation (SE) attention mechanism; finally, a 1×1 convolution decreases the number of channels (using a linear activation function). When the stride is 1 and the dimensions of the input and output feature maps are the same, a residual connection is used between the input and output; when the stride is 2 (downsampling stage), the dimensionality-reduced feature map is directly output. The MobileNet v3 network structure contains a large number of depthwise separable convolution modules. Depthwise separable convolution essentially decomposes the original convolution kernel, thereby reducing the number of parameters. Since splitting the convolution kernel essentially increases the number of layers in the network, i.e., increases the network depth, it is beneficial for the network to extract deeper features. The structure of a standard depthwise separable convolution module is shown below. Figure 5 As shown.

[0027] In another exemplary embodiment of this application, the choice of activation function in the neural network is crucial. Typically, activation functions are added after convolutional layers to increase the network's nonlinearity, thereby improving its resistance to overfitting. The depthwise separable convolutional module has two ReLU activation functions, with the following function expressions: (1) However, the ReLU activation function suffers from neuron death; when the input is negative, the gradient becomes zero, potentially causing some neurons to never activate. This characteristic is also highly sensitive to anomalous inputs. This application optimizes the ReLU activation function in depthwise separable convolutional modules using Exponential Linear Units (ELUs). By optimizing the negative region of the ReLU activation function, the ELU addresses the issue of neurons not updating parameters in the negative region. When the input is negative, the neurons remain operational, making the network easier to train and more robust, especially in addressing gradient vanishing and data shift problems. The expression for the improved ReLU activation function based on ELU is as follows: (2) in, The value is usually 1.673.

[0028] Finally, an improved depthwise separable convolutional module is used to replace the original depthwise separable convolutional module in MobileNet v3. The improved MobileNet v3 network structure is called Robust MobileNet. RobustMobileNet is then used to replace the original feature extraction network in the YOLO model (such as the YOLO v11 model), thus completing a lightweight deep learning object detection model framework called RMN YOLO (Robust MobileNet YOLO). In this embodiment, the object detection model uses the improved YOLO model; in the improved YOLO model, the feature extraction network uses the improved MobileNet v3 network structure; in the improved MobileNet v3 network structure, the depthwise separable convolutional module of each MobileNet v3 block layer uses an improved ReLU activation function. In the improved ReLU activation function, the negative region of the traditional ReLU activation function is replaced by the ELU activation function.

[0029] In this embodiment, the ReLU activation function in the depthwise separable convolutional module is first optimized based on the ELU activation function to improve the robustness of the method. Then, the original depthwise separable convolutional module in the MobileNet v3 network structure is optimized based on the improved depthwise separable convolutional module. Finally, the feature extraction network of the YOLO model (such as the YOLO v11 model) is replaced with the improved MobileNet v3 network, resulting in the basic framework of the object detection model RMN YOLO. This achieves model lightweighting, thereby improving object detection speed and adaptability to edge devices. The technical route is as follows: Figure 6 As shown.

[0030] In another exemplary embodiment of this application, the training process of the object detection model is as follows: (a1) Construct a training sample set; the training sample set includes surface damage image samples and / or repaired image samples and corresponding true surface damage results and repair error results.

[0031] (a2) Based on the training sample set, the initial target detection model is trained using a multi-scale feature knowledge distillation strategy to obtain an intermediate target detection model.

[0032] (a3) Based on the training sample set, the intermediate target detection model is trained using a dual-standard active learning strategy based on uncertainty and complexity to obtain the final trained target detection model.

[0033] In another exemplary embodiment of this application, step (a1), constructing a training sample set, specifically includes: (a1-1) The original training samples are augmented to obtain the augmented training samples.

[0034] To address the problem of insufficient sample size caused by inadequate acquisition of original surface damage images and original maintenance error images, rotation, mirroring, and scaling techniques are used to augment the original surface damage images and original maintenance error images, thereby increasing the sample size, enriching the sample morphology, and avoiding overfitting.

[0035] (a1-2) For the augmented training set, image enhancement processing is performed using histogram equalization to obtain enhanced training samples. The original training samples and the augmented training samples constitute the augmented training set.

[0036] Small-scale surface damage and repair errors inherently lack sufficient feature points, and wild datasets are severely affected by lighting and shooting angles, making feature extraction difficult. Histogram equalization is used for image enhancement to improve image quality and increase the distinction between the detected target and the background.

[0037] The effects before and after image processing and the corresponding grayscale histograms are as follows: Figure 7 As shown in the image, the original image of the scratch has a very concentrated grayscale, and the contrast between the scratch and the background is not very obvious. After histogram equalization, the grayscale levels are evenly distributed across the entire grayscale range, and the scratch in the middle is very obvious. The above training sample construction process can solve the problem of small sample size and data imbalance.

[0038] (a1-3) Construct a training sample set based on the enhanced training samples.

[0039] In another exemplary embodiment of this application, based on the aforementioned improved YOLO v11 architecture, the target detection model is trained using a multi-scale feature knowledge distillation method to obtain a stable and efficient target detection model for surface damage and repair errors. The standard YOLO v11m is used as the teacher model, and the improved YOLO v11 model is used as the student model. Because the feature extraction network structures of the two models differ significantly, with differences in the number and position of intermediate layers and inconsistencies in the feature channels of intermediate layers, the training effect is affected. Therefore, this application uses a multi-scale feature knowledge distillation method for model training to complete the construction of a model for detecting surface damage and repair errors. The technical route is as follows: Figure 8 As shown. The essence of the multi-scale feature knowledge distillation method is to introduce a feature pyramid (FPN) on top of the feature extraction networks of the teacher and student models to generate multi-scale features. This addresses the issue of structural consistency between the teacher and student models. Simultaneously, by performing knowledge distillation on features at different scales, it solves the problem of determining the location and quantity of intermediate layer features, thereby improving training performance. The network structure of the feature pyramid is shown below. Figure 9 As shown, the feature pyramid feature extraction network adopts a ResNet network structure, with one bottom-up path and one top-down path, connected laterally. The feature pyramid first upsamples (bilinear interpolation) deep feature maps (high semantic level but low resolution) level by level to match their size with shallow feature maps. Then, lateral connections are added between each upsampled feature map and its corresponding shallow feature map, adjusting the number of channels using 1×1 convolutions, and the two are summed to combine deep semantic information with shallow spatial detail information. Finally, a prediction head (i.e., a predictor) is set on each fused feature map. Figure 9 The `predict` property allows for the detection of targets of different sizes at each level, improving the accuracy of small-scale target detection.

[0040] like Figure 8 As shown, in each iteration of training, the feature map outputs of the teacher model and the student model at three different scales of the feature pyramid (P3, P4, and P5) are retrieved simultaneously. The feature layers generated by these two models, which have completely corresponding resolutions, are horizontally aligned at the pixel level. Then, a mean squared error loss is introduced to accurately calculate the deviation between the student model and the teacher model in feature representation (i.e., feature distillation loss). Then, classification loss and localization loss are calculated based on the detection results of the two models' detector heads. To balance the learning objective, a guiding weight of 0.25 is assigned to the feature distillation loss and it is incorporated into the total loss function (the total loss function includes feature distillation loss + classification loss + localization loss). This ensures that the student model can mimic the feature response intensity of the teacher model at every receptive field scale when facing different targets, from large patches to fine scratches.

[0041] Based on the above, in step (a2), the initial target detection model is trained using a multi-scale feature knowledge distillation strategy based on the training sample set to obtain an intermediate target detection model, specifically including: (a2-1) The initial object detection model is used as the student model; the traditional YOLO model is used as the teacher model.

[0042] (a2-2) Input the training samples from the current training sample set into the current student model and the current teacher model respectively.

[0043] (a2-3) Input the output of the current student model’s feature extraction network into the first feature pyramid network to obtain multiple first feature maps at different scales.

[0044] (a2-4) Input the output of the feature extraction network of the current teacher model into the second feature pyramid network to obtain multiple second feature maps of different scales.

[0045] (a2-5) Calculate the deviations in feature representation between the first feature map at multiple different scales and the second feature map at multiple different scales, and obtain the feature distillation loss.

[0046] (a2-6) Calculate the classification loss and localization loss based on the detection results of the current student model and the current teacher model.

[0047] (a2-7) Calculate the total loss error based on the feature distillation loss, classification loss and localization loss; and set the current trained student model as the current student model and the current trained teacher model as the current teacher model.

[0048] (a2-8) Return to step "Input the training samples in the first dataset after the final update into the current student model and the current teacher model respectively", until the total loss error converges. The student model after the final training is recorded as the intermediate target detection model.

[0049] In another exemplary embodiment of this application, surface damage and maintenance errors of detection equipment are often difficult to detect, resulting in high manual annotation costs for wild datasets and frequent instances of missed or incorrect annotations. The training set composed of these samples is of low quality, which negatively impacts training quality and model performance. Therefore, to address the performance degradation caused by low-quality and ambiguous images in the wild image data of the training set, this application proposes an active learning strategy based on uncertainty and complexity. Through sample selection, disambiguation, and model retraining, the accuracy of the target detection model is further optimized, thereby improving target detection accuracy. The technical approach is as follows: Figure 10 As shown.

[0050] Active learning is a machine learning strategy that proactively selects the most valuable data and queries its labels, thereby reducing labeling costs and improving classifier accuracy. The key to active learning is the sample selection criterion. However, most active learning algorithms only deploy a single selection criterion, which can limit their performance. This application proposes a dual-criteria active learning method based on uncertainty and complexity. Uncertainty is used to describe the confusion of samples, while complexity measures the difference between local and global samples. By measuring the difficulty of sample identification using both uncertainty and complexity, this approach addresses the problem of confused / ambiguous samples often found in wild datasets, improving the detection accuracy of surface damage and repair errors.

[0051] Specifically, in step (a3), based on the training sample set, the intermediate target detection model is trained using a dual-standard active learning strategy based on uncertainty and complexity to obtain the final trained target detection model, including: (a3-1) Divide the training sample set into a first dataset and a second dataset; the training samples in the first dataset are high-quality, unambiguous training samples, with an initial data size of n. The second dataset is the complement of the first dataset, with an initial data size of the total data size of the entire dataset minus n.

[0052] (a3-2) Use the current first dataset to train the current initial intermediate object detection model and obtain the current trained intermediate object detection model.

[0053] (a3-3) Calculate the class probability vector of each training sample in the current second dataset using the intermediate target detection model after the current training.

[0054] The training samples of the second dataset are defined as follows: , For the i-th training sample in the second dataset, For training samples The probability vector of possible classes, assuming there are m categories defined for surface damage and maintenance errors, then... The expression is as follows: , (3) Each sample Class probability vector It is obtained by calculation using the current trained intermediate object detection model.

[0055] (a3-4) Calculate the sample uncertainty of each training sample in the current second dataset based on the class probability vector of each training sample in the current second dataset.

[0056] Training samples with higher uncertainty often contain more information and are more difficult to classify. To measure uncertainty, information entropy is introduced to measure the uncertainty of a sample. The formula for calculating uncertainty is as follows: (4) in, Indicates training samples The probability vector of category j.

[0057] Uncertainty is used to help active learning algorithms find some samples that are difficult to classify.

[0058] (a3-5) Select the h training samples with the greatest uncertainty in the current second dataset to form the third dataset.

[0059] (a3-6) Calculate the sample complexity for each training sample in the current third dataset.

[0060] Wild datasets often contain images with multiple surface defects, and complexity screening criteria can effectively handle such situations. For multi-label classification, based on label probabilities, the correct labels tend to cluster in the top two categories. This clustering of the top two labels is the cause of complexity and makes classification difficult. Therefore, this application introduces complexity and defines it as the distance from the sample with the highest probability to the sample with the second highest probability, calculated as follows: (5) in, and They represent the training samples respectively. The probability of belonging to the largest class and the probability of belonging to the second largest class, The higher the value, the lower the complexity. Complexity is used to select the most difficult samples to classify in this round of learning.

[0061] (a3-7) Select the training sample with the lowest sample complexity from the current third dataset and add it to the first dataset to obtain the updated first dataset; at the same time, delete the training sample with the lowest sample complexity from the current third dataset from the second dataset. The training sample with the lowest sample complexity in the current third dataset is labeled manually.

[0062] (a3-8) Determine whether the data capacity of the first dataset after the current update meets the preset capacity requirement.

[0063] (a3-9) If not, let the current updated first dataset be the current first dataset, let the current trained intermediate object detection model be the current initial intermediate object detection model, and return to the step "Train the current initial intermediate object detection model using the current first dataset to obtain the current trained intermediate object detection model".

[0064] (a3-10) If so, then use the first dataset after the final update to train the current intermediate target detection model after training, and obtain the final target detection model after training.

[0065] This application addresses the limitations of computing power and camera pixel count in mobile inspection systems, the real-time detection requirements for surface damage and maintenance errors during maintenance, and the higher demands for robustness and real-time performance of detection algorithms in complex maintenance scenarios. It proposes a surface damage and maintenance error detection method based on knowledge distillation and active learning, which possesses the following technical advantages: (1) The surface damage and repair error detection method of this application has high detection efficiency, can realize real-time detection on edge devices, and can solve the contradiction between the real-time processing requirements of high-resolution images and the limitation of computing resources.

[0066] The existing feature extraction network in the YOLO model is replaced with the more lightweight MobileNet v3 network architecture to achieve overall model lightweighting. MobileNet v3 is a lightweight CNN network specifically designed for mobile or embedded devices. Compared to traditional convolutional neural networks, it significantly reduces model parameters and computational load with only a slight decrease in accuracy. The lightweight object detection model has a significantly reduced number of parameters, lower computational burden, and faster detection speed, enabling real-time detection on mobile devices.

[0067] (2) The surface damage and repair error detection method of this application has high detection accuracy.

[0068] 1) Optimization of ReLU activation function in depthwise separable convolution The ReLU activation function in depthwise separable convolution is formally optimized using the Exponential Linear Unit (ELU). By optimizing the negative region of the ReLU activation function, the ELU solves the problem that neurons in the negative region do not update their parameters. When the input is in the negative region, the operation of the neurons can still be maintained, making the network easier to train and more robust. This effectively solves the problem of neuron death, and therefore the detection results are more stable.

[0069] 2) Training of the target detection model based on multi-scale feature knowledge distillation This paper employs knowledge distillation to train an object detection model, using the standard YOLO v11m as the teacher model and an improved YOLO model as the student model. Considering the differences in the number and location of intermediate layers in the feature extraction networks of the two models, as well as inconsistencies in the feature channels of these intermediate layers, which can affect training performance and reduce detection accuracy, this application uses a multi-scale feature knowledge distillation method for model training. A Feature Pyramid (FPN) is introduced on top of the feature extraction networks of both the teacher and student models to generate multi-scale features, addressing the issue of structural consistency between the teacher and student models. Furthermore, by performing knowledge distillation on features at different scales, the difficulty in determining the location and number of intermediate layer features is addressed, thereby effectively improving detection accuracy.

[0070] 3) Optimization of target detection models based on active learning strategies for uncertainty and complexity Training samples for surface damage and repair errors are typically wild datasets, often resulting in missed or mislabeled images, leading to low-quality training sets. To address the performance degradation caused by low-quality and ambiguous images in the wild image data of the training set, an active learning strategy based on uncertainty and complexity is proposed. This strategy uses a dual criterion of uncertainty and complexity to automatically filter, disambiguate, and retrain low-quality samples. The detection accuracy of the model is further improved after retraining with high-quality samples.

[0071] (3) The surface damage and repair error detection method of this application can save training costs and sample labeling costs.

[0072] 1) Active learning strategies based on uncertainty and complexity The proposed active learning strategy based on uncertainty and complexity can automatically select the most valuable samples, thereby achieving clear classification boundaries with low sample labeling costs. Unlike common active learning strategies that rely on a single sample selection criterion, this strategy utilizes both sample uncertainty and complexity as criteria to continuously search for the most valuable samples as training data. It then uses the continuously updated training set to update and optimize the detection model, ultimately constructing a high-accuracy detection model with low labeling and training costs.

[0073] This application also provides an application scenario in which the above-described surface damage and repair error detection method based on knowledge distillation and active learning is applied. Specifically, the surface damage and repair error detection method based on knowledge distillation and active learning provided in this embodiment can be applied in a content distribution scenario. This scenario includes an image acquisition stage and a detection stage; the image acquisition stage is used to acquire images of large and complex equipment to be inspected, including images of surface damage and / or images after repair; the detection stage is used to apply a pre-trained target detection model based on a machine learning model to detect surface damage and / or repair errors in the images to be inspected, and obtain the detection results. The surface damage and repair error detection method based on knowledge distillation and active learning provided in this embodiment belongs to the detection stage.

[0074] Based on the same inventive concept, this application also provides a surface damage and repair error detection system based on knowledge distillation and active learning for implementing the aforementioned system. The solution provided by this device is similar to the solution described in the above method. Therefore, the specific limitations of one or more embodiments of the surface damage and repair error detection system based on knowledge distillation and active learning provided below can be found in the limitations of the surface damage and repair error detection method based on knowledge distillation and active learning described above, and will not be repeated here.

[0075] In one exemplary embodiment, such as Figure 11 As shown, a surface damage and repair error detection system based on knowledge distillation and active learning is provided, comprising: The data acquisition module M1 is used to acquire the image to be inspected; the image to be inspected includes the surface damage image of the object to be inspected and / or the image after repair.

[0076] The detection module M2 is used to apply a pre-trained machine learning-based target detection model to detect surface damage and / or repair errors in the image to be detected, and to obtain the detection results. During the training of the target detection model, a multi-scale feature knowledge distillation strategy and a dual-standard active learning strategy based on uncertainty and complexity are adopted.

[0077] As an optional implementation, the object detection model employs a modified YOLO model; in the modified YOLO model, the feature extraction network adopts a MobileNet v3 network structure or a modified MobileNet v3 network structure.

[0078] In the improved MobileNet v3 network architecture, the depthwise separable convolutional modules of each MobileNet v3 block layer employ an improved ReLU activation function.

[0079] In the improved ReLU activation function, the negative region portion of the traditional ReLU activation function is replaced by the ELU activation function.

[0080] In one exemplary embodiment, a computer device, also known as a computer system, is provided. This computer device may be a server or a terminal, and its internal structure diagram may be as follows: Figure 12 As shown, the computer device includes a processor, memory, input / output (I / O) interfaces, and a communication interface. The processor, memory, and I / O interfaces are connected via a system bus, and the communication interface is also connected to the system bus via the I / O interfaces. The processor provides computational and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system, computer programs, and a database. The internal memory provides the environment for the operating system and computer programs in the non-volatile storage media to run. The database stores images to be detected, training samples, and pre-trained target detection models. The I / O interfaces are used for exchanging information between the processor and external devices. The communication interface is used for communicating with external terminals via a network. When executed by the processor, the computer program implements a surface damage and repair error detection method based on knowledge distillation and active learning.

[0081] Those skilled in the art will understand that Figure 12 The structures shown are merely block diagrams of some structures related to the present application and do not constitute a limitation on the computer device to which the present application is applied. Specific computer devices may include more or fewer components than shown in the figures, or combine certain components, or have different component arrangements. In an exemplary embodiment, a computer device is provided, including a memory and a processor. The memory stores a computer program, and the processor executes the computer program to implement the steps in the above-described method embodiments.

[0082] It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties. Moreover, the collection, use and processing of the relevant data are carried out in compliance with the relevant data protection laws and policies of the country where the location is located, and with the authorization granted by the owner of the corresponding device.

[0083] Those skilled in the art will understand that all or part of the processes in the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium, and when executed, it can include the processes of the embodiments described above. Any references to memory, databases, or other media used in the embodiments provided in this application can include at least one of non-volatile and volatile memory. Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive random access memory (ReRAM), magnetic random access memory (MRAM), ferroelectric random access memory (FRAM), phase change memory (PCM), graphene memory, etc. Volatile memory can include random access memory (RAM) or external cache memory, etc. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM).

[0084] The databases involved in the embodiments provided in this application may include at least one type of relational database and non-relational database. Non-relational databases may include, but are not limited to, blockchain-based distributed databases. The processors involved in the embodiments provided in this application may be general-purpose processors, central processing units, graphics processing units, digital signal processors, programmable logic devices, quantum computing-based data processing logic devices, etc., and are not limited to these.

[0085] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

[0086] This document uses specific examples to illustrate the principles and implementation methods of this application. The descriptions of the above embodiments are only for the purpose of helping to understand the methods and core ideas of this application. Furthermore, those skilled in the art will recognize that, based on the ideas of this application, there will be changes in the specific implementation methods and application scope. Therefore, the content of this specification should not be construed as a limitation of this application.

Claims

1. A method for detecting surface damage and repair errors based on knowledge distillation and active learning, characterized in that, include: Acquire the image to be inspected; the image to be inspected includes images of surface damage to the object being inspected and / or images after repair. A pre-trained machine learning-based target detection model is applied to detect surface damage and / or repair errors in the image to be detected, and the detection results are obtained. During the training process of the target detection model, a multi-scale feature knowledge distillation strategy and a dual-standard active learning strategy based on uncertainty and complexity are adopted.

2. The surface damage and repair error detection method based on knowledge distillation and active learning according to claim 1, characterized in that, The object detection model adopts the improved YOLO model; in the improved YOLO model, the feature extraction network adopts the MobileNet v3 network structure.

3. The surface damage and repair error detection method based on knowledge distillation and active learning according to claim 1, characterized in that, The object detection model adopts an improved YOLO model; in the improved YOLO model, the feature extraction network adopts an improved MobileNet v3 network structure. In the improved MobileNet v3 network architecture, the depthwise separable convolutional modules of each MobileNet v3 block layer employ an improved ReLU activation function; In the improved ReLU activation function, the negative region portion of the traditional ReLU activation function is replaced by the ELU activation function.

4. The surface damage and repair error detection method based on knowledge distillation and active learning according to claim 2 or 3, characterized in that, The training process for the object detection model is as follows: Construct a training sample set; The training sample set includes surface damage image samples and / or repaired image samples, as well as the corresponding true surface damage results and repair error results; Based on the training sample set, the initial target detection model is trained using a multi-scale feature knowledge distillation strategy to obtain an intermediate target detection model. Based on the training sample set, the intermediate target detection model is trained using a dual-standard active learning strategy based on uncertainty and complexity, resulting in the final trained target detection model.

5. The surface damage and repair error detection method based on knowledge distillation and active learning according to claim 4, characterized in that, Constructing the training sample set specifically includes: The original training samples are augmented to obtain the augmented training samples. For the augmented training set, histogram equalization is used to perform image enhancement processing to obtain enhanced training samples; the original training samples and the augmented training samples constitute the augmented training set. A training sample set is constructed based on the enhanced training samples.

6. The surface damage and repair error detection method based on knowledge distillation and active learning according to claim 4, characterized in that, Based on the training sample set, the initial target detection model is trained using a multi-scale feature knowledge distillation strategy to obtain an intermediate target detection model, specifically including: The initial object detection model is used as the student model; the traditional YOLO model is used as the teacher model. Input the training samples from the current training sample set into the current student model and the current teacher model respectively; The output of the current student model's feature extraction network is input into the first feature pyramid network to obtain multiple first feature maps at different scales; The output of the current teacher model's feature extraction network is input into the second feature pyramid network to obtain multiple second feature maps at different scales; The deviations in feature representation between multiple first feature maps at different scales and multiple second feature maps at different scales are calculated to obtain the feature distillation loss; Calculate classification loss and localization loss based on the detection results of the current student model and the current teacher model; The total loss error is calculated based on the feature distillation loss, classification loss, and localization loss; and the current trained student model is set as the current student model, and the current trained teacher model is set as the current teacher model. Return to the step "Input the training samples from the first dataset after the final update into the current student model and the current teacher model respectively", until the total loss error converges. The student model after the final training is recorded as the intermediate target detection model.

7. The surface damage and repair error detection method based on knowledge distillation and active learning according to claim 4, characterized in that, Based on the training sample set, an intermediate object detection model is trained using a dual-standard active learning strategy based on uncertainty and complexity, resulting in the final trained object detection model, which includes: The training sample set is divided into a first dataset and a second dataset; the training samples in the first dataset are unambiguous training samples. Train the current initial intermediate object detection model using the current first dataset to obtain the current trained intermediate object detection model; Calculate the class probability vector for each training sample in the current second dataset using the intermediate target detection model after current training; The sample uncertainty of each training sample in the current second dataset is calculated based on the class probability vector of each training sample in the current second dataset. Select the h training samples with the greatest uncertainty in the current second dataset to form the third dataset; Calculate the sample complexity for each training sample in the current third dataset; Select the training sample with the lowest sample complexity from the current third dataset and add it to the first dataset to obtain the updated first dataset; at the same time, delete the training sample with the lowest sample complexity from the current third dataset from the second dataset. Determine whether the data size of the first dataset after the current update meets the preset capacity requirement; If not, then set the current updated first dataset as the current first dataset, set the current trained intermediate object detection model as the current initial intermediate object detection model, and return to the step "Train the current initial intermediate object detection model using the current first dataset to obtain the current trained intermediate object detection model". If so, then use the first dataset after the final update to train the current intermediate target detection model to obtain the final trained target detection model.

8. A surface damage and repair error detection system based on knowledge distillation and active learning, characterized in that, include: The data acquisition module is used to acquire the image to be inspected; the image to be inspected includes surface damage images and / or repaired images of the object being inspected. The detection module is used to apply a pre-trained machine learning-based target detection model to detect surface damage and / or repair errors in the image to be detected, and to obtain the detection results. During the training of the target detection model, a multi-scale feature knowledge distillation strategy and a dual-standard active learning strategy based on uncertainty and complexity are adopted.

9. The surface damage and repair error detection system based on knowledge distillation and active learning according to claim 8, characterized in that, The object detection model uses an improved YOLO model; in the improved YOLO model, the feature extraction network uses the MobileNet v3 network structure or an improved MobileNet v3 network structure. In the improved MobileNet v3 network architecture, the depthwise separable convolutional modules of each MobileNet v3 block layer employ an improved ReLU activation function; In the improved ReLU activation function, the negative region portion of the traditional ReLU activation function is replaced by the ELU activation function.

10. A computer system, characterized in that, include: A memory, a processor, and a computer program stored in the memory and capable of running on the processor, characterized in that the processor executes the computer program to implement the surface damage and repair error detection method based on knowledge distillation and active learning as described in any one of claims 1-7.