Five metals tool defect on-line detection system based on AI image recognition
By generating rare defect samples using image generative adversarial networks and combining them with the YOLOv8 detection framework and CBAM attention module, the problem of low efficiency in rare defect detection of hardware tools is solved. This enables efficient and accurate identification and real-time detection of rare defects, improving the detection accuracy and efficiency of the production line.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- 金华高格软件有限公司
- Filing Date
- 2025-11-04
- Publication Date
- 2026-06-19
AI Technical Summary
Existing hardware tool defect detection systems are unable to effectively identify rare defects. Traditional machine vision algorithms cannot learn enough defect feature patterns, resulting in low detection efficiency and low accuracy. Manual inspection is also difficult to meet the needs of modern production lines.
We employ an image generative adversarial network (GAN) to generate rare defect samples. By combining the YOLOv8 detection framework and the CBAM attention module, we enhance the feature learning ability for rare defects through dynamic database updates and incremental training of the GAN, thereby achieving continuous optimization and adaptation of the detection model.
It improves the recall rate of rare defects detection, reduces the false negative rate, and achieves accurate identification and real-time detection of rare defects, meeting the real-time and reliability requirements of the production line and reducing the cost and false positive rate of manual inspection.
Smart Images

Figure CN121415147B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of image generative adversarial network technology, and in particular to an online defect detection system for hardware tools based on AI image recognition. Background Technology
[0002] In the production of hardware tools, defect detection is crucial. Initially, inspection relied primarily on manual labor, with inspectors using only their eyes and simple measuring tools to conduct random checks. However, manual inspection has many drawbacks. Firstly, it is inefficient and cannot meet the demands of large-scale, high-speed modern production lines. For example, on a production line producing dozens of hardware tools per minute, manual inspection simply cannot keep up with the pace of production. Secondly, the accuracy of inspection is greatly affected by human factors. Inspectors are prone to fatigue from long hours of work, leading to a significant increase in the rate of missed detections of minor and complex defects. Furthermore, the subjective judgment standards of different inspectors vary, making it difficult to guarantee consistent inspection results.
[0003] With the development of industrial automation, some inspection systems based on traditional machine vision have emerged. These systems use simple image processing algorithms, such as edge detection and grayscale thresholding, to identify defects in hardware tools. Traditional machine vision can learn from large sample sizes to identify obvious external defects of hardware tools. However, some defects are rare and do not occur frequently. For example, external microcracks (fine surface cracks with a depth ≤ 0.1 mm, width ≤ 0.02 mm, and length ≤ 5 mm, which often occur after stress release during heat treatment or local stress concentration during precision machining (such as screwdriver heads and pliers blades)), local pinholes in the coating, local peeling of the coating, color difference spots in the coating, thread tooth distortion, small curling of the cutting edge, hole misalignment, surface foreign object embedding defects such as embedded metal chips and abrasive particle depressions, near-surface material defects such as local oxide spots and exposure of non-metallic inclusions, special process residual defects such as micro-notches from laser processing and EDM residues, internal cracks, and internal plastic deformation caused by insufficient internal quenching of hardware tools. These defects are difficult to detect and the samples are scarce. Traditional algorithms cannot learn enough defect feature patterns, making it almost impossible to detect these defects. Summary of the Invention
[0004] The purpose of this invention is to provide an online defect detection system for hardware tools utilizing image generative adversarial networks (GANs). By training a YOLOv8 detection framework model and specifically adding a CBAM attention module to YOLOv8-S, the system enhances feature extraction of rare defect regions in hardware tools. Simultaneously, it integrates the feature distribution of rare defect samples generated by GANs at the network bottleneck layer, improving the YOLOv8-S detection model's feature learning ability for rare defects and enhancing its detection performance. This system solves the performance bottleneck problem caused by the scarcity of rare defect samples. By dynamically updating the database and incrementally training the GAN to adapt to new defect patterns, combined with incremental model training, parameter-oriented fine-tuning, and modular design, on-demand optimization is achieved. This allows the system to adapt to production process improvements and new defect types at low cost without full retraining, enabling continuous evolution of detection capabilities.
[0005] The specific contents of this invention are as follows:
[0006] The online defect detection system for hardware tools based on AI image recognition includes a data acquisition module, a database construction and dynamic detection module, a detection model training and optimization module, and an online real-time detection module.
[0007] The data acquisition module acquires images of hardware tools in real time.
[0008] The database construction and dynamic detection module stores and labels normal, common, and rare defect samples, performs standardization processing, enhances common defect samples, extracts features of rare defect samples and clusters them to calculate feature centers, generates and filters simulated rare defect image samples through GAN, expands the rare defect sample library, and updates samples monthly to trigger incremental GAN training.
[0009] The detection model training and optimization module mixes the above samples to train the target detection model, uses it as a teacher model, trains a lightweight target detection model through knowledge distillation, and trains an improved target detection enhancement model at the same time.
[0010] The online real-time detection module uses a lightweight target detection model to detect and sort common defect samples, determines highly suspected rare defects through feature comparison, and further uses an improved target detection enhancement model to identify and classify rare defect samples.
[0011] Furthermore, the data acquisition module includes a visible light industrial camera submodule, which controls the visible light industrial camera to acquire external images of the hardware tool in real time; an X-ray industrial camera submodule, which controls the X-ray industrial camera to acquire internal images of the hardware tool in real time; and an infrared thermal imaging industrial camera submodule, which controls the infrared thermal imaging industrial camera to acquire internal infrared thermal imaging images of the hardware tool in real time.
[0012] Furthermore, the database construction and dynamic detection module includes an initial sample storage submodule, a sample preprocessing submodule 1, and a dynamic sample expansion submodule connected in sequence.
[0013] The initial sample storage submodule stores images of normal hardware tools without defects, samples of common defects that occur during the production process, and samples of rare defects. The Labelme tool is used to annotate the common defect samples at the pixel level, and the number of samples of each type of rare defect sample is ≥50 frames, thus establishing an independent rare defect sub-library.
[0014] The sample preprocessing submodule 1 normalizes the pixel values of all sample images to the [0, 1] interval, uses bicubic interpolation to unify the spatial resolution size of all sample images, performs conventional data augmentation on common defect samples, including random flipping and brightness adjustment; extracts 256-dimensional feature vectors of rare defect samples through the ResNet-50 network, uses the DBSCAN algorithm to cluster the 256-dimensional feature vectors, and calculates the feature center of each type of defect.
[0015] The dynamic sample expansion submodule includes a GAN model training unit, a sample generation and screening unit, and a database update unit connected in sequence. The GAN model training unit inputs at least 50 rare defect samples into the generator G, combines the feature center of each type of defect with a 128-dimensional random noise vector z, and generates a simulated rare defect image. The random noise vector z follows an N(0,1) distribution.
[0016] The generated sample screening unit calculates the structural similarity index between the simulated rare defect image and the rare defect sample, retains the sample with SSIM≥0.85, and expands the rare defect sample library at a ratio of 1:5 between the simulated rare defect image and the rare defect sample.
[0017] The database update unit adds newly discovered defect samples to the database every month, triggering incremental training of the GAN model for at least 50 iterations. The defect samples include common defect samples and rare defect samples.
[0018] Furthermore, the GAN model generates simulated images of rare defects by including the following steps:
[0019] 1) Data preparation: Retrieve at least 50 real rare defect samples from the initial sample storage submodule, preprocess them into a uniform format, and use them as reference data for the generator. Retrieve the feature center of each type of defect calculated by the sample preprocessing submodule 1, and use it as the core reference for the generator.
[0020] 2) Noise generation: Generate a 128-dimensional random noise vector z that follows an N(0,1) distribution to provide a source of randomness for image generation;
[0021] 3) Feature fusion and image generation: The generator G receives feature information of real rare defect samples, feature centers of each type of defect and random noise z, and performs feature mapping and reconstruction through a multi-layer neural network. It fuses the common features of the feature centers with the randomness of the noise, transforms them into high-dimensional image data, and generates a simulated image containing rare defect features.
[0022] 4) Adversarial training optimization: The discriminator D distinguishes the common features of the generated image, the real sample, and the feature center representation of each type of defect. The generator adjusts its parameters based on the discriminator feedback. Through multiple rounds of adversarial iteration, the generated image retains the core features of the feature center while also possessing diversity and realism.
[0023] 5) Output simulated images: After sufficient training, the generator outputs simulated images that fit the common features of the feature center and have diversity, which are used for subsequent sample selection and library expansion.
[0024] Furthermore, the detection model training and optimization module includes a hybrid training set construction submodule, a YOLOv8 detection model training submodule, a common defect lightweight model training submodule, and a rare defect enhancement model training submodule connected in sequence.
[0025] The mixed training set construction submodule mixes normal samples and defective samples in a 7:2:1 ratio, dividing them into a 70% training set, a 20% validation set, and a 10% test set. The defective samples include real common defect samples, real rare defect samples, and simulated rare defect images.
[0026] The YOLOv8 detection model training submodule loads the YOLOv8 pre-trained model weights as initial parameters, sets the initial learning rate to 0.01, and uses a cosine annealing learning rate decay mechanism to train the YOLOv8 detection model. The total number of training rounds is set to 100 rounds. In each round, the mAP@0.5 index is calculated on the validation set, and the model weight with the highest mAP in the validation set is retained.
[0027] The common defect lightweight model training submodule uses the YOLOv8 detection model as the teacher model and trains the YOLOv8-nano lightweight model through knowledge distillation.
[0028] The rare defect enhancement model training submodule transmits the overall features and basic detection logic of hardware tools learned during the training of the YOLOv8 detection model to the YOLOv8-S model through knowledge distillation and other methods. The CBAM attention module is added to the YOLOv8-S detection model to enhance the feature extraction of rare defect regions. At the same time, the feature distribution of rare defect image samples generated by GAN is fused into the bottleneck layer of the YOLOv8-S detection model network to improve the feature learning ability of the YOLOv8 detection model for rare defects and enhance the detection effect of rare defects. The improved YOLOv8-S enhancement model is obtained through training.
[0029] Furthermore, the online real-time detection module includes a data acquisition and startup submodule, a sample preprocessing submodule II, a preliminary defect detection submodule, and a rare defect depth detection submodule connected in sequence.
[0030] The acquisition start-up submodule receives signals from the photoelectric sensor, determines that the hardware tool has entered the detection area with the transmission belt, and sends a command to the data acquisition module to acquire images of the hardware tool in real time.
[0031] The second sample preprocessing submodule performs real-time preprocessing on the acquired images;
[0032] The preliminary defect detection submodule performs common defect detection on the preprocessed image using the YOLOv8-nano lightweight model and outputs the results. The confidence level is used to achieve efficient preliminary judgment, which is used to divert the subsequent in-depth detection of rare defects.
[0033] The rare defect deep detection submodule performs fine detection through feature comparison and an improved YOLOv8-S enhancement model, achieving accurate identification and confidence determination of rare defects.
[0034] Furthermore, the preliminary defect detection submodule includes a model loading and initialization unit, a feature extraction and defect prediction unit, a post-processing optimization unit, and a detection result screening and sorting unit connected in sequence.
[0035] The model loading and initialization unit loads the YOLOv8-nano lightweight model optimized by knowledge distillation, inputs the preprocessed image into the lightweight model, the model input layer is adapted to the preprocessed 640×640 pixel image, and the output layer contains the prediction results of common defects.
[0036] The feature extraction and defect prediction unit performs multi-scale feature extraction on the input image, generating 8×8, 16×16, and 32×32 feature maps; it uses a feature pyramid network (FPN) to fuse features at different scales, and outputs the defect type, location coordinates, and confidence score of each candidate box through the detection head;
[0037] The post-processing optimization unit uses a non-maximum suppression algorithm to remove redundant candidate boxes, sets the NMS threshold to 0.4, and retains the highest confidence box in the same defect area; based on the pixel-to-millimeter conversion coefficient of the camera calibration, the pixel coordinates are converted into actual physical coordinates, and the actual position of the defect on the tool is output.
[0038] The detection result screening and triage unit sets the confidence threshold for common defects to 0.7, and judges each defect result output by the model: if the confidence is ≥0.7, it is judged as a valid common defect, the defect type, actual location and confidence are recorded, and the tool is marked as "common defect to be processed"; if the confidence of all defects is <0.7, it is judged as "suspected to have no common defects", and the image and related information of the tool are sent to the rare defect depth detection submodule. The related information includes the tool ID and acquisition time.
[0039] Furthermore, the rare defect deep detection submodule includes a feature extraction unit, a rare defect feature library construction unit, and a comparison and judgment unit connected in sequence;
[0040] The feature extraction unit uses a pre-trained ResNet-50 network as a feature extractor to encode features of suspected samples after preliminary defect detection and output a 256-dimensional high-dimensional feature vector.
[0041] The rare defect feature library construction unit retrieves 256-dimensional feature vectors of historical rare defect samples from the sample preprocessing submodule to construct the rare defect feature library, which is stored according to defect type, with each category containing at least 50 sample features; a KD tree index structure is used to optimize retrieval efficiency, supporting ≥100 feature comparisons per second.
[0042] The comparison and judgment unit calculates the Euclidean distance between the feature vector of the suspected sample and the same type of feature in the feature library. If the minimum distance is less than 0.6, it is judged as "highly suspected rare defect" and triggers subsequent enhanced model detection; otherwise, it is directly marked as "low suspected sample" and enters the manual review stage.
[0043] Furthermore, the rare defect depth detection submodule also includes a model architecture configuration unit, a detection parameter setting unit, and a multi-dimensional result output unit connected in sequence;
[0044] The model architecture configuration unit adopts the improved YOLOv8-S enhanced model, adding two cross-scale attention modules in the neck to enhance feature capture of small and rare defects. The backbone network of the improved YOLOv8-S enhanced model adopts CSPDarknet-53, outputting feature maps at three scales (80×80, 40×40, 20×20) to adapt to rare defects of different sizes.
[0045] The detection parameter setting unit applies Mosaic data augmentation to the input images of the aforementioned "highly suspected rare defects," and enables random rotation and contrast adjustment to improve the robustness of the improved YOLOv8-S augmentation model to complex conditions; a learning rate of 5e is used. -5 Weight decay 1e -4 The AdamW optimizer is used to train the improved YOLOv8-S augmented model with attention modules, optimizing the parameters only by updating the parameters of the newly added attention modules, and setting the batch size to 8 during training.
[0046] The multidimensional result output unit includes defect type, confidence level, and risk level. The risk level is determined based on defect size and location. Risk level = 0.6 × defect area percentage + 0.4 × key area weight. Based on the calculation results, it is set to include Level I (high risk), Level II (medium risk), and Level III (low risk).
[0047] Furthermore, the rare defect deep detection submodule also includes a result determination and classification unit. The result determination and classification unit uses 0.65 as the confidence threshold for rare defect determination. If the model outputs a confidence level ≥ 0.65 and the risk level is Level I / II, it is marked as "confirmed rare defect" and the defect type, location coordinates, confidence level and risk level are recorded and sent to the system database.
[0048] If the confidence level is ≥0.65 but the risk level is III, or the confidence level is ∈ [0.5, 0.65), it is marked as "rare defect to be reviewed" and the manual review process is triggered.
[0049] If the confidence level is <0.5 and the feature comparison Euclidean distance is ≥0.6, it is judged as a "normal sample" and an "no defect" inspection report is automatically generated. The report includes the sample ID, inspection time and key parameters, including the feature vector hash value.
[0050] All judgment results are stored in XML format, including image path, defect parameters, and judgment labels. They are synchronized to the production line MES system via API interface, supporting real-time display and historical traceability. For samples with "confirmed rare defects", an audible and visual alarm signal is triggered.
[0051] Compared with the prior art, the present invention has at least one of the following technical effects:
[0052] 1. The database construction and dynamic detection module of this invention stores a large number of images of normal and defective hardware tools, and performs feature vector extraction and clustering operations on rare defect samples of hardware tools to provide high-quality training data for the detection model. Simultaneously, it generates simulated rare defect images through a GAN model, proportionally expanding the rare defect sample library. Newly discovered defect samples are added to the database monthly, triggering incremental training of the GAN model to continuously improve the model's defect recognition ability. This system solves the performance bottleneck problem of the model caused by the scarcity of rare defect samples in hardware tools.
[0053] 2. The online real-time detection module, through image acquisition and preprocessing, defect layer detection, and result fusion decision-making, can determine the defect situation in real time during the hardware tool production process. For common defects, a lightweight model achieves efficient initial judgment with an inference speed of ≥50fps; for rare defects, feature comparison and enhanced model refinement detection achieve accurate identification and confidence level determination, and graded response is performed according to the defect risk level, meeting the real-time and reliability requirements of online detection on the production line.
[0054] 3. By training with a mixed sample set to build basic detection capabilities, and combining model fusion and attention mechanism optimization, accurate identification of both common and rare defects can be achieved. In the detection of common defects, such as surface scratches and dimensional deviations, the detection accuracy can reach over 95%. For rare defect detection, YOLOv8-S, by incorporating the CBAM attention module and fusing GAN to generate simulated rare defect image sample features, improves the recall rate of rare defects by over 30% compared to traditional methods, effectively reducing the false negative rate.
[0055] 4. By dynamically updating the database and incrementally training the GAN to adapt to new defect patterns, and combining incremental model training, parameter fine-tuning and modular design to achieve on-demand optimization, the system can adapt to production process improvements and new defect types at low cost without full retraining, and achieve continuous evolution of detection capabilities.
[0056] 5. The anomaly handling and system maintenance module ensures the continuous and stable operation of the detection system through hardware anomaly self-checks, software fault switching, and regular calibration. Simultaneously, monthly reviews of the detection data are conducted. When the false detection rate for a certain type of defect exceeds 5%, incremental model training is initiated, incorporating new samples for 30 iterations, and updating the feature library of the GAN-generated model. This allows the system to continuously optimize based on actual production conditions, maintaining detection accuracy and continuity in industrial scenarios.
[0057] 6. Reducing manual inspection steps lowers labor costs and the costs associated with missed or false inspections due to human error. An efficient inspection system can promptly identify defective products, preventing them from entering subsequent production stages, reducing raw material waste and rework costs, and ultimately improving the company's overall production efficiency and economic benefits. Attached Figure Description
[0058] To more clearly illustrate the technical solutions in the embodiments of this application, the drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0059] Figure 1 This is a module architecture diagram of the online defect detection system for hardware tools based on AI image recognition, as described in this invention.
[0060] Figure 2 This is a module architecture diagram of the data acquisition module of the present invention;
[0061] Figure 3 This is a module architecture diagram of the database construction and dynamic detection module of the present invention;
[0062] Figure 4 This is a module architecture diagram of the detection model training and optimization module of the present invention;
[0063] Figure 5 This is a module architecture diagram of the online real-time detection module of the present invention;
[0064] Figure 6 This is a module architecture diagram of the preliminary defect detection submodule of the present invention;
[0065] Figure 7 This is a module architecture diagram of the rare defect depth detection submodule of the present invention. Detailed Implementation
[0066] In the following description, specific details such as particular system architectures and techniques are set forth for illustrative purposes and not for limitation, in order to provide a thorough understanding of the embodiments of this application. However, those skilled in the art will understand that this application may also be implemented in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, apparatuses, circuits, and methods have been omitted so as not to obscure the description of this application with unnecessary detail.
[0067] It should be understood that, when used in this application specification and the appended claims, the term "comprising" indicates the presence of the described features, integrals, steps, operations, elements and / or components, but does not exclude the presence or addition of one or more other features, integrals, steps, operations, elements, components and / or a collection thereof.
[0068] It should also be understood that the term “and / or” as used in this application specification and the appended claims means any combination of one or more of the associated listed items and all possible combinations, and includes such combinations.
[0069] As used in this application specification and the appended claims, the term "if" may be interpreted, depending on the context, as "when," "once," "in response to determination," or "in response to detection." Similarly, the phrase "if determined" or "if detected [the described condition or event]" may be interpreted, depending on the context, as meaning "once determined," "in response to determination," "once detected [the described condition or event]," or "in response to detection [the described condition or event]."
[0070] Furthermore, in the description of this application and the appended claims, the terms "first," "second," "third," etc., are used only to distinguish descriptions and should not be construed as indicating or implying relative importance.
[0071] References to "one embodiment" or "some embodiments" as described in this specification mean that one or more embodiments of this application include a specific feature, structure, or characteristic described in connection with that embodiment. Therefore, the phrases "in one embodiment," "in some embodiments," "in other embodiments," "in still other embodiments," etc., appearing in different parts of this specification do not necessarily refer to the same embodiment, but rather mean "one or more, but not all, embodiments," unless otherwise specifically emphasized. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless otherwise specifically emphasized.
[0072] The AI image recognition-based online defect detection system for hardware tools includes a data acquisition module, a database construction and dynamic detection module, a detection model training and optimization module, an online real-time detection module, a quality traceability and model iteration module, and an anomaly handling and system maintenance module, which are connected in sequence.
[0073] The data acquisition module is used to acquire images of hardware tools in real time. The data acquisition module includes a visible light industrial camera submodule, which controls the visible light industrial camera to acquire external images of hardware tools in real time; an X-ray industrial camera submodule, which controls the X-ray industrial camera to acquire internal images of hardware tools in real time. Due to the difference in X-ray penetration, the crack area inside the hardware tool presents a "continuous / discontinuous dark line". When X-rays penetrate metal, the density at the defect is low and the X-ray attenuation is small, so it appears as a dark area in the image; and an infrared thermal imaging industrial camera submodule, which controls the infrared thermal imaging industrial camera to acquire infrared thermal imaging images of the inside of the hardware tool in real time. Areas with insufficient quenching have slow heat dissipation and appear as "local high-temperature bright areas" in the image. Differences in the internal structure of the metal lead to different thermal conductivity, and thermal imaging can capture temperature differences. Specifically, the designated inspection area is a 6m long × 1.5m wide × 2.5m high enclosed space designed to prevent light interference. A conveyor belt runs through the center of the area, and its surface is covered with a matte black anti-slip mat to reduce glare. The conveyor belt supports stepless adjustment, with a preset speed of 10-20m / min. Positioning baffles made of anodized aluminum with a thickness of 5mm are installed on both sides of the conveyor belt. The baffle spacing is set according to the maximum width of the tool + 20mm (e.g., for wrenches, the maximum width is 150mm, so the baffle spacing is set to 170mm) to ensure that the center line of the tool coincides with the center line of the camera's field of view during tool transmission. Visible light industrial camera inspection area, X-ray industrial camera inspection area, and infrared thermal imaging industrial camera inspection area are set sequentially along the length of the conveyor belt. In each inspection area, for example, two industrial cameras can be symmetrically arranged along the conveyor belt axis (800mm apart). The optical axis of the camera is at a 45° angle to the plane of the conveyor belt, with a vertical distance of 1.2m (calibrated by a laser rangefinder). The camera lens is a Computar M1214-MP2 (focal length 12mm, aperture F1.4). The focal length is adjusted so that the tool occupies 60%-80% of the field of view (e.g., a 150mm long wrench occupies 1400-1800 pixels after imaging). Camera calibration is performed: using the Zhang Zhengyou checkerboard calibration board (12×9 grids, grid spacing 20mm), 20 sets of calibration images from different angles are acquired. The intrinsic parameter matrix (focal length, principal point coordinates) and distortion coefficients (radial distortion k1-k3, tangential distortion p1-p2) are calculated using OpenCV to generate a distortion correction mapping table. Each camera is equipped with one set of LED ring light source (model CCSRL-100SW, diameter 100mm, 120 LED beads). The light source is coaxially mounted with the camera lens (fixed by an adapter ring). The light source brightness is adjustable in 10 levels (default setting is level 7). Light source color temperature calibration: use a color temperature meter (accuracy ±100K) to detect, and adjust the color temperature to 5500K±500K through the light source controller to ensure consistent image color of different batches of tools (ΔE≤3). Two strip lights (length 1m, color temperature 5500K) are installed on both sides of the detection area, installed at a 30° angle to the plane of the conveyor belt to eliminate shadows on the tool surface (such as the groove of the wrench).The edge computing unit uses an NVIDIA Jetson AGX Orin (64GB VRAM version) and connects to the camera via Gigabit Ethernet (with a fixed IP address: 192.168.1.100). It connects to a photoelectric sensor (model Omron E3Z-LS63) via a USB 3.0 interface. The photoelectric sensor is installed 300mm in front of the camera (along the conveyor belt direction), with a detection distance of 50-300mm. The trigger signal is transmitted to the computing unit via a GPIO interface, with a response time ≤1ms. A three-color warning light (red / yellow / green) and a buzzer (adjustable volume, maximum 90dB) are installed at the exit of the detection area, communicating with the computing unit via an RS485 bus. After the photoelectric sensor detects a metal product entering the detection area, it sends a signal to the corresponding industrial camera submodule, which then controls the industrial camera to take a picture. It should be noted that the scene arrangement and data acquisition and processing methods described above for industrial camera image acquisition should not be considered the sole limitation of this invention. Any arrangement that facilitates the acquisition of clear and comprehensive images is within the scope of protection of this invention.
[0074] The database construction and dynamic detection module is used to store and label normal, common, and rare defect samples. It performs sample standardization, enhances common defect samples, extracts features from rare defect samples, and calculates feature centers by clustering. It generates and filters simulated rare defect image samples through GAN to expand the rare defect sample library. The samples are updated monthly to trigger incremental GAN training. Specifically, the database construction and dynamic detection module includes an initial sample storage submodule, a sample preprocessing submodule 1, and a dynamic sample expansion submodule connected in sequence.
[0075] The initial sample storage submodule stores ≥80,000 frames of normal hardware tool images without defects; it stores samples of common defects that occur during the production process, with ≥5,000 frames for each type of defect. The Labelme tool is used to annotate the common defect samples at the pixel level. Common defects include surface scratches, dimensional deviations, and missing parts; it stores samples of rare defects, with ≥50 frames for each type of defect. An independent rare defect sub-library is established. Rare defect samples are obtained through X-ray detection, infrared thermal imaging, and other means, including internal micro-cracks and areas with insufficient quenching. The above-mentioned normal hardware tool images and common defect sample images are historically acquired images that can be captured by the aforementioned industrial camera.
[0076] The sample preprocessing submodule performs standardization processing on all samples, including but not limited to: normalizing image pixel values to the [0, 1] interval, and using bicubic interpolation to unify the spatial resolution size of all sample images; performing conventional data augmentation on common defect samples, including random flipping ±15° and brightness adjustment ±20%; and extracting feature vectors for rare defect samples: extracting 256-dimensional features of rare defect samples through a ResNet-50 network, using the DBSCAN algorithm to cluster the 256-dimensional feature vectors, and calculating the feature center of each type of defect. The core challenge of rare defects is the extremely small sample size; for example, a certain type of defect may only have a few dozen or even just a few samples. When directly used to train a model, the model struggles to learn the complete feature distribution of that type of defect, leading to overfitting or insensitivity to variant samples. Using ResNet-50 to extract 256-dimensional feature vectors can transform high-dimensional image information into low-dimensional, structured feature representations, such as quantitative descriptions of key features like texture, shape, and edges. DBSCAN clustering, a density-based clustering algorithm, is suitable for handling non-convex distributions and does not require a pre-defined number of classes. It automatically groups samples with similar features into one class, uncovering the "common features" of rare defects of the same type, and calculating the feature center of each defect class—that is, the mean or median of the feature vectors of each sample after clustering. This is equivalent to creating a "digital fingerprint" for that type of rare defect. The calculation process is as follows:
[0077] After DBSCAN clustering, multiple "clusters" are obtained, each cluster corresponding to a type of defect. The 256-dimensional feature vectors of all samples in each cluster are extracted to form a set of feature vectors for that type of defect. For example, if there are 10 samples of a certain type of defect, the set is 10 256-dimensional vectors.
[0078] 2) Calculate the average value for each dimension (256 dimensions in total) of all vectors in the set. Assume the eigenvalue of the i-th dimension is x. 1i x 2i , ..., x ni (where n is the number of samples in this class), then the mean of the i-th dimension is ;
[0079] 3) Combine the 256-dimensional mean values in order to form a new 256-dimensional vector, which is the feature center of this type of defect and represents the "average position" of this type of defect in the feature space.
[0080] The dynamic sample augmentation submodule includes a GAN model training unit, a sample generation and screening unit, and a database update unit connected in sequence. The GAN model training unit inputs at least 50 frames of rare defect samples into the generator G, combines the feature centers of each type of defect with a 128-dimensional random noise vector z, and generates simulated rare defect images through a neural network. The random noise vector z follows an N(0,1) distribution. The main technical steps of the GAN model in generating simulated rare defect images are as follows:
[0081] 1) Data preparation: Collect at least 50 real rare defect samples and preprocess them into a uniform format (such as size and resolution standardization); retrieve the feature centers of each type of defect calculated by the sample preprocessing submodule as the core reference features of the generator;
[0082] 2) Noise generation: A 128-dimensional random noise vector z following an N(0,1) distribution is generated based on a random number generation algorithm combined with a normal distribution transformation method, providing a source of randomness for image generation;
[0083] 3) Feature fusion and image generation: The generator G receives feature information of real rare defect samples, feature centers of each type of defect and random noise z. It performs feature mapping and reconstruction through multi-layer neural networks (such as convolution and deconvolution layers), fuses the common features of feature centers with the randomness of noise, transforms them into high-dimensional image data, and generates simulated images containing the core features of rare defects.
[0084] 4) Adversarial training optimization: The discriminator D distinguishes the common features of the generated image, the real sample, and the feature center representation of each type of defect. The generator adjusts its parameters based on the discriminator feedback. Through multiple rounds of adversarial iteration, the generated image retains the core features of the feature center while also possessing diversity and realism, so that the generated image gradually approaches the visual features of the real defect.
[0085] 5) Output simulated images: After sufficient training, the generator can stably output simulated images that fit the common features (such as shape and texture) of the feature center and have diversity, which can be used for subsequent sample selection and library expansion.
[0086] When generating simulated rare defect images using GANs, a 128-dimensional random noise vector z following an N(0,1) distribution is introduced. The core purpose is to inject controllable randomness into the generation process, achieving "diversified expansion" of rare defect samples, as detailed below:
[0087] First, it breaks through the limitations of real samples in terms of form: there are very few real rare defect samples, for example, only about 50 frames, and the form is uniform. For example, a type of defect may only present a few fixed angles or textures. Different values of random noise z will guide the generator to produce subtle differences, such as defect position shift, edge blurring change, local texture variation, etc., so that the generated simulated image covers a wider form space and avoids "overfitting to a limited number of real samples" when the model learns later.
[0088] Secondly, it ensures the "controllable variation" of the generated samples: the noise vector of the N(0,1) distribution has continuity and randomness, which can not only allow the generated image to retain the core attributes of the defect on the basis of the real defect features and undergo "moderate variation", but also prevent the generated result from deviating from the real defect features due to excessive noise, thus constraining the generator's feature learning of the real samples.
[0089] The generated sample screening unit calculates the structural similarity index between the simulated rare defect image and the rare defect sample, retains samples with SSIM ≥ 0.85, and expands the rare defect sample library at a ratio of 1:5 between simulated rare defect images and rare defect samples. The structural similarity index is an indicator that measures the degree of structural similarity between two images. It outputs a score between 0 and 1 (1 represents complete similarity) by comparing three dimensions: brightness (pixel mean), contrast (pixel variance), and structural correlation (pixel covariance). The detailed process of expanding the rare defect sample library is as follows:
[0090] Step 1) First, calculate the SSIM (Structural Similarity Index) between a single simulated rare defect image and a real sample. For each simulated rare defect image generated by the GAN, calculate the structural similarity index with the original input real rare defect samples (at least 50 frames). SSIM compares the two images in three dimensions: brightness (pixel mean), contrast (pixel variance), and structural correlation (pixel covariance), outputting a score between 0 and 1 (1 represents complete similarity). The calculation formula is as follows:
[0091] ,
[0092] Where x is the real sample, y is the generated sample, μ is the pixel mean, and σ is the average pixel value. 2 Let σ be the pixel variance. xy Let σ be the pixel covariance. xy This reflects whether the changing trends of pixel values at corresponding positions in two images are consistent. C1 and C2 are stability constants. Indicates brightness similarity, reflecting whether the brightness information of two images is consistent. Indicates contrast similarity, reflecting whether the contrast information of two images is consistent. Structural similarity is used to measure whether the structural information of two images is consistent. The denominator σ... x σ y It is the "normalization factor" for covariance;
[0093] Step 2) Filter the generated samples that meet the threshold, retain the generated images with SSIM≥0.85, and remove the distorted samples with low similarity. Retaining the generated images with SSIM≥0.85 means that the generated samples are highly similar to the real samples in terms of structure, brightness and contrast, thus ensuring the authenticity of the defect features.
[0094] Step 3) Expand the sample library proportionally. Assuming there are N frames of original rare defect samples (N≥50), select 5×N frames from the filtered valid generated samples according to the ratio of "real sample: generated sample = 1:5" and merge them with the original samples to form the expanded rare defect sample library (total sample size is 6×N).
[0095] The database update unit adds newly discovered defect samples to the database monthly, triggering incremental training of the GAN model for at least 50 iterations. The defect samples include both common and rare defect samples. By constructing a closed loop of "data update - model iteration - capability improvement," and dynamically optimizing the database and model, the GAN maintains high adaptability in defect identification and generation tasks. Regularly updating the defect sample database and driving incremental training of the GAN model are crucial for continuously improving the model's ability to identify, generate, and generalize defects. Especially for rare defects, this addresses the performance bottleneck caused by the scarcity of rare defect samples in existing technologies, providing more reliable technical support for subsequent defect detection, analysis, and other applications.
[0096] The detection model training and optimization module mixes the above samples to train the YOLOv8 model (corresponding to the object detection model). This model is used as the teacher model, employing a specific learning rate mechanism and retaining the optimal weights. The YOLOv8-nano lightweight model (corresponding to the lightweight object detection model) is trained through knowledge distillation to achieve efficient detection of common defects. At the same time, an improved YOLOv8-S enhancement model (corresponding to the improved object detection enhancement model) is trained based on the YOLOv8 model to enhance the detection effect of rare defects. This invention uses the YOLOv8 model as an example for illustration. Other object detection models with the same function, such as Faster R-CNN, SSD, RetinaNet, EfficientDet, DETR, etc., and their corresponding lightweight models such as Faster R-CNN-Lite, SSD-MobileNet, RetinaNet-MobileNet, EfficientDet-Lite, DETR-Lite, and their corresponding improved enhancement models such as Faster R-CNN + attention mechanism, SSD + feature pyramid optimization, RetinaNet + hard example mining, EfficientDet + larger backbone, DETR + hybrid attention, are also acceptable as long as they can achieve the functions of this invention. They are not intended to be the sole limitation of the technical solution of this invention.
[0097] The detection model training and optimization module constructs a hybrid training set and trains a YOLOv8 model to identify common and rare defects. It optimizes metrics to ensure the model possesses initial detection accuracy and generalization performance. This module includes a hybrid training set construction submodule and a YOLOv8 detection model training submodule. The hybrid training set construction submodule mixes normal and defect samples in a 7:2:1 ratio, dividing them into a 70% training set, a 20% validation set, and a 10% test set. The defect samples include common defect samples, rare defect samples, and simulated rare defect images. The YOLOv8 detection model training submodule uses an initial learning rate of 0.01 and employs a cosine annealing learning rate decay mechanism to train the YOLOv8 detection model. The total training rounds are set to 100 rounds, with mAP@0.5 calculated in each round. The model weights with the highest mAP in the validation set are retained. The detailed steps for training the YOLOv8 detection model are as follows:
[0098] Model initialization and parameter settings: Load the weights of the pre-trained YOLOv8 model as initial parameters (or randomly initialize network parameters), set the initial learning rate to 0.01 to initiate model parameter updates, and configure the cosine annealing learning rate decay mechanism: the learning rate gradually decays with each training epoch according to a cosine function, as shown in the formula...
[0099] ,
[0100] Among them, ƞ t Let η be the learning rate in round t. max This is the initial maximum learning rate, used during the early stages of training to ensure rapid parameter updates and accelerate convergence. For example, in YOLOv8, it is set to 0.01. min The minimum learning rate (usually close to 0) is the lower bound for learning rate decay, preventing the model from stopping optimization due to an excessively small learning rate later on. t is the current training epoch, ranging from 0 to T, and is the time variable controlling the learning rate change. T=100 represents the total number of training epochs, determining the length of the learning rate decay period to ensure a slow and precise parameter optimization later on. The cosine function term is the core control factor for learning rate decay. When t=0, cos(0)=1, and the learning rate is ƞ. max When t=T, cos(π)=−1, and the learning rate drops to ƞ. min The learning rate decreases smoothly according to a cosine curve during the intermediate process, achieving a change from "fast at first and slow later", which takes into account both rapid convergence in the early stage and fine optimization in the later stage.
[0101] 2) Training Data Preparation and Input: Prepare training, validation, and test sets containing defect samples (including real common defect samples, real rare defect samples, and samples simulating rare defect images), and label the defect categories and bounding boxes; format the data according to YOLOv8 requirements, such as generating .txt annotation files, dividing the training / validation ratio, and during each training round, the model randomly reads batches of images from the training set, performs data augmentation such as scaling, flipping, and color gamut transformation, and then inputs them into the network to calculate the loss between the predicted box and the ground truth box, including classification loss, localization loss, and confidence loss.
[0102] 3) Training process and metric monitoring: The total number of training rounds is set to 100 rounds. After each round of training, the mAP@0.5 metric is calculated on the validation set, which is the average accuracy when the IoU threshold is 0.5. This measures the detection accuracy of the model for different defect categories. After each round, the current model weights are saved and compared with the historical validation set mAP@0.5. Only the weight file with the highest mAP up to the current time is retained to avoid overfitting and ensure that the model has the best generalization ability.
[0103] 4) Model Convergence and Output: As the training epochs increase, the learning rate is dynamically adjusted using a cosine annealing mechanism. A rapid decrease is set initially for fast convergence, followed by slow fine-tuning in later stages to gradually optimize the model parameters. After 100 training epochs, the model weights with the highest mAP@0.5 on the validation set are output as the final usable YOLOv8 detection model. An initial learning rate of 0.01 ensures rapid parameter updates in the early stages of training. The cosine annealing mechanism prevents excessive learning rates in later stages from causing oscillations, allowing the model to converge stably to a better solution. Each epoch calculates mAP@0.5 and retains the optimal weights to ensure the final model's detection accuracy on the validation set, especially in defect localization and classification accuracy, resulting in stronger generalization ability. The 100 training epochs ensure the model fully learns defect features while avoiding overfitting due to overtraining, balancing training efficiency and detection performance.
[0104] The detection model training and optimization module achieves lightweight and efficient detection of common defects through knowledge distillation, and enhances the extraction of rare defect features by combining an attention mechanism, thereby improving the model's adaptability and detection performance for different types of defects. The module also includes a sub-module for training a lightweight common defect model and a sub-module for training a rare defect enhancement model, which are connected sequentially. The lightweight common defect model training sub-module uses the YOLOv8 detection model as the teacher model and trains the YOLOv8-nano lightweight model through knowledge distillation. This ensures an inference speed ≥50fps while inheriting the teacher model's ability to detect common defects, achieving efficient detection of common defects. The detailed steps for training the YOLOv8-nano lightweight model through knowledge distillation are as follows:
[0105] 1) Preparation of teacher and student models: The mature YOLOv8 detection model is selected as the teacher model. This model has a high mAP index on ordinary defect detection tasks. The YOLOv8-nano model is initialized as the student model. Its network structure is more streamlined and the number of parameters is about 1 / 10 of that of the teacher model. All parameters of the teacher model are frozen and used only to provide knowledge output. It does not participate in training and updates.
[0106] 2) Distillation Loss Function Construction: Design a two-branch loss function, including hard label loss: the loss between the student model's prediction results and the ground truth labels (including classification loss and bounding box regression loss), and soft label loss: calculating the KL divergence loss of the class probability distributions output by the student model and the teacher model. Set the loss weight ratio, typically soft label loss accounts for 30%-50%, and hard label loss accounts for 50%-70%. The distillation loss function is as follows:
[0107]
[0108] Where α is the weighting coefficient (usually taken as 0.5-0.7), which balances the contributions of hard label loss and soft label loss. Here is the formula for hard-label loss, where L cls For classification loss (using cross-entropy loss), L measures the difference between the student model's prediction of the defect category and the true category. box Bounding box loss, using CIoU loss, measures the difference in position and size between the predicted box and the ground truth box. obj The target confidence loss, using cross-entropy loss, measures the accuracy of the student model's judgment on "whether it is a defect," λ. box , λ obj These are the weighting coefficients; the default values for YOLOv8 are 7.5 and 1.0, respectively; T 2 ·KL(P t / / P s )+λ box-soft ·MSE(B t, B s The formula for soft label loss is given, and KL divergence (KL divergence) is used to measure the difference in output distribution between the student model and the teacher model. , where P teacher (c) is the class probability distribution output by the teacher model, softened by a temperature coefficient T, typically T=2~5, P student (c) represents the category probability distribution output by the student model, which has also been softened by a temperature coefficient T. Here, C is the total number of defect categories, and T is the temperature coefficient. 2 This is to balance the magnitude of the loss when T>1, preventing the differences in the probability distribution after softening from being diluted, which could lead to an excessively small loss value that would negatively impact training performance. λbox-soft ·MSE(B t, B s ) represents the bounding box prediction loss (MSE loss), where B t B is the bounding box predicted by the teacher model. s The bounding box is the one predicted by the student model; MSE measures the difference in coordinates between the two, λ. box-soft This is a weighting factor (usually 5~10) to emphasize the importance of bounding box mimicry;
[0109] 3) Distillation training process: First, the same common defect image is input into the teacher model and the student model. Second, the teacher model outputs a high-confidence soft label, which includes the class probability distribution and bounding box prediction. Then, the student model learns the ground truth label (hard label) and the teacher model output (soft label) at the same time. Then, the total loss is calculated using the distillation loss function, and the student model parameters are updated through backpropagation. Finally, a cosine annealing learning rate scheduling strategy is adopted, with the initial learning rate set to 0.001, and training is carried out for 50-80 rounds.
[0110] 4) Speed and accuracy balance optimization: Test the inference speed of the student model after every 10 rounds of training to ensure ≥50fps on the target hardware. If the speed does not meet the standard, use model quantization (INT8) or channel pruning for further compression. Simultaneously monitor the validation set mAP@0.5 index to ensure that the accuracy loss is controlled within 5%.
[0111] 5) Model Evaluation and Selection: After training, the model performance is comprehensively evaluated on the test set. The model with the highest inference speed ≥ 50fps and mAP@0.5 is selected as the final version. The optimal model weights and configuration files are saved for subsequent ordinary defect detection tasks.
[0112] The rare defect enhancement model training submodule adds a CBAM attention module to the YOLOv8 detection model to enhance feature extraction of rare defect regions. Simultaneously, it integrates the feature distribution of rare defect samples generated by GAN at the network bottleneck layer, improving the YOLOv8 detection model's feature learning ability for rare defects and enhancing its detection performance. This training yields the improved YOLOv8-S enhancement model. The training of the rare defect enhancement model (improved YOLOv8-S) specifically includes:
[0113] 1) Basic Model and Module Preparation: Based on the YOLOv8-S model, retain its original backbone, neck, and head structures as the basic architecture for improvement; integrate the CBAM (Convolutional Block Attention Module): insert the CBAM module into the output layers of each stage of the YOLOv8-S backbone network (such as after the C3 module). This module includes a channel attention branch (focusing on key feature channels through squeeze-excitation operations) and a spatial attention branch (locating salient regions through convolution operations), enhancing the model's attention to features of rare defective regions (usually small in proportion and with indistinct features);
[0114] 2) Feature distribution fusion design for simulated rare defect image samples generated by GAN: The simulated rare defect image samples generated by GAN are preprocessed (uniform size and standardized with real samples), and their high-level features (denoted as F) are extracted through the backbone network of YOLOv8-S. gan These high-level features contain semantic, texture, and shape information about rare defects in the image. The mean, variance, and other distribution parameters of these features are statistically analyzed to construct a "rare defect feature distribution library." A feature distribution fusion mechanism is designed in the bottleneck layer of the YOLOv8-S network (such as the feature fusion node of PANet): the features output from the bottleneck layer of real training samples (denoted as Freal) are fused. Freal contains general features of hardware tools (such as contours and surface textures) and known defect features, but the proportion of rare defect features is extremely low (due to the scarcity of real rare samples). When learned alone, it is easily "overwhelmed" by conventional features. Freal is dynamically fused with the feature distribution of simulated rare defect image samples generated by GAN. Through parameterized transformations (such as feature translation and scaling), Freal is fused with the feature distribution of the simulated rare defect image samples. real The distribution of F gan Alignment, the formula can be expressed as:
[0115] F fused =γ·F real +β·(N(μ gan ·σ gan ))
[0116] Where N(μ) gan ·σ gan ) is a random variable that follows the feature distribution of simulated rare defect image samples generated by GAN, enhancing the representation of rare defect patterns in real sample features. The simulated rare defect samples generated by GAN are processed by the backbone network to extract features F. gan (Eigenvalues) and σ gan(Characteristic variance), which describes the "typical distribution pattern" of rare defects in the feature space (e.g., the statistical regularity that the characteristics of microcracks are usually concentrated in certain channels or spatial locations). Using N(μ) gan ·σ gan The Gaussian random variable generated by GAN can be understood as "noise injection simulating rare defect features"—allowing the rare defect patterns "hidden" in the features of real samples to approach the "typical distribution" learned by GAN. γ and β are learnable parameters. γ balances the "basic feature contribution" of real samples, avoiding overfitting to the distribution generated by GAN. β controls the strength of the "rare defect distribution guidance," allowing the model to gradually learn during training how to extract patterns (such as the weak edges of microcracks or the grayscale changes of shallow indentations) from the regular features of real samples that match the rare defect feature distribution simulated by GAN. For the hardware tool defect detection scenario, the significance of this formula is: to allow the model to operate on real production line data (F... real Including differences in production line lighting, background, and tool batches, the rare defect typical feature N(μ) simulated by "grafting" GAN is shown. gan ·σ gan This addresses the pain point of "few real rare defect samples, making it difficult for models to learn their feature distributions." Through mathematical distribution alignment and enhancement, YOLOv8-S can more accurately identify rare defects that are easily missed, such as micro-cracks and shallow dents.
[0117] 3) Training dataset construction and configuration: The dataset consists of three parts: real common defect samples, real rare defect samples (a small number), and simulated rare defect image samples generated by GAN. The real rare samples and simulated samples are mixed in a ratio of 1:3 to 1:5 to expand the amount of rare defect data. The dataset is annotated and enhanced, such as fine-tuning the bounding boxes and screening difficult examples. The focus is on preserving the subtle features of rare defects, such as small cracks and shallow depressions. The training set, validation set and test set are divided in a ratio of, for example, 7:2:1.
[0118] 4) Model Training Process: The initial learning rate is set to 0.01, and a cosine annealing learning rate decay mechanism is adopted. The total number of training rounds is 100-150, and the batch size is adjusted according to the hardware configuration (e.g., 16-32). CIoU loss is used as the bounding box loss, and cross-entropy loss is used as the classification loss. During training, the CBAM module automatically learns weights through backpropagation to dynamically enhance the feature response of rare defect regions. The feature distribution fusion mechanism of the bottleneck layer updates the γ and β parameters synchronously, so that the model gradually adapts to the feature distribution of rare defects. After every 10 rounds of training, the model selects the missed / false detection samples (difficult examples) of rare defects on the validation set, and increases their sampling weight in the next round of training to strengthen the model's learning of difficult examples. After the model completes 100-150 rounds of training, the test set does not participate in any parameter tuning process. It can objectively quantify the model's generalization detection performance of rare defects (e.g., recall, mAP@0.5) through samples covering real production scenarios, and also expose the edge scene vulnerabilities not covered by the training / validation set, verifying the actual effectiveness of CBAM enhancement, feature distribution fusion and other strategies, and providing a basis for industrial deployment.
[0119] 5) Model Evaluation and Optimization: After each training round, calculate specific metrics for rare defects on the validation set, such as mAP@0.5 for rare defect categories (emphasis) and recall (to avoid missed detections), while monitoring the detection accuracy of common defects (ensuring no significant decrease); if the detection accuracy of rare defects does not improve significantly, adjust the insertion position of the CBAM module, such as deepening the attention layer, or increasing the fusion weight of GAN-generated simulated rare defect image samples; if the accuracy of common defects decreases, reduce the intensity of feature distribution fusion, such as decreasing the initial value of β; after training, save the model weight with the highest mAP for rare defects on the validation set as the final improved YOLOv8-S augmented model.
[0120] YOLOv8-S is built upon the YOLOv8 model, retaining the original backbone network, neck network, and detection head structure. The previously trained YOLOv8 detection model (which can be considered a base version) provides a benchmark for general defect detection capabilities for subsequent improvements to YOLOv8-S. The overall features of hardware tools (such as contours and common backgrounds) and basic detection logic learned during the training of the YOLOv8 detection model can be transferred to the YOLOv8-S model through weight transfer or knowledge distillation, reducing redundant computation in the latter's basic feature learning and allowing the improvement process to focus more on specific enhancements for rare defects. Training the improved YOLOv8-S enhancement model involves deep optimization for the specific task of "rare defect detection" on the same data. Its training logic relies on the feature distribution of rare defects (including simulated rare defect image samples) in the mixed samples. The construction and processing of these samples (such as GAN generation and feature clustering) are uniformly completed through the "database construction and dynamic detection module," providing a consistent data input foundation for both.
[0121] The CBAM attention module, through a dual channel and spatial attention mechanism, can automatically focus on rare defect regions (such as small or blurry defects), suppress irrelevant background interference, enhance the capture of weak features, and reduce missed detections. It integrates simulated rare defect features generated by GAN to compensate for the lack of real rare samples, enabling the model to learn more comprehensive rare defect patterns and improve its generalization ability to novel or mutated rare defects. While enhancing the rare defect detection effect (such as recall rate and mAP@0.5 improvement), it maintains the original high-efficiency inference characteristics of YOLOv8-S and does not affect the detection accuracy of common defects, making it suitable for practical industrial inspection scenarios.
[0122] The online real-time detection module achieves real-time and accurate judgment and graded response to defects in hardware tools through image acquisition and preprocessing, defect layer detection, and result fusion decision-making, meeting the real-time and reliability requirements of online detection on the production line. The online real-time detection module includes, in sequence, an acquisition initiation submodule, a sample preprocessing submodule II, a preliminary defect detection submodule, a rare defect in-depth detection submodule, and a defect tracing and model iteration submodule.
[0123] The acquisition start-up submodule receives signals from the photoelectric sensor, determines that the hardware tool has entered the detection area with the transmission belt, and sends a command to the data acquisition module to acquire images of the hardware tool in real time.
[0124] The sample preprocessing submodule 2 performs real-time preprocessing on the acquired images, including: grayscale correction to eliminate uneven illumination; Gaussian filtering to remove noise, σ=1.0; and Sobel operator edge enhancement to highlight the contour features of the hardware tools.
[0125] The preliminary defect detection submodule uses the YOLOv8-nano lightweight model to perform common defect detection on the preprocessed image and outputs the results. After confidence screening, it achieves efficient preliminary judgment, which diverts the subsequent rare defect deep detection and improves the system detection efficiency. The preliminary defect detection submodule includes a model loading and initialization unit, a feature extraction and defect prediction unit, and a post-processing optimization unit connected in sequence.
[0126] The model loading and initialization unit loads the YOLOv8-nano lightweight model optimized by knowledge distillation, inputs the preprocessed image into the lightweight model, the model input layer is adapted to the preprocessed 640×640 pixel image, and the output layer contains the prediction results of common defects, including surface scratches, deformation, burrs, cracks, solder joint detachment, etc.
[0127] The feature extraction and defect prediction unit performs multi-scale feature extraction on the input image, generating three feature maps at different scales: 8×8 for large scale (corresponding to small target defects), 16×16 for medium scale, and 32×32 for small scale (corresponding to large target defects). Each feature map contains semantic information at different levels; shallow features focus on edges and textures, while deep features focus on the overall outline of the defect. The feature pyramid network (FPN) is used to fuse features at different scales, including: a top-down path: fusing the deep 32×32 feature map with the 16×16 feature map through upsampling (e.g., bilinear interpolation), then upsampling the fused result and fusing it with the 8×8 feature map to supplement the semantic information of the small-scale features; and a bottom-up path (PANet structure): downsampling the shallow 8×8 feature map with the 16×16 feature map. Figure 2The feature maps are then fused again with a 32×32 feature map to supplement the detailed information of large-scale features. Finally, three fused feature maps (8×8, 16×16, and 32×32) are obtained, preserving the detailed features of defects at each scale while containing sufficient semantic information for classification. The detection head outputs the defect type, location coordinates, and confidence score for each candidate box. Specifically, each fused feature map is fed into the YOLOv8-nano detection head, which consists of convolutional layers. Three anchor boxes of different sizes are preset for each grid point on the feature map, covering defects of different sizes. Three types of information for each anchor box are predicted: Defect category: the probability distribution of the corresponding defect category (e.g., surface scratches, deformation) is output through the classification branch; Location coordinates: the offset of the anchor box (center x, y coordinate offset and width / height scaling factor) is output through the regression branch and converted into actual bounding box coordinates (x1, y1, x2, y2); Confidence score: the probability that the anchor box contains a defect is output through the target branch (combining classification confidence and localization accuracy). The prediction results of all feature maps are integrated to generate a list containing all candidate boxes. Each candidate box corresponds to a set of information: defect type (such as "spurs"), location coordinates (pixel-level bounding box), and confidence score (a value between 0 and 1, with higher values indicating more reliable prediction).
[0128] The post-processing optimization unit employs a non-maximum suppression algorithm to remove redundant candidate boxes. The NMS threshold is set to 0.4, retaining the highest-confidence box for the same defect region. Based on the pixel-to-millimeter conversion coefficient of the camera calibration, pixel coordinates are converted to actual physical coordinates, outputting the actual location of the defect on the tool. In target detection post-processing, when the overlap (IoU) of multiple candidate boxes exceeds this threshold, only the box with the highest confidence is retained, and redundant boxes are removed to avoid repeated detection of the same defect. The NMS threshold is set to 0.45, meaning that when the intersection-union ratio (IoU) of two candidate boxes is ≥0.45, they are considered to point to the same defect, and only the one with the higher confidence is retained.
[0129] The detection result screening and triage unit sets the confidence threshold for common defects to 0.7, and judges each defect result output by the model: if the confidence is ≥0.7, it is judged as a valid common defect, the defect type, actual location and confidence are recorded, and the tool is marked as "common defect to be processed"; if the confidence of all defects is <0.7, it is judged as "suspected to have no common defects", and the image and related information of the tool are sent to the rare defect depth detection submodule. The related information includes the tool ID and acquisition time.
[0130] The rare defect deep detection submodule achieves accurate identification and confidence level determination of rare defects through feature comparison and enhanced model fine detection, completes the classification from suspected samples to clear defects or normal results, and improves the system's ability to detect low-incidence serious defects.
[0131] The rare defect deep detection submodule includes a feature extraction unit, a rare defect feature library construction unit, and a comparison and judgment unit connected in sequence.
[0132] The feature extraction unit uses a pre-trained ResNet-50 network as a feature extractor to encode features of suspected samples after preliminary defect detection and output a 256-dimensional high-dimensional feature vector.
[0133] The rare defect feature library construction unit builds a rare defect feature library. This library retrieves 256-dimensional feature vectors of historical rare defect samples from the sample preprocessing submodule, classifies and stores them according to defect type, with each class containing at least 50 sample features. A KD-tree index structure is used to optimize retrieval efficiency, supporting ≥100 feature comparisons per second. The classification and storage includes the following technical steps:
[0134] Defect type system definition: Predefine the classification criteria for rare defects, such as by defect morphology: micro cracks, shallow dents, edge burrs, etc.; by cause: material defects, processing errors, etc., to form a structured defect type label system, such as "crack-micro" "dent-shallow", etc.; assign a unique identifier (ID) to each defect type for subsequent associated storage of feature vectors;
[0135] Feature extraction of historically rare defect samples: Obtain the 256-dimensional feature vector output after feature encoding of each historical sample from sample preprocessing submodule 1;
[0136] Feature vector and defect type association: Add metadata to each 256-dimensional feature vector, including defect type label, sample ID, collection time, etc., and establish a "feature vector-type label" mapping relationship; validate the feature vectors and remove low-quality samples (such as blurry or incorrectly labeled samples) to ensure the reliability of the feature library;
[0137] Categorized storage architecture design: A hierarchical storage structure is adopted: the root directory is divided according to the major defect categories (such as "cracks" and "dents"), and the subdirectories are further divided according to the sub-types (such as "micro cracks" and "deep cracks"); the feature vectors are stored in the form of binary files or database entries (such as using Redis, MongoDB, etc.), and each subdirectory only stores the feature vectors of the corresponding type, supporting fast indexing and batch reading;
[0138] 5) Feature library dynamic update mechanism: new rare defect samples are added periodically (such as novel rare defects confirmed by manual review), and steps 2-4 are repeated to update the feature vector set of the corresponding type; existing feature vectors are recoded periodically (if the feature extraction network is updated) to ensure the timeliness of the feature space.
[0139] The comparison and judgment unit calculates the Euclidean distance between the feature vector of a suspected sample and similar features in the feature library. If the minimum distance is less than 0.6, it is judged as a "highly suspected rare defect," triggering subsequent enhanced model detection; otherwise, it is directly marked as a "lowly suspected sample" and enters the manual review stage. The Euclidean distance calculation includes the following steps:
[0140] Obtaining the feature vector: Obtain the 256-dimensional high-dimensional feature vector of the suspected sample to be detected from the feature extraction unit, denoted as V. query
[0141] Retrieve all feature vectors from the rare defect feature library that match the pre-classified category of the suspected sample, denoted as V1, V2, ... V3. n , where n is the number of samples in that category. For V query With each similar feature vector V i (i=1,2,3...n)
[0142] The formula for calculating Euclidean distance is:
[0143]
[0144] Among them, V query,k V is the k-th dimension component of the feature vector of the suspected sample. i,k 3) The k-th dimension component of the feature vector of the i-th sample in the feature library; 4) From all the calculated d i Filter out the minimum value d min .
[0145] The rare defect depth detection submodule also includes a model architecture configuration unit, a detection parameter setting unit, and a multi-dimensional result output unit connected in sequence;
[0146] The model architecture configuration unit adopts an improved YOLOv8-S enhanced model, adding two cross-scale attention modules in the neck area to enhance feature capture of small and rare defects. The specific steps are as follows:
[0147] The cross-scale attention module structure is designed as follows: each module contains a feature alignment layer, a cross-scale interaction layer, and an attention weight generation layer. The feature alignment layer adjusts feature maps of different scales to the same size (e.g., uniformly 40×40) through upsampling / downsampling; the cross-scale interaction layer uses convolutional operations to fuse multi-scale features (e.g., 80×80 detail features and 20×20 semantic features); the attention weight generation layer generates channel attention weights through a squeeze-excitation mechanism, then generates spatial attention weights through spatial convolution, and finally weights and enhances the feature response of small defect regions.
[0148] Module insertion location determination: Cross-scale attention modules are inserted into two key fusion nodes of the neck network (PANet structure) of the improved YOLOv8-S enhancement model: The first module inserts a fusion path between high-level features (20×20) and mid-level features (40×40) to enhance feature fusion for rare defects of small and medium size; the second module inserts a fusion path between mid-level features (40×40) and low-level features (80×80) to focus on enhancing the feature signals of small defects (dependent on 80×80 high-resolution features);
[0149] The feature interaction method of the configuration module is as follows: For input feature maps of different scales, the number of channels is first compressed to the same dimension (e.g., 256 dimensions) through 1×1 convolution, and then the size is adjusted by the feature alignment layer and then stitched together. Cross-scale related features are extracted through 3×3 convolution. The attention weight generation layer dynamically calculates weights based on the fused features, and assigns higher weights to areas with small defects (e.g., areas with low pixel ratio) to suppress background interference.
[0150] Co-training of modules and neck network: The parameters of the cross-scale attention module are incorporated into the training process of the improved YOLOv8-S augmented model and optimized in synergy with the original CBAM module and feature distribution fusion mechanism: In backpropagation, the module weights and other network parameters are updated simultaneously through the loss function (such as CIoU loss + cross-entropy loss), so that the module learns to focus on the feature patterns of small and rare defects, such as edge discontinuities and areas with weak gray-scale changes.
[0151] 5) During training, the focusing effect of the module on small defect regions is verified by visualizing the attention heatmap. If the detection rate of small defects does not improve significantly, the module insertion position is adjusted (e.g., moving closer to lower-level features) or the channel / spatial resolution of the attention weights is increased until the recall rate of the model for small rare defects with pixel size <32×32 is improved by more than 10%. The backbone network of the improved YOLOv8-S enhancement model adopts CSPDarknet-53, outputting feature maps at three scales (80×80, 40×40, 20×20) to adapt to rare defects of different sizes.
[0152] The detection parameter setting unit applies Mosaic data augmentation to the input images of the aforementioned "highly suspected rare defects," and enables random rotation and contrast adjustment to improve the robustness of the improved YOLOv8-S augmentation model to complex conditions; the batch size is set to 8, and the learning rate is 5e. -5 Weight decay 1e -4 The AdamW optimizer, when performing parameter optimization training on the improved YOLOv8-S augmented model incorporating attention modules, only updates the parameters of the newly added attention modules. The specific steps for parameter optimization training on the improved YOLOv8-S augmented model incorporating attention modules are as follows:
[0153] Model parameter hierarchical locking and trainability configuration: Load the complete weights of the improved YOLOv8-S enhanced model (including the backbone network, the original neck network, the detection head, and the newly added cross-scale attention module); iterate through all model parameters, and set requires_grad=False for parameters other than the newly added attention module (such as the CSPDarknet-53 backbone network, the original PANet neck, the convolutional layers and BN layers of the detection head, etc.) to freeze their gradient calculation; only set requires_grad=True for all parameters of the two newly added cross-scale attention modules (including the convolutional kernels and bias terms of the feature alignment layer, the cross-scale interaction layer, and the attention weight generation layer) to retain their trainability;
[0154] Optimizer parameter filtering and focus update: When initializing the AdamW optimizer, only the trainable parameters of the newly added attention module (i.e., parameters with require_grad=True) are passed in through the parameter filtering mechanism to ensure that the optimizer only tracks the gradients of these parameters; Configure the optimizer hyperparameters: learning rate 5e-5 (fine-tuning for a small number of trainable parameters), weight decay 1e-4 (only applied to the parameters of the newly added attention module to prevent overfitting), β1=0.9, β2=0.999 (keeping the default momentum parameters of AdamW);
[0155] Gradient isolation and backpropagation restrictions during training: The input is an enhanced image of a "highly suspected rare defect". The model calculates the prediction result and loss (CIoU loss + classification loss) through forward propagation. When performing backpropagation, since the parameters of the non-attention module are frozen, the gradient only flows within the newly added attention module. The gradients of the parameters of other network layers are blocked and do not update. The optimizer only iteratively updates the parameters of the newly added attention module (such as fine-tuning the convolution kernel weights and adjusting the bias terms) based on the gradient information of the newly added attention module. Other parameters keep their initial weights unchanged.
[0156] After each training round, the difference in model parameters (new module parameters vs. frozen parameters) is compared to confirm that only the attention module parameters have changed, while other parameters remain stable. Combined with validation set metrics (such as the recall rate of minor and rare defects), the effectiveness of updating the new module parameters is determined: if the metrics improve, it indicates that the parameter adjustment is correct; if they decrease, it is checked whether the gradient flow is normal (such as whether the connection between the module and the network is broken).
[0157] By following the steps above, parameter optimization can be performed only on the newly added attention module. While retaining the original detection capabilities of the model, its ability to capture features of small and rare defects can be enhanced in a targeted manner, while avoiding overfitting or performance fluctuations caused by updating all parameters.
[0158] The multidimensional result output unit includes defect type, confidence level, and risk level. The risk level is determined based on defect size and location. Risk level = 0.6 × defect area percentage + 0.4 × critical area weight. Based on the calculation results, it is set to include Level I (high risk), Level II (medium risk), and Level III (low risk).
[0159] The rare defect deep detection submodule also includes a result judgment and classification unit. This unit uses 0.65 as the confidence threshold for rare defect judgment. If the model output confidence is ≥0.65 and the risk level is Level I / II, it is marked as "Confirmed Rare Defect," and the defect type, location coordinates (accurate to 0.1mm), confidence, and risk level are recorded in the system database. If the confidence is ≥0.65 but the risk level is Level III, or the confidence ∈ [0.5, 0.65), it is marked as "Rare Defect Pending Review," triggering a manual review process. If the confidence is <0.5 and the feature comparison Euclidean distance is ≥0.6, it is judged as a "Normal Sample," and a "Defect-Free" detection report is automatically generated. The report includes the sample ID, detection time, and key parameters, including the feature vector hash value. All judgment results are stored in XML format, including image path, defect parameters, and judgment labels, and are synchronized to the production line MES system via API interface, supporting real-time display and historical traceability. For "Confirmed Rare Defect" samples, an audible and visual alarm signal is triggered.
[0160] The quality traceability and model iteration module stores the detection images, detection results and processing measures with defect markers in a local MySQL database for a retention period of ≥3 years. It supports querying the entire lifecycle detection records through the product's unique QR code. The module reviews the detection data monthly. When the false detection rate of a certain type of defect is >5%, it starts incremental model training, adds new samples for 30 iterations, and updates the feature library of the GAN-generated model. It works in conjunction with the database construction and dynamic detection module and the detection model training and optimization module to form an iterative closed loop.
[0161] The quality traceability and model iteration module records and traces product data throughout the entire process, and drives incremental model training through periodic data review, thereby enabling traceability of defects and continuous optimization of the detection model's performance, and improving the system's adaptability to production changes.
[0162] The anomaly handling and system maintenance module ensures the continuous and stable operation of the detection system through hardware anomaly self-checking, software fault switching, and regular calibration, reducing detection errors caused by equipment or software anomalies and maintaining detection accuracy and continuity in industrial scenarios.
[0163] In the above embodiments, the descriptions of each embodiment have different focuses. For parts that are not described in detail or recorded in a certain embodiment, please refer to the relevant descriptions of other embodiments.
[0164] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.
Claims
1. An online defect detection system for hardware tools based on AI image recognition, characterized in that, It includes a data acquisition module, a database construction and dynamic detection module, a detection model training and optimization module, and an online real-time detection module; The data acquisition module acquires images of hardware tools in real time. The database construction and dynamic detection module stores and labels normal, common, and rare defect samples, performs standardization processing, enhances common defect samples, extracts features of rare defect samples and clusters them to calculate feature centers, generates and filters simulated rare defect image samples through GAN, expands the rare defect sample library, and updates samples monthly to trigger incremental GAN training. The detection model training and optimization module mixes the above samples to train the target detection model, uses it as a teacher model, trains a lightweight target detection model through knowledge distillation, and trains an improved target detection enhancement model at the same time. The online real-time detection module uses a lightweight target detection model to detect and sort common defect samples, determines highly suspected rare defects through feature comparison, and further uses an improved target detection enhancement model to identify and classify rare defect samples. The detection model training and optimization module includes a hybrid training set construction submodule, a YOLOv8 detection model training submodule, a common defect lightweight model training submodule, and a rare defect enhancement model training submodule, which are connected in sequence. The mixed training set construction submodule mixes normal samples and defective samples in a 7:2:1 ratio, dividing them into a 70% training set, a 20% validation set, and a 10% test set. The defective samples include real common defect samples, real rare defect samples, and simulated rare defect images. The YOLOv8 detection model training submodule loads the YOLOv8 pre-trained model weights as initial parameters, sets the initial learning rate to 0.01, and uses a cosine annealing learning rate decay mechanism to train the YOLOv8 detection model. The total number of training rounds is set to 100 rounds. In each round, the mAP@0.5 index is calculated on the validation set, and the model weight with the highest mAP in the validation set is retained. The common defect lightweight model training submodule uses the YOLOv8 detection model as the teacher model and trains the YOLOv8-nano lightweight model through knowledge distillation. The rare defect enhancement model training submodule transmits the overall features and basic detection logic of hardware tools learned during the training of the YOLOv8 detection model to the YOLOv8-S model through knowledge distillation. A CBAM attention module is added to the YOLOv8-S detection model to enhance feature extraction of rare defect regions. At the same time, the feature distribution of rare defect image samples generated by GAN is fused into the bottleneck layer of the YOLOv8-S detection model network to improve the feature learning ability of the YOLOv8 detection model for rare defects, thereby enhancing the detection effect of rare defects and training an improved YOLOv8-S enhancement model.
2. The online defect detection system for hardware tools based on AI image recognition as described in claim 1, characterized in that, The data acquisition module includes a visible light industrial camera sub-module, which controls the visible light industrial camera to acquire real-time images of the external appearance of the hardware tools. The X-ray industrial camera submodule controls the X-ray industrial camera to acquire real-time images of the inside of hardware tools; the infrared thermal imaging industrial camera submodule controls the infrared thermal imaging industrial camera to acquire real-time infrared thermal images of the inside of hardware tools.
3. The online defect detection system for hardware tools based on AI image recognition as described in claim 2, characterized in that, The database construction and dynamic detection module includes an initial sample storage submodule, a sample preprocessing submodule 1, and a dynamic sample expansion submodule connected in sequence. The initial sample storage submodule stores images of normal hardware tools without defects, samples of common defects that occur during the production process, and samples of rare defects. The Labelme tool is used to annotate the common defect samples at the pixel level, and the number of samples of each type of rare defect sample is ≥50 frames, thus establishing an independent rare defect sub-library. The sample preprocessing submodule 1 normalizes the pixel values of all sample images to the [0, 1] interval, uses bicubic interpolation to unify the spatial resolution size of all sample images, performs conventional data augmentation on common defect samples, including random flipping and brightness adjustment; extracts 256-dimensional feature vectors of rare defect samples through the ResNet-50 network, uses the DBSCAN algorithm to cluster the 256-dimensional feature vectors, and calculates the feature center of each type of defect. The dynamic sample expansion submodule includes a GAN model training unit, a sample generation and screening unit, and a database update unit connected in sequence. The GAN model training unit inputs at least 50 rare defect samples into the generator G, combines the feature center of each type of defect with a 128-dimensional random noise vector z, and generates a simulated rare defect image. The random noise vector z follows an N(0,1) distribution. The generated sample screening unit calculates the structural similarity index between the simulated rare defect image and the rare defect sample, retains the sample with SSIM≥0.85, and expands the rare defect sample library at a ratio of 1:5 between the simulated rare defect image and the rare defect sample. The database update unit adds newly discovered defect samples to the database every month, triggering incremental training of the GAN model for at least 50 iterations. The defect samples include common defect samples and rare defect samples.
4. The online defect detection system for hardware tools based on AI image recognition as described in claim 3, characterized in that, The steps involved in generating simulated rare defect images using a GAN model are as follows: 1) Data preparation: Retrieve at least 50 real rare defect samples from the initial sample storage submodule, preprocess them into a uniform format, and use them as reference data for the generator. Retrieve the feature center of each type of defect calculated by the sample preprocessing submodule 1, and use it as the core reference for the generator. 2) Noise generation: Generate a 128-dimensional random noise vector z that follows an N(0,1) distribution to provide a source of randomness for image generation; 3) Feature fusion and image generation: The generator G receives feature information of real rare defect samples, feature centers of each type of defect and random noise z, and performs feature mapping and reconstruction through a multi-layer neural network. It fuses the common features of the feature centers with the randomness of the noise, transforms them into high-dimensional image data, and generates a simulated image containing rare defect features. 4) Adversarial training optimization: The discriminator D distinguishes the common features of the generated image, the real sample, and the feature center representation of each type of defect. The generator adjusts its parameters based on the discriminator feedback. Through multiple rounds of adversarial iteration, the generated image retains the core features of the feature center while also possessing diversity and realism. 5) Output simulated images: After sufficient training, the generator outputs simulated images that fit the common features of the feature center and have diversity, which are used for subsequent sample selection and library expansion.
5. The online defect detection system for hardware tools based on AI image recognition as described in claim 1, characterized in that, The online real-time detection module includes a data acquisition and startup submodule, a sample preprocessing submodule II, a preliminary defect detection submodule, and a rare defect depth detection submodule connected in sequence. The acquisition start-up submodule receives signals from the photoelectric sensor, determines that the hardware tool has entered the detection area with the transmission belt, and sends instructions to the data acquisition module to acquire images of the hardware tool in real time. The second sample preprocessing submodule performs real-time preprocessing on the acquired images; The preliminary defect detection submodule performs common defect detection on the preprocessed image using the YOLOv8-nano lightweight model and outputs the results. The confidence level is used to achieve efficient preliminary judgment, which is used to divert the subsequent in-depth detection of rare defects. The rare defect deep detection submodule performs fine detection through feature comparison and an improved YOLOv8-S enhancement model, achieving accurate identification and confidence determination of rare defects.
6. The online defect detection system for hardware tools based on AI image recognition as described in claim 5, characterized in that, The preliminary defect detection submodule includes a model loading and initialization unit, a feature extraction and defect prediction unit, a post-processing optimization unit, and a detection result screening and sorting unit connected in sequence. The model loading and initialization unit loads the YOLOv8-nano lightweight model optimized by knowledge distillation, inputs the preprocessed image into the lightweight model, the model input layer is adapted to the preprocessed 640×640 pixel image, and the output layer contains the prediction results of common defects. The feature extraction and defect prediction unit performs multi-scale feature extraction on the input image to generate 8×8, 16×16, and 32×32 feature maps. The feature pyramid network (FPN) is used to fuse features at different scales, and the defect type, location coordinates and confidence score of each candidate box are output by the detection head. The post-processing optimization unit uses a non-maximum suppression algorithm to remove redundant candidate boxes, sets the NMS threshold to 0.4, and retains the highest confidence box in the same defect area; based on the pixel-to-millimeter conversion coefficient of the camera calibration, the pixel coordinates are converted into actual physical coordinates, and the actual position of the defect on the tool is output. The detection result screening and triage unit sets the confidence threshold for common defects to 0.7, and judges each defect result output by the model: if the confidence is ≥0.7, it is judged as a valid common defect, the defect type, actual location and confidence are recorded, and the tool is marked as "common defect to be processed"; if the confidence of all defects is <0.7, it is judged as "suspected to have no common defects", and the image and related information of the tool are sent to the rare defect depth detection submodule. The related information includes the tool ID and acquisition time.
7. The online defect detection system for hardware tools based on AI image recognition as described in claim 6, characterized in that, The rare defect deep detection submodule includes a feature extraction unit, a rare defect feature library construction unit, and a comparison and judgment unit connected in sequence. The feature extraction unit uses a pre-trained ResNet-50 network as a feature extractor to encode features of suspected samples after preliminary defect detection and output a 256-dimensional high-dimensional feature vector. The rare defect feature library construction unit retrieves 256-dimensional feature vectors of historical rare defect samples from the sample preprocessing submodule to construct the rare defect feature library, which is stored according to defect type, with each category containing at least 50 sample features; a KD tree index structure is used to optimize retrieval efficiency, supporting ≥100 feature comparisons per second. The comparison and judgment unit calculates the Euclidean distance between the feature vector of the suspected sample and the same type of feature in the feature library. If the minimum distance is less than 0.6, it is judged as "highly suspected rare defect" and triggers subsequent enhanced model detection; otherwise, it is directly marked as "low suspected sample" and enters the manual review stage.
8. The online defect detection system for hardware tools based on AI image recognition as described in claim 7, characterized in that, The rare defect depth detection submodule also includes a model architecture configuration unit, a detection parameter setting unit, and a multi-dimensional result output unit connected in sequence; The model architecture configuration unit adopts the improved YOLOv8-S enhanced model, adding two cross-scale attention modules in the neck to enhance feature capture of small and rare defects. The backbone network of the improved YOLOv8-S enhanced model adopts CSPDarknet-53, which outputs feature maps of three scales: 80×80, 40×40, and 20×20, to adapt to rare defects of different sizes. The detection parameter setting unit applies Mosaic data augmentation to the input images of the aforementioned "highly suspected rare defects," and enables random rotation and contrast adjustment to improve the robustness of the improved YOLOv8-S augmentation model to complex conditions; a learning rate of 5e is used. -5 Weight decay 1e -4 The AdamW optimizer is used to train the improved YOLOv8-S augmented model with attention modules, optimizing the parameters only by updating the parameters of the newly added attention modules, and setting the batch size to 8 during training. The multidimensional result output unit includes defect type, confidence level, and risk level. The risk level is determined based on defect size and location. Risk level = 0.6 × defect area percentage + 0.4 × key area weight. Based on the calculation results, it is set to include Level I high risk, Level II medium risk, and Level III low risk.
9. The online defect detection system for hardware tools based on AI image recognition as described in claim 8, characterized in that, The rare defect deep detection submodule also includes a result determination and classification unit. The result determination and classification unit uses 0.65 as the confidence threshold for rare defect determination. If the model outputs a confidence level ≥ 0.65 and the risk level is Level I / II, it is marked as "confirmed rare defect" and the defect type, location coordinates, confidence level and risk level are recorded and sent to the system database. If the confidence level is ≥0.65 but the risk level is III, or the confidence level is ∈ [0.5, 0.65), it is marked as "rare defect to be reviewed" and the manual review process is triggered. If the confidence level is <0.5 and the feature comparison Euclidean distance is ≥0.6, it is judged as a "normal sample" and an "no defect" inspection report is automatically generated. The report includes the sample ID, inspection time and key parameters, including the feature vector hash value. All judgment results are stored in XML format, including image path, defect parameters, and judgment labels. They are synchronized to the production line MES system via API interface, supporting real-time display and historical traceability. For samples that are "confirmed as rare defects", an audible and visual alarm signal is triggered.