A micro-defect nondestructive testing method based on light-weight network and data enhancement

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By using an improved lightweight network and data augmentation method, combined with an efficient local attention mechanism and a feature recombination upsampling operator, the problems of background noise interference and feature loss in the detection of minute defects in metal components are solved, achieving high-precision, multi-scale detection results.

CN122265634APending Publication Date: 2026-06-23CHINA UNIV OF PETROLEUM (EAST CHINA)

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: CHINA UNIV OF PETROLEUM (EAST CHINA)
Filing Date: 2026-03-30
Publication Date: 2026-06-23

Application Information

Patent Timeline

30 Mar 2026

Application

23 Jun 2026

Publication

CN122265634A

IPC: G06V10/25; G06V10/44; G06V10/52; G06V10/54; G06V10/77; G06V10/80; G06V10/46; G06V10/75; G06V10/82; G06V20/60; G06N3/045; G06N3/0464; G06N3/084; G06N3/0985; G06N3/048; G06V10/26; G06V20/70

AI Tagging

Application Domain

Character and pattern recognition Biological models

Technology Topics

Pattern recognitionStochastic gradient descent

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing technologies face challenges in detecting minute defects in metal components under complex industrial environments, including severe background noise interference, easy loss of subtle features by lightweight networks, and insufficient model robustness, resulting in low detection accuracy and easy missed detections.

Method used

We employ a lightweight network combined with data augmentation, using an improved target detection network and data augmentation strategies, including an efficient local attention mechanism and a feature recombination upsampling operator, combined with multi-scale feature extraction and spatial intersection-union loss function, to improve detection accuracy and robustness.

Benefits of technology

It effectively suppresses background noise, preserves subtle features, improves the positioning accuracy of minute defects and the robustness of multi-scale detection, and solves the detection problem in complex industrial environments.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122265634A_ABST

Patent Text Reader

Abstract

The application discloses a kind of based on light network and data enhancement's microdefect nondestructive testing method, it is related to metal component defect detection technical field.The method first prepares microdefect sample, collects and pre-processes the metal defect image containing uneven reflection, oil stain background noise, constructs microdefect dataset;Again, the improved target detection network consisting of MobileNet v3 main network, feature fusion network and detection head of integrated ELA attention mechanism and CARAFE operator is built;Color disturbance, geometric transformation, image flipping, mixed and segmentation copy and paste data enhancement operation are implemented on dataset, SIoU positioning loss function and random gradient descent optimizer are used to train network;Finally, the image of metal component to be measured is input into the network trained, and the defect position and boundary box are output.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of metal component defect detection technology, and in particular to a non-destructive testing method for minute defects based on lightweight networks and data augmentation. Background Technology

[0002] In the field of metal component defect detection technology, metal defects, as the core detection target, are characterized by their tiny size and elongated shape compared to conventional objects to be inspected. Although existing machine vision and deep learning technologies have made some progress in defect detection, some challenges still exist in practical industrial applications targeting tiny defects: First, in complex industrial environments, background noise interference is severe and feature visibility is weak. Industrial sites are often accompanied by reflective surfaces, oil stains, or uneven backgrounds, resulting in a very low pixel ratio for narrow metal defects and edge features being submerged by noise. This leads to low detection accuracy of existing models and a high risk of serious missed detections.

[0003] Secondly, lightweight networks are prone to losing subtle features. To meet the needs of real-time industrial inspection, lightweight models are widely used. However, when conventional lightweight networks are downsampled multiple times, they filter out the edge and contour information of tiny defects, resulting in insufficient pixel-level representation ability of deep features and difficulty in accurate localization.

[0004] Finally, the diverse morphologies and scales of defects, coupled with the scarcity of real damage data, result in insufficient model robustness. The orientation and size of metal defects exhibit strong randomness, and the cost and time required for collecting real samples are high, making existing models prone to overfitting and exhibiting poor generalization ability.

[0005] Therefore, there is an urgent need to design a detection method that is highly efficient and accurate, can overcome background noise, compensate for the loss of lightweight features, and is adaptable to small sample multi-scale scenarios, so as to meet the needs of industrial application of defect detection in metal components. Summary of the Invention

[0006] The purpose of this invention is to provide a non-destructive testing method for minute defects based on lightweight networks and data augmentation, which solves the problems of strong background noise interference in the detection of minute defects on metal surfaces in complex industrial environments, easy loss of subtle features by lightweight networks, and insufficient multi-scale robustness of models.

[0007] To achieve the above objectives, this invention provides a method for nondestructive testing of minute defects based on lightweight networks and data augmentation, comprising the following steps: S1. Prepare a sample with minute defects, collect image data of metal defects containing uneven reflection or oil stain background noise and preprocess them to construct a dataset of minute defects. The minute defects include local damage with small feature size or slender shape. S2. Construct an improved target detection network, comprising a backbone network, a feature fusion network, and a detection head; wherein, the backbone network employs a lightweight neural network, which, while filtering out background noise, extracts multi-scale feature layers of the minute defects; the feature fusion network integrates a local attention mechanism and a feature reconstruction upsampling operator; wherein, the local attention mechanism is an efficient local attention mechanism, which performs one-dimensional average pooling along the horizontal and vertical directions to reduce the feature map size, and captures local features through one-dimensional convolution to generate directional attention weights and fuses them with the original features in a weighted manner; the feature reconstruction upsampling operator includes a kernel prediction branch and a feature reconstruction branch, which dynamically generates a reconstruction kernel that matches the local defect contour to fuse and reconstruct the split local feature blocks; S3. Perform data augmentation on the dataset and train the target detection network using the augmented dataset, using spatial intersection-union ratio as the localization loss function; S4. Use the trained object detection network to detect defects in the image of the metal component under test, and output the location and bounding box of the defect.

[0008] Preferably, in step S1, the preprocessing of the acquired defective image data includes: normalizing the image size to a uniform resolution using bilinear interpolation, and removing invalid samples to ensure the consistency of the dataset.

[0009] Preferably, in step S2, the lightweight convolutional neural network in the backbone network adopts MobileNet v3; the backbone network performs feature expansion and compression through bottleneck blocks, and extracts three feature layers with different resolutions through downsampling.

[0010] Preferably, in step S2, the efficient local attention mechanism performs one-dimensional average pooling to reduce the feature map size along the horizontal and vertical directions, generates horizontal and vertical attention weights through one-dimensional convolution, and then weights and original features are fused together.

[0011] Preferably, in step S2, the kernel prediction branch of the feature recombination upsampling operator generates a recombination kernel, the feature recombination branch splits local feature blocks, and the recombination kernel and the local feature blocks are fused by dot product.

[0012] Preferably, the data enhancement includes: random perturbation of the color space of the image, random rotation, random translation, random scaling, image flipping, image blending, and segmentation copy-paste enhancement.

[0013] Preferably, in step S3, the model training uses a stochastic gradient descent optimizer combined with a cosine annealing learning rate scheduling strategy; the spatial intersection-union ratio loss function comprehensively considers the intersection-union ratio between the predicted box and the ground truth box, the cost of the center spatial offset distance, and the cost of the difference in aspect ratio.

[0014] Preferably, the detection head performs feature decoding and bounding box prediction based on the multi-level feature map output by the feature fusion network, and outputs detection results including defect location bounding boxes and probabilities.

[0015] A computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the method described above.

[0016] According to specific embodiments provided by the present invention, the present invention discloses the following technical effects: 1. Background noise suppression and subtle feature enhancement This invention introduces an efficient local attention mechanism into the feature fusion network, performing one-dimensional average pooling along both the horizontal and vertical directions to compress non-directional background noise while preserving the coordinate distribution information of slender defects. One-dimensional convolution captures local features to generate directional attention weights, achieving feature enhancement in both the horizontal and vertical directions. This mechanism effectively suppresses complex background noise such as metal surface reflections and oil stains, enhances the feature response of slender defects, and solves the problem that the features of tiny defects are easily submerged by noise in complex industrial environments.

[0017] 2. Balancing lightweight networks and feature accuracy This invention employs MobileNet v3 as the backbone network, using bottleneck blocks for feature expansion and compression to filter out background noise while preserving high-frequency features of defect edges. Combined with the CARAFE feature reconstruction upsampling operator, through the collaboration of the kernel prediction branch and the feature reconstruction branch, a reconstruction kernel matching the local defect contour is dynamically generated, achieving pixel-level reconstruction of minute defect edges. This combination effectively compensates for the loss of detail caused by multiple downsampling in lightweight networks, improving the localization accuracy of minute defects while maintaining computational efficiency.

[0018] 3. Improved robustness of multi-scale detection This invention extracts three feature layers with different resolutions through a backbone network, adapting them to multi-scale feature distributions of micro-point defects, medium-scale surface scratches, and penetrating thin cracks. It employs various data augmentation strategies, including color space perturbation, geometric transformation, image blending, and segmentation copy-paste, to simulate sample diversity under different lighting and imaging conditions in industrial settings. Furthermore, it combines a spatial intersection-union (IUU) loss function, comprehensively considering the IUU between predicted and ground truth bounding boxes, center point distance, and aspect ratio differences. These techniques collectively improve the model's robustness across multiple scales, effectively mitigating overfitting issues in small-sample scenarios.

[0019] The technical solution of the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. Attached Figure Description

[0020] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0021] Figure 1 This is a flowchart illustrating the minute defect detection method according to an embodiment of the present invention; Figure 2 This is a schematic diagram of the improved YOLO v11 network structure according to an embodiment of the present invention; Figure 3 This is a schematic diagram of the MobileNet v3 backbone network according to an embodiment of the present invention; Figure 4 This is a schematic diagram illustrating the principle of the ELA attention mechanism in an embodiment of the present invention. Figure 5 This is a schematic diagram of the CARAFE operator according to an embodiment of the present invention; Figure 6 This is an example image of a data-enhanced image according to an embodiment of the present invention; Figure 7 The figure shows the experimental results comparing the model of this invention with the original model in this embodiment of the invention. Detailed Implementation

[0022] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of the present invention.

[0023] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.

[0024] Example like Figure 1 As shown, a non-destructive testing method for minute defects based on lightweight networks and data augmentation is described, with the following steps: S1. Dataset Preparation Metal samples containing minute point defects, slender cracks, and surface scratches were prepared according to industrial metal component testing standards. Defect images containing uneven reflection and oily backgrounds were acquired using an industrial camera, normalized to 640×640 resolution using bilinear interpolation, and invalid samples were removed. Defect selection and annotation were completed using the Labelimg tool, generating a txt format annotation file, which was divided into training and validation sets at a ratio of 4:1.

[0025] S2. Improved Target Detection Network Construction This embodiment improves the object detection network based on the YOLO v11 architecture. For example... Figure 2 As shown, the improved network mainly consists of three parts: a lightweight backbone network, a feature fusion network, and a detection head. C3k2 in the figure is a dynamic convolution module (which can dynamically select between 3×3 and 5×5 convolutions to increase the receptive field and capture broader contextual information). Its specific connection relationships and data flow are as follows: (1) Backbone Network: The original YOLO v11 feature extraction backbone network is replaced by sequentially connected MobileNet v3 modules. The input image is downsampled and extracted sequentially through five MobileNet v3 modules. The backbone network extracts feature maps of three different scales (shallow, medium, and deep) from the outputs of the third, fourth, and fifth MobileNet v3 modules, respectively, and inputs them into the feature fusion network.

[0026] (2) Feature Fusion Network: This part combines the Efficient Local Attention (ELA) mechanism with the Lightweight Feature Reassembly of Features (CARAFE) operator. It includes an upsampling fusion path from deep to shallow and a downsampling fusion path from shallow to deep, achieving full interaction of multi-scale features. Upsampling and fusion path: The deepest feature map output from the backbone network is split. One path is directly retained up to the bottom feature fusion node; the other path is upsampled by the CARAFE operator, concatenated with the mid-level feature map of the backbone network, and then fed into the C3k2 module for processing. Subsequently, the output of the C3k2 module is upsampled again by the CARAFE operator and concatenated with the shallow feature map of the backbone network. The concatenated features pass sequentially through the C3k2 module and the ELA attention module, ultimately forming the first output branch used for detecting small targets.

[0027] Downsampling Fusion Path: This path starts with the output of the ELA module in the first branch described above. This feature is first downsampled via convolution, and then concatenated with the output of the first C3k2 module. The concatenated feature then passes through the C3k2 module and the ELA attention module sequentially, forming the second output branch for detecting medium-sized targets. Similarly, the output of the ELA module in the second branch is again downsampled via convolution, concatenated with the deepest feature previously retained at the bottom, and finally passes through the C3k2 module and the ELA attention module sequentially, forming the third output branch for detecting larger targets.

[0028] (3) Detection head: The three branches output by the feature fusion network (i.e., the shallow, middle and deep features enhanced by the ELA module respectively) are connected to the three detection heads independently and in parallel, and finally output the location and category bounding box prediction results of the small defects.

[0029] Its MobileNet v3 module, such as Figure 3 As shown, when extracting features from the acquired metal component defect images, the normalized metal image is input into the MobileNet v3 backbone network. Considering the complex background noise often present on the surface of metal components in industrial settings, such as uneven reflection and oil stains, and the fact that the edge features of small defects are easily submerged by noise, the backbone network uses the Bottleneck bottleneck block for targeted feature expansion and compression. Specifically, during dimensionality reduction compression, the Bottleneck module effectively filters out large areas of smooth metal reflective background and non-directional oil stain noise; while during dimensionality expansion, it accurately preserves the high-frequency texture features of defect edges that are not submerged by noise.

[0030] After the backbone network was replaced, the input image was downsampled by 8x, 16x, and 32x to obtain three feature maps at different resolutions. These three feature maps were specifically physically mapped to the minute defects with varying morphologies on the metal surface. High-resolution feature map (80×80×24): Preserves rich spatial details and is specifically designed to capture tiny pits or fine scratches on metal surfaces that have a very low pixel ratio and are easily overlooked. Medium-resolution feature map (40×40×48): Used to identify medium-scale surface scratches or localized wear areas; Low-resolution feature map (20×20×576): Utilizing its large receptive field, it is specifically designed to capture long, thin, penetrating metal cracks with global contextual relevance.

[0031] Through the multi-scale feature extraction that is deeply bound to the physical characteristics of metal defects, the network can adaptively match various small defects of different scales in complex metal backgrounds, solving the problem that conventional lightweight networks easily filter out the scarce contour information of small defects during downsampling.

[0032] For the ELA part, the ELA attention mechanism is used for calculation, such as Figure 4 As shown. Tiny defects on metal surfaces (such as scratches and cracks) often exhibit elongated shapes and high directional properties. Conventional two-dimensional pooling easily blends these subtle features with the surrounding reflective background. For example... Figure 4 As shown, the ELA data processing involves three paths working collaboratively: First, the input features undergo independent one-dimensional average pooling in the horizontal and vertical directions within the X-direction and Y-direction attention paths, respectively. This reduces dimensionality and compresses background noise such as oil stains on the metal surface while preserving the coordinate distribution information of slender metal defects in two orthogonal one-dimensional directions.

[0033] Secondly, the dimensionality-reduced feature vectors are subjected to one-dimensional convolution to capture the local cross-channel interaction features of the defects, and then combined with normalized stable feature distribution; subsequently, the sigmoid activation function is used to generate X and Y direction attention weights mapped to the (0,1) interval. and The calculation formulas are shown in equations (1) and (2). In a physical sense, these two weight vectors are equivalent to predicting the probability distribution of the elongated metal defect in the horizontal and vertical directions.

[0034] (1) (2) Finally, as Figure 4 As shown in the output (Output), the feature map is transmitted through the direct path of the original input. The generated attention weights are then multiplied and weighted by matrix multiplication, as shown in Equation (3). This data fusion process amplifies the defect response values at the intersection of the horizontal and vertical coordinates in the feature map (assigning a weight close to 1) and suppresses non-defect background noise (assigning a weight close to 0), thus achieving targeted feature enhancement for weak and slender defects.

[0035] (3) After extracting deep features using lightweight networks, conventional upsampling (such as bilinear interpolation) easily leads to jagged, blurred edges and feature breaks when amplifying minute defects. The CARAFE operator, through a dual branch of kernel prediction and feature reconstruction, achieves pixel-level reconstruction of the edges of minute metal defects. Its specific data processing flow is as follows: Figure 5 As shown: In the kernel prediction branch ( Figure 5 In the left side, the input features are first processed through two convolutions to extract the abstract local semantic content of the current metal surface (i.e., to determine whether the current region is a smooth metal background or a sharp defect edge); then the channel dimensions are adjusted, and then Softmax normalization is performed to dynamically generate an attention weight matrix that strictly matches the local defect contour.

[0036] In the feature recombination branch ( Figure 5 In the right side, the input features are initially amplified by nearest neighbor interpolation, and then local feature blocks containing minute defects are split in the spatial dimension through feature splitting operation, and then the feature shape is adjusted to align with the dimension.

[0037] During the integration and output phase Figure 5 The matrix multiplication module below performs a dot product fusion between the attention weight matrix generated on the left and the local feature blocks on the right. This calculation process does not mechanically copy pixels, but rather reallocates the weights of local pixels based on the true defect direction captured by the kernel prediction branch, and finally completes the final upsampling after dimensionality adjustment. This process reconstructs the sharp edge contours of tiny point defects or fine cracks, effectively compensating for the loss of detail caused by the lightweight backbone network.

[0038] S3. Perform data augmentation on the dataset and train the target detection network using the augmented dataset, using spatial intersection-union ratio as the localization loss function; The model training parameters and data augmentation parameters are set, the training epoch is 100, the optimizer is the stochastic gradient descent (SGD) optimizer, the initial learning rate is 0.01, and a cosine annealing learning rate scheduling strategy is combined. The batch size of each training sample is 16. At the same time, the spatial intersection over union (SIoU) loss is used as the localization loss function to improve the localization accuracy. The formulas for calculating SIoU are equations (4)-(9): (4) (5) , , (6) (7) , (8) (9) in The intersection-union ratio (IU) of the predicted bounding box and the ground truth bounding box. To quantify the distance cost of spatial location deviation, i.e. the spatial offset between the center of the predicted box and the ground truth box, The shape cost is used to quantify the difference in shape, which is the difference in the aspect ratio between the predicted bounding box and the actual bounding box. The distance between the center points of the ground truth bounding box and the predicted bounding box. This represents the height difference between the center points of the ground truth bounding box and the predicted bounding box. and The x and y coordinates of the center point of the true bounding box. and The x and y coordinates are the center point of the prediction box. w , h , and These represent the width and height of the predicted bounding box and the ground truth bounding box, respectively. Parameters for adjusting the degree of emphasis on shape loss.

[0039] In the data augmentation parameters, hue is randomly enhanced by ±1.5%, and saturation and brightness are each randomly adjusted by ±50%; geometrically, there is a random rotation of ±45°, a random translation of ±10%, and a random scaling of ±50% gain; additionally, with a 50% probability, operations such as vertical and horizontal flipping, image blending, and segmentation copy-paste are performed. Figure 6 As shown.

[0040] S4. Compare the trained YOLO model with the original YOLO model, perform target detection on the defect image, and verify the training results. The training results are as follows: Figure 7 As shown in the figure. The results show that the improved and optimized model for metal defect detection tasks outperforms the original YOLO model in all aspects: specifically, the model's accuracy increases, the false positive rate decreases, the missed detection phenomenon is significantly reduced, and the mAP50 increases from 0.818 to 0.912, showing a significant improvement in overall detection performance.

[0041] The remaining technical features in the above embodiments can be flexibly selected by those skilled in the art to meet different specific practical needs according to actual circumstances. Modifications and variations made by those skilled in the art that do not depart from the spirit and scope of the present invention should be within the protection scope of the appended claims. In the above description, numerous specific details have been set forth to provide a thorough understanding of the present invention. However, it will be apparent to those skilled in the art that these specific details are not necessary to implement the present invention. In other instances, to avoid obscuring the present invention, well-known techniques, such as specific construction details, operating conditions, and other technical conditions, have not been specifically described.

[0042] This document uses specific examples to illustrate the principles and implementation methods of the present invention. The descriptions of the above embodiments are only for the purpose of helping to understand the method and core ideas of the present invention. Furthermore, those skilled in the art will recognize that, based on the ideas of the present invention, there will be changes in the specific implementation methods and application scope. Therefore, the content of this specification should not be construed as a limitation of the present invention.

Claims

1. A nondestructive testing method for minute defects based on lightweight networks and data augmentation, characterized in that, Includes the following steps: S1. Prepare a sample with minute defects, collect image data of metal defects containing uneven reflection or oil stain background noise and preprocess them to construct a dataset of minute defects. The minute defects include local damage with small feature size or slender shape. S2. Construct an improved target detection network, comprising a backbone network, a feature fusion network, and a detection head; wherein, the backbone network employs a lightweight neural network, which, while filtering out background noise, extracts multi-scale feature layers of the minute defects; the feature fusion network integrates a local attention mechanism and a feature reconstruction upsampling operator; wherein, the local attention mechanism is an efficient local attention mechanism, which performs one-dimensional average pooling along the horizontal and vertical directions to reduce the feature map size, and captures local features through one-dimensional convolution to generate directional attention weights and fuses them with the original features in a weighted manner; the feature reconstruction upsampling operator includes a kernel prediction branch and a feature reconstruction branch, which dynamically generates a reconstruction kernel that matches the local defect contour to fuse and reconstruct the split local feature blocks; S3. Perform data augmentation on the dataset and train the target detection network using the augmented dataset, using spatial intersection-union ratio as the localization loss function; S4. Use the trained object detection network to detect defects in the image of the metal component under test, and output the location and bounding box of the defect.

2. The method for nondestructive testing of minute defects based on lightweight networks and data augmentation according to claim 1, characterized in that: In step S1, the preprocessing of the collected defective image data includes: normalizing the image size to a uniform resolution using bilinear interpolation and removing invalid samples to ensure the consistency of the dataset.

3. The method for nondestructive testing of minute defects based on lightweight networks and data augmentation according to claim 1, characterized in that: In step S2, the lightweight convolutional neural network in the backbone network adopts MobileNet v3; the backbone network performs feature expansion and compression through bottleneck blocks, and extracts three feature layers with different resolutions through downsampling.

4. The method for nondestructive testing of minute defects based on lightweight networks and data augmentation according to claim 1, characterized in that: In step S2, the efficient local attention mechanism performs one-dimensional average pooling to reduce the feature map size along the horizontal and vertical directions, generates horizontal and vertical attention weights through one-dimensional convolution, and then weights and original features are fused together.

5. The method for nondestructive testing of minute defects based on lightweight networks and data augmentation according to claim 1, characterized in that: In step S2, the kernel prediction branch of the feature recombination upsampling operator generates a recombination kernel, the feature recombination branch splits local feature blocks, and the recombination kernel and local feature blocks are fused by dot product.

6. The method for nondestructive testing of minute defects based on lightweight networks and data augmentation according to claim 1, characterized in that: The data enhancement includes: random perturbation of the color space of the image, random rotation, random translation, random scaling, image flipping, image blending, and segmentation copy-paste enhancement.

7. The method for nondestructive testing of minute defects based on lightweight networks and data augmentation according to claim 1, characterized in that: In step S3, the model training adopts a stochastic gradient descent optimizer combined with a cosine annealing learning rate scheduling strategy; the spatial intersection-union ratio loss function comprehensively considers the intersection-union ratio between the predicted box and the ground truth box, the cost of the center spatial offset distance, and the cost of the difference in width and height ratio.

8. The method for nondestructive testing of minute defects based on lightweight networks and data augmentation according to claim 1, characterized in that: The detection head performs feature decoding and bounding box prediction based on the multi-level feature map output by the feature fusion network, and outputs detection results including defect location bounding boxes and probabilities.

9. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the method as described in any one of claims 1 to 8.