A steel surface defect detection method based on PSN-YOLO
By improving the YOLOv8 network to a PSN-YOLO model and introducing dynamic convolution, multi-scale feature fusion structure and loss function, the problems of insufficient recognition and computational overhead of steel surface defect detection models in complex scenarios are solved, and efficient and accurate automatic detection is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- HUZHOU UNIVERSITY
- Filing Date
- 2026-04-08
- Publication Date
- 2026-06-16
Smart Images

Figure CN122222992A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of computer vision and industry. Background Technology
[0002] Steel is an alloyed metallic material with extremely wide applications and an irreplaceable role in modern industry. With the continuous development of modern industry, the requirements for the surface smoothness and quality of steel are increasing across all sectors, making steel surface inspection a current research hotspot. Research and inspection of steel and its surface quality are not only of great significance to industrial development but also contribute to improving product quality and safety.
[0003] The development of steel surface defect detection technology can be roughly divided into three stages: The first stage is the traditional target detection method, which mainly relies on human experience or image processing techniques such as threshold segmentation, edge detection, and texture analysis to identify defects. These methods are simple to implement, but have poor adaptability to changes in lighting, noise interference, and complex surface textures, making it difficult to maintain stable detection accuracy in real industrial environments. The second stage is machine vision-based detection methods, which introduce feature extraction and classification algorithms, such as traditional machine learning models like SVM and KNN, to discriminate image features. These methods improve automation to some extent, but rely on manually designed features, have limited ability to identify small or complex defects, and involve high computational demands on high-resolution images, making it difficult to balance speed and accuracy. The third stage is deep learning-based detection methods, which utilize convolutional neural networks to automatically learn multi-level feature representations, exhibiting stronger robustness and generalization ability in complex backgrounds and multi-scale defect scenarios. To address the shortcomings of traditional and machine vision methods, YOLOv8 is introduced as the main model for improvement. This not only enables high-precision end-to-end automatic detection but also effectively reduces computational complexity while maintaining detection performance, thereby meeting the dual requirements of real-time performance and accuracy in industrial settings. Summary of the Invention
[0004] This invention aims to address the shortcomings of existing steel surface defect detection models in complex industrial scenarios, such as insufficient ability to identify small target defects, limited feature representation capabilities, and high computational costs. To this end, this invention proposes a steel surface defect detection method based on PSN-YOLO, which enables high-precision automatic detection of various types of steel defects.
[0005] The technical solution of this invention is as follows: A PSN-YOLO model is constructed on the basis of the YOLOv8 network, and a ParameterNet dynamic convolution module is introduced into the backbone network to enhance the feature extraction capability; the original SPPF module is replaced with the SPPELAN feature fusion module to improve the multi-scale feature expression capability; at the same time, the NWDLoss loss function is used to replace the original CIoU loss function to reduce the sensitivity of the model to small target position deviation, thereby improving the detection stability and accuracy of small-scale defects.
[0006] The PSN-YOLO model provided by this invention effectively improves the ability to identify minute defects on the surface of steel while ensuring detection accuracy, and also takes into account the computational efficiency of the model. It can be applied to steel surface quality inspection scenarios in actual industrial production, and has positive significance for improving the level of inspection automation and production efficiency.
[0007] Based on the above inventive concept, the present invention adopts the following technical solution:
[0008] A. Dataset Construction: Collect steel surface image data and label six types of defects, namely rolling scale, patches, cracks, pitting, inclusions and scratches, to construct a steel surface defect dataset, which is then divided into training set, validation set and test set according to a preset ratio.
[0009] B. ParameterNet Dynamic Convolution Module: In the PSN-YOLO backbone network, the traditional convolution module is replaced with the ParameterNet dynamic convolution structure that introduces a parameter enhancement function. The convolutional expressive power is enhanced by dynamic weight combination, which increases the parameter representation ability while controlling the computational complexity, thereby improving the overall performance of the model.
[0010] C. SPPELAN Feature Fusion Module: The SPPELAN module replaces the original SPPF structure in the backbone network, combining spatial pyramid pooling with local feature enhancement mechanisms to extract target features at different receptive field scales, thereby improving the detection accuracy of multi-scale defects.
[0011] D. NWDLoss Loss Function: The NWDLoss loss function based on normalized Wasserstein distance is adopted to model the target bounding box with a Gaussian distribution. The difference between the predicted box and the real box is measured by the distance between the distributions, which reduces the sensitivity of the loss function to small positional deviations and improves the stability of small target defect detection.
[0012] E. Model Training and Defect Detection: The PSN-YOLO model is trained and validated using the constructed dataset to obtain a trained defect detection model. The surface image of the steel to be detected is input into the model, and the defect category and location information are output to realize the automatic detection of defects on the steel surface.
[0013] Compared with the prior art, the present invention has the following beneficial effects:
[0014] By introducing the ParameterNet dynamic convolution module into the backbone network, the convolution kernel can be adaptively adjusted according to the input features, which improves the model's ability to express complex surface textures and diverse defect morphologies, and improves detection accuracy without significantly increasing the amount of computation.
[0015] By introducing the SPPELAN multi-scale feature fusion structure, the receptive field of the model is expanded and the multi-scale feature expression capability is enhanced, enabling the model to simultaneously focus on large-size and small defects, thus significantly improving the small target detection performance.
[0016] By using NWDLoss instead of the traditional bounding box regression loss function, the impact of small target position deviation on the training process is reduced, and the stability and accuracy of small defect localization are improved.
[0017] The overall PSN-YOLO model achieves high-precision automatic identification of multiple types of defects on steel surfaces while ensuring real-time detection capabilities. It balances detection speed and accuracy, and has good prospects for industrial applications and practical engineering value. Attached Figure Description
[0018] Figure 1 This is a schematic diagram of the steel surface defect detection method provided in an embodiment of the present invention.
[0019] Figure 2 This is a schematic diagram of the PSN-YOLO network model structure provided in an embodiment of the present invention.
[0020] Figure 3 This is a schematic diagram of the ParameterNet dynamic convolution module structure provided in an embodiment of the present invention.
[0021] Figure 4 This is a schematic diagram of the SPPELAN module structure provided in an embodiment of the present invention.
[0022] Figure 5 The image PR curve of the steel surface defect recognition result using the PSN-YOLO model is provided in the embodiment of the present invention. Detailed Implementation
[0023] The technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments.
[0024] like Figure 1As shown in this embodiment, a method for detecting surface defects in steel based on PSN-YOLO specifically includes the following steps:
[0025] A. Dataset construction.
[0026] Surface image data from the steel production line were collected, and the image resolution was uniformly adjusted to 640×640 pixels. Annotation tools were used to create bounding boxes for six types of defects: rolling scales, patches, cracks, pitting, inclusions, and scratches, and the annotation results were converted to YOLO format. To improve the model's generalization ability, data augmentation processing was performed on the training set images, including random flipping, brightness and contrast adjustment, random cropping, and mosaic enhancement. Finally, a dataset containing 1800 images was constructed and divided into training, validation, and test sets in an 8:1:1 ratio.
[0027] B. ParameterNet dynamic convolution module.
[0028] like Figure 2 As shown, the improved YOLOv8 network model specifically includes: an input image of steel surface defects with a size of 640×640. Features are extracted through a backbone network with ParameterNet dynamic convolutional modules and SPPELAN modules. Figure 3 As shown, the network structure of the ParameterNet dynamic convolution module includes a coefficient generation module, dynamic weight fusion, and a convolution process.
[0029] The process expression for dynamic convolution with M dynamics is:
[0030]
[0031]
[0032]
[0033] Where X represents the input features of the convolution operation, Y represents the output of the convolution operation, * indicates a convolution operation with the bias term omitted, and W... i It is the i-th convolutional weight tensor, α i These are the corresponding dynamic coefficients. Pool represents global average pooling, and MLP represents multilayer perceptron module.
[0034] For a standard convolutional layer, the number of parameters is... The number of FLOPs is .have The coefficient generation module for each hidden dimension needs Parameters and One FLOP. Dynamic weight fusion is parameter-free and has... Therefore, the total number of parameters and FLOPs for dynamic convolution are respectively [number of FLOPs]. and .
[0035] The parameter ratio of dynamic convolution relative to standard convolution is:
[0036]
[0037] The FLOPs ratio is:
[0038]
[0039] Therefore, dynamic convolution has approximately M times more parameters than standard convolution, while the additional FLOPs are negligible.
[0040] C. Introduce the SPPELAN multi-scale feature fusion module into the backbone network to improve the detection capability of defects at different scales.
[0041] like Figure 4 As shown, the SPPELAN structure employs three consecutive max pooling operations with residual structures, using a uniform 5x5 kernel. Finally, the results before and after each pooling operation are concatenated. SPPELAN utilizes three consecutive pooling operations to reduce computational cost and combines the output of each layer, ensuring multi-scale fusion while further increasing the receptive field. SPPELAN combines SPP with a local attention mechanism to further improve object detection accuracy. Local attention is a method that focuses on information in local regions of an image. By introducing local attention, SPPELAN can capture more refined target features at different scales, thereby improving object detection accuracy. Simultaneously, because the scope of local attention is relatively small, its computational cost is also relatively low, thus not excessively increasing the computational burden. By combining SPP with local attention, SPPELAN not only inherits the advantages of SPP but also further enhances object detection performance.
[0042] D. Replace the original bounding box regression loss function with NWDLoss to improve the localization stability of small target defects.
[0043] The improved YOLOv8 network model is trained using the training set and validated using the validation set, replacing the CIoU of YOLOv8 with NWDLoss.
[0044] The NWDLoss loss function first models the bounding box using a two-dimensional Gaussian distribution; then it uses a new metric, Normalized Wasserstein Distance (NWD), to calculate the similarity through the Gaussian distribution corresponding to the bounding box, effectively reducing the sensitivity to the location of small targets and improving the stability of small target detection.
[0045] Gaussian distribution The normalized Wasserstein distance (NWD) is:
[0046]
[0047] in, Let be the second-order Wasserstein distance between two two-dimensional Gaussian distributions, and C be a constant closely related to the selected dataset.
[0048] The expression for NWDLoss is:
[0049] in, The prediction box p is a Gaussian distribution model. It is a Gaussian distribution model of the prediction box g.
[0050] E. The PSN-YOLO detection model is trained using the dataset and used for the automatic detection of defects on steel surfaces.
[0051] The annotation information of the defects on the steel surface after detection is obtained, and the test results are output; the corresponding test values are set as follows: epochs=200, batch=16, imgsz=640, and the initial learning rate is 0.01.
[0052] Figure 5 This is a graph showing the precision (PR) curves of the PSN-YOLO model used in this invention for recognizing steel surface defects in images. Experimental results show that the average precision (mAP50) of recognizing steel surface defects using the PSN-YOLO model reaches 78.7%, which is 4.3% higher than the original YOLOv8 model. The number of parameters for recognizing steel surface defects using the PSN-YOLO model reaches 4.6M, which is 1.6M higher than the original YOLOv8 model. The number of FLOPs for recognizing steel surface defects using the PSN-YOLO model reaches 7.8G, which is 0.3G lower than the original YOLOv8 model. The PSN-YOLO model improves detection accuracy while increasing the number of parameters slightly and reducing computational load.
[0053] In summary, addressing the issues of insufficient accuracy in identifying steel surface defects in complex industrial environments, easy omission of small-scale defects, and high computational overhead, this invention proposes a steel surface defect detection method based on PSN-YOLO. This method introduces a ParameterNet dynamic convolution module, a SPPELAN multi-scale feature fusion structure, and an NWDLoss loss function into the YOLOv8 model, achieving a synergistic improvement in feature representation capability and small target detection stability. Compared to baseline models, the proposed PSN-YOLO significantly improves detection accuracy while maintaining computational efficiency, meeting the real-time detection requirements of practical industrial scenarios and demonstrating good stability and practicality in steel surface defect detection tasks. The implementation of this invention contributes to the automated and high-precision detection of steel surface defects, and has positive implications for improving industrial product quality control and production efficiency.
Claims
1. A method for detecting surface defects in steel based on PSN-YOLO, characterized in that, Includes the following steps: A. Annotate and preprocess images of steel surface defects to construct a defect detection dataset; B. Construct a PSN-YOLO detection model and introduce a ParameterNet dynamic convolution module into the backbone network to enhance feature extraction capabilities; C. Introduce the SPPELAN multi-scale feature fusion module into the backbone network to improve the detection capability of defects at different scales; D. Replace the original bounding box regression loss function with NWDLoss to improve the localization stability of small target defects; E. The PSN-YOLO detection model was trained using the dataset and used for the automatic detection of defects on steel surfaces.
2. The method according to claim 1, wherein bounding box annotations are performed on the surface defect image of steel, and the annotation information is converted into normalized coordinate data in YOLO format.
3. The method according to claim 1, wherein the ParameterNet dynamic convolution module generates dynamic convolution weights by weighted fusion of multiple convolution kernels, wherein the weighting coefficients are generated jointly by global average pooling and multilayer perceptron.
4. In the method according to claim 3, the dynamic convolution output of the ParameterNet dynamic convolution module is obtained by linear weighted summation of multiple convolution kernels, so as to control the amount of computation while increasing the parameter expressiveness.
5. According to the method of claim 1, the SPPELAN module obtains features at different scales through multiple max pooling operations, and splices and fuses the features at each scale to expand the receptive field and enhance the multi-scale feature representation capability.
6. The method according to claim 1, wherein the NWDLoss reduces the impact of small target position deviation on the training process by modeling the predicted box and the ground truth box as a two-dimensional Gaussian distribution and calculating the loss based on the normalized Wasserstein distance.
7. The method according to claim 1, wherein the PSN-YOLO detection model adopts an end-to-end training method to achieve automatic identification of multiple types of steel surface defects.