An industrial surface defect segmentation network and method based on dynamic snake convolution and multi-scale feature selection

By introducing a network architecture with dynamic serpentine convolution and multi-scale feature selection, the problems of blurred boundaries of slender and curved defects and insufficient multi-scale fusion are solved, realizing efficient detection and real-time segmentation of minute defects, and meeting the high-efficiency inspection needs of industrial production lines.

CN122290118APending Publication Date: 2026-06-26HARBIN NAISHI INTELLIGENT TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HARBIN NAISHI INTELLIGENT TECH CO LTD
Filing Date
2026-05-26
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing technologies suffer from severe boundary blurring and fracture problems when dealing with slender and curved defects. They are unable to adaptively fuse multi-scale features, have poor detection performance for small defects, and lack real-time performance, making it difficult to meet the high-efficiency detection needs of industrial production lines.

Method used

The traditional convolution kernel is replaced by a dynamic snake convolution module (DSConv Block), combined with a progressive content-aware upsampling module (PCAU) and a gated multi-scale selector module (GMS), and lightweighting is achieved by introducing depth-separable convolution and TensorRT optimization through joint optimization of the main branch and auxiliary branches.

Benefits of technology

It significantly improves the edge continuity of slender and curved defects, enhances the detection rate of micro-defects and the segmentation performance of multi-scale defects, and achieves a real-time detection speed of over 200 frames per second, meeting the real-time requirements of industrial production lines.

✦ Generated by Eureka AI based on patent content.
Patent Text Reader

Abstract

This paper proposes an industrial surface defect segmentation network and method based on dynamic serpentine convolution and multi-scale feature selection, relating to the fields of computer vision and industrial automation inspection technology. To address the shortcomings of existing technologies, such as ineffective handling of slender and curved defect boundaries, inability to adaptively fuse multi-scale features, missed detection of minute defects, and insufficient real-time performance, a surface defect segmentation method based on dynamic serpentine convolution and multi-scale feature selection is proposed. The method improves the convolution kernel morphology through a dynamic serpentine convolution module, solving the problem of insensitivity in traditional convolution kernel extraction; employs a progressive content-aware upsampling module to retain more detailed information; introduces a gated multi-scale selector module to fuse multi-scale features; utilizes a bi-branch edge-region segmentation mechanism to improve edge and region segmentation accuracy; and finally, adopts a lightweight inference architecture to meet real-time detection requirements. This method is suitable for efficient segmentation and real-time detection of industrial surface defects.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of computer vision and industrial automation inspection technology, specifically to a deep learning network architecture and method for pixel-level defect segmentation on the surface of industrial products. Background Technology

[0002] In industrial production, surface defect detection is a crucial step in ensuring product quality and improving production efficiency, especially in high-precision manufacturing fields such as automotive, aerospace, and electronics. With the rapid development of automation and intelligence, surface defect detection technologies based on computer vision and deep learning have become mainstream solutions. Existing defect segmentation technologies are mainly based on traditional convolutional neural networks (CNNs) and feature fusion methods, using image processing algorithms to identify and locate surface defects.

[0003] Current status of technological research: Methods based on convolutional neural networks (CNN): Convolutional Neural Networks (CNNs) are currently the core method widely used in surface defect detection. They automatically learn image features by training on large amounts of labeled image data, thereby achieving automatic defect identification and localization. Common network architectures such as U-Net and ResNet have achieved certain results in defect segmentation tasks. U-Net effectively recovers the spatial information of images through its encoder-decoder structure, while ResNet deepens the network layers through residual connections, improving the network's expressive power and feature extraction performance.

[0004] Feature fusion and multi-scale processing methods: To improve the model's adaptability to defects of different sizes, many studies have employed Feature Pyramid Network (FPN) or skip connection techniques. These methods enhance the ability to identify both large and small defects by fusing feature maps at different scales. However, these methods typically rely on fixed scales and predefined feature map fusion strategies, resulting in relatively rigid performance when dealing with defects with significant size differences.

[0005] Improved loss function and optimization algorithm: In defect segmentation tasks, the design of the loss function is also a crucial factor affecting performance. Many studies have employed loss functions such as Dice Loss and BCE Loss to measure the similarity between the predicted results and the true labels. Furthermore, algorithms such as the AdamW optimizer are widely used in the training process to ensure the model's convergence speed and accuracy.

[0006] Technical problems with existing technologies: Despite the progress made in defect segmentation, several challenges remain. First, existing convolutional neural networks still suffer from significant boundary blurring and breakage issues when processing elongated, curved defects (such as microcracks), failing to accurately extract edge information. Second, while traditional feature pyramids and skip connection methods can fuse multi-scale features, they often fail to achieve dynamic adaptive fusion of features at different scales, resulting in insufficient ability to identify defects with large scale differences. Furthermore, existing upsampling methods may lose spatial information when recovering details, particularly performing poorly on small defects. Finally, traditional network structures typically cannot meet the real-time and efficiency requirements of industrial production lines, struggling to achieve inference speeds exceeding 200 FPS.

[0007] In summary, existing technologies have shortcomings such as inability to effectively handle slender and curved defect boundaries, inability to adaptively fuse multi-scale features, missed detection of minute defects, and insufficient real-time performance. Summary of the Invention

[0008] To address the shortcomings of existing technologies, such as ineffective handling of slender and curved defect boundaries, inability to adaptively fuse multi-scale features, missed detection of minute defects, and insufficient real-time performance, the technical solution provided by this invention is as follows: An industrial surface defect segmentation network and method based on dynamic serpentine convolution and multi-scale feature selection includes: The steps include acquiring high-resolution images of automotive parts, performing data enhancement through random scaling, flipping, and HSV color enhancement, and synthesizing different types of defect data; The encoder is built based on ResNet-50, and the 3×3 convolutional layer of each Bottleneck is replaced with a dynamic serpentine convolutional module, which uses the sampling points of the dynamically predicted convolutional kernel to form a curved path. A progressive content-aware upsampling module is applied in the decoder, using the same-scale features of the encoder to guide the generation of the upsampling kernel; A gated multi-scale selector module is introduced at the fusion point of the encoder and decoder, and combined with the channel reweighting unit and the spatial gate unit, a multi-scale feature adaptive fusion step is performed. The optimization is achieved through joint optimization of the main branch and auxiliary branches, with the main branch performing region segmentation and the auxiliary branch performing edge detection. The steps involve using depthwise separable convolutions and TensorRT optimizations to lightweight the model, reduce computational overhead, and achieve inference speeds exceeding 200 frames per second.

[0009] Furthermore, in a preferred embodiment, the boundary features of slender, curved defects are extracted by dynamically predicting the sampling points of the convolution kernel.

[0010] Furthermore, in a preferred embodiment, the generation of the upsampling kernel is guided by encoder-scale features, which reduces the loss of high-frequency details.

[0011] Furthermore, in a preferred embodiment, the channel reweighting unit and the spatial gating unit perform adaptive fusion of multi-scale features.

[0012] Furthermore, in a preferred embodiment, defect region segmentation is performed through the main branch, and edge detection is performed through the auxiliary branch.

[0013] Furthermore, in a preferred embodiment, the computational overhead of the model is reduced through depthwise separable convolution and TensorRT optimization.

[0014] Based on the same inventive concept, this invention also provides an industrial surface defect segmentation network and apparatus based on dynamic serpentine convolution and multi-scale feature selection, comprising: A module that acquires high-resolution images of automotive parts and performs data augmentation through random scaling, flipping, and HSV color enhancement to synthesize different types of defect data; The encoder is built based on ResNet-50, and the 3×3 convolutional layer of each Bottleneck is replaced with a dynamic serpentine convolutional module, which uses the sampling points of the dynamically predicted convolutional kernel to form a curved path. A progressive content-aware upsampling module is applied in the decoder, which uses the same-scale features of the encoder to guide the generation of the upsampling kernel; A gated multi-scale selector module is introduced at the fusion point of the encoder and decoder. This module combines the channel reweighting unit and the spatial gate unit to perform multi-scale feature adaptive fusion. The module is optimized through joint optimization of the main branch and auxiliary branches, with the main branch performing region segmentation and the auxiliary branch performing edge detection. A module that leverages depthwise separable convolutions and TensorRT optimizations to lightweight the model, reduce computational overhead, and achieve inference speeds exceeding 200 frames per second.

[0015] Based on the same inventive concept, the present invention also provides a computer storage medium for storing a computer program, wherein when the computer program is read by a computer, the computer executes the method described thereon.

[0016] Based on the same inventive concept, the present invention also provides a computer, including a processor and a storage medium, wherein when the processor reads a computer program stored in the storage medium, the computer executes the method described thereon.

[0017] Based on the same inventive concept, the present invention also provides a computer program product, which, when executed, implements the method described.

[0018] Compared with the prior art, the advantages of the technical solution provided by the present invention are as follows: This solution significantly improves the ability of existing convolutional neural networks to handle slender and curved defects by introducing a dynamic serpentine convolutional module (DSConv Block). Traditional convolutional kernels struggle to effectively capture the boundary features of slender defects, leading to blurry or broken segmentation results. By dynamically predicting the sampling points of the convolutional kernel to form a continuous curved path, DSConv Block can better adapt to the boundary morphology of minute defects such as cracks. After replacing the traditional 3×3 convolution with DSConv Block in each Bottleneck layer of the encoder, the edge continuity of slender defects such as cracks is significantly improved, with an 8% increase in boundary IoU. Compared to traditional convolutional networks, DSConv Block makes this solution more accurate in segmenting slender defects, effectively solving the problem that traditional networks cannot adapt to complex shape boundaries.

[0019] Existing upsampling methods, especially pooling operations and strided convolutions, often lead to the loss of high-frequency details in images, particularly for the detection of small defects, where the loss of spatial details severely impacts the model's detection capabilities. To address this issue, this scheme introduces a Progressive Content-Aware Upsampling (PCAU) module into the decoder, using features of the same scale as the encoder to guide the generation of the upsampling kernel. In this way, PCAU can effectively reduce the loss of high-frequency details during the upsampling process, thereby improving the detection rate of small defects (pixel area less than 0.1%). Experimental results show that the detection rate of small defects is improved by 15%, and compared with traditional decoder upsampling methods, PCAU significantly improves the adaptability and accuracy for small-sized defects.

[0020] Traditional Feature Pyramid Network (FPN) and skip connection methods, while capable of fusing multi-scale features, typically rely on fixed feature map fusion strategies, making it difficult to adapt the weights of features at different scales. In this scheme, the Gated Multi-Scale Selector (GMS) module achieves intelligent adaptive fusion of cross-scale features by introducing a Channel Reweighting Unit (SEBlock) and a Spatial Gating Unit. The Channel Reweighting Unit calibrates the channel importance of input features, while the Spatial Gating Unit generates a spatial weight map highlighting suspected defect regions. By dynamically selecting and strengthening effective features, this module significantly improves the segmentation performance of multi-scale defects, increasing the mIoU value to 86.5%, a significant improvement compared to SegFormer (82.4%). Compared to existing technologies, the GMS module can more flexibly address the challenges of defects at different scales, avoiding the poor performance of traditional methods when scale differences are large.

[0021] Traditional defect segmentation methods often struggle to balance the accuracy of defect regions and boundaries, especially in detecting low-contrast defects. This proposed solution introduces a dual-branch edge-region segmentation mechanism. The main branch outputs the defect region segmentation result, while the auxiliary branch connects to a lightweight edge detection head, outputting the edge map of the defect. Through this mechanism, the model can not only accurately segment defect regions but also precisely extract edge information, particularly improving the recall rate of low-contrast defects (such as shallow scratches on transparent materials) by 10%. Compared to existing methods, this approach significantly improves the segmentation accuracy of boundaries and regions by jointly optimizing region segmentation and edge detection.

[0022] In existing technologies, deep learning models typically require significant computational resources and time, especially in real-time detection, where many complex networks cannot meet the demands of high frame rates (>200 FPS). To address this issue, this solution employs a lightweight inference architecture, significantly reducing the model's computational load through the use of depthwise separable convolutions and TensorRT optimization. Through channel compression, the solution's parameter count is only 6.1M, greatly reducing computational overhead compared to traditional networks while retaining sufficient accuracy. This enables real-time deployment in industrial production lines, meeting the requirements for efficient inference, achieving high real-time detection speeds, and reducing hardware deployment costs. Compared to traditional networks, this lightweight architecture improves the model's real-time performance and adaptability, making it more practical in real-world applications.

[0023] It is suitable for efficient segmentation and real-time detection of industrial surface defects, especially in quality control work in fields such as automobile manufacturing, electronics and aerospace. Detailed Implementation

[0024] To make the advantages and benefits of the technical solution provided by the present invention clearer, the technical solution provided by the present invention will now be described in further detail, specifically: Implementation Method 1: This implementation method provides an industrial surface defect segmentation network and method based on dynamic serpentine convolution and multi-scale feature selection, including: The steps include acquiring high-resolution images of automotive parts, performing data enhancement through random scaling, flipping, and HSV color enhancement, and synthesizing different types of defect data; The encoder is built based on ResNet-50, and the 3×3 convolutional layer of each Bottleneck is replaced with a dynamic serpentine convolutional module, which uses the sampling points of the dynamically predicted convolutional kernel to form a curved path. A progressive content-aware upsampling module is applied in the decoder, using the same-scale features of the encoder to guide the generation of the upsampling kernel; A gated multi-scale selector module is introduced at the fusion point of the encoder and decoder, and combined with the channel reweighting unit and the spatial gate unit, a multi-scale feature adaptive fusion step is performed. The optimization is achieved through joint optimization of the main branch and auxiliary branches, with the main branch performing region segmentation and the auxiliary branch performing edge detection. The steps involve using depthwise separable convolutions and TensorRT optimizations to lightweight the model, reduce computational overhead, and achieve inference speeds exceeding 200 frames per second.

[0025] Boundary features of slender, curved defects are extracted by dynamically predicting the sampling points of the convolution kernel.

[0026] By guiding the generation of upsampling kernels using encoder-scale features, the loss of high-frequency details is reduced.

[0027] The channel reweighting unit and the spatial gating unit perform adaptive fusion of multi-scale features.

[0028] The main branch is used for defect region segmentation, and the auxiliary branch is used for edge detection.

[0029] Reduce the computational overhead of the model through depthwise separable convolution and TensorRT optimization.

[0030] A network and apparatus for segmenting industrial surface defects based on dynamic serpentine convolution and multi-scale feature selection are also provided, including: A module that acquires high-resolution images of automotive parts and performs data augmentation through random scaling, flipping, and HSV color enhancement to synthesize different types of defect data; The encoder is built based on ResNet-50, and the 3×3 convolutional layer of each Bottleneck is replaced with a dynamic serpentine convolutional module, which uses the sampling points of the dynamically predicted convolutional kernel to form a curved path. A progressive content-aware upsampling module is applied in the decoder, which uses the same-scale features of the encoder to guide the generation of the upsampling kernel; A gated multi-scale selector module is introduced at the fusion point of the encoder and decoder. This module combines the channel reweighting unit and the spatial gate unit to perform multi-scale feature adaptive fusion. The module is optimized through joint optimization of the main branch and auxiliary branches, with the main branch performing region segmentation and the auxiliary branch performing edge detection. A module that leverages depthwise separable convolutions and TensorRT optimizations to lightweight the model, reduce computational overhead, and achieve inference speeds exceeding 200 frames per second.

[0031] A computer storage medium is also provided for storing a computer program, which, when read by the computer, executes the method.

[0032] A computer is also provided, including a processor and a storage medium, wherein the computer executes the method when the processor reads a computer program stored in the storage medium.

[0033] A computer program product is also provided, which, when executed, implements the method described.

[0034] Implementation Method Two: This implementation method is a further detailed description of the technical solution provided in Implementation Method One, specifically: Implementing this solution requires initial data preparation and preprocessing. During the data acquisition phase, high-resolution image data of automotive parts is first acquired. The acquired data should cover various surface materials, taking into account the reflectivity of metals, plastics, and other industrial materials, as well as variations in lighting and environment, ensuring coverage of various potential operating conditions. After data acquisition, image enhancement processing is performed, including random scaling, flipping, and HSV color enhancement. These operations help simulate low-contrast scenes, ensuring the model can adapt to different lighting conditions. Simultaneously, synthetic defect data needs to be generated, especially for minute defects (such as microcracks and shallow scratches). These defects may have a pixel area of ​​less than 0.1%, and simulating these defects helps improve the model's ability to detect small defects.

[0035] Next, the network model is constructed. The entire network adopts an encoder-decoder structure. The encoder's task is to extract multi-scale features from the image. The network is based on ResNet-50, and on this basis, the 3×3 convolutional layer of each Bottleneck in the encoder is replaced with a dynamic serpentine convolutional module (DSConv Block). This module effectively solves the problem of insufficient extraction capability of traditional convolutional kernels for slender and curved defects (such as microcracks) by predicting the sampling points of the convolutional kernel and forming a continuous serpentine path. Through this innovation, DSConv Block can capture the edge information of slender defects such as cracks, improve the continuity of cracks and other slender defect edges, and significantly improve the boundary IoU value.

[0036] The decoder section uses a symmetrical upsampling path, where each decoder layer employs a Progressive Content-Aware Upsampling (PCAU) module. The PCAU module is designed to guide the generation of upsampling kernels using features of the same scale as the encoder, avoiding the loss of detail caused by pooling or stride convolution during conventional upsampling. Especially in the detection of minute defects, PCAU effectively enhances the ability to recover details and improves the detection rate of minute defects.

[0037] To fuse multi-scale features, a gated multi-scale selector (GMS) module was employed. The GMS module includes a channel reweighting unit (SEBlock) and a spatial gating unit. SEBlock weights the channels of the input features, helping the network automatically identify the importance of different features. The spatial gating unit generates a spatial weight map, highlighting suspected defect areas, allowing the network to focus more intently on these regions. Finally, the gated fusion module selects and strengthens the most effective features. In this way, GMS can dynamically and adaptively fuse features from different scales, significantly improving the segmentation accuracy for various defects, especially demonstrating strong adaptability to defects with large scale differences.

[0038] During model training, the AdamW optimizer was used to optimize network parameters, with an initial learning rate set to 1e-4 and a cosine annealing decay strategy employed to ensure gradual model convergence. For loss functions, multiple loss functions were combined, including 0.6x Dice Loss to measure the similarity of segmented regions; 0.3x BAL loss to address class imbalance; and 0.1x BCE Loss to ensure the model can effectively segment all defective regions. During training, an NVIDIA GPU 4090 was used for computation, with a batch size of 16 and 300 training epochs to ensure the model could learn and optimize sufficiently. The training data covered various lighting, reflection, and texture variations to enhance the model's robustness in different environments.

[0039] After training, the model enters the deployment and inference phase. First, the trained model weights are converted to ONNX or TensorRT format for deployment on industrial computers or edge computing devices. The deployed model will be able to process image data in real time, outputting defect segmentation masks and edge maps. To ensure the model meets real-time requirements, the entire network architecture has been lightweight optimized. By using depthwise separable convolutions and TensorRT optimization, the model's parameter count is only 6.1M, significantly reducing computational resource consumption compared to traditional deep learning models. Simultaneously, the model can achieve an inference speed of over 200 frames per second, fully meeting the real-time detection needs of industrial production lines.

[0040] in, DS-ResNet (Dynamic Snake Convolution Residual Network) is the proprietary name for the encoder part in this embodiment. It uses the classic ResNet (Residual Network) as its backbone architecture, but its core convolutional layers are innovatively replaced with Dynamic Snake Convolution modules. This design aims to enable the network to flexibly adapt to the edge contours of defects like a "snake," and is particularly optimized for feature extraction of elongated, curved, low-contrast defects such as microcracks.

[0041] Stages 0-4 (encoder stages 0 to 4) represent the five stages of the DS-ResNet encoder. Each "stage" typically contains a set of residual blocks and undergoes downsampling once (except for the first stage), progressively expanding the receptive field and extracting features at different scales. Stage 0 is usually the initial convolutional layer, and Stage 4 is the deepest layer, containing the most abstract semantic information.

[0042] Content-Gated Decoder is the proprietary name for the decoder portion in this embodiment. "Content gating" refers to the core mechanism of this decoder. It uses a Progressive Content-Aware Upsampling (PCAU) module to dynamically generate upsampling kernels, guided by detailed features (content) from the encoder, achieving intelligent feature map amplification. Simultaneously, through a Gated Multi-Scale Selector (GMS) module, it selectively and adaptively fuses features from the encoder and decoder, like a "gate," suppressing irrelevant background and enhancing defective areas.

[0043] LEVEL 1-4 (decoder levels 1 to 4) represent the four upsampling and fusion levels of the Content-Gated Decoder. LEVEL 1 corresponds to the starting point for upsampling the deepest features, and LEVEL 4 corresponds to the last layer before the final output resolution. Each "Level" contains two core operations: PCAU upsampling and GMS feature fusion.

[0044] This can be understood as: F_guide (guided features) points from DS-ResNet (left side) to Content-Gated Decoder (right side). It refers to feature maps extracted from shallow or intermediate layers of the encoder (such as Stages 0, 1, 2, 3), containing rich spatial details and edge information. These are input into the decoder's PCAU module as "guides" to direct high-quality upsampling of deep semantic features to reconstruct accurate defect boundaries.

[0045] F_low (low-level features) refers to encoder features with the same or similar spatial resolution as the current layer of the decoder. These features are directly input into the GMS module of the corresponding layer and are gatedly fused with the upsampled deep features to supplement the details lost during decoding.

[0046] The data dimensions (C, H, W: number of channels, height, width) are used to describe the shape of the feature map tensor. For example, (64, 56, 56) means that the feature map has 64 channels and a spatial size of 56 pixels × 56 pixels.

[0047] CONV (convolution) is the basic linear operation unit of a convolutional neural network.

[0048] DSCONV (Dynamic Snake Convolution) is the core innovative operator in this implementation. The position of its convolution kernel sampling points can be dynamically predicted and offset according to the input features, forming a continuously curved sampling path to better fit the winding edge of the defect.

[0049] Batch Normalization (BN) standardizes each batch of data, accelerating network training and improving stability.

[0050] RELU (Linear Rectified Function) is a commonly used nonlinear activation function.

[0051] MAXPOOL (Max Pooling) takes the maximum value in a local region, achieving downsampling and feature invariance enhancement.

[0052] BTNK (Bottleneck Block) is a basic building block of residual networks that first compresses channels, then convolutions, and finally expands channels. It is named for its bottleneck-like shape.

[0053] LEVEL N (the Nth decoding layer) refers to any standard processing layer in the decoding path (e.g., N=1,2,3,4). This naming indicates that the decoder is composed of multiple such layers with the same structure but different parameters stacked together.

[0054] PCAU (Progressive Content-Aware Upsampling Module) receives F_low (deep features) and F_guide (detail features), dynamically generates an upsampling kernel using the content information of F_guide, and amplifies the spatial resolution of F_low (e.g., by 2 times) while preserving and reconstructing high-frequency details as much as possible.

[0055] The GMS (Gated Multiscale Selector Module) receives upsampled features from the PCAU module and fuses them with F_guide features. Through internal gating mechanisms (such as channel attention and spatial attention), it dynamically and selectively enhances defect-related features and suppresses irrelevant background, achieving intelligent feature fusion.

[0056] Bilinear interpolation is a classic upsampling method. Here, it is used to perform a fast, fixed-pattern initial upsampling of the compressed F_low features, increasing their spatial size from W_low to the target size W_h (W_guide). This operation does not add new semantic information.

[0057] CONCAT (channel splicing) splices the initially amplified features of the upsampled path with the F_guide features processed by the guide path along the channel dimension, thus fusing the information from both.

[0058] KPM (Kernel Prediction Module) is a lightweight quantum network used to dynamically predict the weights of upsampled recombination kernels. It takes the aforementioned features as input and generates a unique set of filter weights for each location in the output feature map.

[0059] CARAFE (Content-Aware Feature Reassembly) is an advanced upsampling mechanism. It utilizes a dynamic kernel predicted by KPM to perform weighted reassembly of local content on the feature map obtained from initial bilinear upsampling, thereby achieving clearer and more detail-preserving upsampling results than fixed interpolation.

[0060] SEBlock (Squeeze and Activate Module) is a channel attention mechanism. It operates on two input features separately, adaptively recalibrating the importance weights of each feature channel through "squeezing" (global average pooling) and "activation" (fully connected layers and activation functions), suppressing noisy channels and enhancing useful channels.

[0061] Sigmoid (S-shaped growth curve activation function) is an activation function that maps each pixel value of the single-channel feature map (or scalar) output from the previous convolutional layer to the interval [0, 1], thereby generating a spatial weight map. The values ​​in this map represent the importance of the features at the corresponding locations.

[0062] Elementwise Mul multiplies the generated spatial weight map pixel-by-pixel with the calibrated guide path features. The effect is to use the weight map as a "gating" mechanism to dynamically strengthen defect-related regions in the guide features while weakening background or irrelevant regions.

[0063] Weighted Features refer to the guidance path features after spatial gating weighting.

[0064] Elementwise Add adds the weighted guide path features to the calibrated main path features pixel by pixel. This is the final fusion step, integrating the gated detailed information with the deep semantic information.

[0065] First, in the data preparation and preprocessing stage, the system needs to acquire high-resolution image data of automotive parts. To enhance the model's generalization ability, the acquired data should cover various lighting conditions, reflections, and different surface materials. After image acquisition, data augmentation techniques will be applied, including random scaling, flipping, and HSV color enhancement to simulate low-contrast scenes. These enhancements help the model adapt to diverse environmental changes and improve its ability to handle different surface defects (such as microcracks and scratches). Synthetic defect data is also an important part, especially for micro-defects (such as cracks with a pixel area of ​​less than 0.1%), which helps ensure that the model can detect minute industrial surface defects.

[0066] In the construction of the network model, a classic encoder-decoder structure was adopted. The encoder's role is to extract multi-scale features from the input image. To enhance the recognition ability of complex defect morphologies (such as cracks, micro-scratches, etc.), the 3×3 convolution operation of each Bottleneck layer in the encoder was replaced with a dynamic serpentine convolutional module (DSConv Block). Compared with traditional convolutional kernels, DSConv Block can dynamically predict the sampling points of the convolutional kernel, forming a curved path, thereby better capturing the boundary information of slender and curved defects. Traditional convolutional kernels are not sensitive enough to the boundary extraction of slender defects such as micro-cracks, while DSConv Block significantly improves the edge continuity of slender defects such as cracks through curved path convolutional kernels, thereby improving the segmentation accuracy.

[0067] The decoder employs a symmetrical upsampling path to ensure a smooth recovery process from low to high dimensions. Furthermore, each layer of the decoder embeds a Progressive Content-Aware Upsampling (PCAU) module. This module is designed to reduce the loss of high-frequency details that may occur during traditional upsampling, especially in the recovery of minute defects. The PCAU module guides the generation of the upsampling kernel using features at the same scale as the encoder, ensuring that important spatial information, particularly details of minute defects, is not lost during upsampling, thereby improving the model's detection rate for these defects.

[0068] To optimize the fusion of multi-scale features, this model introduces a gated multi-scale selector (GMS) module. The GMS module automatically weights and selects input features by combining a channel reweighting unit (SEBlock) and a spatial gating unit. SEBlock adjusts the channel weights based on feature importance, while the spatial gating unit generates a spatial weight map to highlight regions suspected of being defects. Through this adaptive fusion mechanism, the GMS module effectively combines features from different scales, thereby improving the model's performance on defects of varying sizes. During multi-scale defect segmentation, the GMS module significantly improves the model's mIoU value, addressing the problem of insufficient adaptation to scale differences in traditional multi-scale feature fusion methods.

[0069] Furthermore, this model introduces a dual-branch edge-region segmentation mechanism. Traditional methods typically focus only on region segmentation, neglecting accurate boundary extraction. In this approach, the main branch outputs the region segmentation results, while the auxiliary branch connects to a lightweight edge detection head specifically responsible for edge map generation. This mechanism improves boundary clarity while ensuring region segmentation accuracy, particularly excelling in handling low-contrast defects (such as shallow scratches on transparent surfaces). Through this joint optimization, the model's recall rate for weak-contrast defects is improved by 10%.

[0070] During training, the AdamW optimizer was employed, along with a cosine annealing learning rate decay strategy to ensure gradual model convergence. For the loss function, multiple loss strategies were combined, including Dice Loss, BAL Loss, and BCELoss, to balance segmentation region and boundary accuracy. On the hardware side, an NVIDIA GPU 4090 was used for training to ensure computational efficiency and speed. The batch size was set to 16, and the training epochs were 300 to ensure the model could fully learn the features in the data.

[0071] After training, the model enters the deployment and inference phase. The trained model is converted to ONNX or TensorRT format for deployment on industrial computers or edge computing devices. To meet the real-time requirements of industrial production lines, the model employs a lightweight inference architecture. By using depthwise separable convolutions and TensorRT optimization, the model's parameter count is only 6.1M, significantly reducing computational resource consumption compared to traditional deep learning models. Simultaneously, this architecture achieves an inference speed exceeding 200 frames per second, ensuring the real-time detection requirements are met.

[0072] Implementation Method 3: This implementation method further describes the technical solution provided above in detail through specific embodiments, specifically: This network employs an encoder-decoder structure, where the encoder extracts multi-scale features, and the decoder restores the resolution and outputs a segmentation mask. The core innovation lies in the collaborative modification of the encoder, decoder, and feature fusion path.

[0073] Dynamic Serpentine Convolution Module (DSConv Block): Enhances the ability to extract slender, curved defect edges.

[0074] The 3×3 convolutional layer in each Bottleneck of the encoder is replaced with a DSConv Block, whose convolutional kernel sampling points are dynamically predicted as a continuous, winding path (“serpentine path”), generated by a prediction network through a lightweight offset. This addresses the insensitivity of standard convolutions to extract features from meandering boundaries and improves the edge continuity of defects such as cracks (boundary IoU improvement of 8%).

[0075] Progressive Content-Aware Upsampling (PCAU): Reduces the loss of high-frequency details during the upsampling process.

[0076] A PCAU module is embedded at the beginning of each layer of the decoder, using features of the same scale as the encoder to guide the generation of the upsampling kernel. This improves the detection rate of minute defects by 15%.

[0077] Gated Multiscale Selector (GMS): Enables intelligent adaptive fusion of cross-scale features.

[0078] A GMS module is introduced at the encoder-decoder fusion point, which includes: a channel reweighting unit (SEBlock calibrates the importance of input feature channels), a spatial gating unit (generates spatial weight maps of deep semantic features and highlights suspected defect areas), and gated fusion (dynamically selects and strengthens effective features), improving the multi-scale defect mIoU to 86.5%.

[0079] Dual-branch edge-region segmentation mechanism: jointly optimize the accuracy of defect regions and boundaries.

[0080] The main branch outputs the segmentation results, while the auxiliary branch connects to the lightweight edge detection head to output the defect edge map. The recall rate for weak-contrast defects is improved by 10%.

[0081] Lightweight inference architecture: balancing accuracy and real-time requirements.

[0082] The attention module is constructed using depthwise separable convolutions, with channels compressed to 30% of the original structure, and optimized for deployment with TensorRT. The number of parameters is only 6.1M.

[0083] The technical solution provided in this embodiment, as verified by experiments, has the following significant beneficial effects: Improved edge accuracy: The IoU of linear defect boundaries reaches 92.1%, an 8% improvement over UNet.

[0084] Multi-scale adaptability: mIoU reaches 86.5%, which is better than SegFormer (82.4%).

[0085] Robustness: The false alarm rate is reduced by 32% in metallic reflective and low-contrast scenes.

[0086] In a specific embodiment: S1: Data preparation and preprocessing; High-resolution images of automotive parts are acquired, randomly scaled, flipped, and enhanced with HSV color to synthesize defect data and simulate low-contrast scenes.

[0087] S2: Network model construction; Encoder: Based on ResNet-50, the 3×3 convolutions in Bottleneck of Stages 1-4 are replaced with DSConvBlock.

[0088] Decoder: Symmetric upsampling path, using PCAU module at each layer.

[0089] Feature fusion: The GMS module connects the encoder and decoder at corresponding levels.

[0090] Output header: The main branch outputs the segmentation mask, and the auxiliary branch outputs the edge map.

[0091] S3: Model training; Optimizer: AdamW (initial learning rate 1e-4, cosine annealing decay).

[0092] Loss function: Total loss = 0.6 * Dice Loss + 0.3 * BAL + 0.1 * BCE Loss.

[0093] Hardware: NVIDIA GPU 4090, batch size 16, training 300 rounds.

[0094] S4: Model Deployment and Inference; The trained model weights are converted to ONNX or TensorRT format and deployed to industrial computers or edge computing devices.

[0095] The above description of several specific embodiments further details the technical solution provided by the present invention in order to highlight the advantages and benefits of the technical solution provided by the present invention. However, the above-described specific embodiments are not intended to limit the present invention. Any reasonable modifications and improvements to the present invention, combinations of embodiments, and equivalent substitutions based on the spirit and principles of the present invention should be included within the protection scope of the present invention.

Claims

1. An industrial surface defect segmentation network and method based on dynamic serpentine convolution and multi-scale feature selection, characterized in that, include: The steps include acquiring high-resolution images of automotive parts, performing data enhancement through random scaling, flipping, and HSV color enhancement, and synthesizing different types of defect data; The encoder is built based on ResNet-50, and the 3×3 convolutional layer of each Bottleneck is replaced with a dynamic serpentine convolutional module, which uses the sampling points of the dynamically predicted convolutional kernel to form a curved path. A progressive content-aware upsampling module is applied in the decoder, using the same-scale features of the encoder to guide the generation of the upsampling kernel; A gated multi-scale selector module is introduced at the fusion point of the encoder and decoder, which combines the channel reweighting unit and the spatial gate unit to perform multi-scale feature adaptive fusion. The optimization is achieved through joint optimization of the main branch and auxiliary branches, with the main branch performing region segmentation and the auxiliary branch performing edge detection. The steps involve using depthwise separable convolutions and TensorRT optimizations to lightweight the model, reduce computational overhead, and achieve inference speeds exceeding 200 frames per second.

2. The industrial surface defect segmentation network and method based on dynamic serpentine convolution and multi-scale feature selection according to claim 1, characterized in that, Boundary features of slender, curved defects are extracted by dynamically predicting the sampling points of the convolution kernel.

3. The industrial surface defect segmentation network and method based on dynamic serpentine convolution and multi-scale feature selection according to claim 1, characterized in that, By guiding the generation of upsampling kernels using encoder-scale features, the loss of high-frequency details is reduced.

4. The industrial surface defect segmentation network and method based on dynamic serpentine convolution and multi-scale feature selection according to claim 1, characterized in that, The channel reweighting unit and the spatial gating unit perform adaptive fusion of multi-scale features.

5. The industrial surface defect segmentation network and method based on dynamic serpentine convolution and multi-scale feature selection according to claim 1, characterized in that, The main branch is used for defect region segmentation, and the auxiliary branch is used for edge detection.

6. The industrial surface defect segmentation network and method based on dynamic serpentine convolution and multi-scale feature selection according to claim 1, characterized in that, Reduce the computational overhead of the model through depthwise separable convolution and TensorRT optimization.

7. An industrial surface defect segmentation network and device based on dynamic serpentine convolution and multi-scale feature selection, characterized in that, include: A module that acquires high-resolution images of automotive parts and performs data augmentation through random scaling, flipping, and HSV color enhancement to synthesize different types of defect data; The encoder is built based on ResNet-50, and the 3×3 convolutional layer of each Bottleneck is replaced with a dynamic serpentine convolutional module, which uses the sampling points of the dynamically predicted convolutional kernel to form a curved path. A progressive content-aware upsampling module is applied in the decoder, which uses the same-scale features of the encoder to guide the generation of the upsampling kernel; A gated multi-scale selector module is introduced at the fusion point of the encoder and decoder. This module combines the channel reweighting unit and the spatial gate unit to perform multi-scale feature adaptive fusion. The module is optimized through joint optimization of the main branch and auxiliary branches, with the main branch performing region segmentation and the auxiliary branch performing edge detection. A module that leverages depthwise separable convolutions and TensorRT optimizations to lightweight the model, reduce computational overhead, and achieve inference speeds exceeding 200 frames per second.

8. A computer storage medium for storing computer programs, characterized in that, When the computer program is read by the computer, the computer executes the method of claim 1.

9. A computer, comprising a processor and a storage medium, characterized in that, When the processor reads the computer program stored in the storage medium, the computer executes the method of claim 1.

10. A computer program product, as a computer program, is characterized by: When the computer program is executed, it implements the method of claim 1.