A method of grading surface defects of an industrial product

By employing lossless pixel rearrangement and anchor-guided multi-branch modules, combined with salient location information and group probability, the problem of scale differences and grade continuity in traditional networks when processing surface defects in industrial products is solved, achieving efficient defect classification and improving the accuracy and consistency of detection.

CN122244058APending Publication Date: 2026-06-19FITOW (TIANJIN) DETECTION TECH CO LTD +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
FITOW (TIANJIN) DETECTION TECH CO LTD
Filing Date
2026-05-25
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Traditional convolutional neural networks struggle to simultaneously capture both minute details and large-area structural changes in industrial product surface defects, and the continuity and ordinal relationships between defect levels are not effectively utilized, making it difficult to guarantee detection efficiency and consistency.

Method used

A lossless pixel rearrangement strategy and anchor-guided multi-branch module are adopted. Through a dual-stream grouping gating module and an adaptive ordered fusion module, explicit decoupling of multi-scale defect information and dynamic feature allocation are achieved. Feature fusion is performed by combining salient location information and group probability to construct a rating mechanism consistent with the severity of defects.

Benefits of technology

It significantly improves the accuracy of defect severity classification, can fully cover surface defects of industrial products across all scales, and enhances the stability and consistency of detection, making it suitable for industrial engineering applications.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122244058A_ABST
    Figure CN122244058A_ABST
Patent Text Reader

Abstract

This invention relates to the field of image processing technology, providing a method for classifying surface defects in industrial products. The method includes: compressing the feature space of a high-resolution image using a pixel rearrangement module; inputting the rearranged feature map into a dual-stream grouping gating module to obtain the image's group probability and a heatmap of defect saliency; dynamically cropping the defect saliency heatmap and the high-resolution image; obtaining low-level, mid-level, and high-level features through an anchor-guided multi-branch module; using the image's group probability as dynamic weights, performing weighted feature fusion of the low-level, mid-level, and high-level features through an adaptive ordered fusion module to obtain global aggregated features; and inputting the global aggregated features into a rank regressor to obtain the surface defect classification for industrial products. This invention can comprehensively cover surface defects across all scales of industrial products, significantly improving the accuracy of defect severity classification and possessing excellent industrial engineering application value.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of image processing technology, and in particular to a method for classifying surface defects in industrial products. Background Technology

[0002] The surface quality of industrial products directly affects their reliability and market value. Traditional production lines mainly rely on manual inspection of appearance quality. However, under high-intensity and long-term working conditions, manual inspection is easily affected by fatigue and subjective experience differences, making it difficult to guarantee inspection efficiency and consistency.

[0003] The goals of industrial inspection tasks have evolved from early defect presence identification to more complex defect severity rating. This task not only requires models to accurately detect defects but also to classify them into fine-grained levels based on their morphological characteristics and impact, thus providing a basis for product rework, downgrading, or quality grading. However, in real-world industrial scenarios, defect rating tasks still face two key challenges.

[0004] First, surface defects in industrial products vary significantly in both spatial scale and physical morphology. Minor defects typically manifest as tiny scratches or localized texture anomalies, occupying only a small pixel area in high-resolution images and easily weakened or even completely lost during network downsampling. Severe defects, on the other hand, often exhibit large-area structural distortions or irregular cracks, requiring a larger receptive field to effectively capture their overall shape. Traditional convolutional neural networks typically rely on a fixed receptive field and a uniform feature extraction path. This singular receptive field scale and feature aggregation method struggles to simultaneously capture both minute defect details and macroscopic structural changes.

[0005] Secondly, defect levels often exhibit continuity and sequence. In actual industrial quality standards, defects of different levels are not completely independent categories, but rather a continuous process that gradually changes in severity. For example, defects of adjacent levels are often very similar in appearance, with only minor differences in local features or area proportions. Traditional classification methods typically treat different levels as mutually exclusive categories, thus ignoring the ordinal relationship between levels and easily leading to misjudgments across levels. Summary of the Invention

[0006] This invention aims to address at least one of the technical problems existing in related technologies. To this end, this invention provides a method for classifying surface defects in industrial products, achieving explicit decoupling of multi-scale defect information. A lossless pixel rearrangement strategy reduces computational overhead while preserving underlying texture details. An anchor-guided multi-branch module performs preliminary analysis of input features to obtain the probability of potential defect region classifications and saliency location information. Based on the saliency location information, features are dynamically assigned to branches with different induction biases to handle minor texture anomalies and large-scale structural defects respectively. An adaptive fusion strategy unifies and integrates multi-branch features, constructing a defect classification consistent with defect severity in the feature space, thereby achieving a more stable and consistent defect level assessment. This method can comprehensively cover all scales of industrial product surface defects, significantly improving the accuracy of defect severity classification and possessing excellent industrial engineering application value.

[0007] This invention constructs a collaborative mechanism of "spatial localization-semantic calibration" by reusing group probabilities twice, in a dual-stream grouping gating module and an adaptive ordered fusion module. The first use of group probabilities aims to solve the spatial localization problem of industrial defects with huge scale differences, guiding heatmap generation through probability to lock the core region. The second use of group probabilities aims to solve the discrimination problem of continuous defect levels with ordinal relationships, injecting group probabilities as attention weights into the feature fusion process. This dual-guidance mechanism ensures that the model can maintain a high degree of consistency between physical severity and feature representation when dealing with cross-scale, fine-grained defects.

[0008] This invention provides a method for classifying surface defects in industrial products, comprising: S1: Acquire high-resolution images of surface defects in industrial products, and compress the feature space of the high-resolution images through the pixel rearrangement module to obtain rearranged feature maps; S2: Input the rearranged feature map into the dual-stream grouping gating module to obtain the group probability of the image and the saliency heatmap of defects; S3: Dynamically crop the heatmap of defect saliency and high-resolution image to obtain local image patches containing abnormal features; S4: The low-level branch of the multi-branch module guided by the anchor point subtracts the explicit features of the input features and the anchor point features from the local image patch containing abnormal features to obtain low-level features; the intermediate branch of the multi-branch module guided by the anchor point extracts features from the rearranged feature map to obtain intermediate features; and the high-level branch of the multi-branch module guided by the anchor point extracts features from the rearranged feature map to obtain high-level features. S5: Using the group probability of the image as dynamic weight, the low-level, mid-level and high-level features are weighted and fused through the adaptive ordered fusion module to obtain the global aggregated features; S6: Input the global aggregated features into the rank regressor to obtain the surface defect rating of industrial products.

[0009] Furthermore, the pixel rearrangement module rearranges adjacent pixel blocks in the input image space to the channel dimension according to a preset ratio using lossless pixel rearrangement technology.

[0010] Furthermore, the dual-stream grouping gating module includes macro-group gating and local saliency localization; The macro-group gating classifies surface defects of industrial products into low-level perception groups, mid-level perception groups, and high-level perception groups. After rearranging the feature map and aggregating the spatial context through global average pooling, it is input into a multilayer perceptron for dimensionality reduction. The group probability of the image is output through the Softmax function. The group probability of the image is a three-dimensional probability vector. The local saliency localization generates a saliency heatmap of defects by fusing rearranged feature maps and group probabilities of the image.

[0011] Furthermore, the calculation expression for the salience heatmap of defects is as follows: in, A heatmap showing the salience of defects. for Activation function For the first The severity group number Channel weights for each channel. The network predicts the current input image as the first... The probability of each severity group The first extracted for the shared backbone network Each channel feature map The coordinates on the channel are Spatial pixel activation value, This represents the total number of pixels in the feature map across the spatial dimension.

[0012] Furthermore, the low-level branches of the anchor-guided multi-branch module subtract the explicit features of the input features and anchor features from the local image patches containing anomalous features to obtain low-level features, including: A preset reference point is established, and the feature extractor of the reference point is used to extract features from the reference point through the low-level branch to obtain the reference point features. The feature extractor of the low-level branch extracts features from local image patches containing anomalous features to obtain cropped patch features; Perform a difference operation on the clipping block features and the reference point features to obtain low-level features; The feature extractors for the low-level branches include depthwise separable convolutions.

[0013] Furthermore, the feature extractor of the intermediate branch of the anchor-guided multi-branch module takes the rearranged feature map as input, obtains the global view through the backbone network, and connects spatial pyramid pooling at the end of the backbone network. The feature map is compressed and stitched through grids of three scales: 1×1, 2×2 and 4×4 to obtain intermediate features.

[0014] Furthermore, the advanced branches of the anchor-guided multi-branch module extract the initial features of the rearranged feature map through the initial layer, and then extract the features of the initial features through a variable convolutional layer combined with a dilated convolutional layer to obtain advanced features.

[0015] Furthermore, the adaptive ordered fusion module employs a soft selection and weighted aggregation mechanism at the feature level, using the group probability of the image as a dynamic and learnable attention weight to perform adaptive calibration of low-level, mid-level, and high-level features at the feature level, and then obtains global aggregated features through feature aggregation.

[0016] Furthermore, the rank regressor comprises a first fully connected layer, a ReLU activation function, and a second fully connected layer connected in sequence.

[0017] Furthermore, the adaptive ordered fusion module is jointly trained with both feature space supervision and prediction space supervision for dual objectives; The feature space supervision globally aggregates features through N pairs of multi-edge loss constraints, and the N pairs of multi-edge loss use local window constraints to select negative samples. The prediction space supervision outputs the final predicted value through smoothing L1 loss constraints.

[0018] The above-described one or more technical solutions in the embodiments of the present invention have at least one of the following technical effects: This invention introduces an input preprocessing strategy combining lossless pixel unshuffle and dynamic gating. By mapping from the spatial dimension to the channel dimension, it significantly reduces computational overhead while preserving the fine-grained texture information of the original image and provides a reliable data foundation for subsequent anchoring mechanisms. A multi-branch module guided by anchoring performs multi-branch feature extraction. Lower-level branches enhance the response to subtle texture anomalies, while higher-level branches adapt to structural changes in large-area irregular defects, thereby improving the representation ability of multi-scale defects. An adaptive ordered fusion module uses gating probabilities as dynamic weights to fuse multi-branch features and, combined with ordinal constraints, constructs an arrangement relationship in the feature space consistent with the defect severity, thereby reducing cross-level misjudgments and improving the model's evaluation stability. This invention can comprehensively cover surface defects of industrial products across all scales, significantly improving the accuracy of defect severity classification and possessing excellent industrial engineering application value.

[0019] Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. Attached Figure Description

[0020] To more clearly illustrate the technical solutions in this invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.

[0021] Figure 1 This is a flowchart illustrating a method for classifying surface defects in industrial products provided by the present invention.

[0022] Figure 2 This is a schematic diagram of the structure of an industrial product surface defect classification network provided by the present invention.

[0023] Figure 3 This is a schematic diagram of the structure of the dual-stream grouping gating module provided by the present invention.

[0024] Figure 4 This is a schematic diagram of the structure of the anchor-guided multi-branch module provided by the present invention.

[0025] Figure 5 This is a schematic diagram of the adaptive ordered fusion module provided by the present invention.

[0026] Figure 6 This is a schematic diagram of the industrial surface defect classification results of an automotive engine production line according to an embodiment of the present invention.

[0027] Figure 7This is a schematic diagram showing a comprehensive visual comparison of the present invention with VMamba, PartMatch, MWR, and ResNet. Detailed Implementation

[0028] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of this invention will be clearly and completely described below. Obviously, the described embodiments are only some, not all, of the embodiments of this invention. Based on the embodiments of this invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this invention. The following embodiments are used to illustrate this invention but cannot be used to limit the scope of this invention.

[0029] In the description of this specification, the references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., refer to specific features, structures, materials, or characteristics described in connection with that embodiment or example, which are included in at least one embodiment or example of the present invention. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples. Moreover, without contradiction, those skilled in the art can combine and integrate the different embodiments or examples described in this specification, as well as the features of different embodiments or examples.

[0030] The following is combined Figures 1 to 7 This invention describes a method for classifying surface defects in industrial products.

[0031] like Figure 1 As shown, a method for classifying surface defects in industrial products includes: S1: Acquire high-resolution images of surface defects in industrial products, and compress the feature space of the high-resolution images through the pixel rearrangement module to obtain rearranged feature maps; Industrial product surfaces refer to the outer surfaces of industrial products that are directly exposed during production, testing, and use, including steel, components, equipment housings, pipes, valves, machine tool parts, etc.

[0032] The overall network structure is as follows Figure 2 As shown, the pixel rearrangement module rearranges adjacent pixel blocks in the input image space to the channel dimension according to a preset ratio using lossless pixel rearrangement technology; Specifically, in order to effectively reduce computational overhead while preserving subtle underlying details, the image is first compressed in its feature space using a pixel unshuffle module with a downsampling rate of 4. The computational expression for this mapping process is as follows: in, To rearrange the feature maps, This is a lossless pixel rearrangement technique. For the original image, To preset the rearrangement ratio, The length of the image channel. The width of the image channel. The number of image channels. For image sets; After this operation, the image is compressed to its length and width dimensions. Simultaneously, the number of channels was expanded to 48, ensuring zero-loss transformation of the original pixel data at the beginning of the network input. The original image is a high-resolution image of surface defects in industrial products, with dimensions of [missing information]. .

[0033] In some specific embodiments of the present invention , .

[0034] S2: Input the rearranged feature map into the dual-stream grouping gating module to obtain the group probability of the image and the saliency heatmap of defects; In defect detection tasks for high-resolution industrial images, minute scratches (such as filamentary defects) with a very small width proportion are prone to feature loss during traditional bilinear interpolation or stride convolution downsampling due to receptive field overlap and averaging operations. To reduce computational burden while maintaining the integrity of the original features, this invention employs a dual-stream grouping gating module to extract the rank group probability and saliency location information of potential defect regions from the rearranged feature map.

[0035] like Figure 3 As shown, the dual-stream grouping gating module includes macro-group gating and local saliency localization; To achieve dynamic allocation of computing resources, macro-group gating categorizes the surface defect levels of industrial products into low-level, middle-level, and high-level perception groups based on the actual pixel scale. In some specific embodiments of the present invention, there are 10 levels of surface defects in industrial products, represented by 1-10. The low-level perception group may include levels 1-3, the medium-level perception group includes levels 4-7, and the high-level perception group includes levels 8-10.

[0036] After the rearranged feature map is aggregated with spatial context through global average pooling, it is input into a multilayer perceptron for dimensionality reduction. The softmax function outputs the group probability of the image, which is a three-dimensional probability vector, calculated as follows: in, This represents the probability vector for lower-level branches. This is the probability vector for intermediate branches. This is the probability vector for higher-level branches. It represents the posterior probability that the current image belongs to each scale group. This is the transpose of the matrix.

[0037] Local saliency localization generates a saliency heatmap of defects by fusing rearranged feature maps and group probabilities of images.

[0038] This invention employs a probability-based dynamic soft gating mechanism, using the probability as a dynamic weight to weight and fuse features from different branches, effectively avoiding the discrimination errors caused by hard classification when dealing with critical-size defects. While predicting group probabilities, the dual-stream grouping gating module must accurately locate the abnormal region. Local saliency localization borrows from the class activation mapping concept of Grad-CAM (Gradient-weighted Class Activation Mapping), calculating and generating a defect saliency heatmap by fusing feature maps and dynamic weights. The calculation expression is: in, A heatmap showing the salience of defects. for Activation function For the first The severity group number Channel weights for each channel. The network predicts the current input image as the first... The probability of each severity group The first extracted for the shared backbone network Each channel feature map The coordinates on the channel are Spatial pixel activation value, This represents the total number of pixels in the feature map across the spatial dimension.

[0039] The generation of the defect saliency heatmap includes two stages: feature weight evaluation and spatial aggregation. calculate right The partial derivatives are obtained through global average pooling of the spatial dimension. In mathematical representation, The first was strictly quantified Each channel pair Its overall importance and contribution.

[0040] Using the obtained Linear weighted summation is performed on all feature channels to aggregate multidimensional features in the spatial domain. The weighted result is truncated by the nonlinear activation function ReLU to retain only the pixel response that has a positive excitation effect on the prediction of the current defect group, thereby effectively suppressing the interference of irrelevant backgrounds such as normal machining textures at the feature level.

[0041] Final generation It can accurately reflect the spatial distribution intensity of anomalous features. Based on the region of highest numerical response in the heatmap, the precise coordinates are obtained through analytical geometric center. These coordinates serve as the reference anchor points for cropping the original high-resolution image, ensuring that the system can accurately extract local image patches containing core defects from the redundant background.

[0042] The core task of the dual-stream group gating module is not to directly extract the final features, but to output two independent dynamic gating signals through prospective analysis: one is to predict the probability of the image belonging to different severity groups, which is used to guide the feature fusion of subsequent branches; the other is to adaptively generate saliency localization information, which is used to guide the dynamic cropping of local image patches.

[0043] The application of group probabilities here essentially serves as "spatial navigation." By fusing group probabilities with rearranged feature maps, global group priors can be used to correct local saliency localization, ensuring that the system can accurately remove irrelevant pixels and pinpoint the core defect regions that truly affect the severity rating when facing industrial images with high background interference.

[0044] S3: Dynamically crop the heatmap of defect saliency and high-resolution image to obtain local image patches containing abnormal features; The salient location information, together with the original high-resolution image, drives the dynamic cropping operation, accurately extracting local image patches containing anomalous features from the original image. This avoids pixel loss caused by image scaling.

[0045] S4: The low-level branch of the multi-branch module guided by the anchor point subtracts the explicit features of the input features and the anchor point features from the local image patch containing abnormal features to obtain low-level features; the intermediate branch of the multi-branch module guided by the anchor point extracts features from the rearranged feature map to obtain intermediate features; and the high-level branch of the multi-branch module guided by the anchor point extracts features from the rearranged feature map to obtain high-level features. Due to the vast differences in the physical dimensions of industrial defects, a single fixed network cannot accurately extract features of all types. Therefore, an anchor-guided multi-branch module was introduced, which achieves complete isolation of feature extraction at the physical level through three independent computational branches (low-level, mid-level, and high-level branches).

[0046] like Figure 4 As shown, the Anchor-Guided Multi-Branch Module includes low-level branches, intermediate branches, and high-level branches; The low-level branch receives local image patches containing anomalous features from the dynamically cropped output, while the mid-level and high-level branches receive rearranged feature maps, aiming to capture medium-scale semantic features and large-area global structural changes at different receptive fields.

[0047] The low-level branches of the multi-branch module guided by anchor points subtract the explicit features of the anchor points from the input features of the local image patches containing anomalous features, obtaining low-level features including: A preset reference point is established. The feature extractor of the reference point is used to extract features from the reference point to obtain the reference point features. The feature extractor of the low-level branch is used to extract features from local image patches containing abnormal features to obtain the cropping patch features. Perform a difference operation on the clipping block features and the reference point features to obtain low-level features; The feature extractors for the low-level branches include depthwise separable convolutions.

[0048] Specifically, to address the problem that minute defects are easily masked by the background, the low-level branch introduces a contrastive learning mechanism based on feature difference. This involves using local image patches containing anomalous features... As input, calculate the numerical difference between the current image patch features and preset reference anchors. (Settings are missing from the original text.) This serves as a typical reference sample with the lowest severity within the low-level defect range. As a typical reference sample with the highest severity within the low-level defect range, this physical difference process can be rigorously expressed as: in, These are low-level features. For low-level branch feature extractors, Calculated for the mean. This is a feature difference operator, which is an element-wise subtraction of corresponding spatial locations. By calculating the absolute difference with the upper and lower bound reference anchor points, the network can highlight microscopic anomalies and local detail signals to the greatest extent.

[0049] The feature extractor for the intermediate branch of the anchor-guided multi-branch module is accessed through spatial pyramid pooling at the end of the backbone network.

[0050] Specifically, for regional moderate defects with area fluctuations, local clipping will lose range information. Therefore, the feature extractor of the intermediate branch... Using a rearranged feature map of 224×224 as input, a global view is obtained through the backbone network (ResNetBackbone). To eliminate the numerical alignment problem caused by changes in defect area, this branch incorporates a Spatial Pyramid Pooling (SPP) module at the end of the extraction network. The feature map is compressed and stitched together using grids of three scales: 1×1, 2×2, and 4×4, to obtain intermediate-level features. This combination of multiple receptive fields ensures that the network can stably extract specific feature vectors containing complete morphological information.

[0051] The advanced branches of the anchor-guided multi-branch module extract initial features from the rearranged feature map through the initial layer, and then extract advanced features by combining variable convolutional layers with dilated convolutional layers.

[0052] Faced with large, severe defects with irregular contours, advanced branches Similarly, receiving rearranged feature maps, after initial processing in the initial layer (Stem Layer), the initial features are input into a deformable convolution (Offset Learning). By dynamically stacking positional offsets, the feature extraction range is made to closely fit the irregular edges. The deformable convolution is combined with atrous convolution (Dilation=2, 4), which multiplies the observation range without changing the feature map size, accurately capturing the macroscopic proportional relationship of large-area defects, and obtaining the high-level features of macroscopic defects in the final output. .

[0053] At this point, the anchor-guided multi-branch module has output specific feature vectors for low-level, medium-level, and high-level defects, respectively. , , However, these isolated features still require further processing and cannot be directly equated to the final rating value.

[0054] S5: Using the group probability of the image as dynamic weight, the low-level, mid-level and high-level features are weighted and fused through the adaptive ordered fusion module to obtain the global aggregated features; To overcome the feature scale conflict and semantic misalignment problems that may be caused by simple averaging or splicing at the score level in traditional multi-branch models, this invention designs a mechanism for soft selection and weighted aggregation at the feature level.

[0055] Group probability of images It is not used for hard branch selection, but rather as dynamic and learnable attention weights. For example... Figure 5 As shown, within the adaptive ordered fusion module, the features output by each branch of the anchor-guided multi-branch module are element-wise multiplied with their corresponding dynamic weights. This operation achieves adaptive calibration at the feature level, ensuring that the branch features most relevant to the current sample's defect scale are activated with high weights. It inherently contains a progressive relationship of defect severity, and this operation implicitly injects the prior "ordinal" in physical space into the feature aggregation process.

[0056] The weighted multi-scale features are then aggregated through a summation operation to generate a unified, information-rich global aggregated feature. This feature fusion process is encapsulated in the "Adaptive Ordered Fusion" module, ensuring the coordinated integration of multi-source information in terms of both numerical and semantic aspects. The calculation expression for the globally aggregated features is as follows: in, This is a global aggregated feature.

[0057] The adaptive ordered fusion module incorporates a gated summation mechanism, using the group probabilities output by the gated module as dynamic weights. Since these group probabilities correspond to a strictly progressive hierarchy of defect severity levels, this weighting method naturally injects ordinal priors into the feature aggregation process. The module performs a linear weighted summation of the corresponding features output by the multi-branch modules.

[0058] The application of group probabilities here is to transform them from "spatial navigation signals" into "semantic calibration weights." In existing technologies, such classification probabilities are typically only used in the final output, while this invention uses them as dynamic weights in feature fusion. The beneficial effect is that it ensures that the final synthesized "global aggregated features" contain feature components that best fit the physical scale of the sample. This design not only effectively utilizes the group knowledge already learned by the network, but also forces the network to learn a feature space with strict ordinal arrangements through two different dimensional guidances, thereby fundamentally solving the technical problem of easy misjudgment of adjacent level defects.

[0059] S6: Input the global aggregated features into the rank regressor to obtain the surface defect level of industrial products; The rank regressor consists of a first fully connected layer, a ReLU activation function, and a second fully connected layer connected in sequence. The data is fed into a lightweight rank regressor, which maps the global aggregated features to a continuous predicted ranking, i.e., the level of surface defects in industrial products.

[0060] The fused global aggregated features are finally input into the Rank Regressor, which outputs an accurate defect rating result (Predicted Rank, y).

[0061] To simultaneously ensure the discriminability of the feature space and the accuracy of the regression values, the adaptive ordered fusion module employs both feature space supervision and prediction space supervision for dual-objective joint training.

[0062] Feature space supervision: Globally aggregated features are constrained by multi-margin N-pair loss (MMNP). MMNP is a contrastive learning loss designed for ordinal classification that can directly constrain... This allows features of defects of the same level to cluster together in the manifold space, while features of different levels repel each other, and the margin of repulsion is strictly proportional to the difference in the severity of the defects, thus constructing a highly distinctive ordinal structure in the feature space.

[0063] Prediction space supervision: The predicted value of the final output is constrained by the smooth L1 loss to make it as close as possible to the true ranking.

[0064] Both loss functions use the real labels as supervision signals and jointly drive the entire network to perform end-to-end optimization, thereby achieving accurate and robust defect severity assessment.

[0065] To enable the network to accurately distinguish between different levels in continuous rating tasks, a joint objective consisting of three constraints was adopted during the training phase, rather than a single numerical regression error. The core of this objective is to force the features learned by the network to satisfy the following in the embedding space: the greater the difference in the physical severity of defects, the greater the distance between their corresponding feature vectors should be, thereby ensuring the orderliness and distinguishability between levels.

[0066] Traditional ranking / comparison methods often randomly select negative samples from the entire dataset for comparison. This directly pairs minute defects with large-area severe defects, causing minute features to be stretched and distorted on a numerical scale, impairing local fine-grained discriminative ability. To avoid this problem, this invention introduces a local window constraint (MWR concept) into the multi-interval contrast loss. For each category / group, negative samples are selected only within its own group and adjacent boundary levels, thereby shielding the comparison of samples that span large physical differences and reducing gradient conflicts and feature distortions caused by extreme spans.

[0067] definition Features of the current input sample (anchor point), These are positive sample features of the same level. These are negative sample features of the same level. For true level With negative sample rank The dynamic cumulative interval between branches, for branches of multi-branch modules guided by anchor points. Its loss function is: in, for loss function, For the positive sample set, For the negative sample set, To find the function with the maximum value, This is a function to calculate the similarity between two features.

[0068] Local window constraints are reflected in the negative sample set. In terms of construction, It only includes samples within the current group and those at the adjacent levels. This screening strategy uses only samples from the same group or adjacent levels as comparison objects when comparing minor defects, shielding samples representing macroscopically severe defects, thereby maintaining the fine numerical scale of minor features in the feature space and avoiding feature distortion caused by cross-scale alignment.

[0069] To accommodate the fluctuations in physical performance caused by defects of varying severity, dynamic stretching intervals (margins) are set for different target levels, with smaller base isolation values ​​used for minor defects. This allows similar, subtle features to cluster tightly in space; a larger isolation value is used for severe macroscopic defects that are prone to significant morphological changes. This mechanism accommodates significant numerical deviations caused by morphological differences. While preserving fine-grained consistency, it provides sufficient representational flexibility for large-scale deformations.

[0070] Finally, the joint loss of the entire network is defined as the weighted sum of the gated classification loss, the probability-weighted multi-branch contrastive loss, and the regression loss, and the calculation expression is: in, For the joint loss of the entire network, To evaluate the cross-entropy loss for group classification accuracy, The cross-entropy loss hyperparameter is... This refers to the true group label for the severity of defects corresponding to the input image. This represents the group probability of the image. The hyperparameter for gated classification loss, The network predicts the current input image as the first... The probability of each severity group It is a low-level branch. It is an intermediate branch. For advanced branches, To measure the smoothed L1 regression loss of the final score bias, For regression loss hyperparameters, This provides a true label for the severity of the defect.

[0071] Specifically, the model uses the group probabilities output by the gating module as dynamic weights to... Weighting is applied. This joint optimization mechanism ensures that, while maintaining accurate gating and qualitative analysis, each branch can learn ordered features that are both discriminative and hierarchically arranged through soft weighting. By shifting the tedious search process originally required during inference to structural constraints during the training phase, the network's ability to quantitatively rate complex industrial defects is significantly improved.

[0072] To evaluate the practical effectiveness of this invention in assessing the severity of industrial defects, an industrial surface defect dataset derived from a real automotive engine production line was constructed, primarily containing scratches on engine surfaces. All images were acquired in a real industrial inspection environment to ensure the data reflects the complexities of real-world scenarios, such as variations in lighting, textured background interference, and the distribution of defects in different morphologies. The dataset contains 4253 high-resolution surface images. The samples include various types of defects, such as minor scratches, localized discoloration, and large-scale structural damage. In real-world industrial inspection scenarios, the visual morphology of defects varies considerably; even within the same severity level, their size, shape, and location can differ significantly. Each defect region in the images was labeled with a severity level of 0–9. The labeling was completed by three senior quality inspectors, taking into account the defect's geometric dimensions (length, area), depth, and contrast. Level 0 corresponds to extremely minor surface marks, while level 9 represents severe structural damage requiring scrapping. For experimental partitioning, the entire dataset was randomly divided into training, validation, and test sets in a 7:1:2 ratio to ensure independence between model training and evaluation.

[0073] To analyze the relationship between defect severity and spatial scale, the pixel area of ​​defect regions in the dataset was statistically analyzed and compared with the corresponding severity levels. The segmentation area is shown below. Figure 6As shown, there is no strict linear relationship between defect severity and pixel area. While more severe defects generally have larger areas, there are still many overlapping areas in real-world data. For example, some small defects with high contrast or significant structural damage may be rated as high-severity, while some large regional anomalies may be rated as medium-severity. Therefore, relying solely on a single scale feature is insufficient to accurately determine defect severity in industrial scenarios. Defects of the same severity may differ significantly in scale, and defects of similar size may correspond to different severity levels. Therefore, the model needs to consider both local detail features and overall structural information to obtain stable rating results. This invention introduces a grouped multi-branch structure in the network design, processing defect features at different scales through different branches, thereby improving the model's ability to model complex defects.

[0074] This invention is implemented based on the PyTorch framework and trained and tested on an NVIDIA RTX 3090 GPU. During training, the model's batch size was set to 32, and it was trained for 100 epochs. The optimizer used SGD with a momentum parameter of 0.9 and weight decay set to 1×10⁻⁶. -4 The initial learning rate was 0.01, and the learning rate was gradually decayed using a Cosine Annealing learning rate strategy.

[0075] In terms of network structure, the input image is first losslessly downsampled through the PixelUnshuffle operation, with a resampling rate set to... Each branch's feature extraction network is initialized using ImageNet pre-trained weights to accelerate model convergence. In terms of loss function design, the model employs a multi-task joint optimization strategy (including gated classification loss, multi-branch MMNP contrastive loss, and smoothed L1 regression loss) to simultaneously improve the model's grouping decoupling ability and level prediction accuracy. For the task of assessing the severity of industrial defects, since defect levels have a natural ordinal relationship, traditional classification accuracy alone cannot fully reflect the model's true performance. This invention constructs the following comprehensive evaluation system from three dimensions: classification consistency, numerical error, and ranking consistency: Classification consistency includes accuracy (Accuracy, Acc) and tolerance accuracy (Acc±1). Accuracy measures the proportion of samples whose predicted level is completely consistent with the true label. Tolerance accuracy takes into account the engineering error tolerance requirements and allows the prediction to be considered correct even if it differs from the true label by one level, which is closer to the actual acceptance standard.

[0076] Numerical errors include mean absolute error (MAE) and mean squared error (MSE). MAE reflects the expected value of the absolute deviation between the predicted level and the actual level; MSE is more sensitive to large misjudgments across levels through squared penalties.

[0077] Ranking consistency includes Pearson linear correlation coefficient (PLCC) and Spearman rank correlation coefficient (SRCC). PLCC measures the degree of linear correlation between the predicted results and the true rank; SRCC only focuses on the relative ranking between ranks, intuitively reflecting whether the model has learned the monotonically increasing trend of defect severity.

[0078] To verify the effectiveness of this invention, a variety of representative deep learning models were selected for comparative experiments, including general visual networks, fine-grained recognition models, and ordinal learning models. General visual networks included ResNet-50, ConvNeXt-T, Swin-T, and the state-space model VMamba; fine-grained recognition models included NTS-Net, TransFG, and PartMatch, which typically improve detail recognition capabilities through local region modeling; and ordinal learning models included CORAL and MWR, which explicitly consider the order relationship between categories in the training objective. The performance comparison based on the industrial defect dataset is shown in Table 1.

[0079] Table 1 Performance based on industrial defect dataset

[0080] As shown in Table 1, general-purpose visual networks are severely limited in their performance on ordinal classification tasks. For example, the classic ResNet-50 achieves an Accuracy of only 30.5% and a MAE as high as 1.63. Even the state-space model VMamba, with its stronger feature representation capabilities, only achieves an Accuracy of 41.3%. This indicates that general-purpose visual networks struggle to handle continuous defect levels with extremely subtle differences. Fine-grained recognition models, through local region modeling, show significant performance improvements. PartMatch achieves a grouping accuracy of 95.1%, an Accuracy of 62.4%, and a MAE of 0.72. However, since these methods do not explicitly model the physical order relationship between levels, there is still room for improvement in the SRCC metric of 0.835, which measures ranking consistency. On the other hand, while the ordinal learning model (MWR) achieves an SRCC of 0.845, demonstrating good ranking stability, its accuracy of 46.6% is still relatively low due to its single feature extraction architecture, making it difficult to accurately capture mixed-scale defects. This invention (CFG-Net) achieved best performance across all evaluation metrics, with Acc reaching 65.7% and Acc±1 as high as 95.5%. This means that in most practical industrial applications, the prediction error of this invention is strictly controlled within a certain level. Regarding numerical errors, the MAE of this invention is reduced to 0.45 and MSE to 0.58. Furthermore, PLCC (0.912) and SRCC (0.925) are both greater than 0.9, fully demonstrating that the multi-branch decoupling architecture of this invention can not only accurately identify local details but also perfectly maintain the monotonically increasing property of defect severity.

[0081] To analyze the contribution of each module in this invention, ablation experiments were conducted as shown in Table 2. The performance of the model was observed by gradually removing or replacing core components to observe the performance degradation.

[0082] Table 2 Ablation Experiment Results of the Present Invention

[0083] In Table 2, the baseline model, employing standard scaling, hard classification Argmax, and center pruning, experienced a sharp drop in overall accuracy to 49.4%, with an MSE of 2.56. The recognition rate for the weak defect group (G1) was only 40.5%, completely losing its fine-grained discrimination capability. Replacing the lossless pixel unshuffle operation with traditional resize downsampling (Variant A) reduced the model's overall accuracy to 58.3%, increased the MSE to 0.92, and raised the recognition accuracy for the G1 group to 48.2%. This demonstrates that traditional interpolation scaling easily leads to the loss of high-frequency details, while lossless pixel unshuffle plays a crucial role in preserving subtle defect features. Changing the probability-based dynamic soft gating to a hard classification strategy (Hard Argmax) (Variant B) resulted in an overall accuracy of 62.8%. This indicates that the dynamic soft gating mechanism effectively mitigates the errors caused by the forced segmentation of boundary samples by preserving probability distribution information, contributing to a smooth transition in the feature fusion stage. Variant C removed saliency localization and replaced it with conventional center cropping, resulting in an overall model accuracy drop to 56.5%. Furthermore, the classification performance of G3 and G1 was significantly affected, reflecting the positive role of anchor-guided dynamic cropping in accurately locating abnormal regions and reducing background interference. Therefore, the overall performance improvement of this invention benefits from the effective synergy of lossless fidelity, soft probabilistic gating, and anchor-guided multi-branch decoupling mechanisms. The superior performance of this invention does not depend on a single operator, but rather on the deep synergy of these three mechanisms: lossless fidelity, soft probabilistic gating, and anchor-guided multi-branch decoupling.

[0084] This invention provides a comprehensive visual comparison with VMamba, PartMatch, MWR, and ResNet. Figure 7 As shown, the ResNet model suffers from severe "out-of-focus" behavior, with its high-response regions often completely deviating from the defect itself, instead generating strong false activations of normal processing textures or image edges. While VMamba can roughly cover the defect area, it exhibits a severe "receptive field diffusion" problem, with its heatmap overflowing large areas onto the normal metallic background, failing to accurately remove background interference. PartMatch demonstrates sensitivity to extremely small high-contrast features, but its response exhibits obvious "generalization" and "point-like aggregation." Due to the lack of feature extraction branches targeting macroscopic deformations, this model can only activate localized, minimal regions when faced with long scratches or defects of a certain area, failing to fully depict the physical contours of the defect. Although MWR introduces hierarchical order constraints in the backend, its heatmap exhibits a scattered, speckled distribution in space due to the lack of explicit background suppression in the frontend feature extraction mechanism, making it highly susceptible to interference from surrounding machining textures and resulting in multiple false activations. The activation region of this invention demonstrates extremely high spatial accuracy and morphological adaptability. Figure 7 As can be seen, whether it is a thin, thread-like scratch or an irregularly shaped local abrasion, the heatmap of this invention can closely and accurately fit the real physical contour of the defect, effectively avoiding edge overflow and local breakpoints. The saliency localization and dynamic pruning mechanism at the front end of this invention effectively reduces the interference of redundant backgrounds and alleviates the problem of general networks easily losing focus; the anchor-guided multi-branch structure, through differentiated receptive fields and feature subtraction operations, makes up for the limitations of single models in macroscopic and microscopic feature extraction, enabling the network to focus more completely on abnormal regions.

[0085] This invention introduces an input preprocessing strategy combining lossless pixel unshuffle and dynamic gating. By mapping from the spatial dimension to the channel dimension, it significantly reduces computational overhead while preserving the fine-grained texture information of the original image and provides a reliable data foundation for subsequent anchoring mechanisms. A multi-branch module guided by anchoring performs multi-branch feature extraction. Lower-level branches enhance the response to subtle texture anomalies, while higher-level branches adapt to structural changes in large-area irregular defects, thereby improving the representation ability of multi-scale defects. An adaptive ordered fusion module uses gating probabilities as dynamic weights to fuse multi-branch features and, combined with ordinal constraints, constructs an arrangement relationship in the feature space consistent with the defect severity, thereby reducing cross-level misjudgments and improving the model's evaluation stability. This invention can comprehensively cover surface defects of industrial products across all scales, significantly improving the accuracy of defect severity classification and possessing excellent industrial engineering application value.

[0086] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for classifying surface defects in industrial products, characterized in that, include: S1: Acquire high-resolution images of surface defects in industrial products, and compress the feature space of the high-resolution images through the pixel rearrangement module to obtain rearranged feature maps; S2: Input the rearranged feature map into the dual-stream grouping gating module to obtain the group probability of the image and the saliency heatmap of defects; S3: Dynamically crop the heatmap of defect saliency and high-resolution image to obtain local image patches containing abnormal features; S4: The low-level branch of the multi-branch module guided by the anchor point subtracts the explicit features of the anchor point features from the input features of the local image patch containing abnormal features to obtain low-level features. The intermediate branch of the multi-branch module guided by the anchor point extracts features from the rearranged feature map to obtain intermediate features. The high-level branch of the multi-branch module guided by the anchor point extracts features from the rearranged feature map to obtain high-level features. S5: Using the group probability of the image as dynamic weight, the low-level, mid-level and high-level features are weighted and fused through the adaptive ordered fusion module to obtain the global aggregated features; S6: Input the global aggregated features into the rank regressor to obtain the surface defect rating of industrial products.

2. The method for classifying surface defects in industrial products according to claim 1, characterized in that, The pixel rearrangement module rearranges adjacent pixel blocks in the input image space to the channel dimension according to a preset ratio using lossless pixel rearrangement technology.

3. The method for classifying surface defects in industrial products according to claim 1, characterized in that, The dual-stream grouping gating module includes macro-group gating and local saliency localization; The macro-group gating classifies surface defects of industrial products into low-level perception groups, mid-level perception groups, and high-level perception groups. After rearranging the feature map and aggregating the spatial context through global average pooling, it is input into a multilayer perceptron for dimensionality reduction. The group probability of the image is output through the Softmax function. The group probability of the image is a three-dimensional probability vector. The local saliency localization generates a saliency heatmap of defects by fusing rearranged feature maps and group probabilities of the image.

4. The method for classifying surface defects in industrial products according to claim 1, characterized in that, The formula for calculating the salience heatmap of defects is: in, A heatmap showing the salience of defects. for Activation function For the first The severity group number Channel weights for each channel. The network predicts the current input image as the first... The probability of each severity group The first extracted for the shared backbone network Each channel feature map The coordinates on the channel are Spatial pixel activation value, This represents the total number of pixels in the feature map across the spatial dimension.

5. The method for classifying surface defects in industrial products according to claim 1, characterized in that, The low-level branches of the multi-branch module guided by anchor points subtract the explicit features of the anchor points from the input features of the local image patches containing anomalous features, obtaining low-level features including: A preset reference point is established, and the feature extractor of the reference point is used to extract features from the reference point through the low-level branch to obtain the reference point features. The feature extractor of the low-level branch extracts features from local image patches containing anomalous features to obtain cropped patch features; Perform a difference operation on the clipping block features and the reference point features to obtain low-level features; The feature extractors for the low-level branches include depthwise separable convolutions.

6. The method for classifying surface defects in industrial products according to claim 1, characterized in that, The feature extractor of the intermediate branch of the anchor-guided multi-branch module takes the rearranged feature map as input, obtains the global view through the backbone network, and connects spatial pyramid pooling at the end of the backbone network. The feature map is compressed and stitched through grids of three scales: 1×1, 2×2 and 4×4 to obtain intermediate features.

7. The method for classifying surface defects in industrial products according to claim 1, characterized in that, The advanced branches of the anchor-guided multi-branch module extract initial features from the rearranged feature map through the initial layer, and then extract advanced features by combining variable convolutional layers with dilated convolutional layers.

8. The method for classifying surface defects in industrial products according to claim 1, characterized in that, The adaptive ordered fusion module employs a mechanism of soft selection and weighted aggregation at the feature level. It uses the group probability of the image as a dynamic and learnable attention weight to perform adaptive calibration of low-level, mid-level, and high-level features, and then obtains global aggregated features through feature aggregation.

9. The method for classifying surface defects in industrial products according to claim 1, characterized in that, The rank regressor consists of a first fully connected layer, a ReLU activation function, and a second fully connected layer connected in sequence.

10. The method for classifying surface defects in industrial products according to claim 1, characterized in that, The adaptive ordered fusion module is jointly trained with both feature space supervision and prediction space supervision for dual objectives. The feature space supervision globally aggregates features through N pairs of multi-edge loss constraints, and the N pairs of multi-edge loss use local window constraints to select negative samples. The prediction space supervision outputs the final predicted value through smoothing L1 loss constraints.