A method and apparatus for identifying surface defects in parts.

By combining a pre-trained defect recognition model with a hierarchical semantic knowledge graph and a multi-scale semantic alignment network, the problem of insufficient accuracy and reliability in the detection of surface defects of parts is solved, and high-precision identification and localization of minute defects is achieved.

CN121937445BActive Publication Date: 2026-06-30HEFEI UNIV OF TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
HEFEI UNIV OF TECH
Filing Date
2026-03-27
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing neural network models struggle to simultaneously and effectively cover and accurately distinguish multiple subtle defects on the surface of parts, especially when the defects are highly similar to the background texture. This can easily lead to false detections and missed detections, failing to meet the demands of high-precision and high-reliability industrial inspection.

Method used

A pre-trained defect recognition model is used to extract multi-scale feature maps through a visual encoder. It is then combined with a hierarchical semantic knowledge graph for dynamic association and anomaly sampling. A semantic perception network and a multi-scale semantic alignment network are used for global visual feature analysis and pixel-level feature analysis, outputting image-level anomaly judgment and pixel-level localization results.

Benefits of technology

It significantly improves the detection accuracy and reliability of complex and diverse defects, and can more accurately identify various subtle defects that are highly similar to normal textures and have complex shapes.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN121937445B_ABST
    Figure CN121937445B_ABST
Patent Text Reader

Abstract

This invention provides a method and apparatus for identifying surface defects in parts. The method includes: inputting a part image into a pre-trained defect identification model for feature extraction to obtain a multi-scale feature map; inputting a preset hierarchical semantic knowledge map into the defect identification model, and dynamically associating the multi-scale feature map with the hierarchical semantic knowledge map to update the hierarchical semantic knowledge map; performing anomaly sampling on the updated hierarchical semantic knowledge map to obtain anomaly semantic vectors with different degrees of anomaly, forming an anomaly semantic vector set; performing global visual feature analysis on the multi-scale feature map based on the anomaly semantic vector set to obtain an image-level anomaly score for the part image; and performing pixel feature analysis on the multi-scale feature map based on the anomaly semantic vector set to obtain a pixel-level anomaly score for the part image. The method and apparatus for identifying surface defects in parts provided by this invention can improve the detection accuracy of surface defects in parts.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of industrial visual inspection, and more particularly to a method and apparatus for identifying surface defects in parts. Background Technology

[0002] In the field of industrial visual inspection, automatic detection of surface defects in parts is crucial for ensuring product quality. Parts such as bearings, gears, and blades, after undergoing various processing techniques, exhibit complex and diverse surface defect morphologies, including cracks, scratches, pits, oxide spots, ablation spots, and various texture anomalies. These defects vary greatly in shape, size, contrast, and texture continuity, and are often mixed with the regular processing texture of the parts themselves, posing a significant challenge to the accuracy of inspection technology.

[0003] Currently, automated inspection methods based on deep learning have been applied in industry. However, the types of defects on the surface of parts are too numerous and their morphological features too subtle, making it difficult for existing neural network models to effectively cover and accurately distinguish so many and subtle defect types simultaneously. This is especially true when these defects are highly similar to normal textures, easily leading to false positives and false negatives, and failing to meet the high-precision and high-reliability requirements of industrial inspection. Therefore, there are areas for improvement. Summary of the Invention

[0004] This invention provides a method and apparatus for identifying surface defects of parts, which can solve the technical problem of insufficient detection accuracy and reliability caused by the subtle features, numerous types of defects, and easy confusion with background textures.

[0005] The present invention provides a method for identifying surface defects in a part, comprising:

[0006] The part image is input into a pre-trained defect recognition model, and the visual encoder of the defect recognition model extracts features from the part image to obtain a multi-scale feature map.

[0007] The preset hierarchical semantic knowledge graph is input into the defect recognition model. Through the semantic perception network of the defect recognition model, the multi-scale feature map is dynamically associated with the hierarchical semantic knowledge graph to update the hierarchical semantic knowledge graph. The hierarchical semantic knowledge graph includes multiple nodes at multiple levels, and each node has a corresponding preset initial prompt vector.

[0008] The semantic sampling network of the defect identification model is used to sample the updated hierarchical semantic knowledge graph to obtain abnormal semantic vectors with different degrees of abnormality, forming an abnormal semantic vector set.

[0009] The first output branch of the multi-scale semantic alignment network of the defect identification model performs global visual feature analysis on the multi-scale feature map based on the set of abnormal semantic vectors to obtain an image-level anomaly score for the part image; the second output branch of the multi-scale semantic alignment network performs pixel feature analysis on the multi-scale feature map based on the set of abnormal semantic vectors to obtain a pixel-level anomaly score for the part image.

[0010] In one embodiment of the present invention, the step of dynamically associating the multi-scale feature map with the hierarchical semantic knowledge graph to update the hierarchical semantic knowledge graph includes:

[0011] The feature maps at each scale of the multi-scale feature map are linearly projected based on different initial cue vectors to obtain the corresponding visual features.

[0012] Based on the attention mechanism, the initial prompt vectors of each node in the hierarchical semantic knowledge graph are fused with the corresponding visual features to obtain the updated semantic prompt vectors of the hierarchical semantic knowledge graph.

[0013] In one embodiment of the present invention, the nodes include product nodes, attribute nodes, and defect nodes. The step of performing anomaly sampling on the updated hierarchical semantic knowledge graph to obtain anomaly semantic vectors with different degrees of anomaly includes:

[0014] Cluster analysis is performed on the semantic hint vectors of product nodes and attribute nodes in the updated hierarchical semantic knowledge graph to obtain normal aggregate vectors;

[0015] Cluster analysis is performed on the semantic hint vectors of defect nodes in the updated hierarchical semantic knowledge graph to obtain defect aggregation vectors;

[0016] Based on different preset random mixing coefficients, the normal aggregated vector and the defect aggregated vector are weighted and fused to obtain multiple abnormal semantic vectors with different degrees of abnormality; wherein, the random mixing coefficients satisfy the Beta distribution.

[0017] In one embodiment of the present invention, the first output branch of the multi-scale semantic alignment network of the defect recognition model performs global visual feature analysis on the multi-scale feature map based on the set of abnormal semantic vectors to obtain an image-level anomaly score for the part image; the second output branch of the multi-scale semantic alignment network performs pixel feature analysis on the multi-scale feature map based on the set of abnormal semantic vectors to obtain a pixel-level anomaly score for the part image, including:

[0018] Using the calibration branch in the multi-scale semantic alignment network of the defect identification model, semantic calibration is performed on the multi-scale feature maps based on the set of abnormal semantic vectors to obtain calibration feature maps of the corresponding scales.

[0019] Through the first output branch of the multi-scale semantic alignment network, global visual feature analysis is performed on the smallest-scale calibration feature map based on the set of abnormal semantic vectors to obtain the image-level anomaly score of the part image;

[0020] The pixel-level anomaly score of the part image is obtained by performing pixel feature analysis on the calibration feature maps at each scale based on the set of anomaly semantic vectors through the second output branch of the multi-scale semantic alignment network.

[0021] In one embodiment of the present invention, the step of performing semantic calibration on multi-scale feature maps based on the set of abnormal semantic vectors to obtain calibration feature maps at corresponding scales includes:

[0022] For feature maps at each scale:

[0023] The feature map is then subjected to dimensionality reduction projection to obtain the corresponding compressed feature map;

[0024] Perform a depthwise separable convolution operation on the compressed feature map to obtain the corresponding first feature map;

[0025] The set of abnormal semantic vectors is converted into corresponding channel attention weights, and the first feature map is modulated channel by channel to obtain the second feature map.

[0026] The second feature map is projected in an upgraded dimension to obtain the third feature map;

[0027] The third feature map is added to the feature map to obtain the corresponding calibration feature map.

[0028] In one embodiment of the present invention, the step of performing global visual feature analysis on the smallest-scale calibration feature map based on the set of anomalous semantic vectors to obtain an image-level anomaly score for the part image includes:

[0029] Global average pooling is performed on the smallest-scale calibration feature map to generate global visual features;

[0030] Calculate the cosine similarity between the global visual features and each anomaly semantic vector to obtain the corresponding image-level anomaly prediction value;

[0031] The mean of all image-level anomaly predictions is calculated to obtain the image-level anomaly score of the part image.

[0032] In one embodiment of the present invention, the step of performing pixel feature analysis on calibration feature maps at various scales based on the set of abnormal semantic vectors to obtain pixel-level anomaly scores for the part image includes:

[0033] The calibration feature maps at various scales are aligned and fused to obtain a fused feature map;

[0034] For each anomalous semantic vector, the cosine similarity between the fused feature map and the anomalous semantic vector is calculated pixel by pixel to generate the corresponding two-dimensional matrix;

[0035] Pixel aggregation is performed on all two-dimensional matrices to obtain pixel-level anomaly scores for the part images.

[0036] In one embodiment of the present invention, the defect identification model is trained and updated based on a contrast loss function, a regression loss function, and a segmentation loss function;

[0037] The contrast loss function is calculated based on a preset hierarchical semantic knowledge graph, the hierarchical semantic knowledge graph updated from the training part images, and the global visual features of the training part images.

[0038] The regression loss function is calculated based on the abnormal semantic vector of the part images used for training, different random mixing coefficients, and the corresponding image-level anomaly scores.

[0039] The segmentation loss function is calculated based on the pixel-level anomaly scores of the training part images and the true defect masks of the training part images.

[0040] In one embodiment of the present invention, the defect identification model is trained and updated based on a loss function; wherein the loss function is: , , , The preset weighting coefficients, , , These are the contrastive loss function, the regression loss function, and the segmentation loss function, respectively.

[0041] The present invention also provides a device for identifying surface defects of a part, comprising:

[0042] The feature extraction module is used to input the part image into the pre-trained defect recognition model, and extract features from the part image through the visual encoder of the defect recognition model to obtain a multi-scale feature map.

[0043] The semantic association module is used to input a preset hierarchical semantic knowledge graph into the defect recognition model, and dynamically associate the multi-scale feature map with the hierarchical semantic knowledge graph through the semantic perception network of the defect recognition model to update the hierarchical semantic knowledge graph; wherein, the hierarchical semantic knowledge graph includes multiple nodes at multiple levels, and each node has a corresponding preset initial prompt vector;

[0044] The semantic sampling module is used to perform anomaly sampling on the updated hierarchical semantic knowledge graph through the semantic sampling network of the defect identification model, so as to obtain anomaly semantic vectors with different degrees of anomaly and form an anomaly semantic vector set.

[0045] The defect processing module is used to perform global visual feature analysis on the multi-scale feature map based on the set of abnormal semantic vectors through the first output branch of the multi-scale semantic alignment network of the defect recognition model to obtain an image-level anomaly score of the part image; and to perform pixel feature analysis on the multi-scale feature map based on the set of abnormal semantic vectors through the second output branch of the multi-scale semantic alignment network to obtain a pixel-level anomaly score of the part image.

[0046] The beneficial effects of this invention are as follows: By introducing a pre-defined hierarchical semantic knowledge graph and associating it with image features, prior knowledge such as the category, attributes, and defects of parts is effectively integrated into the model, enabling the model to more accurately identify and distinguish various subtle defects that are highly similar to normal textures and have complex shapes. Through sampling of anomalous semantics and multi-scale semantic alignment analysis, coverage and accurate evaluation of defects of different degrees and scales are achieved, while outputting image-level anomaly judgment and pixel-level localization results, thereby significantly improving the detection accuracy and reliability of complex and diverse defects. Attached Figure Description

[0047] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application. It is obvious that the drawings described below are merely some embodiments of this application, and those skilled in the art can obtain other drawings based on these drawings without any inventive effort.

[0048] In the attached diagram:

[0049] Figure 1 A flowchart illustrating a method for identifying surface defects in a part according to an embodiment of the present invention;

[0050] Figure 2 This is a schematic diagram of a defect identification model provided in one embodiment of the present invention;

[0051] Figure 3This is a schematic diagram of a part surface defect identification device provided in one embodiment of the present invention. Detailed Implementation

[0052] The following specific examples illustrate the implementation of the present invention. Those skilled in the art can easily understand other advantages and effects of the present invention from the content disclosed in this specification. The present invention can also be implemented or applied through other different specific embodiments. Various details in this specification can also be modified or changed based on different viewpoints and applications without departing from the spirit of the present invention. In the absence of conflict, the following embodiments and features in the embodiments can be combined with each other.

[0053] It should be noted that the illustrations provided in the following embodiments are only schematic representations of the basic concept of the present invention. The drawings only show the components related to the present invention and are not drawn according to the actual number, shape and size of the components in the actual implementation. In the actual implementation, the form, quantity and proportion of each component can be arbitrarily changed, and the layout of the components may also be more complex.

[0054] In the following description, numerous details are explored to provide a more thorough explanation of embodiments of the invention. However, it will be apparent to those skilled in the art that embodiments of the invention may be practiced without these specific details. In other embodiments, well-known structures and devices are shown in block diagram form rather than in detail to avoid obscuring embodiments of the invention.

[0055] This invention discloses a method for identifying surface defects in parts, applicable to automated visual inspection of the surface quality of various parts, including bearings, gears, and blades, after machining. These parts exhibit a wide variety of complex and varied defects, such as cracks, scratches, pits, and oxide spots, which are highly similar to normal machining features in size, appearance, and texture, making accurate identification and location difficult using traditional automated inspection methods. The identification method in this embodiment solves the technical challenges of insufficient detection accuracy and reliability caused by the subtle and diverse characteristics of defects, which are easily confused with background textures.

[0056] Please see Figure 1 and Figure 2 The identification method includes the following steps: S10, inputting the part image into a pre-trained defect identification model, and extracting features from the part image through the visual encoder of the defect identification model to obtain a multi-scale feature map. The defect identification model may include a visual encoder 110, a semantic perception network 120, a semantic sampling network 130, and a multi-scale semantic alignment network 140.

[0057] After obtaining the part image, it can be input into a pre-trained defect recognition model, where the visual encoder 110 within the defect recognition model processes the part image. The visual encoder 110 used in this embodiment can be a visual encoder released in OpenCLIP (e.g., the Swing Transformer Backbone), which has hierarchical feature extraction capabilities.

[0058] The visual encoder 110 performs multi-level, multi-scale feature encoding on the input part image. Specifically, the visual encoder 110 outputs multiple feature maps at different scales during processing, which together constitute a multi-scale feature map. In this embodiment, feature maps at four scales are extracted, corresponding to the outputs of the four stages within the visual encoder 110, with downsampling rates of 1 / 4, 1 / 8, 1 / 16, and 1 / 32 of the original size of the part image, respectively. Each scale feature map retains the original two-dimensional spatial structure of the part image, i.e., it has height and width dimensions, providing a foundation for subsequent spatial analysis and calibration. The multi-scale feature map can be represented as follows: , For the scale.

[0059] From the multi-scale feature maps obtained above, features of image patches can be further extracted. Specifically, for each scale feature map, the feature map is spatially divided into several image patches. Then, the feature tensor corresponding to each image patch is flattened into a feature vector. The feature vectors of all image patches at each scale are collected to form the feature vector set for that scale. At this point, a corresponding multi-scale feature vector set is formed for the multi-scale feature map. The multi-scale feature vector set captures cross-scale visual information from global overview to local details, providing rich visual feature input for subsequent semantic association and anomaly analysis. The multi-scale feature vector set can be represented as follows: , This refers to the image patch number.

[0060] Please see Figure 1 The identification method also includes the following steps: S20, inputting the preset hierarchical semantic knowledge graph into the defect identification model, and dynamically associating the multi-scale feature map with the hierarchical semantic knowledge graph through the semantic perception network of the defect identification model to update the hierarchical semantic knowledge graph; wherein, the hierarchical semantic knowledge graph includes multiple nodes at multiple levels, and each node has a corresponding preset initial prompt vector.

[0061] The hierarchical semantic knowledge graph is pre-built. Its purpose is to encode and organize professional knowledge about parts in the industrial field, including categories, attributes, and defects, in a hierarchical, machine-understandable form. The construction process of the hierarchical semantic knowledge graph does not involve training on image data; instead, it initializes nodes and their relationships based on knowledge definitions and semantic priors from pre-trained language models (such as CLIP Text Encoder).

[0062] When constructing a hierarchical semantic knowledge graph, the first step is to collect and define the set of nodes. Depending on the specific application scenario, such as in the automotive parts manufacturing field, product categories might include "bearings," "gears," and "flanges." Simultaneously, attributes related to defect detection need to be defined. These attributes are functional surfaces or critical parts of the parts; for example, for gears, attributes include "tooth surface" and "end face"; for bearings, they might include "outer raceway surface" and "inner raceway surface." Finally, all possible defects need to be listed in detail, such as "cracks," "scratches," "dents," and "rust." These categories, attributes, and defects constitute the nodes for constructing the hierarchical semantic knowledge graph.

[0063] Secondly, it is necessary to define the attribution and association relationships between products, attributes, and defects, and construct a directed graph structure accordingly. This directed graph structure contains three types of nodes: product nodes representing specific part types, attribute nodes representing specific parts of the part, and defect nodes representing specific defect forms. The connection rules of the directed graph structure are: each product node points to all attribute nodes related to it, indicating that the part has these parts; each attribute node further points to all defect nodes that may appear on that part, indicating the defects that may occur on that specific part. For example, the product node "gear" will be connected to attribute nodes such as "tooth surface" and "end face"; while the attribute node "tooth surface" will be connected to defect nodes such as "crack" and "scratch". The initial weight of the connection edges between nodes is set to 1 to indicate the existence of this association relationship.

[0064] Subsequently, a high-dimensional, learnable semantic representation, i.e., an initial cue vector, is initialized for each node in the directed graph structure. Each node is first assigned an independent learnable soft cue vector, which is a parameter vector that can be optimized through gradient descent during training. , ;in, For the real number field; The dimension of the learnable soft cue vector is a positive integer hyperparameter.

[0065] The specific method involves converting the node's label into a complete natural language sentence using a pre-defined text template, based on the node's type. For product nodes, the template "A photo of a [Product]" is used, for example, "A photo of an age." For attribute nodes, the template "The [Attribute] of a [Product]" is used, for example, "The tooth surface of a gear." For defect nodes, the template "A [Product] with [Defect] on its [Attribute]" is used, for example, "A gear with abrasion on its tooth surface." These generated text sentences are then fed into a pre-trained text encoder, such as CLIP's text encoder, to pre-define a fixed text feature vector for each node.

[0066] Finally, the learnable soft cue vector of each node is fused with its corresponding fixed text feature vector to form the initial cue vector for that node. The fusion can be achieved through vector addition or concatenation. By adding the learnable soft cue vector, it is responsible for fine-tuning and adapting for specific defect detection tasks during subsequent training, while the fixed text feature vector provides stable and rich general semantic priors, ensuring the rationality and interpretability of the model's semantic space. Ultimately, the hierarchical semantic knowledge graph is formally defined as containing the set of all nodes, the set of directed edges between nodes, and the initial cue vector corresponding to each node.

[0067] When performing step S20, step S20 includes the following steps:

[0068] S21. Perform linear projection of the feature maps at each scale in the multi-scale feature map based on different initial cue vectors to obtain the corresponding visual features.

[0069] S22. Based on the attention mechanism, the initial prompt vectors of each node in the hierarchical semantic knowledge graph are fused with the corresponding visual features to obtain the updated semantic prompt vectors of the hierarchical semantic knowledge graph.

[0070] The semantic awareness network 120 can be a network structure based on a multi-head cross-attention mechanism (e.g., OpenCLIPS win-B). The semantic awareness network 120 receives multi-scale feature maps from the visual encoder 110. For each multi-scale feature map, there is a set of multi-scale feature vectors, where each feature vector corresponds to an image patch in the part image. To enable interaction and matching with the initial cue vectors of nodes in the hierarchical semantic knowledge graph, the feature vectors of these image patches need to be mapped to a unified semantic feature space to obtain the visual features of each image patch. The semantic awareness network 120 incorporates a learnable linear projection matrix. , ;in, The dimension of the multi-scale feature vector set; Let be the dimension of the attention feature space, which is equal to the dimension of the learnable soft cue vector.

[0071] For the feature vector sets corresponding to feature maps at each scale in the multi-scale feature map, the semantic perception network 120 transforms the feature vector of each image patch using a learnable linear projection matrix. Specifically, each original feature vector is multiplied by the learnable linear projection matrix to obtain a new vector with the same dimension as the initial cue vector. This new vector is the visual feature corresponding to that image patch.

[0072] The formula for calculating visual features is as follows:

[0073]

[0074] in, For the first The first scale The visual features are obtained by linearly projecting the feature vectors of each image patch. It is a linear projection matrix. For the multi-scale feature vector set, the first The first scale Feature vectors of image patches.

[0075] Through this process, feature vectors from different scales and locations are uniformly transformed into the same vector space as the hierarchical semantic knowledge graph. This learnable linear projection matrix is ​​randomly assigned during model initialization and continuously optimized and updated during training as gradient backpropagation occurs, in order to learn the mapping relationship that is most conducive to associating visual features with the initial cue vector.

[0076] The semantic awareness network 120 begins by calculating the relevance between each visual feature obtained through linear projection and the initial cue vector of each node in the hierarchical semantic knowledge graph. These nodes include product nodes, attribute nodes, and defect nodes, each carrying its own initial cue vector. The mechanism for calculating relevance is an attention mechanism.

[0077] For any visual feature of an image patch, the semantic perception network 120 will pair it with the initial cue vector of each node to calculate the corresponding relevance.

[0078] The formula for calculating relevance is as follows:

[0079]

[0080] in, For the first The first scale The visual features of the first image patch and the second The relevance of the initial cue vector of each node; and A learnable parameter vector; For transpose; For element-wise multiplication; For the first The initial cue vector for each node; This involves concatenating vectors. For along the first Image patches and scale The softmax function normalizes the dimensions. Let be the dimension of the attention feature space.

[0081] The semantic awareness network 120 performs feature fusion operations based on the calculated relevance to obtain an updated hierarchical semantic knowledge graph. In the updated hierarchical semantic knowledge graph, the representation of each node is no longer a static, generic initial cue vector, but rather a semantic cue vector incorporating specific visual information from the part image.

[0082] The process of generating semantic cue vectors is a weighted aggregation process. Taking a defect node as an example, the semantic awareness network 120 traverses the visual features of all image patches. For each defect node, based on the correlation between the initial cue vector of the defect node and the visual features of each image patch, all visual features are weighted and summed. Through weighted summation, a semantic cue vector that integrates all semantic cue vectors related to the defect node in the entire part image is obtained. This semantic cue vector incorporates visual pattern information in the part image that may indicate the defect. Similarly, for attribute nodes and product nodes, the semantic awareness network 120 uses the same logic for calculation, except that the initial cue vectors involved in the calculation are replaced with the initial cue vectors of the corresponding attribute nodes or product nodes, thereby obtaining their respective semantic cue vectors.

[0083] The formula for calculating semantic cue vectors is as follows:

[0084]

[0085] in, For the first Semantic cue vectors for each node; It is a non-linear activation function (e.g., GELU or ReLU); For the first The first scale The visual features of the first image patch and the second The relevance of the initial cue vector of each node; For the first The first scale Visual features obtained by linearly projecting the feature vectors of image blocks; For As weight, for all The aggregated visual features obtained by weighted summation represent the convergence of visual features from the entire part image with the first... Visual features associated with the initial cue vector of each node.

[0086] Ultimately, all nodes and their updated semantic hint vectors, along with the hierarchical relationships between them, together constitute the updated hierarchical semantic knowledge graph.

[0087] Please see Figure 1 The identification method also includes the following steps: S30, through the semantic sampling network of the defect identification model, perform anomaly sampling on the updated hierarchical semantic knowledge graph to obtain abnormal semantic vectors with different degrees of anomaly, forming an abnormal semantic vector set.

[0088] Step S30 includes the following steps: performing cluster analysis on the semantic hint vectors of product nodes and attribute nodes in the updated hierarchical semantic knowledge graph to obtain normal aggregate vectors; performing cluster analysis on the semantic hint vectors of defect nodes in the updated hierarchical semantic knowledge graph to obtain defect aggregate vectors.

[0089] Product nodes and attribute nodes describe the visual features and inherent properties that a part image should possess when it is defect-free. Through the semantic sampling network 130 of the defect recognition model (e.g., a mathematical operation module based on reparameterized Gaussian sampling and Beta distribution sampling), element-wise averaging can be performed on the semantic cue vectors of product nodes and attribute nodes in the updated hierarchical semantic knowledge graph, and element-wise averaging can also be performed on the semantic cue vectors of all defect nodes. This averaging operation effectively integrates the information of each node and resists interference from noise in individual nodes, thus obtaining statistically representative normal aggregation vectors and defect aggregation vectors. The normal aggregation vector represents the comprehensive visual semantic features of the part in its intact state; the defect aggregation vector represents the common semantics of various defect features. Before performing the element-wise averaging operation, a normalization operation can be introduced to preprocess the semantic cue vectors of the nodes to ensure consistent scale of the semantic cue vectors, enhancing the stability of the aggregation process and the robustness of numerical computation.

[0090] The formula for calculating a normal aggregate vector is as follows:

[0091]

[0092] The formula for calculating the defect aggregation vector is as follows:

[0093]

[0094] in, This is a normal aggregated vector; This is the defect aggregation vector; This is a normalization operation; The semantic cue vector for the product node associated with the part image; The total number of attribute nodes associated with the product node; For the first associated product node Semantic hint vectors for each attribute node; This represents the total number of defect nodes associated with a product node. For the first associated product node The semantic hint vector of each defective node.

[0095] Step S30 further includes the following steps: weighted fusion of normal aggregate vector and defect aggregate vector according to different preset random mixing coefficients to obtain multiple abnormal semantic vectors with different degrees of abnormality; wherein, the random mixing coefficients satisfy the Beta distribution.

[0096] First, we need to define a random mixing coefficient that takes values ​​in the interval [0,1]. The random mixing coefficient The sampling follows a predefined Beta distribution, and the hyperparameters of the Beta distribution are... and It can be set based on prior knowledge, for example, all values ​​can be set to 0.5, which makes The sampled values ​​tend to appear around 0 and 1, while also covering intermediate values, thus simulating random but targeted sampling between normal, severely defective, and intermediate transitional states.

[0097] After generating a random mixing coefficient, this coefficient is used to perform linear interpolation on the obtained normal aggregate vector and defect aggregate vector. Linear interpolation starts with the defect aggregate vector and ends with the normal aggregate vector, using the random mixing coefficient as the weight for weighted summation, thus calculating an intermediate semantic vector. When the random mixing coefficient is close to 0, the generated intermediate semantic vector represents a semantic state close to a severe defect; when the random mixing coefficient is close to 1, the generated intermediate semantic vector represents a semantic state close to a intact state; when the random mixing coefficient takes an intermediate value, it corresponds to a state between the two, with a certain degree of anomalousness. This intermediate semantic vector is a preliminary sample of the anomalous semantic vector.

[0098] However, to increase sampling diversity and explore neighborhood information in the semantic space, and to avoid overly singular sampling points, it is necessary to perturb this initial anomaly semantic vector sampling. Specifically, the intermediate semantic vector obtained through the linear interpolation is treated as a Gaussian distribution mean vector. Subsequently, the final anomaly semantic vector is generated by randomly sampling from a multidimensional standard normal distribution centered on this mean vector and with a pre-defined fixed small variance unit covariance matrix. The calculation formula is as follows: ;in, These are random mixing coefficients; This is a normal aggregated vector; This is the defect aggregation vector.

[0099] By independently sampling the random mixing coefficients multiple times and repeatedly performing the above linear interpolation and Gaussian perturbation steps, a set of anomalous semantic vectors with a controllable number and covering different levels of assumptions from normal to anomalous can be obtained, forming an anomalous semantic vector set.

[0100] Please see Figure 1 The identification method also includes the following steps: S40, through the first output branch of the multi-scale semantic alignment network of the defect identification model, perform global visual feature analysis on the multi-scale feature map based on the abnormal semantic vector set to obtain the image-level anomaly score of the part image; through the second output branch of the multi-scale semantic alignment network, perform pixel feature analysis on the multi-scale feature map based on the abnormal semantic vector set to obtain the pixel-level anomaly score of the part image.

[0101] Step S40 includes the following steps: S41, using the calibration branch in the multi-scale semantic alignment network of the defect identification model, semantic calibration is performed on the multi-scale feature maps based on the set of abnormal semantic vectors to obtain calibration feature maps for the corresponding scales. Specifically, S41 includes:

[0102] For feature maps at each scale:

[0103] The feature map is then subjected to dimensionality reduction projection to obtain the corresponding compressed feature map;

[0104] Perform a depthwise separable convolution operation on the compressed feature map to obtain the corresponding first feature map;

[0105] The set of abnormal semantic vectors is converted into corresponding channel attention weights, and the first feature map is modulated channel by channel to obtain the second feature map.

[0106] The second feature map is projected in an increased dimension to obtain the third feature map;

[0107] The third feature map is added to the feature map to obtain the corresponding calibration feature map.

[0108] The calibration branch (e.g., Feature Pyramid Network, FPN) in the multi-scale semantic alignment network of the defect identification model can perform dimensionality reduction projection operations. Its input is a multi-scale feature map, represented as follows: ,in, , Let be the total number of scales, for example, 4. For each scale's feature map, the goal of dimensionality reduction projection is to map it from a high-dimensional visual feature space to a low-dimensional manifold space, significantly reducing the computational overhead of subsequent steps and facilitating the interaction of features at different scales within the low-dimensional manifold space. Dimensionality reduction projection can be achieved using a learnable projection matrix. To achieve this, a learnable projection matrix can be used. The initial value is a random decimal that follows a Gaussian distribution.

[0109] After obtaining the compressed feature map, in order to effectively extract local spatial details (such as edges, textures, blemishes, and other defect-sensitive local patterns) within the low-dimensional manifold space, the semantic interaction layer of the calibration branch can utilize depthwise separable convolution to process the compressed feature map. Depthwise separable convolution performs spatial convolution operations independently on each channel of the compressed feature map and pads pixels to maintain the spatial size of the compressed feature map. This step requires fusing spatial neighborhood information within each channel to capture local visual patterns without mixing information across channels. After depthwise separable convolution processing, the compressed feature map is transformed into a first feature map with the same spatial size and the same number of channels.

[0110] The semantic interaction layer is also used to modulate the first feature map using anomaly semantic vectors. These anomaly semantic vectors encode semantic information about a specific degree of anomalousness. To influence low-dimensional visual features with these anomaly semantic vectors, they need to be converted into channel attention weights that match the channel dimensions of the first feature map. This conversion is achieved through a small multilayer perceptron (MLP) network consisting of two fully connected layers with a non-linear activation function (such as ReLU) in between.

[0111] The specific process is as follows: the anomalous semantic vector is input into the MLP. The first layer of the MLP maps the anomalous semantic vector to an intermediate dimension. After ReLU activation, the second layer maps it to the target dimension, ultimately outputting a vector of the target dimension. This vector of the target dimension represents the channel attention weights derived from the anomalous semantic vector, with each element corresponding to the modulation coefficient of a channel in the first feature map. Next, channel-wise modulation processing is performed: each channel of the first feature map is element-wise multiplied with its corresponding channel attention weight.

[0112] Through this channel-by-channel multiplication operation, the degree of anomalousness carried by the anomalous semantic vector is injected into the visual features, guiding the network to focus on feature channels that are more relevant to the current anomalous hypothesis, resulting in a second feature map. The second feature map is the result of combining visual local details with semantic global guidance.

[0113] The calibration branch also needs to map the processed second feature map back to the original high-dimensional space so that it can be residually connected to the feature map at the corresponding scale. This process can be accomplished through a dimension-up projection operation, which uses another learnable projection matrix. To achieve, and The dimensions are transposes of each other. The initial value is all zeros. Through the up-dimensional projection operation, a third feature map can be obtained. The third feature map encodes the adjustment information that should be made to the multi-scale feature map according to the current anomaly semantic hypothesis, so as to enhance the features related to the hypothesized anomaly and suppress irrelevant features.

[0114] Third feature map The calculation formula is as follows:

[0115]

[0116] in, and Both are learnable projection matrices, and their dimensions are transposes of each other; Multi-scale feature maps; It is a non-linear activation function; For depthwise separable convolution; For element-wise multiplication; Anomaly semantic vector A target vector with the same dimensions as the first feature map, generated by a small multilayer perceptron (MLP).

[0117] The final step of the calibration branch is residual fusion, which involves element-wise addition of the calculated third feature map to the feature map at the corresponding scale. This addition operation is channel-aligned and spatially corresponding. Through element-wise addition, the original multi-scale feature maps are adjusted to corresponding scale calibration feature maps. This residual connection method ensures the stability of the training process and avoids the loss of original feature information or gradient vanishing problems that may occur due to the introduction of the calibration branch. Finally, for each scale, its corresponding calibration feature map is obtained.

[0118] Calibration feature map The calculation formula is as follows:

[0119]

[0120] in, This is the third feature map; This is a multi-scale feature map.

[0121] Step S40 further includes the following step: S42, using the first output branch in the multi-scale semantic alignment network, performing global visual feature analysis on the smallest-scale calibration feature map based on the anomaly semantic vector set to obtain the image-level anomaly score of the part image. Specifically, S42 includes:

[0122] Global average pooling is performed on the smallest-scale calibration feature map to generate global visual features;

[0123] Calculate the cosine similarity between global visual features and each anomaly semantic vector to obtain the corresponding image-level anomaly prediction value;

[0124] The mean of all image-level anomaly predictions is calculated to obtain the image-level anomaly score for the part image.

[0125] Among the calibration feature maps at multiple scales, the smallest scale calibration feature map corresponds to the output of the deepest network layer. It has the smallest spatial size. Although this smallest scale calibration feature map has relatively sparse spatial detail information, each feature vector incorporates contextual information from a large area of ​​the image, possessing the strongest semantic abstraction ability and global receptive field. Therefore, for image-level anomaly detection, the first output branch of the multi-scale semantic alignment network 140 (e.g., a network composed of Global Average Pooling (GAP), Linear, and Cosine Similarity) chooses to use this smallest scale calibration feature map as input.

[0126] Specifically, the selected calibration feature map is subjected to global average pooling (GAP) to generate a global visual feature. Global average pooling is a spatial aggregation operation that averages the calibration feature map along both the height and width dimensions. This operation effectively aggregates visual cues scattered across two-dimensional space that are related to potential defects, forming a comprehensive global visual feature. This global visual feature contains overall information about whether an anomaly exists in the part image and what kind of anomaly it might be.

[0127] After obtaining the global visual features, the fully connected layer (Linear) of the first output branch needs to match them with each anomaly semantic vector to evaluate the degree of fit between the global visual features and these anomaly semantic vectors. To measure the proximity of the global visual features to each anomaly semantic vector in the semantic space, the cosine similarity calculation layer of the first output branch uses cosine similarity as the metric. Cosine similarity calculates the cosine of the angle between two vectors in their directions to obtain an image-level anomaly prediction value. The image-level anomaly prediction value represents the probability or matching score that the global visual features are considered anomaly under the current anomaly semantic vector. A higher image-level anomaly prediction value indicates that the direction of the global visual features is highly consistent with the anomaly semantic vector, suggesting that it is very likely to match the description of that anomaly semantic vector; conversely, a lower value indicates that the direction of the global visual features does not match the anomaly semantic vector.

[0128] For each anomalous semantic vector, a corresponding image-level anomaly prediction value is calculated. Each image-level anomaly prediction value reflects the score of the global visual features under the corresponding anomalous semantic vector. To obtain a comprehensive final judgment and avoid misjudgment due to the randomness of a single sampling, the first output branch aggregates all image-level anomaly prediction values ​​and calculates the arithmetic mean to obtain the image-level anomaly score. The image-level anomaly score is a scalar value representing the overall probability estimate of the presence of defects in the part image. The higher the image-level anomaly score, the higher the confidence level in judging the presence of surface defects; conversely, it indicates that the part image is more likely to be consistent with the normal state.

[0129] Step S40 further includes the following step: S43, using the second output branch of the multi-scale semantic alignment network, performing pixel feature analysis on the calibration feature maps at each scale based on the anomaly semantic vector set to obtain the pixel-level anomaly score of the part image. Specifically, S43 includes:

[0130] The calibration feature maps at various scales are aligned and fused to obtain a fused feature map;

[0131] For each anomalous semantic vector, the cosine similarity between the fused feature map and the anomalous semantic vector is calculated pixel by pixel to generate the corresponding two-dimensional matrix;

[0132] Pixel aggregation is performed on all two-dimensional matrices to obtain pixel-level anomaly scores for the part images.

[0133] The second output branch of the multi-scale semantic alignment network 140 (e.g., a network consisting of an FPN Decoder and a Pixel-wise Cosine Similarity Map) can transform the calibration feature maps at various scales into a spatial probability map that indicates the likelihood of an anomaly at each pixel in the image, i.e., a pixel-level anomaly score. This process is performed on a per-unit-anomaly semantic vector basis.

[0134] First, to achieve effective fusion of multi-scale information, the Feature Pyramid Network Decoder (FPN Decoder) can adjust calibration feature maps of different scales to the same spatial resolution through upsampling or downsampling operations. For example, a spatial size at an intermediate or specific scale can be selected as the target resolution. For calibration feature maps with a spatial size larger than the target resolution, downsampling is performed using bilinear interpolation or transposed convolution; for calibration feature maps with a spatial size smaller than the target resolution, upsampling is performed using bilinear interpolation or nearest neighbor interpolation. This alignment process ensures that feature maps from all scales correspond strictly on the spatial grid.

[0135] Secondly, after adjusting the calibration feature maps of all scales to the same scale, they are concatenated along the channel dimension to form an intermediate feature map with a large number of channels. Subsequently, to optimize feature combination and control dimensionality, a 1x1 convolution operation is typically applied to this concatenated intermediate feature map. The role of this convolutional layer is to learn a cross-channel feature weighting and recombination mechanism to adaptively filter and fuse complementary information from different scales: for example, extracting fine edge and texture details from features at smaller scales (which is crucial for defect boundary localization), and extracting high-level semantic context from features at larger scales (which is crucial for understanding the type and overall morphology of defects).

[0136] Subsequently, after dimensionality reduction and fusion processing using 1x1 convolutions, a single fused feature map is output. The number of channels in this fused feature map is integrated into a preset dimension, while its spatial resolution is consistent with the previously set target resolution. This fused feature map integrates multi-scale information, possessing both high-resolution spatial details and deep semantic representation capabilities.

[0137] Then, for each anomalous semantic vector, a pixel-wise cosine similarity map treats each spatial location in the fused feature map as an independent feature vector, calculating the cosine similarity between the anomalous semantic vector and the feature vector at each spatial location in the fused feature map. The calculation method for cosine similarity is the same as that in the classification head, and will not be repeated here. After calculating the cosine similarity for all spatial locations in the fused feature map, the second output branch obtains a two-dimensional matrix, where each pixel value is a scalar between -1 and +1.

[0138] Each pixel value in the two-dimensional matrix represents the degree of matching between the local visual features of the corresponding location in the image and the anomalous semantic vector. A higher pixel value indicates that the features of that pixel region are related to the anomalous semantic vector, and it is likely a defect; conversely, a lower value suggests a more normal background. Therefore, each two-dimensional matrix provides a spatial prediction of where defects might be located in the image from a single anomalous perspective (the anomalous semantic vector).

[0139] To obtain a comprehensive final localization result and avoid inaccurate or missed localizations due to the randomness of a single sampling, pixel aggregation is required for all two-dimensional matrices. The second output branch can perform pixel-by-pixel numerical accumulation on all two-dimensional matrices; that is, for a specific pixel in a two-dimensional matrix, the arithmetic mean of all pixel values ​​of all two-dimensional matrices at that pixel is calculated. This calculation process is equivalent to performing pixel-level average pooling on all two-dimensional matrices.

[0140] This pixel aggregation operation ultimately yields a pixel-level anomaly score. Each pixel value in this score is the average score of the matching results of different anomaly semantic vectors at that pixel. This average value comprehensively considers the model's observations at multiple different anomaly points (from minor to severe) in the semantic space, and can smooth out noise or bias caused by a specific sampling.

[0141] Finally, surface defects in the part image can be determined based on the image-level and pixel-level anomaly scores. The specific process is as follows:

[0142] First, an overall assessment is made based on the image-level anomaly score. The calculated image-level anomaly score is compared with a preset score threshold. If the image-level anomaly score is lower than the score threshold, the part image is determined to be defect-free, and the process ends; if the image-level anomaly score is higher than or equal to the score threshold, the part image is determined to have surface defects, and the process proceeds to the next step of localization analysis.

[0143] For a part image determined to have defects, the pixel value of each pixel in the pixel-level anomaly score is then compared with a preset pixel threshold to generate a binary defect region mask. Pixels with pixel values ​​higher than the pixel threshold are marked as defects, while those with values ​​lower than the threshold are marked as background.

[0144] Finally, the defect region mask image undergoes post-processing and information extraction. For example, morphological operations are used to filter out noise, and connected component analysis is used to identify independent defect instances. The output includes a final conclusion on whether a defect exists, as well as quantitative information such as the location, quantity, and contour of the defect, thus completing the detection and localization of surface defects on the part.

[0145] The defect identification model in the above identification method is trained and updated based on a contrastive loss function, a regression loss function, and a segmentation loss function. The contrastive loss function is calculated based on a pre-defined hierarchical semantic knowledge graph, the updated hierarchical semantic knowledge graph of the training part images, and the global visual features of the training part images. The regression loss function is calculated based on the abnormal semantic vectors of the training part images, different random mixing coefficients, and the corresponding image-level anomaly scores. The segmentation loss function is calculated based on the pixel-level anomaly scores of the training part images and the true defect mask of the training part images.

[0146] First, the purpose of the contrastive loss function is to guide the model to learn more discriminative visual and semantic feature representations. The contrastive loss function is calculated based on three key inputs: a pre-defined hierarchical semantic knowledge graph, a hierarchical semantic knowledge graph updated using the currently used part image, and global visual features extracted from the same part image. The goal of the contrastive loss function is to narrow the distance in feature space between the semantic cue vectors of normal nodes (product nodes, attribute nodes) in the updated hierarchical semantic knowledge graph and the global visual features of the current part image, while simultaneously widening the distance between the semantic cue vectors of defective nodes and the global visual features. This contrastive learning strategy enhances the discriminative power of the feature representations.

[0147] Contrast loss function The calculation formula is as follows:

[0148]

[0149] in, The total number of part images used for training; For the first in the hierarchical semantic knowledge graph One node; For the first The node set corresponding to the part image used for training is the node set obtained by indexing the actual defect mask (GroundTruth) of the part image in a preset hierarchical semantic knowledge graph. For example, if the actual defect mask of the part image is a crack, then the node set is (gear, surface, crack). For the first The hierarchical importance weight of each node is assigned, for example, 0.5 for a product node and 1.0 for a defect node, prompting the model to prioritize fine-grained defects. It is an exponential function; The cosine similarity function; For the first Global visual features of the parts images used in training; For the first Semantic cue vectors for each node; Temperature coefficient; For the first in the hierarchical semantic knowledge graph One node; All nodes in the hierarchical semantic knowledge graph; For the first The semantic cue vector of each node.

[0150] Secondly, the regression loss function is used to supervise the prediction accuracy of image-level anomaly scores. Its calculation involves the anomaly semantic vectors generated during the sampling process of the training part images, different random mixing coefficients, and the corresponding image-level anomaly scores. The regression loss function measures the difference between the final image-level anomaly score output by the model and the random mixing coefficients used to generate these anomaly semantic vectors. Specifically, the random mixing coefficients represent the theoretical degree of anomaly in the defect aggregate vector when the normal aggregate vector and the defect aggregate vector are fused. The regression loss function requires that the score trend predicted by the model should be positively correlated with this theoretical degree of anomaly; that is, the higher the random mixing coefficients, the higher the predicted image-level anomaly score should be. The regression loss function ensures the logical consistency of the model's quantitative prediction of anomaly severity.

[0151] Regression loss function The calculation formula is as follows:

[0152]

[0153] in, The total number of part images used for training; For the first Global visual features of the parts images used in training; For the first The abnormal semantic vector corresponding to the part image used for training; For the first Image-level anomaly scores are calculated from the global visual features of the part images used in training and the corresponding anomaly semantic vectors. These are the random mixing coefficients sampled when generating the anomaly semantic vector. It is the square of the L2 norm (i.e., the square of the Euclidean norm).

[0154] Finally, a segmentation loss function is used to optimize the accuracy of pixel-level defect localization. Its input includes the pixel-level anomaly score predicted by the model for the training part image, and the corresponding ground truth defect mask. The ground truth defect mask is a binary image where pixels in the defect region have a value of 1, and pixels in the background region have a value of 0. The segmentation loss function measures the difference between the pixel-level anomaly score and the ground truth defect mask at the pixel level, for example, using pixel-wise binary cross-entropy loss or Dice loss. By minimizing this loss, the model is driven to adjust the network parameters so that the generated pixel-level anomaly scores not only have higher response values ​​in the defect region, but also approximate the ground truth defect label as closely as possible in shape and spatial extent, thereby improving the accuracy of defect segmentation.

[0155] Segmentation loss function The calculation formula is as follows:

[0156]

[0157]

[0158]

[0159] in, The total number of part images used for training; For the first Pixel-level anomaly scores of the parts images used in training; For the first The true defect mask of the part images used in training; This represents the total number of pixels in the pixel-level anomaly score. For pixel-level anomaly scores or the first pixel in the true defect mask 1 pixel; The first in the real defect mask The pixel value of each pixel; The first in the pixel-level anomaly score The pixel value of each pixel; To prevent smoothing terms from being divided by zero, for example... ; It is a binary cross-entropy loss function used to measure the classification error between pixel-level anomaly scores and the true defect mask on a pixel-by-pixel basis; The Dice loss function is used to measure the degree of overlap between the region corresponding to the pixel-level anomaly score and the region corresponding to the real defect mask. and The combined use of these methods aims to balance pixel-by-pixel classification accuracy with overall region segmentation quality.

[0160] The defect recognition model is trained and updated based on a loss function, which is a weighted sum of contrastive, regression, and segmentation loss functions. During training, the gradient of the loss function with respect to all trainable parameters of the model is calculated using the backpropagation algorithm, and the parameters are iteratively updated using an optimizer (such as Adam) until the model performance converges. This multi-task loss joint optimization mechanism ensures that the defect recognition model achieves good performance in three tasks: image-level anomaly detection, anomaly severity quantification, and pixel-level defect localization.

[0161] loss function The calculation formula is as follows:

[0162]

[0163] in, , , These are preset weighting coefficients, for example, 1, 5, and 10 respectively; , , These are the contrastive loss function, the regression loss function, and the segmentation loss function, respectively.

[0164] As can be seen, in the above scheme, by introducing a pre-defined hierarchical semantic knowledge graph and associating it with image features, prior knowledge such as the category, attributes, and defects of parts is effectively integrated into the model, enabling the model to more accurately identify and distinguish various subtle defects that are highly similar to normal textures and have complex shapes. Through sampling of anomalous semantics and multi-scale semantic alignment analysis, coverage and accurate evaluation of defects of different degrees and scales are achieved, while outputting image-level anomaly judgment and pixel-level localization results, thereby significantly improving the detection accuracy and reliability of complex and diverse defects.

[0165] Please see Figure 3 The present invention also discloses a device for identifying surface defects of parts, and the above-mentioned identification method can be applied to the identification device. The identification device may include a feature extraction module 210, a semantic association module 220, a semantic sampling module 230, and a defect processing module 240.

[0166] The feature extraction module 210 is used to input the part image into the pre-trained defect recognition model, and extract features from the part image through the visual encoder of the defect recognition model to obtain a multi-scale feature map.

[0167] The semantic association module 220 is used to input the preset hierarchical semantic knowledge graph into the defect recognition model. Through the semantic perception network of the defect recognition model, the multi-scale feature map is dynamically associated with the hierarchical semantic knowledge graph to update the hierarchical semantic knowledge graph. The hierarchical semantic knowledge graph includes multiple nodes at multiple levels, and each node has a corresponding preset initial prompt vector.

[0168] The semantic sampling module 230 is used to perform anomaly sampling on the updated hierarchical semantic knowledge graph through the semantic sampling network of the defect identification model, so as to obtain abnormal semantic vectors with different degrees of anomaly and form an abnormal semantic vector set.

[0169] The defect processing module 240 is used to perform global visual feature analysis on the multi-scale feature map based on the set of abnormal semantic vectors through the first output branch of the multi-scale semantic alignment network of the defect recognition model to obtain the image-level anomaly score of the part image; and to perform pixel feature analysis on the multi-scale feature map based on the set of abnormal semantic vectors through the second output branch of the multi-scale semantic alignment network to obtain the pixel-level anomaly score of the part image.

[0170] For specific limitations regarding the identification device, please refer to the limitations of the identification method above, which will not be repeated here. Each module in the aforementioned identification device can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in hardware or independently of the memory in the electronic device, or stored in software form in the memory of the electronic device, so that the memory can call and execute the operations corresponding to each module.

[0171] The above embodiments are merely illustrative of the principles and effects of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or alter the above embodiments without departing from the spirit and scope of the present invention. Therefore, all equivalent modifications or alterations made by those skilled in the art without departing from the spirit and technical concept disclosed in the present invention should still be covered by the claims of the present invention.

Claims

1. A method for identifying surface defects in a part, characterized in that, include: The part image is input into a pre-trained defect recognition model, and the visual encoder of the defect recognition model extracts features from the part image to obtain a multi-scale feature map. The preset hierarchical semantic knowledge graph is input into the defect recognition model. Through the semantic perception network of the defect recognition model, the multi-scale feature map is dynamically associated with the hierarchical semantic knowledge graph to update the hierarchical semantic knowledge graph. The hierarchical semantic knowledge graph includes multiple nodes at multiple levels, and each node has a corresponding preset initial prompt vector. The semantic sampling network of the defect identification model is used to sample the updated hierarchical semantic knowledge graph to obtain abnormal semantic vectors with different degrees of abnormality, forming an abnormal semantic vector set. The first output branch of the multi-scale semantic alignment network of the defect identification model performs global visual feature analysis on the multi-scale feature map based on the set of abnormal semantic vectors to obtain an image-level anomaly score for the part image; the second output branch of the multi-scale semantic alignment network performs pixel feature analysis on the multi-scale feature map based on the set of abnormal semantic vectors to obtain a pixel-level anomaly score for the part image.

2. The method for identifying surface defects of a part according to claim 1, characterized in that, The step of dynamically associating the multi-scale feature map with the hierarchical semantic knowledge graph to update the hierarchical semantic knowledge graph includes: The feature maps at each scale of the multi-scale feature map are linearly projected based on different initial cue vectors to obtain the corresponding visual features. Based on the attention mechanism, the initial prompt vectors of each node in the hierarchical semantic knowledge graph are fused with the corresponding visual features to obtain the updated semantic prompt vectors of the hierarchical semantic knowledge graph.

3. The method for identifying surface defects of a part according to claim 1, characterized in that, The nodes include product nodes, attribute nodes, and defect nodes. The step of anomaly sampling of the updated hierarchical semantic knowledge graph to obtain anomaly semantic vectors with different degrees of anomaly includes: Cluster analysis is performed on the semantic hint vectors of product nodes and attribute nodes in the updated hierarchical semantic knowledge graph to obtain normal aggregate vectors; Cluster analysis is performed on the semantic hint vectors of defect nodes in the updated hierarchical semantic knowledge graph to obtain defect aggregation vectors; Based on different preset random mixing coefficients, the normal aggregated vector and the defect aggregated vector are weighted and fused to obtain multiple abnormal semantic vectors with different degrees of abnormality; wherein, the random mixing coefficients satisfy the Beta distribution.

4. The method for identifying surface defects of a part according to claim 1, characterized in that, The first output branch of the multi-scale semantic alignment network of the defect identification model performs global visual feature analysis on the multi-scale feature map based on the set of abnormal semantic vectors to obtain the image-level anomaly score of the part image. Through the second output branch of the multi-scale semantic alignment network, pixel feature analysis is performed on the multi-scale feature map based on the set of anomaly semantic vectors to obtain pixel-level anomaly scores for the part image, including: Using the calibration branch in the multi-scale semantic alignment network of the defect identification model, semantic calibration is performed on the multi-scale feature maps based on the set of abnormal semantic vectors to obtain calibration feature maps of the corresponding scales. Through the first output branch of the multi-scale semantic alignment network, global visual feature analysis is performed on the smallest-scale calibration feature map based on the set of abnormal semantic vectors to obtain the image-level anomaly score of the part image; The pixel-level anomaly score of the part image is obtained by performing pixel feature analysis on the calibration feature maps at each scale based on the set of anomaly semantic vectors through the second output branch of the multi-scale semantic alignment network.

5. The method for identifying surface defects of a part according to claim 4, characterized in that, The step of semantically calibrating the multi-scale feature maps based on the set of abnormal semantic vectors to obtain calibrated feature maps at the corresponding scales includes: Feature maps for each scale: The feature map is then subjected to dimensionality reduction projection to obtain the corresponding compressed feature map; Perform a depthwise separable convolution operation on the compressed feature map to obtain the corresponding first feature map; The set of abnormal semantic vectors is converted into corresponding channel attention weights, and the first feature map is modulated channel by channel to obtain the second feature map. The second feature map is projected in an upgraded dimension to obtain the third feature map; The third feature map is added to the feature map to obtain the corresponding calibration feature map.

6. The method for identifying surface defects of a part according to claim 4, characterized in that, The step of performing global visual feature analysis on the smallest-scale calibration feature map based on the set of anomalous semantic vectors to obtain the image-level anomaly score of the part image includes: Global average pooling is performed on the smallest-scale calibration feature map to generate global visual features; Calculate the cosine similarity between the global visual features and each anomaly semantic vector to obtain the corresponding image-level anomaly prediction value; The mean of all image-level anomaly predictions is calculated to obtain the image-level anomaly score of the part image.

7. The method for identifying surface defects of a part according to claim 4, characterized in that, The step of performing pixel feature analysis on calibration feature maps at various scales based on the set of abnormal semantic vectors to obtain pixel-level anomaly scores for the part image includes: The calibration feature maps at various scales are aligned and fused to obtain a fused feature map; For each anomalous semantic vector, the cosine similarity between the fused feature map and the anomalous semantic vector is calculated pixel by pixel to generate the corresponding two-dimensional matrix; Pixel aggregation is performed on all two-dimensional matrices to obtain pixel-level anomaly scores for the part images.

8. The method for identifying surface defects of a part according to claim 1, characterized in that, The defect identification model is trained and updated based on the contrast loss function, regression loss function, and segmentation loss function; The contrast loss function is calculated based on a preset hierarchical semantic knowledge graph, the hierarchical semantic knowledge graph updated from the training part images, and the global visual features of the training part images. The regression loss function is calculated based on the abnormal semantic vector of the part images used for training, different random mixing coefficients, and the corresponding image-level anomaly scores. The segmentation loss function is calculated based on the pixel-level anomaly scores of the training part images and the true defect masks of the training part images.

9. The method for identifying surface defects of a part according to claim 8, characterized in that, The defect identification model is trained and updated based on a loss function; wherein the loss function is: , , , The preset weighting coefficients, , , These are the contrastive loss function, the regression loss function, and the segmentation loss function, respectively.

10. A device for identifying surface defects in a part, characterized in that, include: The feature extraction module is used to input the part image into the pre-trained defect recognition model, and extract features from the part image through the visual encoder of the defect recognition model to obtain a multi-scale feature map. The semantic association module is used to input a preset hierarchical semantic knowledge graph into the defect recognition model, and dynamically associate the multi-scale feature map with the hierarchical semantic knowledge graph through the semantic perception network of the defect recognition model to update the hierarchical semantic knowledge graph; wherein, the hierarchical semantic knowledge graph includes multiple nodes at multiple levels, and each node has a corresponding preset initial prompt vector; The semantic sampling module is used to perform anomaly sampling on the updated hierarchical semantic knowledge graph through the semantic sampling network of the defect identification model, so as to obtain anomaly semantic vectors with different degrees of anomaly and form an anomaly semantic vector set. The defect processing module is used to perform global visual feature analysis on the multi-scale feature map based on the set of abnormal semantic vectors through the first output branch of the multi-scale semantic alignment network of the defect recognition model to obtain an image-level anomaly score of the part image; and to perform pixel feature analysis on the multi-scale feature map based on the set of abnormal semantic vectors through the second output branch of the multi-scale semantic alignment network to obtain a pixel-level anomaly score of the part image.