Defect detection method, system, electronic device, storage medium, and program product
By introducing structural consistency determination and data diversion before training the defect detection model, the problems of low training efficiency and poor stability in existing technologies are solved, and efficient training and adaptive improvement of defect detection models in industrial manufacturing are achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- ZHEJIANG YIFANG HANGCHUANG TECHNOLOGY CO LTD
- Filing Date
- 2026-05-28
- Publication Date
- 2026-06-23
AI Technical Summary
Existing defect detection models in industrial manufacturing suffer from a lack of structural consistency assessment of the original images, resulting in a large amount of ineffective computational load and low training efficiency. This affects the model's convergence stability and makes it difficult to adapt to changes in processes and diverse products.
By introducing a structural consistency determination and data diversion mechanism based on reference images, the original images are registered, aligned, and diverted before model training to remove interfering data, retain defective candidate samples with structural anomalies, and construct a multi-layered data governance architecture.
The model's data input foundation has been optimized, reducing training costs, improving model stability and adaptability, and enabling the identification of unknown defects while reducing manual annotation costs.
Smart Images

Figure CN122265296A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of image data processing technology, and more specifically, to a defect detection method, system, electronic device, storage medium, and program product. Background Technology
[0002] In the field of industrial manufacturing quality control, automated defect detection technology based on machine vision has become a key means to ensure product yield. With the development of deep learning technology, current mainstream solutions typically use data-driven models such as convolutional neural networks to identify defects in images collected during the production process. The implementation process of such solutions generally includes: first, collecting a large number of production images and building a training dataset; then, training the detection model using labeled data; and finally, directly inputting newly collected images into the trained model for defect judgment.
[0003] However, existing technical solutions typically input all raw image data collected during the production process directly into the downstream model to be trained. Due to the lack of a governance mechanism to determine the structural consistency of the raw images and implement data diversion before the model comes into contact with the data, a large amount of interference data without defects directly enters the model training process, causing the model to bear a large amount of invalid computational load, affecting training efficiency and model convergence stability. Summary of the Invention
[0004] The purpose of this application is to provide a defect detection method, system, electronic device, storage medium, and program product to solve the above-mentioned problems.
[0005] In a first aspect, embodiments of this application provide a defect detection method, comprising: acquiring an original image and a reference image; wherein the reference image is used to characterize the structural benchmark of a normal product; performing registration and alignment processing on the original image and the reference image; before inputting the original image into a downstream defect detection model to be trained, performing a structural consistency determination on the original image based on the registration and alignment results; and splitting the original image based on the structural consistency determination results; wherein the known defect data obtained after splitting is used to train the downstream defect detection model.
[0006] In the implementation of the above scheme, by introducing a structural consistency judgment and data diversion mechanism based on reference images before model training, the quality management and hierarchical screening of the original images are realized at the data entry point of the model, optimizing the data input foundation of the downstream defect detection model. On the other hand, by pre-judging and removing interfering data with the same structure as the reference image, the invalid computational load of the downstream model in processing samples that do not contain defects is reduced, thus lowering the sample cleaning cost during the training phase. Furthermore, by identifying out-of-distribution anomalies that do not match the existing defect features through structural consistency judgment, the initial discovery and labeling of unknown defects can be achieved in the absence of prior labels. Moreover, by placing the structural consistency judgment step in advance and decoupling it from the downstream model training process, interfering data is prevented from directly entering the model training process, thereby improving the stability and controllability of the defect detection model training.
[0007] In one implementation of the first aspect, the step of splitting the original image based on the structural consistency determination result includes: if the structural consistency determination result indicates that the original image is consistent with the reference image, then the original image is split into interference data; if the structural consistency determination result indicates that the original image has structural anomalies, then the original image is split into defect data; extracting defect features from the defect data and matching the defect features with existing defect features; if the matching result indicates that the defect features match the existing defect features, then the defect data is split into known defect data; otherwise, the defect data is split into unknown defect data.
[0008] In the implementation of the above scheme, by performing hierarchical analysis on the structural consistency judgment results, a three-level fine-grained diversion of the original image to interference data, known defect data, and unknown defect data was achieved at the data entry point, thus constructing a multi-layered data governance architecture. On the other hand, by combining structural anomaly judgment with existing defect feature matching, the defect types already covered by the training data and new defects outside the distribution are distinguished, providing cleaned known defect samples for model training while retaining unknown defect samples for open set analysis. Furthermore, by diverting and retaining unknown defect data, an automatic discovery mechanism for newly emerging defect types is established without the need for manual prior annotation, thereby improving the defect detection system's adaptability to changes in production line processes.
[0009] In one implementation of the first aspect, the step of performing a structural consistency determination on the original image based on the registration and alignment results includes: determining a valid region of interest (ROI) based on the registration and alignment results; wherein the valid ROI is used to characterize the intersection of valid imaging regions in the original image and the reference image after registration and alignment; determining the structural difference response between the original image and the reference image within the valid ROI, and generating a difference map; determining candidate anomalous regions based on the difference map; calculating a structural similarity index between the candidate anomalous regions and the corresponding regions in the reference image, and comparing the structural similarity index with a preset structural consistency threshold to determine whether the original image has structural anomalies relative to the reference image, and obtaining a structural consistency determination result.
[0010] In the implementation of the above scheme, a progressive technical process is constructed from registration and alignment, effective region definition, difference mapping generation to structural similarity determination. A complete data governance mechanism based on reference image structural constraints is established before the model comes into contact with the data, providing a structurally verified input data foundation for downstream models. On the other hand, by determining the intersection of effective imaging regions based on the registration and alignment results, the structural consistency determination is limited to the visible region with reliable alignment relationship, suppressing the interference of edge registration errors and invalid pixel regions on the determination results. Furthermore, through a two-stage processing of first determining candidate abnormal regions based on difference mapping and then calculating the structural similarity index, a hierarchical progression from pixel-level difference recall to region-level structural verification is achieved, distinguishing between real structural anomalies and pseudo-differences caused by illumination noise. Moreover, by calculating the structural similarity index between the candidate abnormal region and the corresponding region of the reference image and performing threshold comparison, pixel-level grayscale differences are transformed into structural consistency quantitative indicators, improving the sensitivity to structural deformation defects.
[0011] In one implementation of the first aspect, determining the effective region of interest based on the registration and alignment results includes: extracting the effective pixel region of the reference image and the effective pixel region of the original image respectively; wherein, the effective pixel is a pixel whose gray value is greater than a preset gray value threshold; performing an intersection operation on the effective pixel region of the reference image and the effective pixel region of the original image to obtain an overlapping effective region; performing a morphological closing operation on the overlapping effective region; extracting the largest inscribed rectangle within the overlapping effective region after the morphological closing operation, and determining the region defined by the largest inscribed rectangle as the effective region of interest.
[0012] In the implementation of the above scheme, a multi-step processing flow based on grayscale threshold extraction, intersection operation, morphological processing, and maximum inscribed rectangle extraction is used to establish an effective region of interest with reliable alignment and geometric regularity on the basis of registration and alignment, providing a spatial constraint benchmark for subsequent structural difference analysis. On the other hand, by performing an intersection operation on the effective pixel regions of the reference image and the original image, the structural consistency judgment is limited to the overlapping region where both images have effective imaging responses, eliminating the interference of single-sided invalid pixels or misaligned edge regions on the judgment results. Furthermore, by performing morphological closing operations on the overlapping effective region to fill the internal isolated holes, and combining the maximum inscribed rectangle extraction to obtain a continuous and unbroken regular comparison region, the adverse effects of fragmented regions on difference mapping generation and structural similarity calculation are avoided.
[0013] In one implementation of the first aspect, determining the structural difference response between the original image and the reference image within the effective region of interest and generating a difference map includes: calculating a grayscale difference index, a gradient difference index, and a high-frequency structural difference index between the original image and the reference image within the effective region of interest; and performing weighted fusion of the grayscale difference index, the gradient difference index, and the high-frequency structural difference index to generate a difference map.
[0014] In the implementation of the above scheme, a multi-dimensional difference response mechanism is constructed before the model comes into contact with the data by fusing multi-level structural difference indicators such as gray-level difference, gradient difference, and high-frequency structural difference within the effective region of interest, thereby achieving high recall detection capability for different types of structural anomalies. On the other hand, by calculating gray-level difference indicators to capture overall intensity deviation, gradient difference indicators to capture edge contour changes, and high-frequency structural difference indicators to capture texture detail anomalies, the complementarity of multiple indicators is used to cover different types of defect features such as gray-level abrupt changes, edge distortion, and texture destruction. Furthermore, by weighted fusion of multi-level difference indicators to generate a unified difference mapping, the multi-dimensional difference information is transformed into a single response map, providing a data foundation for subsequent threshold segmentation and connected component analysis based on statistical distribution.
[0015] In one implementation of the first aspect, determining candidate abnormal regions based on the difference mapping includes: determining an adaptive threshold based on the statistical characteristics of the difference distribution within the effective region of interest; determining abnormal pixels based on the adaptive threshold; and performing connected component analysis on the abnormal pixels to obtain candidate abnormal regions.
[0016] In the implementation of the above scheme, an adaptive threshold is determined based on the statistical characteristics of the difference distribution within the effective region of interest. Dynamic threshold segmentation is performed on the difference mapping to extract abnormal pixels. Connectivity analysis is then performed to merge continuous abnormal pixels into candidate abnormal regions, establishing a complete recall mechanism from pixel-level difference response to region-level abnormal candidate, providing a candidate object basis for subsequent structural consistency verification. On the other hand, by adaptively determining the threshold based on the statistical characteristics of the difference distribution within the effective region of interest, the abnormal pixel extraction process adapts to the changes in difference distribution under different imaging conditions and product types. Furthermore, by performing connectivity analysis on abnormal pixels to merge them into candidate abnormal regions and extracting location information, discrete pixel-level abnormal responses are transformed into region-level candidate objects with spatial continuity, laying the foundation for subsequent calculation of the structural similarity index for the entire region.
[0017] In one implementation of the first aspect, calculating the structural similarity index between the candidate anomaly region and the corresponding region of the reference image includes: calculating the brightness consistency index, contrast consistency index, and structural consistency index between the candidate anomaly region and the corresponding region of the reference image, respectively; and fusing the brightness consistency index, the contrast consistency index, and the structural consistency index to obtain the structural similarity index between the candidate anomaly region and the corresponding region of the reference image.
[0018] In the implementation of the above scheme, a multi-dimensional structural similarity evaluation mechanism is established at the candidate anomaly region level by calculating and fusing the brightness consistency index, contrast consistency index, and structural consistency index separately. This elevates pixel-level differences to a comprehensive consistency quantification index that includes brightness, contrast, and structural information, thereby improving the accuracy and distinguishability of structural anomaly judgment. On the other hand, by introducing the brightness consistency index to evaluate the consistency of brightness distribution between the candidate anomaly region and the corresponding region of the reference image, the interference of light intensity fluctuations or exposure differences on structural anomaly judgment is suppressed, avoiding misjudging light changes as structural defects. Furthermore, by introducing the contrast consistency index and structural consistency index to evaluate the degree of local contrast retention and the correlation of structural information respectively, the contrast fluctuations caused by imaging noise are distinguished from real structural deformation or damage, accurately identifying real defect regions with structural damage characteristics.
[0019] In one implementation of the first aspect, the method further includes: based on the structural consistency determination result, annotating the defect data to generate pseudo-labels containing defect location and type identifiers.
[0020] In the implementation of the above scheme, the defect data is automatically labeled and pseudo-labels containing defect location and type identifiers are generated by reusing the pre-construction structural consistency judgment results. The automatic labeling function is embedded in the initial screening data governance process, realizing the downstream reuse of structural analysis results, reducing the cost of manual labeling and accelerating the construction of training data. On the other hand, by directly generating pseudo-labels based on the structural anomaly location and defect classification information in the structural consistency judgment results, the data cleaning and labeling tasks are integrated into the shared process, avoiding the repeated processing of the same defect samples and improving the overall efficiency of defect data governance. Furthermore, by generating pseudo-labels as pre-labeling results for subsequent manual review or model training reference, the cost of relying entirely on manual labeling is reduced while retaining the manual verification link, balancing the degree of automation and the requirements of labeling accuracy.
[0021] Secondly, embodiments of this application provide a defect detection system, including: a preliminary screening module and a downstream defect detection module communicatively connected to the preliminary screening module, wherein: The initial screening module is used to acquire an original image and a reference image; wherein the reference image is used to characterize the structural baseline of a normal product; the original image and the reference image are registered and aligned; before inputting the original image into the downstream defect detection model to be trained, a structural consistency determination is performed on the original image based on the registration and alignment results; based on the structural consistency determination results, the original image is split; wherein the known defect data obtained after splitting is used to train the downstream defect detection model; and the known defect data is sent to the downstream defect detection module. The downstream defect detection module is used to receive the known defect data and perform training.
[0022] Thirdly, embodiments of this application provide an electronic device, including: a processor, a memory, and a communication bus, wherein the processor and the memory communicate with each other through the communication bus; the memory stores computer program instructions that can be executed by the processor, and the computer program instructions are read and executed by the processor to perform the method provided in the first aspect or any possible implementation of the first aspect.
[0023] Fourthly, embodiments of this application provide a computer-readable storage medium storing computer program instructions, which, when read and executed by a processor, perform the method provided in the first aspect or any possible implementation thereof.
[0024] Fifthly, embodiments of this application provide a computer program product, the computer program product including a computer program, which, when executed by a processor, implements the method provided by the first aspect or any possible implementation of the first aspect.
[0025] Other features and advantages of this application will be set forth in the following description and will be apparent in part from the description or may be learned by practicing embodiments of this application. The objectives and other advantages of this application may be realized and obtained by means of the structures particularly pointed out in the written description, claims and drawings. Attached Figure Description
[0026] To more clearly illustrate the technical solutions of the embodiments of this application, the accompanying drawings used in the embodiments of this application will be briefly introduced below. It should be understood that the following drawings only show some embodiments of this application and should not be regarded as a limitation of the scope. For those skilled in the art, other related drawings can be obtained based on these drawings without creative effort.
[0027] Figure 1 A flowchart illustrating a prior art defect detection scheme provided in the embodiments of this application; Figure 2 A schematic flowchart illustrating the defect detection method provided in this application embodiment; Figure 3 This is a schematic diagram of the structural consistency determination process provided in the embodiments of this application; Figure 4 This is a flowchart illustrating a defect detection method in a specific application scenario provided in an embodiment of this application. Figure 5 This is a schematic diagram of the architecture of the defect detection system provided in the embodiments of this application; Figure 6 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. Detailed Implementation
[0028] The technical solutions of the embodiments of this application will now be described with reference to the accompanying drawings. The following embodiments are only used to more clearly illustrate the technical solutions of this application, and are therefore merely examples and should not be used to limit the scope of protection of this application.
[0029] like Figure 1As shown, most existing defect detection schemes adopt an end-to-end defect detection architecture based on deep learning. The main process includes: first, acquiring a large number of production images; then, manually labeling known defect categories to construct a training dataset containing a limited number of defect types; training a convolutional neural network or detector model using the labeled training dataset to obtain a trained defect recognition model; and finally, directly inputting newly acquired images into the trained model for inference and outputting the defect judgment result. While some existing schemes introduce reference or template images as auxiliary information, they are only used for one-time registration or simple difference calculations, and their output is directly used as the basis for defect judgment.
[0030] The aforementioned solutions typically require large-scale, detailed data labeling before model training. This labeling process is labor-intensive and time-consuming, and when product form or process conditions change, data often needs to be re-collected and re-labeled, resulting in high engineering costs. Furthermore, these solutions usually directly input all collected raw image data into the deep learning model for training or inference without effectively separating interfering data and potentially defective data. This causes the model to bear a large amount of invalid computational load, affecting training stability and the reliability of results.
[0031] In view of this, embodiments of this application provide a defect detection method. This method introduces a structural consistency judgment and data diversion mechanism based on a reference image before model training, thereby achieving quality management and hierarchical screening of the original image at the data entry point of the model, and optimizing the data input foundation of the downstream defect detection model. On the other hand, by pre-judging and removing interfering data with the same structure as the reference image, the method reduces the invalid computational load of the downstream model in processing samples that do not contain defects, and reduces the sample cleaning cost during the training phase. Furthermore, by identifying out-of-distribution anomalies that do not match existing defect features through structural consistency judgment, the method achieves preliminary discovery and labeling of unknown defects in the absence of prior labels. Moreover, by placing the structural consistency judgment step in advance and decoupling it from the downstream model training process, the method avoids interfering data from directly entering the model training process, thereby improving the stability and controllability of the defect detection model training.
[0032] The application scenarios of the above defect detection methods include, but are not limited to: (1) Wafer defect detection scenario in semiconductor manufacturing process: In this application scenario, wafer images are collected as raw data, and reference images are collected in the stable area around the wafer to characterize the structural benchmark of normal products. Since the imaging parameters are systematically shifted due to machine aging during long-term operation of the production line, this method can determine the structural consistency and divert the collected data based on the reference image before model training, promptly identify and remove interference data caused by illumination fluctuations or alignment errors, and retain out-of-distribution defect samples caused by process changes, so as to ensure the training stability and unknown defect detection capability of the deep learning model under complex process conditions.
[0033] (2) Flexible manufacturing quality inspection scenario for multiple models and batches of products: When the production line frequently switches between different models of products or adjusts process parameters, the normal structural benchmarks corresponding to various products are different, and traditional detection schemes based on fixed thresholds or single models are difficult to adapt to. The above defect detection method introduces reference images of the corresponding models as dynamic structural constraints, realizes the adaptation of the original images of different models of products at the data entry point, avoids inputting interference data that does not conform to the current model benchmark into the model, and captures unknown defect patterns introduced by new processes through the structural consistency judgment mechanism, thereby improving the adaptability of the quality inspection system to product diversity.
[0034] (3) Large-scale training data construction scenario: When a large amount of unlabeled historical image data is accumulated during long-term operation of the production line, manual annotation would be costly. The above defect detection method sets up a preliminary screening module decoupled from the downstream model before model training. It uses reference alignment and structural consistency judgment to automatically pre-screen the historically accumulated coarse data, retaining only defect candidate samples with structural abnormalities for manual fine annotation, eliminating a large amount of interference data without defects, reducing the workload of manual annotation, and constructing a balanced training dataset containing in-distribution defects and out-of-distribution defects by distinguishing between known and unknown defects, thereby improving the generalization ability and robustness of the downstream model.
[0035] Please see Figure 2 The diagram illustrates a flowchart of a defect detection method provided in an embodiment of this application. This defect detection method can be applied to electronic devices, which may include physical devices such as servers, PCs, tablets, or smartphones, or virtual devices such as virtual machines or containers. The electronic device can be a single device, a combination of multiple devices, or a cluster of a large number of devices. The aforementioned defect detection method may include: Step S110: Obtain the original image and the reference image; wherein the reference image is used to characterize the structural reference of a normal product.
[0036] The aforementioned reference image refers to an image acquired under specific imaging conditions to characterize the structural baseline of a normal, defect-free product. It reflects the baseline attributes of a normal product in terms of geometry, grayscale distribution, and texture features, serving as a reference for subsequent structural consistency assessment. The acquisition of the reference image can be achieved, for example, by selecting a quality-verified defect-free standard sample and acquiring the image under the same imaging environment and equipment parameters as the sample to be inspected, ensuring a comparable geometric imaging relationship between the reference image and the original image. Then, systematic offsets are eliminated through imaging equipment calibration to establish a normal-state baseline image covering different models, batches, and operating conditions, supporting the structural consistency assessment requirements of the original image under different input conditions. Taking wafer defect detection as an example, one way to acquire the aforementioned reference image is to select an area with stable structural features around the wafer for image acquisition. Furthermore, systematic drift caused by equipment aging or environmental changes can be absorbed into the normal baseline through equipment-side imaging parameter calibration, ensuring that the reference image covers the normal-state characterization requirements under different batches and lighting conditions.
[0037] The aforementioned raw images refer to coarse data images collected from the production line during the defect detection model training phase, which have not yet been screened or labeled. These images may include normal, defect-free samples, noisy samples, and various known or unknown defect samples. Raw images can be continuously and automatically acquired using industrial imaging equipment. For example, in wafer defect detection scenarios, a scanning electron microscope can be used to perform area-by-area scanning imaging of wafers on the production line; in multi-model product manufacturing scenarios, industrial vision systems can be used to automatically acquire images of different batches of products.
[0038] Step S120: Register and align the original image with the reference image.
[0039] Because the original image and the reference image may deviate in terms of acquisition time, imaging position, or mechanical positioning, direct pixel-level comparison will introduce false difference responses caused by geometric misalignment, interfering with the identification of real structural anomalies. The purpose of the above registration and alignment process is to establish a geometric correspondence between the original image and the reference image, making them structurally comparable under a unified coordinate system. Through registration and alignment, global displacement and local deformation differences between the two images can be eliminated, ensuring that subsequent difference analysis targets structural features at the same spatial location, rather than artifacts caused by imaging position deviations.
[0040] Furthermore, from the perspective of the overall scheme of the above defect detection method, the above registration and alignment processing can provide a spatial reference guarantee for the structural consistency determination in the subsequent step S130. Based on the completion of registration and alignment, the structural constraints based on the reference image can accurately act on the corresponding areas of the original image, thereby ensuring that the data diversion determination is based on the actual product structural deviation, rather than the alignment error in the imaging process.
[0041] The registration and alignment process in this application can be performed through at least one of the following multiple implementation methods: The first implementation method: Registration and alignment are performed using the frequency domain phase correlation method combined with the correlation coefficient; In this embodiment, the global translation transformation matrix between the original image and the reference image can first be estimated in the frequency domain using the phase correlation method. A coarse displacement offset is obtained by calculating the peak position of the inverse Fourier transform of the cross-power spectra of the two images, achieving coarse alignment at the structural contour level. Based on this coarse alignment, an enhanced correlation coefficient method can be used for iterative optimization. By maximizing the correlation coefficient in the overlapping area of the two images, fine geometric transformation parameters are estimated to compensate for local deformation and residual displacement, ultimately achieving sub-pixel level registration accuracy.
[0042] In the frequency domain phase correlation method, the peak position of the inverse Fourier transform of the cross-power spectrum of two images characterizes the global translational offset between the two images in the spatial domain. The peak position of the inverse Fourier transform of the cross-power spectrum is obtained by performing an inverse Fourier transform on the normalized cross-power spectrum of the two images. The normalized cross-power spectrum retains the phase difference information between the images while suppressing the amplitude information. According to the translational characteristics of the Fourier transform, the translational transform in the spatial domain manifests as a linear phase difference in the frequency domain. Therefore, the spatial domain function after the inverse transform exhibits a pulse-like peak at the coordinate position corresponding to the translation vector. The peak coordinates can directly indicate the global displacement of the original image relative to the reference image in the horizontal and vertical directions.
[0043] The enhanced correlation coefficient method described above is an iterative optimization registration method based on region similarity measurement. Its core lies in estimating fine geometric transformation parameters by maximizing the correlation coefficient of the overlapping regions of two images. This method transforms the image registration problem into an optimization problem in parameter space, using the correlation coefficient as the objective function to measure the linear correlation between the registered image and the reference image. It iterative updates of the geometric transformation parameters bring the correlation coefficient to its global maximum. The enhanced correlation coefficient method exhibits strong robustness to local illumination changes and can compensate for residual local deformation and sub-pixel displacement deviations on top of coarse alignment, thereby achieving fine registration with sub-pixel accuracy.
[0044] The second implementation method: registration and alignment based on feature point matching; In this embodiment, a sparse correspondence between two images can be established by extracting significant structural feature points (such as scale-invariant feature transformation features or acceleration robust features) from the original image and the reference image. Then, a random sampling consensus algorithm is used to remove mismatched point pairs, and a global geometric transformation matrix (such as affine transformation or perspective transformation) is calculated based on the retained accurate matching point pairs to achieve registration and alignment between images.
[0045] The third implementation method: registration and alignment based on mutual information; In this embodiment, mutual information entropy from information theory can be used as a similarity measure to estimate the optimal geometric transformation parameters by optimizing the statistical dependency between the gray-level distributions of two images. This embodiment transforms the registration problem into a mutual information maximization problem, making it applicable to image alignment under conditions of nonlinear differences in gray-level distributions or multimodal imaging. An iterative search strategy is used to find the spatial transformation relationship that minimizes the joint entropy of the two images.
[0046] The fourth implementation method: registration and alignment based on deep learning; In this embodiment, a convolutional neural network is used to learn the end-to-end geometric transformation mapping from the original image to the reference image. By training the registration network to predict the spatial transformation grid or displacement field, the registered image or transformation parameters are directly output. There is no need to explicitly design feature extraction or similarity measurement functions, which can adapt to complex deformation and non-rigid registration scenarios and achieve data-driven adaptive alignment.
[0047] It is understood that the four implementation methods for registration and alignment processing described above are technically independent of each other, and there is no logical order or functional dependency between them. They are all parallel alternatives.
[0048] Step S130: Before inputting the original image into the downstream defect detection model to be trained, perform a structural consistency determination on the original image based on the registration and alignment results.
[0049] The aforementioned downstream defect detection model refers to a defect identification model that occurs after the data screening stage, relative to the preceding data governance process. It is typically implemented using data-driven architectures such as convolutional neural networks or object detection networks. The training process of the downstream defect detection model relies on cleaned and labeled defect sample data. By learning the statistical distribution patterns of defect features from the input images, a mapping relationship is established between image features and defect categories or locations.
[0050] The aforementioned structural consistency determination refers to the process of quantitatively evaluating and classifying the structural similarity between the original image and the reference image in a local or global range based on the geometric correspondence after registration and alignment. The structural consistency determination process uses the normal product structure benchmark represented by the reference image as a reference point. By analyzing the degree of deviation of the structural features of the original image at corresponding spatial locations, it identifies whether there are abnormal regions that disrupt the normal structural relationship and outputs a determination result of structural consistency or structural abnormality, serving as the basis for subsequent data diversion and sample selection.
[0051] The purpose of the structural consistency determination in step S130 above is to establish a data governance mechanism based on reference structural constraints before the downstream defect detection model comes into contact with the data. By introducing the structural benchmark of normal products as a priori constraints, the structural consistency determination step can perform quality pre-screening on the original collected data during the data input stage, distinguishing between interference data that conforms to the normal structural benchmark and defect candidate data with structural anomalies before model training. This avoids a large number of defect-free samples directly entering the downstream model training process, reducing the model's invalid computational load and training data cleaning costs.
[0052] The following describes an optional implementation method for structural consistency determination in this application: like Figure 3 As shown, step S130 above may include: Step S131: Based on the registration and alignment results, determine the effective region of interest; wherein, the effective region of interest is used to characterize the intersection of the effective imaging regions in the original image and the reference image after registration and alignment; The aforementioned effective region of interest (ROI) refers to the shared visible area in a unified coordinate system after registration and alignment, where both the original and reference images have effective imaging responses. Physically, it is the intersection of the effective imaging ranges of the two images. The ROI excludes invalid pixels on one side, black borders, occluded areas, and misregistered edge zones, retaining overlapping imaging ranges with reliable correspondence in both geometric structure and grayscale information. This serves as a spatial constraint benchmark for subsequent structural difference analysis and consistency determination. By limiting the computation to the ROI, the pixel correspondence used for structural consistency determination is ensured to be based on real and valid imaging data, suppressing spurious difference responses introduced by edge interpolation distortion, misaligned boundaries, or invalid pixels, thus improving the accuracy and interpretability of structural anomaly detection.
[0053] The effective region of interest can be determined through at least one of the following multiple implementation methods: The first method: a determination method based on grayscale threshold segmentation and morphological processing; Optionally, step S131 may include: extracting the effective pixel region of the reference image and the effective pixel region of the original image respectively; wherein, the effective pixel is a pixel whose gray value is greater than a preset gray value threshold; performing an intersection operation on the effective pixel region of the reference image and the effective pixel region of the original image to obtain an overlapping effective region; performing a morphological closing operation on the overlapping effective region; extracting the largest inscribed rectangle within the overlapping effective region after the morphological closing operation, and determining the region defined by the largest inscribed rectangle as the effective region of interest.
[0054] The aforementioned effective pixel region refers to the set of pixels in an image that have an effective grayscale response, formed by the actual reflected or transmitted light signals from the product. It is determined by comparing the grayscale value of each pixel with a preset grayscale threshold; pixels with grayscale values higher than the preset threshold are considered to belong to the true imaging area. The effective pixel region excludes low-grayscale backgrounds, black borders, and invalid response areas caused by insufficient lighting, limitations in the imaging device's field of view, blind spots, or occlusions, and can reflect the visible range in the image that actually contains product structural information. In the above scheme, the effective pixel region can serve as a geometric mask for a single image, defining the spatial range within that image that possesses reliable structural information. This provides a basis for performing intersection operations with the effective pixel regions of another image and determining the overlapping range where both images share effective imaging information.
[0055] The aforementioned overlapping effective region is the result of the intersection operation of the effective pixel regions of the reference image and the original image after registration and alignment. It can characterize the common visible range where both images simultaneously possess effective imaging responses in a unified coordinate system. The overlapping effective region includes spatial locations that correspond geometrically and whose grayscale values are all higher than a preset grayscale threshold. This excludes invalid pixels, black borders, occluded areas, or imaging blind spots that exist only in a single image, ensuring that pixels in both images within the overlapping effective region correspond to the true product structure information. It can be understood that within the overlapping effective region, pixels in the original image and the reference image have a one-to-one physical correspondence, and structural analysis based on pixel-level or region-level difference calculations has high reliability. By limiting subsequent structural difference response calculations and structural consistency determinations to this overlapping effective region, false difference responses introduced by single-sided invalid pixels, edge registration residuals, or interpolation artifacts in non-overlapping areas can be effectively suppressed. This avoids misjudging imaging system limitations or registration errors as product structural anomalies, improving the accuracy and interpretability of structural consistency determination.
[0056] The aforementioned morphological closing operation is a composite morphological filtering operation consisting of a cascaded dilation and erosion operation. Its processing flow includes: first, performing a dilation operation on the input image using a preset structuring element to fill tiny holes within the foreground region and connect adjacent broken areas; then, performing an erosion operation on the dilation result using the same structuring element to restore the overall shape and boundary smoothness of the region. This morphological closing operation operates on overlapping effective regions, aiming to eliminate isolated holes and jagged edges caused by imaging noise, local registration deviations, or grayscale thresholding errors. It integrates potentially discretized effective pixel sets into continuous, unbroken connected regions, thereby generating a geometrically complete effective imaging region mask, laying the spatial foundation for subsequent extraction of the largest inscribed rectangle with a regular shape.
[0057] The above-mentioned scheme establishes a valid region of interest with reliable alignment and geometric regularity based on a multi-step processing flow of grayscale threshold extraction, intersection operation, morphological processing, and maximum inscribed rectangle extraction, providing a spatial constraint benchmark for subsequent structural difference analysis. On the other hand, by performing an intersection operation on the valid pixel regions of the reference image and the original image, the structural consistency judgment is limited to the overlapping region where both images have valid imaging responses, eliminating the interference of single-sided invalid pixels or misaligned edges on the judgment results. Furthermore, by performing morphological closing operations on the overlapping valid region to fill internal isolated holes, and combining this with maximum inscribed rectangle extraction to obtain continuous and unbroken regular comparison regions, the scheme avoids the adverse effects of fragmented regions on difference mapping generation and structural similarity calculation.
[0058] The second method: determination based on registration confidence maps; In this approach, the transformation confidence map or registration residual map output from the registration and alignment process can be used to determine the effective regions of interest (ROIs). For example, during the registration and alignment process, the similarity metric between the original image and the reference image in local regions is calculated to generate a confidence distribution map characterizing the local registration quality. Connected regions with confidence values higher than a preset confidence threshold in the confidence map are identified as high-confidence registration regions, while low-confidence regions caused by occlusion, abrupt changes in illumination, or missing textures are removed. Then, morphological processing is performed on these high-confidence registration regions to eliminate isolated points, and the processed regions are identified as effective ROIs, ensuring that structural difference analysis is performed only within reliably registered regions.
[0059] The third approach: a determination method based on edge alignment quality assessment; In this approach, the effective region of interest (ROI) can be determined by evaluating the alignment quality of the edge structures in the registered images. For example, edge features are first extracted from the registered reference image and the original image, and the difference or distance transform of the edge maps of the two images is calculated to identify regions with severe edge misalignment. Regions with edge alignment errors less than a preset error threshold are marked as effective aligned regions, while regions with misaligned edges due to local deformation or non-rigid distortion are removed. Connectivity analysis and boundary regularization are performed on the effective aligned regions, and connected regions that satisfy preset area and shape constraints are extracted as effective ROIs, ensuring that subsequent structural comparisons are based on reliable geometric correspondences within an effective imaging range.
[0060] It is understandable that the three implementation methods of the above effective region of interest determination schemes are technically independent of each other, and there is no logical order or functional dependency between them. They are all parallel alternatives.
[0061] Step S132: Within the effective region of interest, determine the structural difference response between the original image and the reference image, and generate a difference map; The aforementioned structural difference response refers to the response result that quantifies the degree of deviation of structural features between the original image and the reference image at corresponding spatial locations within the effective region of interest. It reflects the degree of difference between the original image and a normal product structural benchmark in multi-dimensional structural attributes such as grayscale intensity, edge gradient, and high-frequency texture. The structural difference response is not directly used as the final basis for defect determination, but rather presents pixel-level or local structural deviation information in continuous numerical form, providing basic data for subsequent identification of potential abnormal regions. The difference mapping is a two-dimensional spatial distribution map generated based on the structural difference response. It uses the structural difference response values of each pixel location within the effective region of interest as pixel values, forming a data matrix corresponding to the spatial resolution of the original image. The difference mapping can comprehensively present the spatial distribution of structural deviations between the original image and the reference image through visualization or numerical methods. High response value regions indicate locations where the original image has significant structural deviations relative to the reference benchmark, serving as the spatial localization basis for subsequent extraction of candidate abnormal regions.
[0062] The embodiments of this application can generate difference mappings using at least one of the following methods: The first method: a generation method based on weighted fusion of multi-level structural difference indicators; Optionally, step S132 may include: calculating the grayscale difference index, gradient difference index, and high-frequency structure difference index between the original image and the reference image within the effective region of interest; and performing weighted fusion of the grayscale difference index, gradient difference index, and high-frequency structure difference index to generate a difference map.
[0063] The aforementioned grayscale difference index refers to the deviation measure obtained by comparing the grayscale intensity values of corresponding pixel positions in the original image and the reference image point by point within the effective region of interest. The grayscale difference index can characterize the absolute or relative difference between two images in terms of overall brightness distribution and intensity levels, and is used to capture local brightness changes or overall brightness deviations caused by defects.
[0064] The gradient difference index mentioned above refers to the difference obtained by comparing the gradient magnitude or direction at corresponding positions after calculating the spatial gradients of the original image and the reference image (using first-order partial derivatives or gradient operators such as Sobel and Scharr operators). The gradient difference index can characterize the degree of deviation between two images in terms of edge contours, boundary sharpness, and local structure orientation, and can detect edge distortion, contour blurring, or geometric deformation caused by defects.
[0065] The aforementioned high-frequency structural difference index refers to the difference measure obtained by extracting high-frequency spatial components from the original image and the reference image (which can be achieved using high-pass filters, Laplacian operators, or wavelet transform high-frequency subband decomposition) and comparing the corresponding components. The high-frequency structural difference index can characterize the degree of deviation between two images in terms of texture details, microstructure, and fine-grained surface features, and is used to identify texture damage, micro-cracks, or changes in surface roughness caused by defects.
[0066] The aforementioned difference indicators can be calculated in various forms, including but not limited to: pixel-by-pixel absolute difference, pixel-by-pixel squared difference, normalized cross-correlation difference, log ratio difference, and entropy difference based on information theory. These calculation methods can be implemented through point-to-point operations or statistical operations based on local windows, so that subsequent weighted fusion can generate a unified difference map.
[0067] When performing weighted fusion, the weights of grayscale difference index, gradient difference index, and high-frequency structural difference index can be preset fixed values based on expert experience. For example, according to the distribution characteristics of various defects and imaging conditions in the target detection scenario, the weight ratio of grayscale difference index, gradient difference index, and high-frequency structural difference index can be preset by technicians. For example, in scenarios with stable lighting but high edge detection requirements, the weight of gradient difference index can be increased, and in surface detection scenarios with rich textures, the weight of high-frequency structural difference index can be increased, so that the generation of difference mapping can adapt to the defect feature distribution of a specific production line.
[0068] The aforementioned weights can also be determined using optimization search methods based on calibrated datasets. For example, using a pre-labeled defect sample dataset as a validation set, hyperparameter optimization algorithms such as Bayesian optimization, grid search, or random search can be employed to find the optimal weight combination that achieves the best overall defect recall and precision within a pre-defined weight parameter space. This approach transforms weight determination into a data-driven optimization problem, allowing the generation of difference maps to adapt to the statistical characteristics of defect distribution on the actual production line.
[0069] The aforementioned weights can also employ an adaptive dynamic adjustment strategy based on image content characteristics. For example, based on the local structural characteristics of the current input original image and reference image, such as local texture complexity, edge density, or gray-level uniformity, the contribution of each difference index in the local region can be dynamically calculated. In texture-rich regions, the weight of high-frequency structural difference indexes can be automatically increased; in edge-dominated regions, the weight of gradient difference indexes can be increased; and in flat regions, the weight of gray-level difference indexes can be increased. This achieves an adaptive response to different local structural characteristics during the difference mapping generation process.
[0070] The above scheme constructs a multi-dimensional difference response mechanism before the model interacts with the data by fusing multi-level structural difference indices, such as gray-level difference, gradient difference, and high-frequency structural difference, within the effective region of interest. This enables high recall detection of different types of structural anomalies. Furthermore, by separately calculating gray-level difference indices to capture overall intensity deviations, gradient difference indices to capture edge contour changes, and high-frequency structural difference indices to capture texture detail anomalies, the complementary nature of these multiple indices covers different types of defect features, such as gray-level abrupt changes, edge distortion, and texture destruction. Finally, by weighted fusion of the multi-level difference indices to generate a unified difference map, the multi-dimensional difference information is transformed into a single response map, providing a data foundation for subsequent threshold segmentation and connected component analysis based on statistical distribution.
[0071] The second method: a generation method based on pixel-level statistical feature differences; In this approach, local statistical feature differences between corresponding pixel locations in the original and reference images can be calculated within the effective region of interest. These differences include local mean differences, local variance differences, and local energy differences. Local mean differences reflect deviations in brightness distribution, local variance differences reflect contrast variations, and local energy differences reflect texture activity variations. A difference vector is constructed based on these statistical features, and the degree of deviation between the two images in the feature space is measured using Mahalanobis distance or Euclidean distance, generating a mapping map based on these statistical feature differences.
[0072] The third method: using a generation method based on deep feature extraction; In this approach, a pre-trained convolutional neural network can be used to extract multi-scale deep feature maps of the original and reference images within the effective regions of interest. By calculating the cosine similarity or Euclidean distance between corresponding feature channels, a mapping map based on high-level semantic feature differences is generated. This approach utilizes the abstract features learned by deep networks to represent structural information, enabling it to capture high-level structural deviations that are difficult to describe using traditional handcrafted features, and exhibiting stronger representation capabilities for complex deformations and subtle structural changes.
[0073] It is understandable that the three implementation methods of the above difference mapping generation scheme are technically independent of each other, and there is no logical order or functional dependency between them. They are all parallel alternatives.
[0074] Step S133: Based on the difference mapping, determine the candidate anomaly regions; The aforementioned candidate anomaly regions are a set of spatial regions suspected of containing structural anomalies, extracted from significant response locations in the difference map. They are characterized by their spatial extent and geometric boundaries in the original image using bounding box coordinates or pixel masks. As a high-recall preliminary screening result in the structural anomaly detection process, candidate anomaly regions can characterize potential anomaly candidate locations, which may include false positive responses caused by imaging noise, illumination fluctuations, or slight registration deviations. These candidate anomaly regions provide target objects for subsequent refined structural determination based on structural similarity indices, thus focusing structural analysis on locally suspicious regions and avoiding intensive computation across the entire effective region of interest.
[0075] The embodiments of this application can determine candidate abnormal regions using at least one of the following methods: The first method: a determination method based on adaptive threshold segmentation and connected component analysis; Optionally, step S133 may include: determining an adaptive threshold based on the statistical characteristics of the difference distribution within the effective region of interest; identifying anomalous pixels based on the adaptive threshold; and performing connected component analysis on the anomalous pixels to obtain candidate anomalous regions. For example, this implementation may involve: performing statistical distribution analysis on the difference mapping within the effective region of interest, calculating its grayscale histogram or cumulative distribution function to characterize the probability distribution characteristics of the difference values; and, based on the statistical distribution, using the percentile thresholding method to select a specific quantile as a segmentation threshold, so that difference response pixels exceeding this threshold are identified as anomalous pixels. This implementation ensures that the threshold is adaptively adjusted according to the overall level of the difference distribution.
[0076] The above scheme can also adopt an adaptive thresholding method based on statistical moments. For example, the mean and standard deviation of the difference mapping are calculated, and the sum of the mean and the preset multiple of the standard deviation is used as the adaptive threshold. This allows the threshold setting to adapt to the variation in the dispersion of the difference distribution, thereby maintaining a stable recall capability for abnormal areas under different imaging conditions or product types.
[0077] The aforementioned connected component analysis is a binary image region labeling algorithm based on pixel spatial adjacency relationships. Its processing object is the set of anomalous pixels extracted after threshold segmentation. The connected component analysis algorithm can traverse and search for foreground pixels in a binary image according to preset connectivity criteria (e.g., four-connected or eight-connected neighborhood relationships), identifying spatially adjacent anomalous pixels as a set of pixels belonging to the same connected region, and assigning a unique identifier to each independent connected region. Through connected component analysis, discrete and isolated anomalous pixels can be clustered and integrated into spatially continuous geometric regions. Simultaneously, geometric attribute parameters of each connected region (such as region area, bounding rectangle, centroid coordinates, and region boundary contour) can be extracted, thereby transforming pixel-level difference responses into region-level candidate anomalous objects. This provides processing units with clear spatial boundaries for subsequent calculation of the overall region-based structural similarity index and structural consistency verification.
[0078] The above scheme establishes a complete recall mechanism from pixel-level difference response to region-level anomaly candidates by determining an adaptive threshold based on the statistical characteristics of difference distribution within the effective region of interest, performing dynamic threshold segmentation on the difference map to extract anomalous pixels, and performing connected component analysis to merge continuous anomalous pixels into candidate anomalous regions. This provides a foundation of candidate objects for subsequent structural consistency verification. On the other hand, by adaptively determining the threshold based on the statistical characteristics of difference distribution within the effective region of interest, the anomalous pixel extraction process adapts to the changes in difference distribution under different imaging conditions and product types. Furthermore, by performing connected component analysis on anomalous pixels to merge them into candidate anomalous regions and extracting location information, the discrete pixel-level anomalous response is transformed into a region-level candidate object with spatial continuity, laying the foundation for subsequent calculation of the structural similarity index for the entire region.
[0079] The second approach: a determination method based on morphological reconstruction and local maximum detection; In this approach, morphological opening or Gaussian filtering is used to suppress noise in the difference map to eliminate isolated high-response noise points. Then, significant peak points in the difference map are identified as seed points for region growth through local maximum search or H-maximum transformation. Based on these seed points, morphological reconstruction or conditional dilation is performed, and pixels with gray values higher than the preset tolerance and connected to the seed points are aggregated into candidate anomalous regions.
[0080] The third method: a determination method based on superpixel segmentation and region aggregation; In this approach, a superpixel segmentation algorithm (such as SLIC (Simple Linear Iterative Clustering)) can be used to divide the effective region of interest into several sub-regions (superpixels) with consistent local features. Then, the mean, maximum, or specific quantile of the difference mapping within each superpixel is calculated as the abnormal response metric of that superpixel. Superpixels with response metrics higher than a preset threshold are marked as abnormal superpixels, and connected component aggregation is performed on spatially adjacent abnormal superpixels to form candidate abnormal regions with irregular boundaries but consistent features.
[0081] It is understandable that the three implementation methods of the above candidate abnormal region determination scheme are technically independent of each other, and there is no logical order or functional dependency between them. They are all parallel alternatives.
[0082] Step S134: Calculate the structural similarity index between the candidate abnormal region and the corresponding region of the reference image, and compare the structural similarity index with the preset structural consistency threshold to determine whether there is a structural abnormality in the original image relative to the reference image, and obtain the structural consistency determination result.
[0083] The aforementioned structural similarity index is a comprehensive metric used to quantitatively evaluate the structural consistency between two images as a whole, or between local regions within two images. Its numerical range is typically normalized to [0,1]. A value closer to 1 indicates higher consistency in structural information between the two images, while a value closer to 0 indicates significant structural deviation. The structural similarity index elevates traditional pixel-level grayscale differences to a structural-level similarity determination, effectively distinguishing between structural anomalies caused by actual product structural damage or deformation and spurious differences caused by illumination fluctuations, imaging noise, or slight registration errors. This provides a quantitative basis for subsequent threshold-based structural consistency classification decisions.
[0084] The structural similarity index can be calculated in at least one of the following ways according to the embodiments of this application: The first method: a structural similarity index calculation method based on multi-dimensional structural information decomposition; Optionally, step S134 may include: calculating the brightness consistency index, contrast consistency index, and structural consistency index between the candidate anomaly region and the corresponding region of the reference image, respectively; and fusing the brightness consistency index, contrast consistency index, and structural consistency index to obtain the structural similarity index between the candidate anomaly region and the corresponding region of the reference image.
[0085] The aforementioned brightness consistency index is a metric used to quantitatively evaluate the similarity in overall grayscale intensity between two images or local regions of an image. The brightness consistency index can be based on the arithmetic or weighted average of pixel grayscale values within a region, reflecting the consistency of illumination intensity, exposure conditions, or overall brightness distribution by comparing the mean difference between a candidate region in the original image and the corresponding region in the reference image. The brightness consistency index is highly sensitive to linear illumination changes; when two regions exhibit only an overall brightness shift but maintain consistent texture structure, this index can effectively capture and quantify such deviations.
[0086] The contrast consistency index mentioned above is a metric used to characterize the similarity of two images or local regions of images in terms of grayscale dynamic range and local contrast. Its calculation is typically based on the standard deviation or variance of pixel grayscale values within the region, reflecting the contrast in texture sharpness, edge sharpness, and detail richness of local image regions. The contrast consistency index is independent of the overall brightness level, focusing on evaluating the consistency of local grayscale fluctuations between two regions. When defects cause a decrease or increase in local contrast, this index can sensitively reflect the fidelity of the structure in the contrast dimension.
[0087] The aforementioned structural consistency index is a metric used to measure the correlation between two images or local regions of images in terms of normalized structural patterns and relative pixel positions. Its calculation is typically based on the covariance or normalized cross-correlation coefficient of pixel grayscale values within the region, reflecting the structural morphological similarity between the two regions after removing the effects of brightness and contrast. This index is highly sensitive to local geometric deformation, structural damage, or changes in texture patterns, and is a key indicator for distinguishing between real structural anomalies and illumination noise. It can maintain a stable assessment of structural fidelity under conditions of varying brightness and contrast.
[0088] It is understandable that, in addition to the aforementioned brightness consistency index, contrast consistency index, and structural consistency index, consistency indices can be used in various alternative forms to adapt to the judgment requirements of different application scenarios. For example, the normalized cross-correlation coefficient can be used to measure the degree of linear matching between two regions in grayscale waveform patterns; the mutual information index based on information theory can be used to evaluate the information overlap between two regions in statistical dependency, which is suitable for nonlinear grayscale transformation scenarios; the phase correlation index based on the frequency domain can be used to evaluate the positional offset consistency of structures; or the cosine similarity based on deep convolutional features can be used to measure the directional consistency between two regions in the high-level semantic feature space to capture more abstract structural similarity.
[0089] The fusion of the aforementioned brightness consistency index, contrast consistency index, and structural consistency index can be achieved through multiplicative fusion, weighted summation fusion, and nonlinear fusion. Multiplicative fusion directly multiplies the three indices to generate the final structural similarity index. Weighted summation fusion assigns weight coefficients based on the application scenario's emphasis on consistency across dimensions, generating a comprehensive index through linear weighted summation. Nonlinear fusion transforms the various indices using exponential, logarithmic, or power functions before fusion, or uses machine learning models (such as support vector machines or neural networks) to learn the nonlinear mapping relationship from multidimensional indices to the comprehensive similarity score, adapting to the interactions and dynamic weight adjustments required by various indices in complex scenarios.
[0090] The above scheme establishes a multi-dimensional structural similarity evaluation mechanism at the candidate anomaly region level by calculating and fusing the brightness consistency index, contrast consistency index, and structural consistency index separately. This elevates pixel-level differences to a comprehensive consistency quantification index that includes brightness, contrast, and structural information, thereby improving the accuracy and distinguishability of structural anomaly detection. On the other hand, by introducing the brightness consistency index to evaluate the consistency of brightness distribution between the candidate anomaly region and the corresponding region in the reference image, the scheme suppresses the interference of light intensity fluctuations or exposure differences on structural anomaly detection, avoiding misjudging changes in illumination as structural defects. Furthermore, by introducing the contrast consistency index and structural consistency index to evaluate the degree of local contrast retention and the correlation of structural information, the scheme distinguishes between contrast fluctuations caused by imaging noise and real structural deformation or damage, accurately identifying real defect regions with structural damage characteristics.
[0091] The second method: Calculation of structural consistency index based on local feature point matching; In this approach, local feature points and their descriptors of the candidate anomaly region and the corresponding region of the reference image can be extracted using algorithms such as scale-invariant feature transformation or accelerated robust feature extraction. Feature correspondence between the two regions is established through nearest neighbor distance ratio matching or brute-force matching. Based on the number of matched feature points, spatial distribution consistency, or geometric transformation consistency (such as the homography matrix estimated through random sampling consistency or the inlier rate of affine transformation), a structural similarity index is calculated. A higher structural similarity index indicates a stronger correspondence between the two regions in key structural features and higher structural consistency.
[0092] The third method: Calculation of structural similarity index based on deep convolution features; In this approach, pre-trained convolutional neural networks (such as VGG networks or residual networks) can be used to extract multi-scale deep feature maps of candidate anomaly regions and corresponding regions of the reference image. By calculating the cosine similarity or Euclidean distance between corresponding feature channels, pixel-wise feature similarity maps are generated. Global average pooling or weighted aggregation is then performed on the feature similarity maps to obtain a structural similarity index representing the consistency between the two regions in the high-level semantic feature space. This approach utilizes the abstract structural representations learned by deep networks to capture subtle structural deviations and complex texture changes that are difficult to describe using traditional hand-crafted metrics.
[0093] The fourth method: Calculation of structural similarity index based on normalized cross-correlation or Pearson correlation coefficient; In this approach, the candidate anomaly region and the corresponding region of the reference image can be treated as a sequence of random variables. The normalized cross-correlation value or linear correlation coefficient between the gray values of the two regions is calculated to quantify the degree of linear correlation between them in terms of gray-level change trends and waveform patterns. The range of the above linear correlation coefficient is usually normalized to [-1, 1], and the closer the absolute value is to 1, the higher the structural consistency.
[0094] It is understandable that the four implementation methods of the above structural similarity index calculation scheme are technically independent of each other, and there is no logical order or functional dependency between them. They are all parallel alternatives.
[0095] Steps S131 to S134 above establish a progressive technical process from registration and alignment, effective region definition, difference mapping generation to structural similarity determination. This process establishes a complete data governance mechanism based on reference image structural constraints before the model comes into contact with the data, providing a structurally validated input data foundation for downstream models. On the other hand, by determining the intersection of effective imaging regions based on the registration and alignment results, the structural consistency determination is limited to the visible area with reliable alignment relationships, suppressing the interference of edge registration errors and invalid pixel areas on the determination results. Furthermore, through a two-stage processing approach—first determining candidate abnormal regions based on difference mapping and then calculating the structural similarity index—a hierarchical progression from pixel-level difference recall to region-level structural verification is achieved, distinguishing between real structural anomalies and pseudo-differences caused by illumination noise. Finally, by calculating the structural similarity index between the candidate abnormal region and the corresponding region of the reference image and performing threshold comparison, pixel-level grayscale differences are transformed into a quantitative indicator of structural consistency, improving the sensitivity to structural deformation defects.
[0096] It is understood that, in addition to the above-described implementation methods, the embodiments of this application may also employ any of the following implementation methods for structural consistency determination: The first implementation method: a structural consistency determination method based on statistical hypothesis testing; In this embodiment, the grayscale distribution or feature distribution of the candidate anomaly region and the corresponding region of the reference image can be regarded as two statistical samples. Statistical methods such as chi-square test, Kolmogorov-Smirnov test, or two-sample t-test are used to test whether the two regions come from the same distribution. If the test statistic exceeds the critical value corresponding to the preset significance level, it is determined that there is a significant distribution difference between the two regions, indicating structural anomaly; otherwise, it is determined that the structures are consistent.
[0097] The second implementation method: a structural consistency determination method based on feature space distance metric; In this embodiment, low-level or high-level feature vectors (such as local binary pattern features, histogram of oriented gradients features, or deep features extracted by a pre-trained neural network) can be extracted from the candidate abnormal region and the corresponding region of the reference image. Then, the distance metric between the two feature vectors in the feature space is calculated, such as Euclidean distance, Mahalanobis distance, or cosine distance. If the distance value is greater than a preset distance threshold, it is determined that the two regions are far apart in the feature space and have structural anomalies; otherwise, it is determined that the structures are consistent.
[0098] The third implementation method: a structural consistency determination method based on machine learning classifiers; In this embodiment, a binary classification model (such as support vector machine, random forest or gradient boosting tree) can be trained using labeled normal and abnormal samples. The feature difference vector or joint feature vector between the candidate abnormal region and the corresponding region of the reference image is input into the classifier, and the probability or decision score of belonging to the normal category or abnormal category is output. If the probability of the abnormal category exceeds a preset threshold, it is determined that there is a structural abnormality; otherwise, it is determined that the structure is consistent.
[0099] The fourth implementation method: a structural consistency determination method based on deep learning reconstruction error; In this embodiment, an autoencoder or generative adversarial network trained only on normal samples can be used to reconstruct or generate an expected image corresponding to a normal structure, with the corresponding region of the reference image as input or condition. The pixel-level reconstruction error or perceptual loss between the reconstructed image and the candidate abnormal region is calculated. If the reconstruction error exceeds a preset error threshold, it indicates that the candidate region cannot be accurately reconstructed by the normal structure model, and a structural abnormality is determined. Otherwise, the structure is determined to be consistent.
[0100] It is understood that the four implementation methods for determining structural consistency described above are technically independent of each other, and there is no logical order or functional dependency between them. They are all parallel alternatives.
[0101] Step S140: Based on the structural consistency determination result, the original image is split; wherein, the known defect data obtained after splitting is used to train the downstream defect detection model.
[0102] It is understandable that known defect data that has passed structural consistency assessment is a high-quality training sample that has undergone pre-screening. It can provide reliable supervision signals and accurate feature learning objects for downstream defect detection models, enabling the model to establish an accurate mapping relationship from image features to specific defect categories during training. Using known defect data for model training allows the defect detection model to learn the feature distribution patterns and discrimination boundaries of existing defect types. Through iterative optimization with a large number of identically distributed samples, it forms a high-sensitivity recognition capability for known defect patterns, improving the model's detection accuracy and recall rate for common defect types in real-world production environments.
[0103] It is understandable that existing technical solutions typically employ deep learning models trained on labeled data of limited known defect types to build defect recognition capabilities. Since the distribution and coverage of the model training data are limited by the already labeled existing defect categories, when new defect patterns not covered by the training samples appear during the production process, the model struggles to establish an effective mapping relationship from image features to defect categories. It cannot reliably identify such samples as newly emerging defect types for retention, nor can it effectively distinguish them from imaging noise or normal texture fluctuations, often misclassifying them as normal samples or directly discarding them. This results in new defect samples not being included in the training dataset, hindering the model's ability to continuously learn and adapt to real abnormal patterns arising from changes in production line processes. Therefore, this application provides the following solution: Optionally, step S140 may include: if the structural consistency determination result indicates that the original image is consistent with the reference image, then the original image is split into interference data; if the structural consistency determination result indicates that the original image has structural anomalies, then the original image is split into defect data; extract the defect features of the defect data and match the defect features with existing defect features; if the matching result indicates that the defect features match the existing defect features, then the defect data is split into known defect data; otherwise, the defect data is split into unknown defect data.
[0104] The aforementioned interference data refers to the original image data that, after structural consistency assessment, is deemed to be consistent with the reference image in terms of brightness, contrast, and structural information. It represents the imaging state of a normal, defect-free product or contains only slight noise fluctuations that do not damage the product's structural integrity. Interference data is directly removed during the sorting process and does not enter the downstream defect detection model training flow. This avoids the model consuming computational resources on samples that do not contain defect information, and also prevents the contamination of a large number of normal samples from diluting the distribution of defect features in the training data.
[0105] The aforementioned known defect data refers to image data that, after structural consistency assessment, is identified as having structural anomalies and whose defect features match the defect patterns recorded in the existing defect feature library. This data represents the existing defect types covered by the training data distribution. This type of data has clear defect type identifiers and precise location annotations. As cleaned, high-quality training samples, it is directly used for supervised learning of downstream defect detection models, enabling the models to establish accurate mappings from image features to specific known defect categories, thus improving their ability to identify common defect patterns.
[0106] The aforementioned unknown defect data refers to image data that, after structural consistency assessment, is identified as having structural anomalies but whose defect features do not match existing defect features. This represents novel defects not covered by the training data distribution or abnormal patterns caused by process changes. Separating and retaining this type of data allows for the establishment of an automatic discovery and accumulation mechanism for out-of-distribution defects without interfering with downstream models' learning of known defects. This provides a source of novel defect samples for subsequent model iteration and optimization, while avoiding contamination of the model's decision boundary by mixing out-of-distribution samples into the training data, thus ensuring the training stability and generalization ability of the known defect detection model.
[0107] The matching process between the aforementioned defect features and existing defect features is achieved by quantitatively analyzing the similarity between the attribute features of the candidate defect region and the features recorded in the existing defect feature library. For example, the implementation method is as follows: extract multi-dimensional defect features from the defect data identified by structural consistency determination. These features may include geometric morphological features (such as region area, aspect ratio, and contour complexity), gray-scale statistical features (such as local mean, variance, and energy), and texture structure features. Then, the extracted feature vectors are compared with the feature templates or cluster centers of each known defect type in the existing defect feature library. This feature library is constructed based on historically accumulated labeled defect samples, characterizes the distribution characteristics of existing defect types, and associates them with corresponding defect codes. By calculating distance metrics (such as Euclidean distance and Mahalanobis distance) or similarity metrics (such as cosine similarity) in the feature space, the degree of matching between the current defect feature and a specific known defect type is evaluated. If the matching degree meets the preset threshold condition, the defect feature is determined to match the existing defect feature, and the corresponding defect data is classified as known defect data and assigned the corresponding defect code. Conversely, if the matching degree with all existing defect types is lower than the threshold, it is determined to be unknown defect data and classified into the out-of-sample defect category for open set analysis.
[0108] The above scheme achieves a three-level refined data governance architecture by hierarchically analyzing the structural consistency judgment results and separating the original image into interference data, known defect data, and unknown defect data at the data entry point. On the other hand, by combining structural anomaly judgment with existing defect feature matching, it distinguishes between defect types already covered by the training data and novel defects outside the distribution, providing cleaned known defect samples for model training while retaining unknown defect samples for open set analysis. Furthermore, by separating and retaining unknown defect data, it establishes an automatic discovery mechanism for newly emerging defect types without the need for manual prior annotation, thereby improving the defect detection system's adaptability to changes in production line processes.
[0109] Understandably, to ensure that novel defect samples emerging during the production process can be included in the training dataset and to prevent them from being prematurely rejected due to overly strict judgment thresholds, this application embodiment can employ a high-recall-oriented hyperparameter configuration strategy. By lowering the anomaly judgment threshold or increasing the sensitivity to structural differences, the structural consistency judgment stage allows more boundary samples or weak anomalies to enter the candidate set, thereby ensuring that the early forms of novel defects are preserved and included in subsequent analysis processes, providing a sample basis for continuous model learning.
[0110] Under a strategy that prioritizes recall, the grayscale difference threshold can be lowered, making the judgment process more sensitive to subtle changes in grayscale levels. This allows for the capture of early forms of novel defects with indistinct grayscale features, ensuring that such defect samples are not missed and enter the subsequent data processing flow. Similarly, the gradient difference threshold can be lowered to enhance the perception of subtle structural changes. By relaxing the requirements for the magnitude of edge gradient changes, it becomes possible to capture early structural defects such as slight linewidth contraction, expansion, or minute cracks, ensuring that such structural defects are included in the candidate set at an early stage.
[0111] Furthermore, by tightening the structural consistency criterion—that is, lowering the normal threshold for the structural similarity index—slight mismatches in local structures between the original and reference images can be identified as anomalies. This operation improves the sensitivity to local structural deformation and novel deformation-related defects, ensuring that such deformation defects can be detected and included in the sample library. The minimum connected region area threshold can also be lowered, allowing early forms of novel defects with small footprints, such as tiny particles, initial point defects, or crack initiation, to be preserved and included in the candidate set. By relaxing the restriction on the minimum pixel area of anomaly regions, it ensures that minute defects are captured in the initial stage, providing the model with complete early defect samples.
[0112] Optionally, the above defect detection method may further include: labeling defect data based on structural consistency determination results, and generating pseudo-labels containing defect location and type identifiers.
[0113] The above scheme can automatically annotate defect data based on the structural consistency determination results, generating pseudo-labels containing the spatial location and type identifier of defects. This annotation process reuses the spatial coordinates of candidate abnormal regions extracted in the previous structural consistency determination step as defect location information, and infers the defect type attribute based on the deviation of the structural similarity index or the matching relationship with the existing defect feature library, thereby forming structured pseudo-label data containing bounding box coordinates and category codes.
[0114] Because this automated annotation process shares core processing steps with the defect detection process, such as reference alignment, effective region of interest definition, and structural consistency determination, it eliminates the need to construct complex feature extraction and classification mechanisms separately for the annotation task, enabling batch pre-labeling of samples. The generated pseudo-labels can serve as pre-labeled data for the downstream defect detection model training phase, reducing the time and labor costs of manual annotation from scratch. Simultaneously, they can also serve as a preliminary screening basis for the manual review process, prompting quality inspectors to pay attention to structurally abnormal areas and their inferred types. Through manual verification and correction, a human-machine collaborative data annotation closed loop is formed, ensuring both annotation efficiency and the reliability of annotation quality.
[0115] The above solution automatically labels defect data and generates pseudo-labels containing defect location and type identifiers by reusing the pre-construction structural consistency judgment results. It embeds automatic labeling function into the initial screening data governance process, realizes downstream reuse of structural analysis results, reduces manual labeling costs, and accelerates training data construction. On the other hand, by directly generating pseudo-labels based on the structural anomaly location and defect classification information in the structural consistency judgment results, it integrates data cleaning and labeling tasks into a shared process, avoids repeated processing of the same defect samples, and improves the overall efficiency of defect data governance. Furthermore, by generating pseudo-labels as pre-labeling results for subsequent manual review or model training reference, it reduces the cost of relying entirely on manual labeling while retaining the manual verification step, balancing the requirements of labeling automation and labeling accuracy.
[0116] It is understandable that after long-term operation of the production line, due to machine aging and systematic drift of process parameters, systematic offsets in geometric position or grayscale characteristics may occur between the reference images acquired in the early stages and the images of normal products actually produced. This offset causes the reference image to gradually deviate from its ability to represent the structural baseline of normal products, failing to accurately reflect the state of normal products under current production conditions, leading to a deterioration in the structural consistency between the reference image and the actual normal product. Furthermore, when there is a systematic deviation between the reference image and the actual normal state, the effectiveness of registration and alignment processing will be directly affected, manifested as a reduction in the effective region of interest or a decrease in registration accuracy. This will cause subsequent structural difference response calculations and structural consistency judgments to be based on a misaligned baseline, potentially introducing spurious difference responses caused by alignment errors rather than actual structural anomalies, thereby interfering with the accurate separation of interfering and defective data and reducing the reliability of data governance. To ensure that the reference image continuously and effectively represents the current structural baseline of normal products, the reference image can be maintained and updated through machine calibration or periodic re-acquisition. This process absorbs the systematic drift generated during production line operation into the normal baseline, keeping the dynamic update of the reference image synchronized with the actual state of the production line, thereby maintaining the stability and accuracy of the structural constraints based on the reference image in the long-term operation of the data governance process.
[0117] like Figure 4 As shown, to facilitate understanding of the working principle of the above-described defect detection method, this application embodiment also provides a specific application example of the method in a certain application scenario. In this application scenario, the above-described defect detection method mainly includes: Step 1: Input raw data and reference baseline; The process involves acquiring raw images obtained through scanning electron microscopy during the production process, along with corresponding reference images. These reference images represent the structural baseline of normal products. They are acquired by selecting a stable area around the wafer on the scanning electron microscope platform and undergoing imaging parameter calibration to absorb systematic drift, ensuring that they can cover the normal state characterization of different models, batches, and lighting conditions.
[0118] Step 2: Perform registration and alignment processing; To establish structural comparability under a unified coordinate system, the original image and the reference image are registered and aligned. This process includes two progressive stages: coarse alignment and fine alignment. In the coarse alignment stage, the global displacement deviation between the original and reference images is estimated using the frequency domain phase correlation method. The translation transformation matrix is obtained by calculating the peak position of the inverse Fourier transform of the cross-power spectrum of the two images, achieving preliminary alignment at the structural contour level. In the fine alignment stage, the enhanced correlation coefficient method is used for iterative optimization based on the coarse alignment result. By maximizing the correlation coefficient of the overlapping area of the two images, the fine geometric transformation parameters are estimated to compensate for local deformation and residual displacement, achieving sub-pixel level registration accuracy.
[0119] Step 3: Identify the valid region of interest; The effective region of interest (ROI) is determined based on the registration and alignment results. This ROI represents the intersection of the effective imaging regions in the original image and the registered and aligned reference image. The specific determination method includes: performing effective pixel determination on both the reference image and the registered and aligned original image, and selecting pixels with grayscale values greater than a preset grayscale threshold. T Pixels with a value of 0 are considered valid pixels. The valid pixel regions of the reference image and the original image are extracted respectively. An intersection operation is performed on the two valid pixel regions to obtain the overlapping valid region. A morphological closing operation is performed on the overlapping valid region to fill the internal isolated holes. The largest inscribed rectangle is extracted from the overlapping valid region after the morphological closing operation. The region defined by the largest inscribed rectangle is determined as the valid region of interest, which is used to limit the spatial range of subsequent difference analysis and structure determination.
[0120] Step 4: Generate difference mapping; Within the effective region of interest (ROI), the structural difference response between the original image and the reference image is determined, and a difference map is generated. This mainly includes: calculating gray-level difference indices, gradient difference indices, and high-frequency structural difference indices, and then weighted and fused to obtain a unified difference response map, i.e., the difference map. The gray-level difference indices characterize pixel intensity deviations, the gradient difference indices characterize edge contour changes, and the high-frequency structural difference indices characterize texture detail deviations. Statistical analysis of the difference map is performed within the ROI, and percentile thresholds or adaptive thresholds are used for segmentation, selecting images with difference response values greater than or equal to the threshold. T Pixels are marked as abnormal pixels, where T It can be determined by the statistical characteristics of the difference distribution within the effective region of interest; by performing connected component analysis on abnormal pixels, consecutive abnormal pixels are merged into candidate abnormal regions and their bounding box coordinates are extracted, thereby determining the location information of the candidate abnormal regions in the difference mapping.
[0121] Step 5: Perform structural consistency determination; The structural similarity index (SSIM) between candidate anomaly regions and corresponding regions in the reference image is calculated, and this index is compared with a preset structural similarity threshold to determine whether there are structural anomalies in the original image relative to the reference image. This mainly includes: calculating the brightness consistency index, contrast consistency index, and structural consistency index between the candidate anomaly region and the corresponding local region in the reference image, taking each candidate anomaly region as a unit; and fusing these three indices to obtain the structural similarity index. The calculation follows the formula: SSIM = Brightness Consistency × Contrast Consistency × Structural Similarity, with a value range of [0,1]. The closer the value is to 1, the more consistent the structure. If the calculated structural similarity index is less than the preset threshold... τ If the original image is structurally inconsistent with the reference image, it is determined that the original image has a structural abnormality relative to the reference image. Otherwise, it is determined to be a normal structure or a pseudo-difference caused by noise, thus realizing the transformation of pixel-level differences into structural-level judgments.
[0122] Step Six: Data Splitting and Tagging; Based on the structural consistency determination result, the original image is split at the entry level. If the structural consistency determination result indicates that the original image is consistent with the reference image, the original image is split into interference data and directly removed. If the structural consistency determination result indicates that the original image has structural anomalies, the original image is split into defect data. Then, the defect features of the defect data are extracted and matched with existing defect features. If the matching result indicates that the defect features match the existing defect features, the defect data is split into known defect data. Otherwise, it is split into unknown defect data and marked as out-of-sample defects. The above structural consistency determination results are shared outputs and are used in the reference-aligned structural defect detection process and the reference-guided automatic annotation process, respectively. The Reference-Aligned Structural Defect Detection (RASDD) workflow is a rule-based structural analysis workflow that directly uses the structural consistency determination results to perform the final discrimination between defects and non-defects. The hyperparameters in this workflow can be set manually or determined based on a calibration dataset through Bayesian optimization or grid search. The Reference-Guided Auto Labeling (RGAL) workflow, on the other hand, uses the structural consistency determination results to generate pseudo-labels containing defect location and type identifiers, which serve as the basis for subsequent manual review or training data management.
[0123] Step 7: Downstream task adaptation; The known defect data obtained after the split is used to train the downstream defect detection model. This includes: labeling the known defect data with bounding boxes and assigning corresponding defect codes according to the defect code library of each process layer, and providing a dedicated image detection model for training at that layer; at the same time, automatically labeling the defect data based on the structural consistency judgment results to generate pseudo-labels containing defect location and type identifiers. These pseudo-labels serve as the basis for training data governance or as a reference for manual review, and are used in the data labeling stage of the downstream defect detection model, realizing the collaborative work of defect detection and automatic labeling under a shared process.
[0124] like Figure 5 As shown, based on the same inventive concept, this application also provides a defect detection system 200, including a preliminary screening module 210 and a downstream defect detection module 220 communicatively connected to the preliminary screening module 210, wherein: The initial screening module 210 is used to acquire an original image and a reference image; wherein the reference image is used to characterize the structural benchmark of a normal product; the original image and the reference image are registered and aligned; before inputting the original image into the downstream defect detection model to be trained, a structural consistency determination is performed on the original image based on the registration and alignment results; based on the structural consistency determination results, the original image is split; wherein the known defect data obtained after splitting is used to train the downstream defect detection model; and the known defect data is sent to the downstream defect detection module. The downstream defect detection module 220 is used to receive known defect data and perform training.
[0125] It is understood that the above-mentioned preliminary screening module 210 can realize any one of the functions of the defect detection method provided in the embodiments of this application. For the way the preliminary screening module 210 realizes each function and the working principle, please refer to the method embodiment section. The system embodiment section will not repeat it.
[0126] Figure 6 This is a schematic diagram of an electronic device provided in an embodiment of this application. (Refer to...) Figure 6 The electronic device 300 includes a processor 310, a memory 320, and a communication interface 330. These components are interconnected and communicate with each other via a communication bus 340 and / or other forms of connection mechanism (not shown).
[0127] The memory 320 includes one or more (only one is shown in the figure), which may be, but is not limited to, Random Access Memory (RAM), Read Only Memory (ROM), Programmable Read-Only Memory (PROM), Erasable Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), etc. The processor 310 and other possible components may access the memory 320 to read and / or write data therein.
[0128] Processor 310 includes one or more (only one is shown in the figure), which can be an integrated circuit chip with signal processing capabilities. The processor 310 described above can be a general-purpose processor, including a central processing unit (CPU), a microcontroller unit (MCU), a network processor (NP), or other conventional processors; it can also be a special-purpose processor, including a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components.
[0129] Communication interface 330 includes one or more (only one is shown in the figure) that can be used to communicate directly or indirectly with other devices to exchange data. For example, communication interface 330 can be an Ethernet interface; it can be a mobile communication network interface, such as an interface for 3G, 4G, or 5G networks; or it can be other types of interfaces with data transmission and reception capabilities.
[0130] One or more computer program instructions may be stored in the memory 320, and the processor 310 may read and run these computer program instructions to implement the defect detection method provided in the embodiments of this application and other desired functions.
[0131] Understandable. Figure 6The structure shown is for illustrative purposes only; the electronic device 300 may also include components that are more advanced than those shown. Figure 6 The more or fewer components shown, or having the same Figure 6 The different configurations shown. Figure 6 The components shown can be implemented using hardware, software, or a combination thereof. For example, electronic device 300 can be a single server (or other device with computing power), a combination of multiple servers, a cluster of a large number of servers, etc., and can be either a physical device or a virtual device.
[0132] This application also provides a computer-readable storage medium storing computer program instructions. These computer program instructions are read and executed by a processor to perform the defect detection method provided in this application. For example, the computer-readable storage medium can be implemented as follows: Figure 6 The memory 320 in the electronic device 300, or a separate storage product (such as a USB flash drive, portable hard drive, etc.).
[0133] This application also provides a computer program product, which includes computer program instructions. These computer program instructions are read and executed by a processor to perform the defect detection method provided in this application. For example, these computer program instructions can be stored in... Figure 6 The memory 320 in the electronic device 300 is located inside the memory, or it is stored in a separate storage product (such as a USB flash drive, portable hard drive, etc.).
[0134] The above description is merely an embodiment of this application and is not intended to limit the scope of protection of this application. Various modifications and variations can be made to this application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the scope of protection of this application.
Claims
1. A defect detection method, characterized in that, The method includes: Acquire an original image and a reference image; wherein the reference image is used to characterize the structural reference of a normal product; The original image and the reference image are registered and aligned. Before inputting the original image into the downstream defect detection model to be trained, a structural consistency determination is performed on the original image based on the registration and alignment results. Based on the structural consistency determination result, the original image is split; wherein, the known defect data obtained after splitting is used to train the downstream defect detection model.
2. The defect detection method according to claim 1, characterized in that, The process of splitting the original image based on the structural consistency determination result includes: If the structural consistency determination result indicates that the original image is consistent with the reference image, then the original image is split into interference data; If the structural consistency determination result indicates that the original image has structural anomalies, then the original image is split into defect data; Extract the defect features from the defect data and match the defect features with existing defect features; If the matching result indicates that the defect feature matches the existing defect feature, the defect data is split into known defect data; otherwise, the defect data is split into unknown defect data.
3. The defect detection method according to claim 1, characterized in that, The step of performing a structural consistency determination on the original image based on the registration and alignment results includes: Based on the registration and alignment results, a valid region of interest is determined; wherein, the valid region of interest is used to characterize the intersection of the valid imaging regions in the original image and the reference image after registration and alignment; Within the effective region of interest, determine the structural difference response between the original image and the reference image, and generate a difference map; Based on the difference mapping, candidate anomaly regions are determined; Calculate the structural similarity index between the candidate abnormal region and the corresponding region of the reference image, and compare the structural similarity index with a preset structural consistency threshold to determine whether the original image has structural anomalies relative to the reference image, and obtain the structural consistency determination result.
4. The defect detection method according to claim 3, characterized in that, The determination of the effective region of interest based on the registration and alignment results includes: The effective pixel regions of the reference image and the original image are extracted respectively; wherein, effective pixels are pixels with gray values greater than a preset gray value threshold; Perform an intersection operation on the effective pixel region of the reference image and the effective pixel region of the original image to obtain the overlapping effective region; Perform a morphological closing operation on the overlapping effective region; Extract the largest inscribed rectangle within the overlapping effective region after the morphological closing operation, and determine the region defined by the largest inscribed rectangle as the effective region of interest.
5. The defect detection method according to claim 3, characterized in that, The step of determining the structural difference response between the original image and the reference image within the effective region of interest and generating a difference map includes: Within the effective region of interest, the grayscale difference index, gradient difference index, and high-frequency structure difference index between the original image and the reference image are calculated respectively. The grayscale difference index, the gradient difference index, and the high-frequency structure difference index are weighted and fused to generate a difference mapping.
6. The defect detection method according to claim 5, characterized in that, The step of determining candidate anomaly regions based on the difference mapping includes: Based on the statistical characteristics of the differential distribution within the effective region of interest, an adaptive threshold is determined; Based on the adaptive threshold, abnormal pixels are determined; Connectivity analysis is performed on the abnormal pixels to obtain candidate abnormal regions.
7. The defect detection method according to any one of claims 3 to 6, characterized in that, The calculation of the structural similarity index between the candidate anomaly region and the corresponding region of the reference image includes: Calculate the brightness consistency index, contrast consistency index, and structural consistency index between the candidate anomaly region and the corresponding region of the reference image, respectively. By fusing the brightness consistency index, the contrast consistency index, and the structural consistency index, a structural similarity index is obtained between the candidate abnormal region and the corresponding region of the reference image.
8. The defect detection method according to any one of claims 1 to 6, characterized in that, The method further includes: Based on the structural consistency determination result, the defect data is labeled to generate pseudo-labels containing defect location and type identifiers.
9. A defect detection system, characterized in that, It includes a preliminary screening module and a downstream defect detection module that is communicatively connected to the preliminary screening module, wherein: The initial screening module is used to acquire an original image and a reference image; wherein the reference image is used to characterize the structural baseline of a normal product; the original image and the reference image are registered and aligned; before inputting the original image into the downstream defect detection model to be trained, a structural consistency determination is performed on the original image based on the registration and alignment results; based on the structural consistency determination results, the original image is split; wherein the known defect data obtained after splitting is used to train the downstream defect detection model; and the known defect data is sent to the downstream defect detection module. The downstream defect detection module is used to receive the known defect data and perform training.
10. An electronic device, characterized in that, include: A processor, a memory, and a communication bus, wherein the processor and the memory communicate with each other via the communication bus; The memory stores program instructions that can be executed by the processor, and the processor can execute the method as described in any one of claims 1 to 8 by calling the program instructions.
11. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer instructions that, when executed by a computer, cause the computer to perform the method as described in any one of claims 1 to 8.
12. A computer program product, characterized in that, The computer program product includes a computer program that, when executed by a processor, implements the method as described in any one of claims 1 to 8.