An AI vision-based silicone product surface defect detection method
By improving the fusion of the EdgeViT model and the Shearlet feature pyramid, and combining multimodal features and sparse matrix decomposition, the problems of illumination interference and incomplete feature extraction in silicone surface defect detection are solved, achieving high-precision and real-time defect detection.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHENZHEN BEYOUJIA ELECTRONIC TECH CO LTD
- Filing Date
- 2026-03-26
- Publication Date
- 2026-06-19
AI Technical Summary
Existing methods for detecting defects on silicone surfaces struggle to effectively separate illumination interference from defect texture when faced with highly reflective properties, resulting in a high false negative rate. Furthermore, methods based on convolutional neural networks neglect the advantages of phase information in the frequency domain and multi-scale geometric transformations, leading to incomplete feature extraction and weak resistance to illumination interference.
By improving the EdgeViT model and combining multimodal features, utilizing illumination-invariant phase features and Shearlet feature pyramids, and constructing sparse moment feature maps through multi-scale Shearlet transformation and polarization feature fusion, and combining non-matrix factorization and attention mechanisms, the long-distance dependencies of defect textures are captured, thereby achieving sparse feature reconstruction and enhanced anti-interference capabilities.
It effectively enhances the high-precision identification capability of small and complex defects on silicone surfaces, improves the real-time performance and anti-light interference capability of the detection system, and realizes efficient detection of diverse defects.
Smart Images

Figure CN122243985A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of computer vision technology, and in particular to a method for detecting surface defects in silicone products based on AI vision. Background Technology
[0002] With the rapid development of industrial automated production lines, silicone products are increasingly widely used in medical, electronic, and consumer goods fields, placing higher demands on the accuracy and efficiency of surface defect detection. Existing silicone surface detection methods mainly rely on traditional image processing algorithms or simple convolutional neural network models. Traditional image processing methods struggle to effectively separate illumination interference from defect textures when dealing with the highly reflective properties of silicone surfaces, resulting in a high false negative rate. While convolutional neural network-based methods improve feature extraction capabilities, they primarily rely on local pixel correlations in the spatial domain for feature learning, neglecting the robustness of phase information in the frequency domain to illumination changes and the advantages of multi-scale geometric transformations in representing singular textures. Silicone surface defects often exhibit characteristics such as variable scale, low contrast, and complex background noise. Single-modal feature extraction methods are insufficient to comprehensively describe the essential attributes of defects, easily leading to feature information redundancy or loss. Furthermore, conventional sparse coding methods often suffer from poor basis vector orthogonality and slow convergence speed when processing high-dimensional feature matrices, limiting the accuracy of defect feature reconstruction and the real-time performance of the detection system.
[0003] Therefore, how to provide a method for detecting surface defects in silicone products based on AI vision is a problem that urgently needs to be solved by those skilled in the art. Summary of the Invention
[0004] This invention proposes an AI vision-based method for detecting surface defects in silicone products. It improves the EdgeViT model by performing global context modeling and local feature enhancement on the fusion features of illumination-invariant phase features and Shearlet feature pyramids. An optimized attention mechanism captures the long-distance dependencies of defect textures. Based on this, the enhanced features are expanded into feature matrices, and basis vector matrices and coefficient matrices are randomly initialized. The Euclidean distance between the feature matrix and the reconstructed matrix is calculated as the objective function. Gradient descent is used to iteratively update the basis vector matrix and coefficient matrix, and non-negativity constraints are combined to extract the basis vector matrix representing the defect texture and the coefficient matrix representing the activation weights. This mechanism, through the improved EdgeViT model's deep extraction of multimodal features and the decoupling of sparse reconstruction of composite features, effectively enhances the expressive power and anti-interference ability of defect features, achieving high-precision identification of small and complex defects on silicone surfaces. This invention overcomes the limitations of traditional methods, such as incomplete feature extraction, weak resistance to illumination interference, and low recognition accuracy due to high feature coupling, providing an efficient solution for surface defect detection in silicone products.
[0005] A method for detecting surface defects in silicone products based on AI vision, according to an embodiment of the present invention, specifically includes:
[0006] S1. Acquire the original silicone surface image, construct a network flow graph and use the maximum flow minimum cut algorithm to separate the specular reflection and diffuse reflection components, fill the specular area with the diffuse reflection component, and output the reflection suppression image.
[0007] S2. Perform a two-dimensional discrete Fourier transform on the reflection suppression image, calculate the phase consistency value of each frequency component in the frequency domain, extract the illumination-invariant phase features, and reconstruct the output illumination-invariant feature sequence.
[0008] S3. Perform multi-scale Shearlet transform on the reflection suppression image, construct a sparse representation matrix using anisotropic shear wave coefficients, and generate a Shearlet feature pyramid.
[0009] S4. Calculate the polarization features of the reflection suppression image and match them with the Shearlet feature pyramid. Use the mutual information minimization criterion to remove feature redundancy and construct a multimodal decoupled composite feature map.
[0010] S5. Perform non-negative matrix decomposition on the multimodal decoupled composite feature map, extract the basis vector matrix and coefficient matrix, reconstruct the features based on the coefficient matrix, and output the defect enhancement feature map;
[0011] S6. After fusing the illumination-invariant feature sequence with the defect enhancement feature map, input the improved EdgeViT model. By constructing perspective-invariant geometric descriptors and orientation-anisotropic sparse sampling, global dependencies are captured. Local-global features are reorganized using formal background grid algebra operations. After nonlinear enhancement of the main semantic component and residual fusion, context dependencies are aggregated. After restoring spatial resolution, the defect mask and semantic category are output.
[0012] S7. Calculate the distribution state entropy based on the defect mask and semantic category, construct the constraint objective function, and solve for the optimal solution of the light source and analyzer by maximizing the information entropy. Iterate in reverse until the convergence condition is met.
[0013] Optionally, S1 specifically includes:
[0014] S11. Construct a network flow graph based on the original silicone surface image, map image pixels to network nodes, calculate the gray-level difference between adjacent pixels as a smoothing penalty term, and calculate edge weights by combining spatial adjacency relationships to generate a weighted undirected graph.
[0015] S12. In the weighted undirected graph, mark the specular reflection pixel as the source and the background pixel as the sink. Use the breadth-first search strategy to iteratively find the augmenting path from the source to the sink. Calculate the minimum residual capacity on the augmenting path and update the edge weights. When there is no augmenting path in the graph, define the set of edges between the set of reachable nodes and the set of unreachable nodes of the source as the minimum cut set. Divide the image pixels into specular reflection components and diffuse reflection components.
[0016] S13. Extract the texture direction information of the diffuse reflection component, calculate the gradient magnitude and color mean of the pixels around the specular reflection area, search for the image block with the smallest pixel value difference in the diffuse reflection component along the gradient direction, copy the searched image block into the interior of the specular reflection area, calculate the weighted average value of the pixel colors on both sides of the filling boundary to correct the pixel value, and output the reflection suppression image.
[0017] Optionally, S2 specifically includes:
[0018] S21. Perform a two-dimensional discrete Fourier transform on the reflection suppression image to convert the image from the spatial domain to the frequency domain, and calculate the local phase information and amplitude information of each frequency component in the frequency domain.
[0019] S22. Calculate the local energy by performing a sine and cosine weighted summation on the local phase information, calculate the frequency weighted sum by accumulating the amplitude information, calculate the ratio of the local energy to the frequency weighted sum, use the ratio as the phase consistency measurement result, eliminate the influence of light intensity changes, and output the light-invariant phase characteristics.
[0020] S23. Perform an inverse Fourier transform on the illumination-invariant phase features to reconstruct the spatial feature map, flatten the spatial feature map into a one-dimensional vector in row scanning order, and output the illumination-invariant feature sequence.
[0021] Optionally, S3 specifically includes:
[0022] S31. Perform Laplacian pyramid multi-scale decomposition on the reflection suppression image to obtain multi-resolution sub-band images, perform directional shearing operation on the multi-resolution sub-band images, and calculate the anisotropic shear wave coefficients of each directional sub-band.
[0023] S32. Calculate the mean and standard deviation of the anisotropic shear wave coefficients, add the mean to the preset multiple of the standard deviation to obtain the statistical threshold, set the coefficients with absolute values less than the statistical threshold to zero, retain the coefficients with absolute values greater than or equal to the statistical threshold, perform non-negative sparse coding on the retained coefficients, and construct a sparse representation matrix.
[0024] S33. The sparse representation matrix is reorganized and feature mapped according to the scale level to generate the Shearlet feature pyramid.
[0025] Optionally, S4 specifically includes:
[0026] S41. Calculate the Stokes parameter vector based on the reflection suppression image, calculate the degree of polarization and polarization angle of the light wave according to the Stokes parameter vector, and generate a polarization feature map by mapping.
[0027] S42. Align the polarization feature map with the Shearlet feature pyramid in terms of spatial scale and channel matching to construct a multimodal feature pair;
[0028] S43. Calculate the joint probability distribution and marginal probability distribution of the multimodal feature pairs, calculate the mutual information value based on the joint probability distribution and the marginal probability distribution, minimize the mutual information value using the gradient descent method to remove feature redundancy, and construct a multimodal decoupled composite feature map.
[0029] Optionally, S5 specifically includes:
[0030] S51. Expand the multimodal decoupled composite feature map into a feature matrix, randomly initialize the basis vector matrix and coefficient matrix, calculate the Euclidean distance between the feature matrix and the reconstruction matrix as the objective function, calculate the partial derivatives of the objective function with respect to the basis vector matrix and the coefficient matrix respectively, subtract the partial derivatives multiplied by the preset step size from the current basis vector matrix and coefficient matrix to obtain the updated matrix, force the values of elements less than zero in the updated matrix to zero, and output the basis vector matrix representing the defect texture basis and the coefficient matrix representing the activation weight when the objective function converges.
[0031] S52. Perform sparsity constraint and highlight enhancement processing on the coefficient matrix, retain the defect feature response and suppress the background noise response to generate an optimized coefficient matrix;
[0032] S53. Perform matrix multiplication on the basis vector matrix and the optimization coefficient matrix to reconstruct the feature space and output the defect enhancement feature map.
[0033] Optionally, the improved EdgeViT model includes a feature embedding layer, a sparse sampling attention layer, a local-global feature reorganization layer, a context-dependent aggregation layer, and a defect prediction head:
[0034] The feature embedding layer is used to map the illumination-invariant feature sequence back to the spatial dimension and concatenate it with the defect enhancement feature map, extract local feature point sets and construct point cluster topology, calculate the cross ratio sequence of point clusters in the projective transformation space to generate perspective-invariant geometric descriptors, and output sequence tensors by concatenating them with the original feature tensors.
[0035] The sparse sampling attention layer is used to receive the sequence tensor to generate query, key, and value tensors, use dual-tree complex wavelet transform to analyze the query tensor to construct a directional energy spectrum to generate an anisotropic sampling mask, perform directional consistency sparse extraction on the key and value tensors, and calculate scaled dot product attention to output a sparse global attention tensor.
[0036] The local-global feature reconstructing layer is used to extract local texture features of the sequence tensor to obtain a local detail tensor, map the local detail tensor and the sparse global attention tensor to the formal background space, establish a partial order relationship of feature attributes, extract common and merge difference features, and output the reconstructed context tensor through grid algebra operations.
[0037] The context-dependent aggregation layer is used to receive the recombined context tensor, perform eigenvalue decomposition, parse out the main semantic component matrix and the detail component matrix, perform nonlinear enhancement only on the main semantic component matrix, recombines it with the detail component matrix in the spectral domain, and fuses it with the sequence tensor residual to output the dependency-enhanced feature tensor.
[0038] The defect prediction head is used to receive the dependency enhancement feature tensor, restore the spatial resolution using the reconstruction upsampling operator based on operator spectral decomposition, and process it through two independent convolutional branches to output the defect mask tensor and semantic category probability vector.
[0039] Optionally, S7 specifically includes:
[0040] S71. Calculate the set of defect pixel coordinates based on the defect mask, statistically analyze the spatial distribution histogram of pixel coordinates and calculate the spatial position entropy, and simultaneously statistically analyze the frequency probability of each category based on the semantic category and calculate the category distribution entropy. Then, perform a weighted summation of the spatial position entropy and the category distribution entropy and output the distribution state entropy.
[0041] S72. Construct an objective function for light intensity and polarization angle with the distribution state entropy as the dependent variable, set the threshold range of light intensity and the range of polarization angle as constraints, perform global search and iterative optimization in the parameter space defined by the constraints, update the parameters until the distribution state entropy reaches a maximum value, and output the optimal configuration parameters of the light source and the analyzer.
[0042] S73. Adjust the working state of the light source and analyzer in reverse iteration according to the optimal configuration parameters, obtain the updated image data and calculate the distribution state entropy of the current frame in real time. When the rate of change of the distribution state entropy of the current frame is less than a preset threshold, it is determined that the convergence condition is met.
[0043] The beneficial effects of this invention are:
[0044] (1) This invention establishes a sparse representation and significant enhancement system for multi-dimensional defect features through multi-modal feature decoupling and non-negative matrix factorization enhancement mechanisms. A sparse representation matrix is constructed using anisotropic shear wave coefficients obtained through Shearlet transform. Redundancy is removed by combining polarization features and mutual information minimization criteria to generate a multi-modal decoupled composite feature map. Based on non-negative matrix factorization, a basis vector matrix representing the defect texture basis and a coefficient matrix representing the activation weights are extracted. Sparsity constraints and highlighting enhancement are applied to the coefficient matrix. This system utilizes frequency domain phase consistency to extract illumination-invariant features. Combining polarization physical features and mathematical transformations, it achieves a mapping from the feature space to the basis vector space, effectively suppressing background noise response and highlighting weak defect features.
[0045] (2) This invention achieves global dependency capture and precise context aggregation of defects on complex surfaces by improving the EdgeViT model and the perspective-invariant geometric descriptor construction mechanism. A point cluster topology is constructed, and the cross-ratio sequence in the projective transformation space is calculated to generate a perspective-invariant geometric descriptor. An anisotropic sampling mask is generated using the analytical direction energy map of the dual-tree complex wavelet transform. The local detail tensor and the global attention tensor are mapped to the formal background space, a partial order relationship of feature attributes is established, and lattice algebra operations are performed to reconstruct the features. This mechanism solves the feature matching problem caused by the imaging distortion of complex surfaces through geometric invariance constraints and formal concept analysis, achieving deep fusion of local texture details and global semantic information.
[0046] (3) This invention establishes an adaptive parameter optimization and steady-state maintenance system for the defect detection system through the construction of distributed state entropy and an active imaging closed-loop control mechanism. Based on the defect mask and semantic category, spatial position entropy and category distribution entropy are calculated, and a constrained objective function with respect to illumination intensity and polarization angle is constructed. The optimal solution for the light source and analyzer is solved by maximizing information entropy and then iteratively adjusted in reverse. This system uses information entropy as a quantitative indicator of imaging quality, dynamically adjusts optical imaging conditions to match the optimal imaging posture for different defect types, ensures that the system is always in the best detection state, and significantly improves the detection rate and robustness for diverse defects. Attached Figure Description
[0047] The accompanying drawings are provided to further illustrate the invention and form part of the specification. They are used in conjunction with embodiments of the invention to explain the invention and do not constitute a limitation thereof. In the drawings:
[0048] Figure 1 This is an overall flowchart of a method for detecting surface defects in silicone products based on AI vision, as proposed in this invention.
[0049] Figure 2This is a flowchart illustrating the working principle of the improved EdgeViT model, which is based on AI vision for detecting surface defects in silicone products, as proposed in this invention. Detailed Implementation
[0050] The invention will now be described in further detail with reference to the accompanying drawings. These drawings are simplified schematic diagrams, illustrating only the basic structure of the invention, and therefore only show the components relevant to the invention.
[0051] refer to Figure 1 and Figure 2 A method for detecting surface defects in silicone products based on AI vision, specifically including:
[0052] S1. Acquire the original silicone surface image, construct a network flow graph and use the maximum flow minimum cut algorithm to separate the specular reflection and diffuse reflection components, fill the specular area with the diffuse reflection component, and output the reflection suppression image.
[0053] S2. Perform a two-dimensional discrete Fourier transform on the reflection suppression image, calculate the phase consistency value of each frequency component in the frequency domain, extract the illumination-invariant phase features, and reconstruct the output illumination-invariant feature sequence.
[0054] S3. Perform multi-scale Shearlet transform on the reflection suppression image, and construct a sparse representation matrix using anisotropic shear wave coefficients to generate a Shearlet feature pyramid.
[0055] S4. Calculate the polarization features of the reflection-suppressed image and match them with the Shearlet feature pyramid. Use the mutual information minimization criterion to remove feature redundancy and construct a multimodal decoupled composite feature map.
[0056] S5. Perform non-negative matrix decomposition on the multimodal decoupled composite feature map, extract the basis vector matrix and coefficient matrix, reconstruct the features based on the coefficient matrix, and output the defect enhancement feature map;
[0057] S6. After fusing the illumination-invariant feature sequence with the defect-enhanced feature map, input the improved EdgeViT model. By constructing a perspective-invariant geometric descriptor and anisotropic sparse sampling, global dependencies are captured. Local-global features are reorganized using formal background grid algebra operations. After nonlinear enhancement of the main semantic component and residual fusion, context dependencies are aggregated. After restoring the spatial resolution, the defect mask and semantic category are output.
[0058] S7. Calculate the distribution state entropy based on the defect mask and semantic category, construct the constraint objective function, and solve the optimal solution of the light source and analyzer by maximizing the information entropy. Iterate in reverse until the convergence condition is met.
[0059] In this embodiment, S1 specifically includes:
[0060] S11. Read the coordinates and grayscale values of each pixel in the original silicone surface image, uniquely map each pixel to a network node in the network flow graph, and construct a spatial adjacency table based on the four neighboring regions (up, down, left, and right). Calculate the absolute value of the difference in grayscale values between adjacent pixels as the grayscale difference, take the mathematical constant e as the base and the opposite of the grayscale difference as the exponent to perform a power operation to obtain the exponent value, divide the value 1 by the exponent value to calculate the smoothing penalty term. Set the spatial proximity weight coefficient to 0.05 and the grayscale similarity weight coefficient to 10, divide the spatial proximity weight coefficient by the Euclidean distance between adjacent pixels to obtain the spatial weight, multiply the grayscale similarity weight coefficient by the smoothing penalty term to obtain the grayscale weight, sum the spatial weight and grayscale weight to calculate the edge weight, and perform bidirectional connections on all adjacent nodes to generate a weighted undirected graph.
[0061] S12. Traverse the image pixels and calculate the product of color saturation and brightness for each pixel. Set the saturation threshold to 0.6 and the brightness threshold to 0.85. Filter pixels whose saturation and brightness products are greater than the threshold as specular reflection pixels and mark them as source points. Select the grayscale mean of the 5×5 pixel area at the four corners of the image as background pixels and mark them as sink points. Use the breadth-first search strategy to construct a hierarchical queue and iteratively find the augmenting path from the source point to the sink point. Calculate the minimum value of all edge weights on the augmenting path as the minimum residual capacity. Update the edge weights by subtracting the minimum residual capacity from the current weight of each edge on the path. When there is no augmenting path in the graph, extract the edge set between the set of reachable nodes and the set of unreachable nodes of the source point and define it as the minimum cut set. Divide the corresponding pixel set into specular reflection component and diffuse reflection component.
[0062] S13. Calculate the gradient magnitude and direction angle within the local window of the pixel in the diffuse reflection component to extract texture direction information. Statistically analyze the RGB color mean and gradient direction distribution of the pixels surrounding the edge of the specular reflection region. Set the search radius to 15 pixels and the matching block size to 7×7 pixels. Slide along the gradient direction in the diffuse reflection component to search for the image block with the smallest sum of squared grayscale differences between the pixels at the edge of the specular reflection region and the pixel at the edge. Copy the optimal matching image block to the corresponding position inside the specular reflection region. Set the center weight coefficient to 0.6. Calculate the weighted average of the colors of the pixels inside and outside the filling boundary. Replace the original values of the pixels at the boundary with the weighted average and output the reflection suppression image.
[0063] The reflection suppression process based on graph cut theory proposed in this step is similar to traditional color space threshold segmentation or morphological filtering methods in that it uses image pixels as processing units. It aims to achieve physical separation between specular reflection highlight areas and effective diffuse reflection areas through pixel-level attribute discrimination and region division, and finally restore the texture information of the suppressed areas through image inpainting techniques.
[0064] The difference lies in that this invention breaks away from the limitations of traditional methods that rely solely on single color features (such as RGB thresholding or YCbCr spatial clustering) for hard decision-making or ignore the topological relationships between regions. Instead of directly isolating pixels in traditional models, this invention adds network flow graph construction and edge weight calculation steps, mapping image pixels to network nodes. It constructs a smoothing penalty term using the gray-level difference between adjacent pixels and quantizes edge weights by combining spatial adjacency relationships, rather than using only color distance. In the region separation step, the maximum flow minimum cut algorithm replaces traditional threshold segmentation. It iteratively searches for augmenting paths and updates the residual network through breadth-first search, solving for the globally optimal cut set by minimizing the energy function. In the image restoration step, it uses the gradient direction information of the diffuse component to guide the image patch search and employs weighted color mean correction to fill boundaries, rather than simple neighborhood mean filtering.
[0065] The beneficial effects of the improvements are that by constructing a weighted undirected graph and introducing a smoothing penalty term, this invention can integrate the spatial adjacency relationship between pixels and the gray-level abrupt change characteristics into the reflection component separation process. This breaks through the limitations of traditional methods that are sensitive to the shape of the highlight region and are easily affected by noise, and achieves a leap from local pixel discrimination to global energy optimization. This design significantly improves the boundary accuracy of specular reflection region segmentation and effectively avoids over-segmentation or under-segmentation. The gradient-guided image block filling mechanism not only preserves the local texture structure of the image, but also eliminates repair traces through boundary weighting correction, greatly improving the visual consistency of the reflection suppression image and the accuracy of subsequent feature extraction.
[0066] In this embodiment, S2 specifically includes:
[0067] S21. Read the grayscale values of each pixel in the reflection suppression image to construct a two-dimensional input matrix. Convolve the two-dimensional input matrix with the preset cosine transform basis vector and sine transform basis vector respectively to obtain the real part value matrix and the imaginary part value matrix. Calculate the sum of the squares of the real part values and the squares of the imaginary part values, and perform a square root operation on the sum to obtain the amplitude information. Use the arctangent function to calculate the value of the imaginary part value divided by the real part value, and use the obtained angle value as the local phase information.
[0068] S22. Construct a multi-scale, multi-directional Log-Gabor filter bank, setting the number of scales to 4 and the number of directions to 6. Convolve the real and imaginary numerical matrices with the filter bank respectively, calculate the even-symmetric and odd-symmetric energy components in each direction at each scale, and take the square root of the sum of the squares of the even-symmetric and odd-symmetric energy components to obtain the local energy value. Accumulate and sum the amplitude information at all scales to obtain the frequency-weighted sum value, calculate the ratio of the local energy value to the frequency-weighted sum value, take the difference between this ratio and the preset noise threshold, truncate the difference to a non-negative number, and output the illumination-invariant phase feature.
[0069] S23. Perform inverse Fourier transform on the real and imaginary numerical matrices of the illumination-invariant phase features to map the frequency domain complex data back to the spatial domain and generate a spatial feature map; traverse the value of each pixel in the spatial feature map in a row scanning order from left to right and from top to bottom; construct a one-dimensional array with a length equal to the total number of pixels in the image, and store the extracted values in the corresponding positions of the one-dimensional array in sequence to output the illumination-invariant feature sequence.
[0070] In this embodiment, S3 specifically includes:
[0071] S31. Read the pixel values of the reflection suppression image, smooth the image using a Gaussian low-pass filter and perform downsampling to construct a Gaussian pyramid; calculate the pixel difference between adjacent layers of the Gaussian pyramid to obtain the multi-resolution sub-band image of the Laplacian pyramid; construct a directional shearing filter bank, perform shear wave transform on the multi-resolution sub-band image, calculate the real and imaginary response values of pixels in each directional sub-band, solve for the square root of the sum of the squares of the real and imaginary parts, and output the anisotropic shear wave coefficients.
[0072] S32. Sum all values in the anisotropic shear wave coefficient set and divide by the total number of coefficients to calculate the mean. Calculate the square of the difference between each coefficient value and the mean and sum them to obtain the variance. Perform the square root operation on the variance to obtain the standard deviation. Set a preset multiplier of 3 and add 3 times the standard deviation to the mean value to obtain the statistical threshold. Traverse all shear wave coefficients and set the coefficient values with absolute values less than the statistical threshold to zero, retaining the coefficient values with absolute values greater than or equal to the statistical threshold. Perform non-negative matrix decomposition on the retained coefficients, extract the product of the basis vector matrix and the coefficient matrix as the non-negative sparse coding result, and construct a sparse representation matrix.
[0073] S33. Read the coefficient data of different scale levels in the sparse representation matrix, and reorganize and stitch them according to the image pyramid level order; perform upsampling operation on the reorganized coefficient data to restore the original image size, and use bilinear interpolation algorithm to fill the pixel gaps; cascade and stitch the feature maps restored at different scales in the channel dimension to generate a Shearlet feature pyramid containing multi-scale edge texture information.
[0074] In this embodiment, S4 specifically includes:
[0075] S41. Read the light intensity values of the reflection suppression image in four polarization directions: 0°, 45°, 90°, and 135°. Calculate the sum of the light intensity values in the 0° and 90° directions as the Stokes parameter S0 component, calculate the difference between the light intensity values in the 0° and 90° directions as the Stokes parameter S1 component, and calculate the difference between the light intensity values in the 45° and 135° directions as the Stokes parameter S2 component. Calculate the sum of the squares of the S1 component values and the S2 component values, and perform a square root operation on this sum to obtain an intermediate variable. Calculate the intermediate variable divided by the S0 component value to obtain the degree of polarization of the light wave. Calculate the arctangent of half the S2 component value divided by the S1 component value to obtain the polarization angle. Linearly map the degree of polarization and polarization angle values to the integer range of 0 to 255, respectively, to generate a polarization feature map.
[0076] S42. Read the feature map dimensions at different scales in the Shearlet feature pyramid, perform bilinear interpolation on the polarization feature map, adjust the number of rows and columns of the polarization feature map to align it with the scale of the Shearlet feature pyramid one by one; set the channel stitching rules, copy and expand the size-aligned polarization feature map to match the number of channels of the Shearlet feature pyramid, and stitch the channels according to the pixel positions one by one to construct a multimodal feature pair containing polarization texture information and geometric edge information.
[0077] S43. Construct a two-dimensional histogram by statistically analyzing the frequency of occurrence of the two types of feature strengths in a multimodal feature pair. Calculate the joint probability distribution by dividing the value of each cell in the two-dimensional histogram by the total number of pixels. Summate the joint probability distribution along the horizontal and vertical axes to obtain the marginal probability distribution. Iterate through all feature values, calculate the joint probability distribution value divided by the logarithm of the product of the two marginal probability distribution values, and sum all the logarithmic values with weights to obtain the mutual information value. Set minimizing the mutual information as the optimization objective, calculate the partial derivative of the mutual information value with respect to the feature value as the gradient, and use the Adam algorithm to iteratively update the feature parameters to reduce the mutual information value, remove redundant information between features, and output a multimodal decoupled composite feature map.
[0078] The multimodal decoupled composite feature map construction process proposed in this step is similar to traditional multi-scale feature fusion methods in that both aim to integrate image feature information from different sources or at different scales, enhance feature representation capabilities through feature stitching or weighted fusion strategies, and both need to address the spatial alignment and dimensionality matching issues between features.
[0079] The difference lies in that this invention breaks away from the limitations of traditional methods that rely solely on concatenating feature vectors or element-wise weighted sums for information overlay, ignoring statistical correlations and information redundancy between features. Building upon the traditional model's direct feature fusion, this invention adds a mutual information minimization decoupling step. This step calculates the joint probability distribution and marginal probability distribution of multimodal feature pairs, constructing a statistical dependency metric rather than solely relying on the geometric distance of feature values. In the feature fusion step, gradient descent is used to iteratively optimize the mutual information value, actively removing redundant information by minimizing statistical correlation, rather than passively retaining all feature components. Finally, in the feature output step, the multimodal decoupled composite feature map is reconstructed based on the decoupled feature distribution, rather than a high-dimensional redundant feature matrix.
[0080] The beneficial effect of the improvement is that the present invention, through the mutual information minimization criterion, can accurately quantify and eliminate the statistical redundancy between physical optical features and mathematical transformation features from the information theory level. It breaks through the limitations of traditional fusion methods, which suffer from information superposition saturation and waste of computing resources due to the high correlation of features, and realizes the essential transformation from "data stacking" to "information complementarity". This design significantly improves the compactness and discriminativeness of feature expression, effectively avoids the accumulation and amplification of noise in redundant features, and enhances the system's ability to capture small defect features and its anti-interference robustness under complex lighting and texture backgrounds.
[0081] In this embodiment, S5 specifically includes:
[0082] S51. Read the pixel values of the multimodal decoupled composite feature map, and reshape the two-dimensional feature map into a two-dimensional feature matrix according to the row scanning order; randomly generate matrices with values between 0 and 1 as the initial basis vector matrix and the initial coefficient matrix, respectively; calculate the matrix product of the basis vector matrix and the coefficient matrix to obtain the reconstruction matrix, and calculate the sum of squares of the differences between corresponding elements of the feature matrix and the reconstruction matrix as the objective function value; solve the gradient partial derivatives of the objective function with respect to the basis vector matrix and the coefficient matrix, respectively, and subtract the gradient partial derivatives multiplied by the preset step size of 0.01 from the current basis vector matrix and the coefficient matrix to obtain the updated matrix; traverse each element in the updated matrix, force the elements with values less than zero to zero, and perform non-negativity constraint operation; determine whether the difference between the current objective function value and the previous iteration value is less than the preset convergence threshold of 0.0001. If it is less, stop the iteration and output the basis vector matrix representing the defect texture basis and the coefficient matrix representing the activation weight.
[0083] S52. Calculate the absolute value of all values in the coefficient matrix and arrange them in descending order. Select the values whose arrangement number is 5% of the total number as the sparsity threshold. Set the elements in the coefficient matrix whose absolute value is less than the sparsity threshold to zero and perform sparsity constraint operation. Calculate the mean of the non-zero elements in the coefficient matrix. Multiply the elements in the coefficient matrix whose value is greater than three times the mean by the gain coefficient 2 for highlight enhancement and generate an optimized coefficient matrix.
[0084] S53. Read the basis vector matrix and the optimization coefficient matrix, calculate the matrix product of the basis vector matrix and the optimization coefficient matrix to obtain the reconstructed two-dimensional feature matrix; rearrange the values of each row of the two-dimensional feature matrix according to the original image width to the image row pixel values, restore the spatial size of the original image, and output the defect enhancement feature map.
[0085] In this embodiment, the improved EdgeViT model includes a feature embedding layer, a sparse sampling attention layer, a local-global feature reorganization layer, a context-dependent aggregation layer, and a defect prediction head:
[0086] The feature embedding layer is used to read the illumination-invariant feature sequence, rearrange the sequence values into a two-dimensional matrix according to the width and height of the original image, and restore the spatial dimension of the illumination-invariant feature map. It then reads the defect enhancement feature map and concatenates the illumination-invariant feature map and the defect enhancement feature map along the channel dimension to construct a dual-channel fusion feature map. The layers iterate through the pixels in the dual-channel fusion feature map, calculating the sum of the squares of the differences between each pixel and its eight neighboring pixels as the response value. A response threshold of 1000 is set, and the coordinates of pixels with response values greater than the threshold are retained as feature points, outputting a local feature point set. Finally, the layers iterate through the local feature point set, calculating the Euclidean distance between any two feature points, and connecting feature points with a distance less than a preset distance threshold of 10 pixels to construct a point cluster topology map. In the point cluster topology graph, four feature points located on the same straight line are selected. The distance product between the first and third feature points is calculated sequentially, followed by the distance product between the second and fourth feature points. The first product is divided by the second product to obtain the cross ratio value. This process is repeated for all four-point combinations that meet the conditions to generate a cross ratio sequence. The cross ratio sequence is then sorted by value to construct a perspective-invariant geometric descriptor vector. This vector is then extended to the same dimension as the dual-channel fused feature map to construct a descriptor feature tensor. Finally, the descriptor feature tensor and the dual-channel fused feature map are concatenated along the channel dimension to output a sequence tensor.
[0087] A sparse sampling attention layer is used to read the numerical values of the sequence tensor. Three weight matrices with different parameters are constructed. The sequence tensor is multiplied by the three weight matrices to generate query tensors, key tensors, and value tensors. A dual-tree complex wavelet transform is performed on the query tensor to decompose it into high-frequency complex coefficient subbands of different scales and directions. The square root of the sum of the squares of the real and imaginary parts of the complex coefficients of each subband is calculated as the directional energy value. The directional energy values are mapped back to the spatial dimension and normalized to the interval of 0 to 1 to generate a directional energy map. The energy retention ratio is set to 0.7. Regions in the directional energy map with values greater than or equal to this ratio are retained, and the values at other positions are set to zero to construct an anisotropic sampling mask. Multiply the anisotropic sampling mask by the key tensor to extract the key tensor feature vectors corresponding to the non-zero positions of the mask, obtaining a sparse key tensor. Multiply the anisotropic sampling mask by the value tensor to extract the value tensor feature vectors corresponding to the non-zero positions of the mask, obtaining a sparse value tensor. Calculate the matrix product between the query tensor and the transpose of the sparse key tensor to obtain the relevance score matrix. Calculate the result of dividing each row value of the relevance score matrix by the maximum value of that row to obtain a normalized relevance matrix. Perform natural exponentiation on each element of the normalized relevance matrix, calculate the sum of the exponentiation results for each row, and divide each natural exponentiation result by the sum of the corresponding rows to obtain the attention weight matrix. Calculate the matrix product between the attention weight matrix and the sparse value tensor, and reshape the weighted feature vectors according to the spatial structure of the original sequence tensor to output a sparse global attention tensor.
[0088] The local-global feature reconstruction layer reads the numerical values of the sequence tensor and constructs a Gabor filter bank with 4 scales and 8 directions. The sequence tensor is convolved with the filter bank, and the square root of the squared filter response amplitude is calculated to obtain the texture response value. The maximum texture response value at all scales and directions is extracted to construct the local detail tensor. The local detail tensor and the sparse global attention tensor are read, and the average value of all elements in the tensor is calculated as the binarization threshold. Each element in the tensor is traversed, and elements with values greater than or equal to the binarization threshold are assigned a logical true value, while elements with values less than the threshold are assigned a logical false value, generating local detail formal backgrounds and global attention formal backgrounds. Priority weights are defined for each attribute dimension in the formal backgrounds. The logical states of elements at the same position in the two formal backgrounds are compared. If both are logically true, they are marked as strong common features; if the states are different, they are marked as differential features. Attributes are sorted according to their priority weights to establish a partial order relationship between feature attributes. Traverse all nodes in the partial order relation, calculate the intersection and union of node pairs with hierarchical relationships in the grid space, merge nodes with the same intersection result into an equivalence class, and complete the extraction of common features and the merging of difference features; construct node connection rules, if one equivalence class is contained in another equivalence class, generate connection edges, and construct the concept grid structure; traverse all nodes in the concept grid structure, extract the attribute feature vectors corresponding to the nodes, concatenate them in hierarchical order, and output the recombined context tensor.
[0089] The context-dependent aggregation layer reads the numerical values of the reconstructed context tensor, calculates the covariance matrix of the tensor in the channel dimension, and solves for the eigenvalues and eigenvectors of the covariance matrix. The eigenvalues are arranged in descending order, and the eigenvectors corresponding to the top five eigenvalues are selected to construct the main semantic projection matrix. The eigenvectors corresponding to the sixth to last eigenvalues are selected to construct the detail projection matrix. The projection values of the reconstructed context tensor onto the main semantic projection matrix are calculated to obtain the main semantic component matrix, and the projection values of the reconstructed context tensor onto the detail projection matrix are calculated to obtain the detail component matrix. Each element in the main semantic component matrix is traversed, and the cube of the element value is calculated as the augmented value. The difference between the augmented value and the maximum absolute value of the element value is calculated. If the difference is greater than zero, the augmented value is retained; if the difference is less than or equal to zero, the original element value is retained, resulting in the non-linearly augmented main semantic component matrix. The detail component matrix is read, and a two-dimensional fast Fourier transform is performed on it to map the spatial domain data to the frequency domain, obtaining the detail frequency domain matrix. A two-dimensional fast Fourier transform is performed on the nonlinearly enhanced main semantic component matrix to map the spatial domain data to the frequency domain, obtaining the main semantic frequency domain matrix. The complex product of the corresponding positions in the main semantic frequency domain matrix and the detail frequency domain matrix is calculated to achieve spectral domain reconstruction. A two-dimensional inverse fast Fourier transform is performed on the product result to map the frequency domain data back to the spatial domain, obtaining the reconstructed feature matrix. The sequence tensor is read, and the element-wise difference between the reconstructed feature matrix and the sequence tensor at corresponding positions is calculated to obtain the residual feature matrix. The element-wise values of the reconstructed feature matrix and the residual feature matrix at corresponding positions are added together to output the dependency-enhanced feature tensor.
[0090] The defect prediction head reads the numerical values of the dependency augmentation feature tensor and constructs an upsampled convolution kernel matrix with a size 4 times that of the input channels. Singular value decomposition is performed on the convolution kernel matrix to obtain a left singular vector matrix, a singular value diagonal matrix, and a right singular vector matrix. The first 50% of the column vectors of the left singular vector matrix are extracted to construct a low-rank left basis matrix, the first 50% of the row vectors of the right singular vector matrix are extracted to construct a low-rank right basis matrix, and the first 50% of the diagonal elements of the singular value diagonal matrix are extracted to construct a low-rank singular value matrix. The product of the low-rank left basis matrix, the low-rank singular value matrix, and the transpose of the low-rank right basis matrix is calculated to obtain the reconstructed low-rank upsampled convolution kernel. The reconstructed low-rank upsampled convolution kernel is used to perform a transpose convolution operation on the dependency augmentation feature tensor with a stride of 2 and padding of 1, outputting an upsampled feature tensor with a spatial resolution magnified by a factor of 2. The first convolutional branch is constructed, with a kernel size of 3x3, a stride of 1, and padding of 1. The kernel weights are initialized, and the upsampled feature tensor is convolved with the kernel to obtain the first feature map. A first layer-by-layer normalization layer is constructed, which calculates the mean and variance of the first feature map along the channel dimension. The mean and variance are used to normalize the first feature map, which is then multiplied by a scaling factor and a bias factor is added to obtain the normalized feature map. A first activation function layer is constructed, which reads each element in the normalized feature map. If the element value is greater than zero, it remains unchanged; if the element value is less than or equal to zero, it is set to zero, and the activation feature map is output. A second convolutional layer is constructed, with a kernel size of 1x1, a stride of 1, and padding of 0. The activation feature map is convolved with the kernel, and the defect mask tensor is output. The process involves constructing a second convolutional branch, setting the third convolutional kernel size to 3x3, stride to 1, and padding to 1, initializing the weights of the third convolutional kernel, and performing a convolution operation between the upsampled feature tensor and the third convolutional kernel to obtain the second feature map. A second layer-by-layer normalization layer is then constructed to calculate the mean and variance of the second feature map along the channel dimension. The mean and variance are used to normalize the second feature map, which is then multiplied by a scaling factor and a bias factor to obtain the second normalized feature map. A second activation function layer is constructed, reading each element from the second normalized feature map. If the element value is greater than zero, it remains unchanged; if the element value is less than or equal to zero, it is set to zero, and the second activation feature map is output. A global average pooling layer is constructed to calculate the average value of all pixels in the spatial dimension of the second activation feature map, obtaining the global description vector. A fully connected layer is constructed, a weight matrix is built, and the matrix product of the global description vector and the weight matrix is calculated, outputting the category score vector. A normalized exponential function operation is performed on the category score vector, calculating the natural exponent value of each element and dividing it by the sum of the natural exponent values of all elements, outputting the semantic category probability vector.
[0091] The improved EdgeViT model proposed in this step is similar to the traditional EdgeViT model in that it follows the basic paradigm of the Transformer architecture. That is, the input data is mapped to a high-dimensional feature space through the feature embedding layer, long-distance dependencies are captured by the self-attention mechanism, and features are aggregated and transformed through the feedforward network and residual connection. Finally, the prediction result is output through the task head.
[0092] The difference lies in that this invention breaks away from the limitations of traditional methods that rely solely on absolute position encoding or regular grid sampling to perceive spatial structure, and on simply using feedforward networks for feature transformation. Building upon the traditional model's direct use of linear mapping to generate query, key, and value vectors, this invention adds a perspective-invariant geometric descriptor construction step. It calculates and generates geometric descriptors using the cross-ratio of point cluster topology and projective transformation space, rather than solely relying on position encoding superposition. In the attention calculation step, an anisotropic sampling mask is generated using the dual-tree complex wavelet transform analytical direction energy map to perform directional consistency sparse extraction of key and value tensors, rather than regular square window sampling. In the feature reorganization step, formal background grid algebra operations are introduced to map local details and global attention to the formal concept space for partial order relation reasoning, rather than simple vector concatenation. In the feature enhancement step, the main semantic components are extracted through eigenvalue decomposition and reorganized in the spectral domain after nonlinear enhancement, rather than uniform activation of the channel dimension.
[0093] The beneficial effects of the improvements are that this invention, through perspective-invariant geometric descriptors and anisotropic sparse sampling, can integrate the topological invariance of projective geometry and multi-directional texture structure into the attention mechanism, breaking the limitation of traditional methods in the sluggish perception of geometric structures in complex deformation and multi-angle imaging scenarios, and realizing the leap from regular visual perception to perspective-invariant geometric reasoning; the introduction of formal background grid algebra operations enables the model to accurately separate common features and differential features at the logical level, effectively solving the semantic gap problem in the process of fusing local details and global semantics; the nonlinear enhancement mechanism of the main semantic component realizes the key enhancement of key defect features and the effective suppression of background noise, significantly improving the geometric robustness and semantic discrimination accuracy of the model in the task of detecting defects on complex silicone surfaces.
[0094] In this embodiment, S7 specifically includes:
[0095] S71. Read the values of the defect mask tensor, traverse the pixel positions where the mask value is greater than zero, extract the horizontal and vertical coordinate values and combine them into coordinate pairs to construct a set of defect pixel coordinates; divide the image space into non-overlapping grid regions, traverse the set of defect pixel coordinates, count the number of defect pixels contained in each grid region, and calculate the proportion of the count in each grid region to the total number of defect pixels as the grid probability value; calculate the logarithm of each grid probability value to the base 2 and multiply it by the negative of the grid probability value, sum the calculation results of all grids to obtain the spatial location entropy. Read the semantic category probability vector, extract the category probability value corresponding to each element in the vector, count the frequency of each category in the current sample as the category probability; calculate the logarithm of each category probability value to the base 2 and multiply it by the negative of the category probability value, sum the calculation results of all categories to obtain the category distribution entropy. Set the weighting coefficient of spatial location entropy to 0.6 and the weighting coefficient of category distribution entropy to 0.4. Calculate the result of multiplying the spatial location entropy value by the weighting coefficient 0.6 and the result of multiplying the category distribution entropy value by the weighting coefficient 0.4. Add the two product results and output the distribution state entropy.
[0096] S72. Using the distribution state entropy value as the dependent variable of the objective function, set the search lower limit for the illumination intensity parameter to 100 lux and the upper limit to 1000 lux, and the search lower limit for the polarization angle parameter to 0 degrees and the upper limit to 180 degrees, constructing a parameter search space containing both illumination intensity and polarization angle dimensions. Within the parameter search space, divide the grid with a fixed step size of 5 lux and 1 degree. Traverse all grid nodes, reading the illumination intensity and polarization angle values of the current node; calculate the square of the difference between the current illumination intensity value and the reference illumination intensity value, calculate the square of the difference between the current polarization angle value and the reference polarization angle value, add the two squares and take the square root to obtain the parameter distance value; calculate the negative parameter distance value raised to the power of the natural constant e to obtain the parameter response coefficient; multiply the parameter response coefficient by the baseline distribution state entropy value to obtain the predicted distribution state entropy value of the current node; compare the predicted values of all grid nodes, and select the grid node with the largest value as the initial optimal parameter combination. Calculate the first-order difference between the illumination intensity and polarization angle values at the current optimal parameter combination, setting the learning rate to 0.01. Calculate the illumination intensity value plus the product of the first-order difference multiplied by the learning rate, and calculate the polarization angle value plus the product of the first-order difference multiplied by the learning rate to obtain the updated parameter values. Determine if the updated parameter values exceed the range defined by the search lower and upper limits. If the illumination intensity value is less than the lower limit, assign it the lower limit value; if the illumination intensity value is greater than the upper limit, assign it the upper limit value. Similarly, if the polarization angle value is less than the lower limit, assign it the lower limit value; if the polarization angle value is greater than the upper limit, assign it the upper limit value. Repeat the parameter update and boundary correction steps. Stop iterating when the change in the predicted distribution state entropy value calculated in ten consecutive iterations is less than 0.001. Extract the illumination intensity and polarization angle values of the current iteration, and output the optimal illumination intensity parameters of the light source and the optimal polarization angle parameters of the analyzer.
[0097] S73. Read the optimal illumination intensity parameter of the light source and the optimal polarization angle parameter of the analyzer. Calculate the difference between the optimal illumination intensity parameter and the current actual illumination intensity of the light source. Multiply the difference by the adjustment step coefficient to obtain the illumination intensity adjustment amount. Add the illumination intensity adjustment amount to the current actual illumination intensity of the light source and output the updated light source operating voltage value. Calculate the difference between the optimal polarization angle parameter and the current actual angle of the analyzer. Multiply the difference by the adjustment step coefficient to obtain the polarization angle adjustment amount. Add the polarization angle adjustment amount to the current actual angle of the analyzer and output the updated analyzer rotation angle value. The image acquisition device is triggered to acquire the image data at the current moment. The system iterates through the pixel values in the image data, extracts the coordinates of pixels with mask values greater than zero to construct the defect coordinate set for the current frame. The image space is divided into non-overlapping grid regions. The number of defect pixels in each grid region is counted, and the proportion of each grid region to the total number of defect pixels is calculated as the grid probability. The probability of each grid is calculated, its base-2 logarithm is multiplied by the negative of the grid probability, and the results of all grids are summed to obtain the spatial position entropy of the current frame. The semantic category probability vector of the current frame is read, the probability value of each category is extracted, the base-2 logarithm of each probability value is calculated, and the results of all categories are summed to obtain the category distribution entropy of the current frame. The spatial position entropy multiplied by a weighting coefficient of 0.6 and the category distribution entropy multiplied by a weighting coefficient of 0.4 are calculated to output the distribution state entropy of the current frame. Read the distribution state entropy value from the previous moment, calculate the absolute value of the difference between the current frame distribution state entropy and the previous moment distribution state entropy, divide the absolute value by the previous moment distribution state entropy value to obtain the rate of change of distribution state entropy; set a preset threshold of 0.01, compare the calculated rate of change value with the preset threshold value. If the rate of change value is greater than or equal to the preset threshold value, it is determined that the system has not met the convergence condition, and the adjustment and calculation steps continue; if the rate of change value is less than the preset threshold value, it is determined that the convergence condition is met, and the current light source operating voltage value and analyzer rotation angle value are locked.
[0098] Example 1: To verify the feasibility of this invention in the quality inspection of silicone products, the method of this invention was applied to the automated quality inspection production line of a well-known medical silicone catheter manufacturer (hereinafter referred to as "Company S"). In traditional silicone surface defect detection systems, threshold segmentation based on ordinary machine vision or conventional convolutional neural network algorithms are usually used. Due to the inherent high reflectivity and complex curved surface structure of silicone, these methods not only struggle to accurately identify defects such as microbubbles and scratches against a strong specular reflection background, but also cannot effectively overcome feature distortion caused by changes in imaging angle, easily leading to misjudgment or missed detection of defects. To solve the above problems, Company S decided to adopt the AI vision-based silicone product surface defect detection method proposed in this invention.
[0099] During implementation, Company S first acquired the raw image stream of the silicone catheter surface using a vision acquisition unit equipped with a polarized light source and a high-resolution camera. Based on the raw images, the system constructed a network flow graph, used the maximum flow minimum cut algorithm to separate specular and diffuse reflection components, and filled the specular region with the diffuse reflection component, outputting a reflection-suppressed image that completely eliminated specular interference. Simultaneously, multi-scale Shearlet transform and polarization feature matching were used, combined with the mutual information minimization criterion to remove feature redundancy, constructing a multimodal decoupled composite feature map. Company S's technical team performed feature calibration for common defects in medical catheters, such as micro-pinholes and foreign bodies, as a benchmark for model training.
[0100] Enterprise S extracted the basis vector matrix and coefficient matrix representing the defect texture base through nonnegative matrix factorization. The coefficient matrix was then subjected to sparsity constraints and highlight enhancement to reconstruct and output an enhanced defect feature map. Next, the illumination-invariant feature sequence and the enhanced defect feature map were fused and input into the improved EdgeViT model. By constructing a perspective-invariant geometric descriptor and using orientation-anisotropic sparse sampling to capture global dependencies, and utilizing formal background grid algebra operations to reorganize local-global features, accurate aggregation of complex surface defect features was achieved. In the core recognition and optimization stage, the system calculated the distribution state entropy based on the output defect mask and semantic category, constructed a constrained objective function, and solved for the optimal solution for the light source and analyzer by maximizing information entropy. Iterative adjustments were made in reverse until the convergence condition was met, achieving adaptive optimal control of the imaging environment.
[0101] During implementation, the technical team at Company S discovered that, compared to traditional machine vision inspection methods, the method of this invention significantly improves the accuracy and robustness of silicone surface defect detection. Traditional methods are limited by reflective interference and single features, resulting in poor identification of low-contrast scratches and curved surface deformation defects. In contrast, the method of this invention effectively achieves accurate identification of minute defects and adaptive optimization of imaging parameters through graph theory reflection suppression, perspective-invariant feature reconstruction, and active imaging closure.
[0102] To further verify the actual performance of the method of the present invention, Company S conducted a detailed comparative test between the method of the present invention and the traditional method. The specific performance data is shown in Table 1:
[0103] Table 1. Performance Comparison of Surface Defect Detection Methods for S Silicone Products from Various Enterprises
[0104] index Traditional methods Method of the present invention Increase Defect identification accuracy (%) 83.5 98.2 +14.7% Minor defect detection rate (%) 76.8 97.5 +20.7% False alarm rate for reflectivity (%) 15.2 0.5 -96.7% Number of missed defects on curved surfaces (per thousand pieces) 12.5 0.8 -93.6% Processing time for a single frame of data (milliseconds) 210 85 -59.5% Quality inspection labor costs (ten thousand yuan / year) 120 45 -62.5% First-pass yield (%) 92.0 99.5 +7.5% Customer quality complaint rate (%) 4.8 0.2 -95.8%
[0105] As shown in Table 1, the performance of the silicone product surface defect detection system was comprehensively improved after applying the method of this invention. The defect identification accuracy increased from 83.5% with traditional methods to 98.2%, and the detection rate of minute defects increased from 76.8% to 97.5%, significantly improving the accuracy of quality inspection. The false alarm rate due to reflectivity decreased dramatically from 15.2% to 0.5%, effectively solving the problem of false alarms caused by high reflectivity. The processing time for a single frame of data was shortened from 210 milliseconds to 85 milliseconds, meeting the real-time requirements of high-speed production lines. Furthermore, the production line first-pass yield increased from 92.0% to 99.5%, and the labor cost for quality inspection decreased from 1.2 million yuan / year to 450,000 yuan / year, significantly reducing production costs. The customer quality complaint rate decreased from 4.8% to 0.2%, greatly improving the product's market reputation.
[0106] Through the method of this invention, Company S has successfully achieved high-precision detection and intelligent quality control of surface defects in silicone products. It effectively solves the identification problems caused by high reflectivity and curved surface distortion, ensures the quality and safety of medical silicone products, significantly improves the automation and intelligence level of the production line, significantly reduces the burden of manual quality inspection, enhances the environmental adaptability and robustness of the detection system, and provides strong technical support for the quality control of high-end silicone products.
[0107] The above description is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any equivalent substitutions or modifications made by those skilled in the art within the scope of the technology disclosed in the present invention, based on the technical solution and inventive concept of the present invention, should be covered within the scope of protection of the present invention.
Claims
1. A method for detecting surface defects in silicone products based on AI vision, characterized in that, Includes the following steps: S1. Acquire the original silicone surface image, construct a network flow graph and use the maximum flow minimum cut algorithm to separate the specular reflection and diffuse reflection components, fill the specular area with the diffuse reflection component, and output the reflection suppression image. S2. Perform a two-dimensional discrete Fourier transform on the reflection suppression image, calculate the phase consistency value of each frequency component in the frequency domain, extract the illumination-invariant phase features, and reconstruct the output illumination-invariant feature sequence. S3. Perform multi-scale Shearlet transform on the reflection suppression image, construct a sparse representation matrix using anisotropic shear wave coefficients, and generate a Shearlet feature pyramid. S4. Calculate the polarization features of the reflection suppression image and match them with the Shearlet feature pyramid. Use the mutual information minimization criterion to remove feature redundancy and construct a multimodal decoupled composite feature map. S5. Perform non-negative matrix decomposition on the multimodal decoupled composite feature map, extract the basis vector matrix and coefficient matrix, reconstruct the features based on the coefficient matrix, and output the defect enhancement feature map; S6. After fusing the illumination-invariant feature sequence with the defect enhancement feature map, input the improved EdgeViT model. By constructing perspective-invariant geometric descriptors and orientation-anisotropic sparse sampling, global dependencies are captured. Local-global features are reorganized using formal background grid algebra operations. After nonlinear enhancement of the main semantic component and residual fusion, context dependencies are aggregated. After restoring spatial resolution, the defect mask and semantic category are output. S7. Calculate the distribution state entropy based on the defect mask and semantic category, construct the constraint objective function, and solve for the optimal solution of the light source and analyzer by maximizing the information entropy. Iterate in reverse until the convergence condition is met.
2. The method for detecting surface defects of silicone products based on AI vision according to claim 1, characterized in that, S1 specifically includes: S11. Construct a network flow graph based on the original silicone surface image, map image pixels to network nodes, calculate the gray-level difference between adjacent pixels as a smoothing penalty term, and calculate edge weights by combining spatial adjacency relationships to generate a weighted undirected graph. S12. In the weighted undirected graph, mark the specular reflection pixel as the source and the background pixel as the sink. Use the breadth-first search strategy to iteratively find the augmenting path from the source to the sink. Calculate the minimum residual capacity on the augmenting path and update the edge weights. When there is no augmenting path in the graph, define the set of edges between the set of reachable nodes and the set of unreachable nodes of the source as the minimum cut set. Divide the image pixels into specular reflection components and diffuse reflection components. S13. Extract the texture direction information of the diffuse reflection component, calculate the gradient magnitude and color mean of the pixels around the specular reflection area, search for the image block with the smallest pixel value difference in the diffuse reflection component along the gradient direction, copy the searched image block into the interior of the specular reflection area, calculate the weighted average value of the pixel colors on both sides of the filling boundary to correct the pixel value, and output the reflection suppression image.
3. The method for detecting surface defects of silicone products based on AI vision according to claim 1, characterized in that, S2 specifically includes: S21. Perform a two-dimensional discrete Fourier transform on the reflection suppression image to convert the image from the spatial domain to the frequency domain, and calculate the local phase information and amplitude information of each frequency component in the frequency domain. S22. Calculate the local energy by performing a sine and cosine weighted summation on the local phase information, calculate the frequency weighted sum by accumulating the amplitude information, calculate the ratio of the local energy to the frequency weighted sum, use the ratio as the phase consistency measurement result, eliminate the influence of light intensity changes, and output the light-invariant phase characteristics. S23. Perform an inverse Fourier transform on the illumination-invariant phase features to reconstruct the spatial feature map, flatten the spatial feature map into a one-dimensional vector in row scanning order, and output the illumination-invariant feature sequence.
4. The method for detecting surface defects of silicone products based on AI vision according to claim 1, characterized in that, S3 specifically includes: S31. Perform Laplacian pyramid multi-scale decomposition on the reflection suppression image to obtain multi-resolution sub-band images, perform directional shearing operation on the multi-resolution sub-band images, and calculate the anisotropic shear wave coefficients of each directional sub-band. S32. Calculate the mean and standard deviation of the anisotropic shear wave coefficients, add the mean to the preset multiple of the standard deviation to obtain the statistical threshold, set the coefficients with absolute values less than the statistical threshold to zero, retain the coefficients with absolute values greater than or equal to the statistical threshold, perform non-negative sparse coding on the retained coefficients, and construct a sparse representation matrix. S33. The sparse representation matrix is reorganized and feature mapped according to the scale level to generate the Shearlet feature pyramid.
5. The method for detecting surface defects of silicone products based on AI vision according to claim 1, characterized in that, S4 specifically includes: S41. Calculate the Stokes parameter vector based on the reflection suppression image, calculate the degree of polarization and polarization angle of the light wave according to the Stokes parameter vector, and generate a polarization feature map by mapping. S42. Align the polarization feature map with the Shearlet feature pyramid in terms of spatial scale and channel matching to construct a multimodal feature pair; S43. Calculate the joint probability distribution and marginal probability distribution of the multimodal feature pairs, calculate the mutual information value based on the joint probability distribution and the marginal probability distribution, minimize the mutual information value using the gradient descent method to remove feature redundancy, and construct a multimodal decoupled composite feature map.
6. The method for detecting surface defects of silicone products based on AI vision according to claim 1, characterized in that, S5 specifically includes: S51. Expand the multimodal decoupled composite feature map into a feature matrix, randomly initialize the basis vector matrix and coefficient matrix, calculate the Euclidean distance between the feature matrix and the reconstruction matrix as the objective function, calculate the partial derivatives of the objective function with respect to the basis vector matrix and the coefficient matrix respectively, subtract the partial derivatives multiplied by the preset step size from the current basis vector matrix and coefficient matrix to obtain the updated matrix, force the values of elements less than zero in the updated matrix to zero, and output the basis vector matrix representing the defect texture basis and the coefficient matrix representing the activation weight when the objective function converges. S52. Perform sparsity constraint and highlight enhancement processing on the coefficient matrix, retain the defect feature response and suppress the background noise response to generate an optimized coefficient matrix; S53. Perform matrix multiplication on the basis vector matrix and the optimization coefficient matrix to reconstruct the feature space and output the defect enhancement feature map.
7. The method for detecting surface defects of silicone products based on AI vision according to claim 1, characterized in that, The improved EdgeViT model includes a feature embedding layer, a sparse sampling attention layer, a local-global feature reorganization layer, a context-dependent aggregation layer, and a defect prediction head. The feature embedding layer is used to map the illumination-invariant feature sequence back to the spatial dimension and concatenate it with the defect enhancement feature map, extract local feature point sets and construct point cluster topology, calculate the cross ratio sequence of point clusters in the projective transformation space to generate perspective-invariant geometric descriptors, and output sequence tensors by concatenating them with the original feature tensors. The sparse sampling attention layer is used to receive the sequence tensor to generate query, key, and value tensors, use dual-tree complex wavelet transform to analyze the query tensor to construct a directional energy spectrum to generate an anisotropic sampling mask, perform directional consistency sparse extraction on the key and value tensors, and calculate scaled dot product attention to output a sparse global attention tensor. The local-global feature reconstructing layer is used to extract local texture features of the sequence tensor to obtain a local detail tensor, map the local detail tensor and the sparse global attention tensor to the formal background space, establish a partial order relationship of feature attributes, extract common and merge difference features, and output the reconstructed context tensor through grid algebra operations. The context-dependent aggregation layer is used to receive the recombined context tensor, perform eigenvalue decomposition, parse out the main semantic component matrix and the detail component matrix, perform nonlinear enhancement only on the main semantic component matrix, recombines it with the detail component matrix in the spectral domain, and fuses it with the sequence tensor residual to output the dependency-enhanced feature tensor. The defect prediction head is used to receive the dependency enhancement feature tensor, restore the spatial resolution using the reconstruction upsampling operator based on operator spectral decomposition, and process it through two independent convolutional branches to output the defect mask tensor and semantic category probability vector.
8. The method for detecting surface defects of silicone products based on AI vision according to claim 1, characterized in that, Specifically, S7 includes: S71. Calculate the set of defect pixel coordinates based on the defect mask, statistically analyze the spatial distribution histogram of pixel coordinates and calculate the spatial position entropy, and simultaneously statistically analyze the frequency probability of each category based on the semantic category and calculate the category distribution entropy. Then, perform a weighted summation of the spatial position entropy and the category distribution entropy and output the distribution state entropy. S72. Construct an objective function for light intensity and polarization angle with the distribution state entropy as the dependent variable, set the threshold range of light intensity and the range of polarization angle as constraints, perform global search and iterative optimization in the parameter space defined by the constraints, update the parameters until the distribution state entropy reaches a maximum value, and output the optimal configuration parameters of the light source and the analyzer. S73. Adjust the working state of the light source and analyzer in reverse iteration according to the optimal configuration parameters, obtain the updated image data and calculate the distribution state entropy of the current frame in real time. When the rate of change of the distribution state entropy of the current frame is less than a preset threshold, it is determined that the convergence condition is met.