Method and device for detecting abnormal packaging box based on semantic constraint and geometric prior

By using a method based on semantic constraints and geometric priors, and leveraging 3D point cloud and image features for packaging box anomaly detection, this method solves the problems of low detection efficiency and insufficient accuracy in existing technologies, and achieves high-precision packaging box defect detection.

CN122089714BActive Publication Date: 2026-06-26XIAMEN UNIV OF TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
XIAMEN UNIV OF TECH
Filing Date
2026-04-21
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing methods for detecting defects in packaging boxes suffer from problems such as low detection efficiency, high subjectivity, insufficient understanding of abnormal samples, difficulty in multimodal information fusion, and insufficient defect localization accuracy. In particular, they are unable to meet the high-precision industrial requirements in packaging box inspection.

Method used

A method based on semantic constraints and geometric priors is adopted. By acquiring the 3D point cloud data and image features of the packaging box, the geometric reliability coefficient is calculated using a geometric coding network. An anomaly score map is generated by combining a local scale prediction network, and threshold segmentation and connected component analysis are performed to achieve high-precision defect detection.

Benefits of technology

It enables high-precision and high-reliability detection and location of packaging box anomalies without relying on manually labeled data, thereby improving the detection efficiency and quality of industrial production.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122089714B_ABST
    Figure CN122089714B_ABST
Patent Text Reader

Abstract

The application provides a packaging box anomaly detection method and device based on semantic constraints and geometric priors, and relates to the technical field of defect detection. The method extracts local geometric description information from three-dimensional point cloud data, inputs the information into a geometric perception measurement module to generate a geometric reliability coefficient, and uses the coefficient to adaptively modulate a local scale parameter predicted based on a direction vector to construct a multi-modal anomaly measurement constrained by a geometric prior. The method inputs a sample to be detected into a trained anomaly detection model, calculates local anomaly measurement values of each region through the multi-modal anomaly measurement, generates an anomaly score map reflecting the degree of anomaly, and performs threshold segmentation and connected component analysis to realize detection and positioning of packaging box anomalies. The method can realize high-precision packaging box anomaly detection without relying on a large amount of manually labeled data, and significantly improves the reliability and efficiency of packaging box quality detection.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of defect detection technology, specifically to a method and apparatus for detecting packaging box anomalies based on semantic constraints and geometric priors. Background Technology

[0002] In modern product packaging production lines, quality control of packaging boxes is a crucial link in ensuring product appearance quality and market competitiveness. During production, transportation, and storage, packaging boxes are prone to various defects such as stains, scratches, dents, printing defects, and color deviations. These defects not only affect the product's visual presentation but may also reduce consumer trust in the brand. Therefore, developing efficient and accurate methods for detecting packaging box defects is of significant practical importance.

[0003] Traditional methods for detecting defects in packaging boxes primarily rely on manual visual inspection, which suffers from low efficiency, high subjectivity, and high labor intensity, making it difficult to meet the real-time inspection needs of large-scale production lines. With the development of machine vision technology, automated inspection methods based on image processing are gradually being introduced into industrial production. However, existing machine learning-based defect detection methods typically require a large number of labeled defect samples for supervised training. In real-world industrial scenarios, acquiring defect samples is costly, and the types and forms of defects are diverse, making it difficult to comprehensively cover all possible anomalies, thus limiting the model's generalization ability and detection accuracy.

[0004] In recent years, unsupervised anomaly detection methods have attracted widespread attention due to their training requirement only on normal samples. These methods learn the distribution characteristics of normal samples and classify samples deviating from normal patterns as anomalies, thus alleviating reliance on defective samples to some extent. However, existing unsupervised anomaly detection methods still face many challenges when applied to packaging box inspection. First, the scarcity of anomaly samples leads to insufficient model recognition of anomaly patterns, limiting detection accuracy. Second, packaging box inspection typically involves multimodal data such as 2D images and 3D point clouds; effectively fusing multimodal information to improve robustness and accuracy remains a key challenge. Furthermore, packaging boxes have regular geometric structures, and existing methods fail to fully utilize their geometric prior knowledge to guide the anomaly detection process, resulting in insufficient ability to identify geometric anomalies such as dents and protrusions. Finally, existing methods still have limitations in the precise localization of anomaly regions, making it difficult to meet the high-precision defect location identification requirements of industrial production.

[0005] In view of the above, this application is hereby submitted. Summary of the Invention

[0006] This invention provides a method and apparatus for detecting packaging box anomalies based on semantic constraints and geometric priors, which can at least partially improve the above-mentioned problems.

[0007] To achieve the above objectives, the present invention adopts the following technical solution:

[0008] A method for detecting packaging box anomalies based on semantic constraints and geometric priors, comprising:

[0009] The three-dimensional point cloud data of the packaging box sample to be inspected is obtained, and the local geometric description vector extracted from the three-dimensional point cloud data is input into the geometric coding network to obtain the geometric reliability coefficient.

[0010] Image features and point cloud features of the packaging box sample to be detected are extracted. Based on the feature memory, the direction vectors between the image features, point cloud features and their corresponding normal prototypes are calculated respectively. The direction vectors are then input into a pre-trained local scale prediction network to obtain scale parameter data.

[0011] Local anomaly metrics are calculated based on scale parameter data, an anomaly score map is generated, and threshold segmentation and connected component analysis are performed on the anomaly score map to generate anomaly detection results.

[0012] The present invention also provides a packaging box anomaly detection device based on semantic constraints and geometric priors, which includes:

[0013] The coefficient calculation unit is used to acquire the three-dimensional point cloud data of the packaging box sample to be tested, and input the local geometric description vector extracted from the three-dimensional point cloud data into the geometric coding network to obtain the geometric reliability coefficient.

[0014] The prediction unit is used to extract image features and point cloud features of the packaging box sample to be detected. Based on the feature memory, it calculates the direction vector between the image features, point cloud features and their corresponding normal prototypes, and inputs the direction vector into the pre-trained local scale prediction network to obtain scale parameter data.

[0015] The detection unit is used to calculate local anomaly metrics based on scale parameter data, generate anomaly score maps, and perform threshold segmentation and connected component analysis on the anomaly score maps to generate anomaly detection results.

[0016] In summary, this invention provides a packaging box anomaly detection method based on semantic constraints and geometric priors. It aims to address technical problems in existing technologies such as scarce anomaly samples, difficulties in multimodal information fusion, insufficient utilization of geometric structures, and limited defect localization accuracy, achieving high-precision and high-reliability packaging box anomaly detection and localization. Specifically, local geometric description information is extracted from the 3D point cloud of the packaging box sample to be detected, generating a geometric reliability coefficient to adaptively modulate local scale parameters, and constructing a multimodal anomaly metric constrained by geometric priors. The sample to be detected is input into the model, and the local anomaly metric value is calculated to generate an anomaly score map. Anomaly detection and localization are then achieved through threshold segmentation and connected component analysis.

[0017] Compared with existing technologies, this method offers the following advantages: It guides defect simulation through a large language model and constructs a multimodal detection model based on geometric priors. Using this model, it detects newly acquired images and point cloud data in real time, enabling high-precision defect and anomaly identification of packaging boxes. Based on this, it achieves high-precision anomaly detection in industrial packaging without relying on manual data annotation and manual defect detection. This plays a crucial role in automated production and quality inspection, providing a reliable intelligent detection method for improving industrial product quality and production efficiency. It also reduces data preparation costs and facilitates rapid deployment in real-world production environments. Attached Figure Description

[0018] Figure 1 This is a flowchart illustrating the packaging box anomaly detection method based on semantic constraints and geometric priors provided in the first embodiment of the present invention.

[0019] Figure 2 This is an overall flowchart of the packaging box anomaly detection method based on semantic constraints and geometric priors provided in the embodiments of the present invention.

[0020] Figure 3 This is a flowchart of the packaging box anomaly detection method based on semantic constraints and geometric priors provided in the embodiments of the present invention.

[0021] Figure 4 This is a schematic diagram of a packaging box anomaly detection device based on semantic constraints and geometric priors provided in the second embodiment of the present invention. Detailed Implementation

[0022] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the invention.

[0023] refer to Figures 1 to 3As shown, the first embodiment of the present invention discloses a packaging box anomaly detection method based on semantic constraints and geometric priors, which can be executed by a packaging box anomaly detection device based on semantic constraints and geometric priors (hereinafter referred to as the detection device), specifically, by one or more processors within the detection device, to implement the following method:

[0024] S1. Obtain the 3D point cloud data of the packaging box sample to be inspected, and input the local geometric description vector extracted from the 3D point cloud data into the geometric coding network to obtain the geometric reliability coefficient.

[0025] Specifically, step S1 further includes: acquiring the three-dimensional point cloud data of the packaging box sample to be inspected, and processing the i-th three-dimensional point in the three-dimensional point cloud data. The process involves traversing the data, where a 3D point is centered and its K nearest neighbors in space are searched to form a local neighborhood set. ;

[0026] For local neighborhood point set Perform least-squares plane fitting to calculate the local neighborhood point set. The average Euclidean distance from all nearest neighbors within the region to the fitted plane is used as the plane fitting residual for that region. , For the j-th 3D point, As the center point of the neighborhood, For transpose, Let be the i-th unit normal vector of the fitted plane; where the plane fitting residual is used to measure the smoothness and noise level of the local surface.

[0027] Calculate the local neighborhood point set The unit normal vectors at each point within the matrix are used, and the deviation or standard deviation of the direction of these unit normal vectors is taken as the consistency of the normal vectors. , Let be the j-th unit normal vector of the fitted plane; where the consistency of the normal vector is used to characterize whether there are edges, corners or geometric abrupt changes in the local region;

[0028] Plane fitting residuals Consistency with normal vector Combined into a local geometric description vector and the local geometry description vector The input is fed into a geometric coding network, and the corresponding geometric reliability coefficients are obtained through activation function mapping. , For activation function, The network is a geometric coding network, which comprises two layers of MLP (Multi-Layer Perceptron). A lower geometric reliability coefficient indicates greater geometric instability in the region (e.g., noise points, edges, or sharp features), and thus lower reliability.

[0029] In this embodiment, the geometric reliability coefficient is calculated by inputting the local geometric description vector into a lightweight geometric coding network, which includes two layers of MLP and outputs intermediate features; then, the geometric reliability coefficient is calculated through a modulation function; the lower the geometric reliability coefficient value, the more unstable the geometry of the region and the worse the reliability.

[0030] Specifically, the process begins by acquiring the 3D point cloud data of the packaging box sample to be inspected. The local geometric description vector extracted from this data is then input into a geometric coding network to map and obtain a geometric reliability coefficient. The purpose of this step is to quantify and analyze the local geometric structure of the packaging box surface, providing geometric prior constraints for subsequent anomaly detection. Specifically, after acquiring the 3D point cloud data of the packaging box sample, the process iterates through the i-th 3D point in the data. For each 3D point, its K nearest neighbors in space are searched, forming a local neighborhood point set. By constructing this local neighborhood point set, the local geometric structure information around the point can be effectively captured, laying the foundation for subsequent geometric feature extraction. Next, the local neighborhood point set is subjected to least squares plane fitting processing. The average Euclidean distance from all nearest neighbor points in the local neighborhood point set to the fitting plane is calculated, and this distance is used as the plane fitting residual for the region. The plane fitting residual can effectively measure the flatness and noise level of the local surface. When there are geometric anomalies such as pits, bumps or damage on the surface of the packaging box, the residual value will increase significantly.

[0031] Simultaneously, the unit normal vectors of each point within the local neighborhood point set are calculated, and the deviation or standard deviation of these unit normal vector directions is used as the normal vector consistency. Normal vector consistency effectively characterizes whether there are edges, corners, or geometric abrupt changes in the local region. For a flat packaging box surface, the normal vector directions of each point are basically consistent, and the normal vector consistency value is relatively small; while for regions with edges or geometric distortions, the normal vector directions differ significantly, and the normal vector consistency value increases accordingly. Subsequently, the plane fitting residuals obtained above are combined with the normal vector consistency to form a local geometric description vector, which can comprehensively reflect the geometric characteristics of the local region. This local geometric description vector is input into a geometric encoding network, and the corresponding geometric reliability coefficient is obtained through activation function mapping.

[0032] In this embodiment, the geometric coding network comprises two layers of multilayer perceptrons (MLPs). Through mapping using this geometric coding network, the original geometric features can be transformed into geometric reliability coefficients with clear physical meaning. The geometric reliability coefficient ranges from 0 to 1; a lower value indicates a more unstable geometric structure in the region, such as the presence of noise points, edges, or sharp features, resulting in lower reliability; conversely, a higher value indicates a smoother and more stable geometric structure in the region, resulting in higher reliability. By introducing the geometric reliability coefficient, this application can adaptively modulate point cloud features, reducing the weight of abnormal responses in geometrically unstable regions, thereby effectively suppressing false detections caused by noise or edges, and significantly improving the accuracy and robustness of anomaly detection. This geometric prior constraint mechanism fully utilizes the regular geometric structure of the packaging box itself, making the detection process more consistent with the physical laws of actual industrial scenarios.

[0033] S2, extract the image features and point cloud features of the packaging box sample to be detected, and calculate the direction vector between the image features, point cloud features and their corresponding normal prototypes based on the feature memory. Then, input the direction vector into the pre-trained local scale prediction network to obtain the scale parameter data. The local scale prediction network is a three-layer fully connected network.

[0034] Specifically, step S2 further includes: extracting image features from the packaging box sample to be detected using an image encoder and a point cloud encoder, respectively. and point cloud features ;

[0035] From image feature memory Retrieval and Image Features The closest prototype feature in European style And calculate image features The direction vector between the normal prototype and the prototype feature. , , Image features With prototype features The Euclidean distance between the corresponding normal prototypes, It is an L2 norm;

[0036] From point cloud feature memory Searching and point cloud features The closest prototype feature in European style And calculate point cloud features The direction vector between the normal prototype and the prototype feature. , , Point cloud features With prototype features The Euclidean distance between the corresponding normal prototypes;

[0037] Direction vector and direction vector Inputting the data into a local scale prediction network yields the basic image scale parameters. and basic point cloud scale parameters The calculation formula is as follows: , It is a three-layer, multi-layer sensor. It is a two-layer, multi-layer sensor;

[0038] Based on geometric reliability coefficient For basic point cloud scale parameters Constraints are applied to suppress anomalous responses in unreliable regions, thus obtaining point cloud scale parameters. And based on the basic image scale parameters (i.e., the image branch scale parameters remain unchanged) and the point cloud scale parameters, scale parameter data is generated.

[0039] In this embodiment, an image encoder and a point cloud encoder are used to extract features from the packaging box sample to be detected, obtaining image features and point cloud features respectively. In this embodiment, the image encoder uses DINOV3 as its backbone network. This network is pre-trained on a large amount of image data through self-supervised learning and can extract visual features with strong discriminative capabilities. The point cloud encoder uses PointMAE as its backbone network and learns the semantic representation of the point cloud through a mask reconstruction task. The features extracted by the above encoders can comprehensively characterize the visual and geometric information of the packaging box surface, laying a solid foundation for subsequent anomaly detection.

[0040] Next, the prototype feature with the closest Euclidean distance to the image feature is retrieved from a pre-built image feature memory, and the direction vector between the image feature and the corresponding normal prototype is calculated. This direction vector effectively characterizes the deviation direction and degree of the image feature to be detected relative to the normal prototype feature, providing key input for subsequent scale parameter prediction. Similarly, the prototype feature with the closest Euclidean distance to the point cloud feature is retrieved from a pre-built point cloud feature memory, and the direction vector between the point cloud feature and the corresponding normal prototype is calculated. By calculating the direction vectors of the image modality and the point cloud modality respectively, this application can comprehensively characterize the differences between the sample to be detected and the normal sample from both visual and geometric dimensions, providing a rich information foundation for anomaly detection through multimodal fusion.

[0041] Subsequently, the calculated image orientation vector and point cloud orientation vector are input into a pre-trained local scale prediction network to obtain basic image scale parameters and basic point cloud scale parameters. This local scale prediction network adopts a three-layer fully connected network structure, which can adaptively predict the scale parameters of each modality based on the input orientation vector. Through the learning of this network, the originally isotropic Euclidean distance can be transformed into an anisotropic local distance metric, making the anomaly metric more consistent with the feature distribution characteristics of different regions and modalities, and significantly improving the accuracy of anomaly detection.

[0042] The design of the local scale prediction network fully considers the correlation between multimodal features. In this embodiment, the network architecture includes two parallel branches, which process prototype features and direction vectors respectively: the first branch processes prototype features, with the input being the concatenated modality prototype. The dimension is 1920. After dimensionality reduction using Linear→ReLU→Dropout, and then another Linear layer, a 128-dimensional feature is obtained. Branch 2 processes the direction vector, with the concatenated direction vector as input. The dimension is 1920, and it is also transformed into 128-dimensional features through two layers.

[0043] Fusion and prediction: The outputs of the two branches are concatenated to form a 256-dimensional feature, which is then passed through a Linear→ReLU→Dropout layer. Finally, a linear layer outputs two values, corresponding to the scale parameters of the two modalities. To control the output of the scale parameters, the last layer of the local scale prediction network uses a special activation function. Its calculation formula is as follows: , where x is the output of the last linear layer of the network. This design ensures the scale parameter of the output. In the initial state At this point, the metric degenerates into Euclidean distance; as the network trains, it learns to predict w based on the geometry of the input, thus transforming the isotropic Euclidean distance into an anisotropic local distance metric.

[0044] Furthermore, after obtaining the basic image scale parameters and basic point cloud scale parameters, this method further constrains the basic point cloud scale parameters based on the geometric reliability coefficient, that is, it combines the geometric reliability coefficient generated by geometric perception with adaptive modulation to obtain the point cloud scale parameters.

[0045] S3 calculates local anomaly metrics based on scale parameter data, generates anomaly score maps, and performs threshold segmentation and connected component analysis on the anomaly score maps to generate anomaly detection results.

[0046] Specifically, step S3 further includes: combining the basic image scale parameters With modulated image scaling factor Calculate the local anomaly measure between image features and their corresponding normal prototypes. ;

[0047] Combined with point cloud scale parameters With modulated point cloud scaling factor Calculate the local anomaly metric between point cloud features and their corresponding normal prototypes. and local outlier metrics and local anomaly measurement Weighted summation is performed to obtain the final local anomaly metric. ;

[0048] Based on the final local anomaly metric, calculate the metric set of the current features of the packaging box sample to be detected in the k+1 nearest neighbor local spaces. Then, the minimum value operator is used for aggregation to obtain the final anomaly score corresponding to the current feature of the packaging box sample to be detected. ;

[0049] Map the local anomaly metrics corresponding to all features to the image space or 3D point cloud coordinate system of the original packaging box to generate an anomaly score map that reflects the degree of anomaly in each region of the packaging box.

[0050] The maximum value or weighted average of all local anomaly measures in the anomaly score graph is selected as the final anomaly score S of the packaging box sample to be detected, and the final anomaly score S is compared with the preset safety threshold T.

[0051] When the final anomaly score S is greater than the safety threshold T, the packaging box sample to be tested is determined to have a defect.

[0052] An adaptive thresholding algorithm is used to perform binarization segmentation on the anomaly score map, stripping away the normal background to obtain a preliminary binarized anomaly region map;

[0053] Morphological operations are used to clean up the binarized outlier map, making the boundaries of the outlier map smoother and more continuous, thus eliminating small noise points.

[0054] The cleaned binary anomaly region map is analyzed by connecting components, and spatially adjacent anomaly pixels are aggregated into independent connected blocks. A corresponding bounding box is generated for each extracted anomaly region. Combined with the confidence score of the region, anomaly detection results with bounding boxes and confidence scores are generated, realizing pixel-level anomaly localization. The confidence score is generated from the local anomaly metric.

[0055] Preferably, the local anomaly metrics corresponding to all features are mapped to the image space or three-dimensional point cloud coordinate system of the original packaging box. Specifically, for the image modality, the patch-level anomaly scores are interpolated to the original image resolution through upsampling; for the point cloud modality, the point-by-point anomaly scores are directly mapped back to the three-dimensional space.

[0056] In this embodiment, a local anomaly metric is first obtained by combining point cloud scale parameters, basic image scale parameters, and the Euclidean distance between the features of the sample to be detected and its nearest prototype in the feature memory. Since image feature extraction is based on patches, the directly output local anomaly metric has a much lower spatial resolution than the original image. Therefore, bilinear interpolation and other upsampling algorithms are used to amplify the patch-level anomaly scores and align them to the resolution level of the original image. Simultaneously, to eliminate blocky artifacts caused by interpolation, a Gaussian smoothing kernel is typically superimposed to filter the anomaly scores, resulting in a visually smooth heatmap effect that accurately reflects the anomaly probability of each pixel.

[0057] Since the point cloud feature encoder extracts feature vectors point-by-point, the system can directly map the local anomaly metrics corresponding to each point back to their exact physical location in the original 3D coordinate system. This generates a 3D point cloud map with an "anomaly weight" attribute, visually displaying the geometric anomaly distribution (such as pits or abnormal protrusions) on the surface of the packaging box. Based on the high-resolution anomaly score map that integrates multimodal information, the maximum value of all local anomaly metrics in the map is extracted. This maximum value represents the degree of anomaly in the "most suspicious" area on the packaging box, and it is used as the final anomaly score S for the sample to be detected. The final anomaly score S is then compared with a pre-set safety threshold T to detect anomalies.

[0058] Upon detecting an anomaly, an adaptive thresholding algorithm is used to segment the continuous anomaly score map, stripping away the normal background to obtain a preliminary binary anomaly region map. Morphological operations are then applied to clean the binary map, making the boundaries of the anomaly regions smoother and more continuous. Connectivity component analysis is performed on the cleaned binary map, aggregating spatially adjacent anomaly pixels / points into independent connected blocks. Finally, a corresponding bounding box is generated for each extracted anomaly region, and a confidence score (based on the anomaly metric within the region) is attached to achieve anomaly localization.

[0059] Preferably, in this embodiment, the construction steps of the feature memory are as follows: obtain normal packaging box samples, input the normal packaging box samples into the image encoder and the point cloud encoder, perform multi-scale feature extraction processing, and extract the image features and point cloud features of the normal packaging box samples. The backbone network of the image encoder adopts DINOV3, and the backbone network of the point cloud encoder adopts PointMAE.

[0060] The extracted image features and point cloud features of normal packaging box samples are stored in the feature memory, generating the image feature memory and point cloud feature memory respectively. The most representative normal sample features are then selected through a core set sampling algorithm to form a prototype pattern representing the distribution of normal patterns of packaging boxes.

[0061] In this embodiment, some preparatory work is required before executing this method, such as: pre-constructing a feature memory library based on normal packaging box samples, which includes an image feature memory library and a point cloud feature memory library.

[0062] Specifically, normal samples are input into the encoder for multi-scale feature extraction. The image encoder uses DINOV3 as its backbone network, while the point cloud encoder uses PointMAE. DINOV3 is pre-trained on a large amount of image data through self-supervised learning, enabling it to extract rich visual features. PointMAE learns the semantic representation of the point cloud through a mask reconstruction task. For the input image, DINOV3 outputs multi-scale feature maps; for the input point cloud, PointMAE outputs point-by-point feature vectors.

[0063] For point clouds ( ) and RGB images ( These two modes The extracted image features and point cloud features are stored in feature memories, respectively. To avoid an excessively large feature memory and redundant features, a core set sampling algorithm is used to select and retain the most representative normal sample features, thus constructing the feature memory. Core set sampling iteratively selects a subset of samples that maximizes coverage of the original feature space using a greedy strategy, forming a compact and comprehensive representation of normal patterns. During inference or training, for samples at location... Extracted query features The model will go to the corresponding memory store. The set of 2k+1 nearest feature prototypes is retrieved from the data. .

[0064] Preferably, in this embodiment, the training steps of the local scale prediction network are as follows: using a large language model to process according to preset prompt words to generate parameters for controlling noise, including: frequency, anisotropy, frequency harmonics, binarization threshold, and morphological kernel;

[0065] An initial Perlin noise matrix is ​​generated in a two-dimensional coordinate space based on frequency and anisotropy. By scaling the distribution along the x and y axes by different proportions, the noise cells are stretched in specific directions to simulate anomalies with directional characteristics.

[0066] By using frequency doubling, four noise layers with different frequencies and decreasing amplitudes are weighted and superimposed to simulate microscopic irregular edge anomalies; a binarization threshold is used to binarize and truncate the continuous noise field; a morphological kernel is used to perform a closing operation on the binarization mask to fill the internal holes, resulting in the final binarized anomaly mask.

[0067] In the processing of image branches, a texture region is randomly cropped from the image of other normal packaging box samples. Using region replacement technology, the cropped texture is implanted into the specified position of the target normal sample according to the final binarized anomaly mask. Gradient domain boundary fusion technology is used to process the edge of the replacement region so that the implanted anomaly region transitions naturally with the background in terms of brightness, color and texture gradient, generating a visually consistent first synthetic anomaly sample.

[0068] In the processing of point cloud branches, a point set is selected in the point cloud space according to the final binarized anomaly mask, and Z-axis offset or local geometric deformation is applied to maintain spatial consistency with the image modality and generate a second synthetic anomaly sample.

[0069] By combining the first and second synthetic anomalous samples, an anomalous sample set is obtained.

[0070] Construct a total loss function based on the abnormal sample set. The local-scale prediction network is trained using a joint total loss function. To separate the loss function, For the boundary loss function, The weights of the boundary loss function, For consistency loss function, The weights of the consistency loss function, For scale-constrained loss, The weights for the scale constraint loss, For cross-modal alignment loss function, The weights are for the cross-modal alignment loss function.

[0071] In this embodiment, the Local Scale Prediction Network (LLM) needs to be trained before invoking it. Specifically, the LLM prompt template is set as follows: "Give a set of possible anomalies on the packaging box. I am using Perlin noise to simulate industrial defects. Please provide the optimal range of the following parameters for the [anomaly name] that may appear on the packaging box: scale (frequency), stretch (anisotropy), threshold (threshold), octaves (octaves). Please output in JSON format." Utilizing the open-domain knowledge of the large language model, semantic mining is performed for specific industrial categories to automatically construct a candidate anomaly library that matches the packaging box material and geometry. For example, the parameters for contamination are: {"scale":[1.0,10.0],"stretch":[0.8,1.2],"threshold":[0.6,0.95],"octaves":[2,5]}. That is, generating parameters to control noise, including frequency, anisotropy, octaves, binarization threshold, and morphological kernel.

[0072] The generation of Perlin noise then involves mesh generation, stochastic gradient calculation, and smoothing interpolation. Frequency determines the mesh density. Given a range in LLM, the model generates an initial Perlin noise matrix in two-dimensional coordinate space. A denser mesh results in smaller, more fragmented noise patches. Anisotropy determines the mesh's variation in specific directions. By scaling the distribution along the X and Y axes at different ratios, the originally circular noise cells are stretched in specific directions. For example, when simulating "scratches," the stretching parameter in a certain direction can be large. The resulting noise visually resembles smooth clouds, but real industrial defects (such as rust or damage) typically have rough edges. This necessitates the introduction of fractal Brownian motion, mathematically expressed as a weighted superposition of multiple layers of noise with different frequencies and amplitudes. The harmonics parameter determines the number of layers in this superposition. A higher harmonics introduce more high-frequency micro-details, resulting in more realistic jagged and irregular edges in the final synthesized noise. After frequency, anisotropy, and harmonics adjustments, a 2D continuous grayscale noise map is generated. To distinguish which pixels belong to the anomalous region and which belong to the background, a binarization threshold provided by the LLM is used to truncate the continuous noise field. For example, if the threshold is 0.7, pixels with values ​​greater than 0.7 in the noise map are set to 1 (representing anomalous regions), and those less than 0.7 are set to 0 (representing the background). After binarization, there may be some scattered holes or spikes within the region. The model uses morphological kernel parameters to perform a closing operation (dilation followed by erosion) to fill in the internal holes, ultimately obtaining a clean binarized anomalous mask.

[0073] After obtaining the binarized anomaly mask, in the image branch processing, the source image (other normal packaging box images in the training set) is cropped according to the mask, color / brightness enhancement is performed, and then it is pasted to the corresponding position in the target image (the normal sample image currently being processed), and visual fusion is achieved through edge feathering; in the point cloud branch processing, a point set is selected in the point cloud space according to the mask, and Z-axis offset or local geometric deformation is applied; anomaly sample sets of image and point cloud are generated.

[0074] Finally, based on the anomaly sample set of images and point clouds, a total loss function is constructed to train the local scale prediction network. The total loss function consists of a separation loss function, a boundary loss function, a consistency loss function, a scale constraint loss, and a cross-modal alignment loss function.

[0075] Specifically, the separation loss function The aim is to increase the distance between normal and abnormal samples in the feature space. The training objective is to minimize the local distance metric for normal samples and maximize the metric for abnormal samples. Synthesized abnormal samples serve as the target boundary for optimization, forcing the network to move abnormal features away from normal prototypes in the feature space. Specifically, for the label... normal sample Abnormal samples ; Indicates the first The local distance metric between a sample and its nearest prototype. This indicates the minimum distance that an anomaly sample should reach.

[0076] To further optimize the classification boundary and reduce the overlap between normal and abnormal distributions, a boundary loss function was designed. .in, Indicates the number of normal samples. Indicates the number of abnormal samples; This is the upper bound of the distance to normal samples. The lower bound of the distance to outlier samples is the extreme value that is dynamically calculated based on the data currently being used in the calculation during the model training process.

[0077] Consistency loss function This requires that the metric values ​​between a feature point and its K nearest neighbor prototypes remain continuous, and that the smoothness of the local geometric space be maintained using the nearest neighbor relationships in the feature memory. Here, K is the number of nearest neighbor prototypes retrieved. Indicates the first The sample and the first Local distance between neighboring prototypes; This is a scaling factor used to control the distance within the neighborhood from the nearest neighbor.

[0078] Scale constraint loss It refers to the scale parameters output by the local scale prediction network. Constraints are imposed to prevent parameter divergence. Synthetic anomalies are often more pronounced than real-world anomalies. The model leverages this by adjusting the parameters at this scale during training. Apply asymmetric constraints. For normal samples, compress their scale parameter; for synthesized anomalous samples, force their scale parameter to the theoretical upper limit. This teaches the network to "greatly amplify its anomalous score when it encounters similar abnormal features." Among these, This represents the modality index, where P is the point cloud modality and R is the image modality.

[0079] Cross-modal alignment loss function This ensures that the local scale prediction network learns paired cross-modal relationships. When the normal image and point cloud matching relationship is disrupted (e.g., by forcibly concatenating an image feature with an incorrect point cloud feature to form a negative sample), the model utilizes the scale parameter corresponding to the synthesized anomalous sample. This serves as the upper limit of the target. This allows the network to award high scores even when encountering mismatched cross-modal features, as if they were anomalies. This represents the scale parameter in cross-modal alignment scenarios.

[0080] In summary, this method generates synthetic anomaly samples through the semantic understanding capabilities of a large language model, constructs a multimodal feature memory, and designs a geometrically perceptual guided metric module to impose geometric prior constraints on the anomaly detection process. This method can achieve high-precision and high-reliability anomaly detection and localization in packaging boxes without relying on a large amount of manually labeled data, playing a significant role in the field of packaging box quality inspection and providing a reliable intelligent detection method for improving packaging product quality and production efficiency.

[0081] Please see Figure 4 The second embodiment of the present invention provides a packaging box anomaly detection device based on semantic constraints and geometric priors, which includes:

[0082] The coefficient calculation unit 101 is used to acquire the three-dimensional point cloud data of the packaging box sample to be tested, and input the local geometric description vector extracted from the three-dimensional point cloud data into the geometric coding network to map and obtain the geometric reliability coefficient.

[0083] Prediction unit 102 is used to extract image features and point cloud features of the packaging box sample to be detected. Based on the feature memory, it calculates the direction vector between the image features, point cloud features and their corresponding normal prototypes, and inputs the direction vector into the pre-trained local scale prediction network to obtain scale parameter data.

[0084] The detection unit 103 is used to calculate local anomaly metrics based on scale parameter data, generate anomaly score maps, and perform threshold segmentation and connected component analysis on the anomaly score maps to generate anomaly detection results.

[0085] The above description represents the preferred embodiments of the present invention. It should be noted that those skilled in the art can make various improvements and modifications without departing from the principles of the present invention, and these improvements and modifications are also considered to be within the scope of protection of the present invention.

Claims

1. A method for detecting anomalies in packaging boxes based on semantic constraints and geometric priors, characterized in that, include: The 3D point cloud data of the packaging box sample to be inspected is obtained. The local geometric description vector extracted from the 3D point cloud data is input into the geometric coding network, and the geometric reliability coefficient is obtained by mapping. Specifically: Obtain the 3D point cloud data of the packaging box sample to be inspected, and then analyze the i-th 3D point in the 3D point cloud data. The process involves traversing the data, where a 3D point is centered and its K nearest neighbors in space are searched to form a local neighborhood set. ; For local neighborhood point set Perform least-squares plane fitting to calculate the local neighborhood point set. The average Euclidean distance from all nearest neighbors within the fitted plane is used as the local neighborhood point set. Planar fitting residuals of the corresponding region , For the j-th 3D point, As the center point of the neighborhood, For transpose, Let i be the i-th unit normal vector of the fitted plane; Calculate the local neighborhood point set The unit normal vectors at each point within the matrix are used, and the deviation or standard deviation of the direction of these unit normal vectors is taken as the consistency of the normal vectors. , Let j be the unit normal vector of the fitted plane; Plane fitting residuals Consistency with normal vector Combined into a local geometric description vector and the local geometry description vector The input is fed into a geometric coding network, and the corresponding geometric reliability coefficients are obtained through activation function mapping. , For activation function, It is a geometric coding network, wherein the geometric coding network comprises two MLP layers; Image features and point cloud features of the packaging box sample to be detected are extracted. Based on the feature memory, the direction vectors between the image features, point cloud features and their corresponding normal prototypes are calculated respectively. The direction vectors are then input into a pre-trained local scale prediction network, and scale parameter data is obtained by combining geometric reliability parameters. Local anomaly metrics are calculated based on scale parameter data, an anomaly score map is generated, and threshold segmentation and connected component analysis are performed on the anomaly score map to generate anomaly detection results.

2. The packaging box anomaly detection method based on semantic constraints and geometric priors according to claim 1, characterized in that, Image features and point cloud features of the packaging box sample to be detected are extracted. Based on the feature memory, the direction vectors between the image features, point cloud features and their corresponding normal prototypes are calculated respectively. The direction vectors are then input into a pre-trained local scale prediction network to obtain scale parameter data, specifically: Image features were extracted from the packaging box sample to be detected using an image encoder and a point cloud encoder, respectively. and point cloud features ; From image feature memory Retrieval and Image Features The closest prototype feature in European style And calculate image features The direction vector between the normal prototype and the prototype feature. , , Image features With prototype features The Euclidean distance between the corresponding normal prototypes, It is an L2 norm; From point cloud feature memory Searching and point cloud features The closest prototype feature in European style And calculate point cloud features The direction vector between the normal prototype and the prototype feature. , , Point cloud features With prototype features The Euclidean distance between the corresponding normal prototypes; Direction vector and direction vector Inputting the data into a local scale prediction network yields the basic image scale parameters. and basic point cloud scale parameters The calculation formula is as follows: , It is a three-layer, multi-layer sensor. It is a two-layer, multi-layer sensor; Based on geometric reliability coefficient For basic point cloud scale parameters By applying constraints, the point cloud scale parameters are obtained. And based on the basic image scale parameters and point cloud scale parameters, scale parameter data is generated.

3. The packaging box anomaly detection method based on semantic constraints and geometric priors according to claim 2, characterized in that, Local anomaly metrics are calculated based on scale parameter data, an anomaly score map is generated, and threshold segmentation and connected component analysis are performed on the anomaly score map to generate anomaly detection results, specifically: Combined with basic image scale parameters With modulated image scaling factor Calculate the local anomaly measure between image features and their corresponding normal prototypes. ; Combined with point cloud scale parameters With modulated point cloud scaling factor Calculate the local anomaly metric between point cloud features and their corresponding normal prototypes. and local outlier metrics and local anomaly measurement Weighted summation is performed to obtain the final local anomaly metric. ; Based on the final local anomaly metric, calculate the metric set of the current features of the packaging box sample to be detected in the k+1 nearest neighbor local spaces. Then, the minimum value operator is used for aggregation to obtain the final anomaly score corresponding to the current feature of the packaging box sample to be detected. ; Map the local anomaly metrics corresponding to all features to the image space or 3D point cloud coordinate system of the original packaging box to generate an anomaly score map that reflects the degree of anomaly in each region of the packaging box. The maximum value or weighted average of all local anomaly measures in the anomaly score graph is selected as the final anomaly score S of the packaging box sample to be detected, and the final anomaly score S is compared with the preset safety threshold T. When the final anomaly score S is greater than the safety threshold T, the packaging box sample to be tested is determined to have a defect.

4. The packaging box anomaly detection method based on semantic constraints and geometric priors according to claim 3, characterized in that, All local anomaly metrics corresponding to the features are mapped to the image space or 3D point cloud coordinate system of the original packaging box. Specifically, for the image modality, the patch-level anomaly scores are interpolated to the original image resolution through upsampling; for the point cloud modality, the point-by-point anomaly scores are directly mapped back to the 3D space.

5. The packaging box anomaly detection method based on semantic constraints and geometric priors according to claim 3, characterized in that, Also includes: An adaptive thresholding algorithm is used to perform binarization segmentation on the anomaly score map, stripping away the normal background to obtain a preliminary binarized anomaly region map; Morphological operations are used to clean up the binarized outlier map, making the boundaries of the outlier map in the binarized outlier map smooth and continuous. The cleaned binary abnormal region map is analyzed by connecting components. Spatially adjacent abnormal pixels are aggregated into independent connected blocks, and a corresponding bounding box is generated for each extracted abnormal region. Combined with the confidence score of the region, an anomaly detection result with bounding box and confidence score is generated. The confidence score is generated by the local anomaly metric.

6. The packaging box anomaly detection method based on semantic constraints and geometric priors according to claim 1, characterized in that, The steps for constructing the feature memory are as follows: A normal packaging box sample is obtained and input into an image encoder and a point cloud encoder for multi-scale feature extraction processing. The image features and point cloud features of the normal packaging box sample are extracted. The backbone network of the image encoder is DINOV3 and the backbone network of the point cloud encoder is PointMAE. The extracted image features and point cloud features of normal packaging box samples are stored in the feature memory, generating the image feature memory and point cloud feature memory respectively. The most representative normal sample features are then selected through a core set sampling algorithm to form a prototype pattern representing the distribution of normal patterns of packaging boxes.

7. The packaging box anomaly detection method based on semantic constraints and geometric priors according to claim 1, characterized in that, The training steps for the local scale prediction network are as follows: The large language model is used to process the preset prompts and generate parameters to control noise, including: frequency, anisotropy, octave, binarization threshold, and morphological kernel. An initial Perlin noise matrix is ​​generated in a two-dimensional coordinate space based on frequency and anisotropy. By scaling the distribution along the x-axis and y-axis by different proportions, anomalies with directional characteristics are simulated. By using frequency doubling, four noise layers with different frequencies and decreasing amplitudes are weighted and superimposed to simulate microscopic irregular edge anomalies; a binarization threshold is used to binarize and truncate the continuous noise field; a morphological kernel is used to perform a closing operation on the binarization mask to fill the internal holes, resulting in the final binarized anomaly mask. In the processing of image branches, a texture region is randomly cropped from the image of other normal packaging box samples. Using region replacement technology, the cropped texture is implanted into the specified position of the target normal sample according to the final binarized anomaly mask. Gradient domain boundary fusion technology is used to process the edge of the replacement region so that the implanted anomaly region transitions naturally with the background in terms of brightness, color and texture gradient, generating a visually consistent first synthetic anomaly sample. In the processing of point cloud branches, a point set is selected in the point cloud space according to the final binarized anomaly mask, and Z-axis offset or local geometric deformation is applied to maintain spatial consistency with the image modality and generate a second synthetic anomaly sample. By combining the first and second synthetic anomalous samples, an anomalous sample set is obtained.

8. The packaging box anomaly detection method based on semantic constraints and geometric priors according to claim 7, characterized in that, Also includes: Construct a total loss function based on the abnormal sample set. The local-scale prediction network is trained using a joint total loss function. To separate the loss function, For the boundary loss function, The weights of the boundary loss function, For consistency loss function, The weights of the consistency loss function, For scale-constrained loss, The weights for the scale constraint loss, For cross-modal alignment loss function, The weights are for the cross-modal alignment loss function.

9. A packaging box anomaly detection device based on semantic constraints and geometric priors, characterized in that, include: The coefficient calculation unit is used to acquire the 3D point cloud data of the packaging box sample to be inspected. The local geometric description vector extracted from the 3D point cloud data is input into the geometric coding network, and the geometric reliability coefficient is obtained through mapping. Specifically: Obtain the 3D point cloud data of the packaging box sample to be inspected, and then analyze the i-th 3D point in the 3D point cloud data. The process involves traversing the data, where a 3D point is centered and its K nearest neighbors in space are searched to form a local neighborhood set. ; For local neighborhood point set Perform least-squares plane fitting to calculate the local neighborhood point set. The average Euclidean distance from all nearest neighbors within the fitted plane is used as the local neighborhood point set. Planar fitting residuals of the corresponding region , For the j-th 3D point, As the center point of the neighborhood, For transpose, Let i be the i-th unit normal vector of the fitted plane; Calculate the local neighborhood point set The unit normal vectors at each point within the matrix are used, and the deviation or standard deviation of the direction of these unit normal vectors is taken as the consistency of the normal vectors. , Let j be the unit normal vector of the fitted plane; Plane fitting residuals Consistency with normal vector Combined into a local geometric description vector and the local geometry description vector The input is fed into a geometric coding network, and the corresponding geometric reliability coefficients are obtained through activation function mapping. , For activation function, It is a geometric coding network, wherein the geometric coding network comprises two MLP layers; The prediction unit is used to extract image features and point cloud features of the packaging box sample to be detected. Based on the feature memory, it calculates the direction vector between the image features, point cloud features and their corresponding normal prototypes, and inputs the direction vector into the pre-trained local scale prediction network. Combined with the geometric reliability parameters, the scale parameter data is obtained. The detection unit is used to calculate local anomaly metrics based on scale parameter data, generate anomaly score maps, and perform threshold segmentation and connected component analysis on the anomaly score maps to generate anomaly detection results.