Method and device for detecting defects in a piece of jewelry based on image processing

By acquiring clamping torque vectors, inertial measurement data, and polarization filtering technology, combined with attitude quaternion transformation and multi-scale dilated convolution, the problem of real-time feedback and precise control of defect detection in traditional jewelry processing is solved, achieving efficient defect identification and stable processing quality.

CN122199476APending Publication Date: 2026-06-12SHENZHEN JINCHAO JEWELRY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SHENZHEN JINCHAO JEWELRY CO LTD
Filing Date
2026-03-13
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

In traditional jewelry processing, defect detection relies on human experience, making it difficult to achieve real-time feedback and precise control. Furthermore, the strong specular reflection of metal jewelry and the changing posture of the clamping device lead to high false detection and false negative rates.

Method used

By acquiring clamping torque vectors, inertial measurement data, and jewelry surface images, polarization filtering and global defect feature recognition are performed. The data is then converted into point cloud data using attitude quaternions. Geometric perception feature decoupling networks are used to separate attitude and defect features. Multi-scale dilated convolution and fuzzy adaptive control are employed to calculate spindle speed adjustment and tool compensation.

🎯Benefits of technology

It enables real-time defect detection and feedback control of processing parameters, effectively suppressing defect propagation and improving processing quality stability and detection accuracy.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122199476A_ABST
    Figure CN122199476A_ABST
Patent Text Reader

Abstract

The application relates to the technical field of image processing, and discloses a jewelry defect detection method and device based on image processing, wherein the method comprises the following steps: collecting a clamping torque vector, inertial measurement data and a jewelry surface image in a jewelry processing process; performing polarization filtering on the jewelry surface image to obtain a reflection-removed image, and performing global defect feature recognition on the reflection-removed image to obtain a defect position coordinate set; calculating a posture quaternion based on the clamping torque vector and the inertial measurement data, and converting the posture quaternion into target point cloud data of a clamping device to input the target point cloud data into a geometric perception feature decoupling network to calculate a posture geometric feature; and calculating a main shaft rotating speed adjustment amount and a tool compensation amount based on the defect position coordinate set and the posture geometric feature, so that the method realizes feedback control of defect detection and processing parameter regulation, can dynamically optimize a processing track according to real-time defect distribution, effectively suppresses defect expansion and guarantees processing quality stability.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of image processing technology, and in particular to a method and apparatus for detecting jewelry defects based on image processing. Background Technology

[0002] Jewelry processing demands extremely high standards for processing quality and surface integrity. Traditional jewelry processing relies on manual experience for quality monitoring and parameter adjustment, making it difficult to achieve real-time feedback and precise control of the processing. This leads to frequent problems such as the accumulation of surface defects, severely impacting product yield and processing efficiency. The strong specular reflection characteristics of metal jewelry surfaces cause severe light spot interference in images captured by industrial cameras. Traditional image preprocessing methods cannot effectively eliminate specular reflection components, causing defect features to be submerged in reflection noise. Even minute changes in the posture of the clamping device during processing can cause relative positional shifts in the jewelry. Existing defect detection algorithms cannot distinguish between feature changes caused by posture variations and actual defect features, resulting in high false positive and false negative rates. Summary of the Invention

[0003] This invention provides a jewelry defect detection method and apparatus based on image processing. This invention realizes feedback control of defect detection and processing parameter adjustment, and can dynamically optimize the processing trajectory according to the real-time defect distribution, effectively suppressing defect expansion and ensuring the stability of processing quality.

[0004] In a first aspect, the present invention provides a jewelry defect detection method based on image processing, the jewelry defect detection method based on image processing comprising: Collect clamping torque vectors, inertial measurement data, and images of the jewelry surface during the jewelry processing; The surface image of the jewelry is polarized and filtered to obtain a dereflection image. Global defect feature identification is then performed on the dereflection image to obtain a set of defect location coordinates. The attitude quaternion is calculated based on the clamping torque vector and the inertial measurement data, and the attitude quaternion is converted into target point cloud data of the clamping device and input into the geometric perception feature decoupling network to calculate the attitude geometric features. The spindle speed adjustment and tool compensation are calculated based on the defect location coordinate set and the posture geometry features.

[0005] In conjunction with the first aspect, in a first implementation of the first aspect of the present invention, the acquisition of clamping torque vector, inertial measurement data, and jewelry surface image during the jewelry processing includes: The three-axis linear force component and three-axis torque component of the clamping device are collected in real time by a six-axis torque sensor, and the three-axis linear force component and three-axis torque component are combined to obtain the clamping torque vector; The triaxial acceleration and triaxial angular velocity of the clamping device are synchronously acquired by the inertial measurement unit, and the triaxial acceleration and triaxial angular velocity are used as inertial measurement data. An industrial camera is used to photograph the surface of jewelry during processing, resulting in an image of the jewelry's surface.

[0006] In conjunction with the first aspect, in a second implementation of the first aspect of the present invention, the step of performing polarization filtering on the jewelry surface image to obtain a dereflection image, and performing global defect feature identification on the dereflection image to obtain a set of defect location coordinates, includes: Calculate the Stokes parameters of the jewelry surface image and separate the specular reflection component and diffuse reflection component according to the Stokes parameters, retaining the diffuse reflection component to obtain a de-reflection image; The dereflection image is decomposed to obtain the target reflection component and the target illumination component, and the purified jewelry image is reconstructed based on the target illumination component. Three feature maps are extracted from the cleaned jewelry image and weighted upsampling fusion is performed by applying dilated dilated convolutions to expand the receptive field to obtain defect feature maps. Global average pooling is then performed on the defect feature maps to obtain global defect features. The global features of the defect are input into the detection network to generate multi-scale anchor boxes and perform defect identification to obtain a set of defect location coordinates.

[0007] In conjunction with the first aspect, in a third implementation of the first aspect of the present invention, the step of decomposing the dereflected image to obtain a target reflection component and a target illuminance component, and reconstructing a purified jewelry image based on the target illuminance component, includes: The dereflection image is decomposed into an initial reflection component and an initial illuminance component; Construct an objective function that includes a reconstruction loss term and a smoothing constraint loss term, apply a smoothing constraint to the gradient of the initial illuminance component and minimize the objective function to obtain the target reflection component and the target illuminance component; The target reflection component and the target illuminance component are multiplied element-wise to obtain the purified jewelry image.

[0008] In conjunction with the first aspect, in a fourth implementation of the first aspect of the present invention, the step of calculating the attitude quaternion based on the clamping torque vector and the inertial measurement data, and converting the attitude quaternion into target point cloud data of the clamping device and inputting it into a geometric perception feature decoupling network to calculate the attitude geometric features includes: The clamping torque vector and the inertial measurement data are fused using an extended Kalman filter to obtain fused measurement data, and quaternion calculations are performed on the fused measurement data to obtain attitude quaternions; A rotation matrix is ​​constructed based on the attitude quaternion, and the key point coordinates of the clamping device are mapped to the current attitude through the rotation matrix to obtain the target point cloud data. The target point cloud data is sampled at the farthest point to obtain a set of center points, and local features are aggregated based on the set of center points to obtain global features; The global features are input into a geometric perception feature decoupling network that includes geometric and semantic branches. Geometric attention weights and semantic attention weights are calculated respectively. The geometric attention weights are then multiplied element-wise with the global features to obtain the pose geometric features.

[0009] In conjunction with the first aspect, in the fifth implementation of the first aspect of the present invention, the step of sampling the farthest point of the target point cloud data to obtain a set of center points, and performing local feature aggregation based on the set of center points to obtain global features, includes: Perform farthest point sampling on the target point cloud data to obtain the center point set; Calculate the Euclidean distance from each center point in the center point set to the nearest clamping point, multiply the Euclidean distance by the hardness coefficient of the jewelry material to obtain the adaptive neighborhood radius, and sample neighborhood points within the adaptive neighborhood radius to obtain a local point set; Singular value decomposition is performed on the local point set to obtain the optimal rotation matrix and the optimal translation vector, and the local rigid transformation matrix is ​​obtained by combining the optimal rotation matrix and the optimal translation vector. Based on the local point set and the local rigid transformation matrix, local feature aggregation is performed to obtain global features.

[0010] In conjunction with the first aspect, in a sixth implementation of the first aspect of the present invention, the step of performing local feature aggregation based on the local point set and the local rigid transformation matrix to obtain global features includes: The coordinates of each local point in the local point set are concatenated with the corresponding local rigid transformation matrix to form a feature input vector. The feature input vector is input into a shared multilayer perceptron for feature mapping to obtain a local feature vector. The shared multilayer perceptron contains three fully connected layers. The local feature vectors are input into a four-layer ensemble abstraction layer. In each ensemble abstraction layer, max pooling is performed and the local feature vectors of each local point are aggregated layer by layer to obtain global features.

[0011] In conjunction with the first aspect, in the seventh implementation of the first aspect of the present invention, the step of calculating the spindle speed adjustment and tool compensation based on the defect location coordinate set and the attitude geometry features includes: The defect density is obtained by counting the number of defects in the defect location coordinate set and dividing it by the surface area of ​​the jewelry. Based on the attitude geometric features, the attitude deviation is calculated by decoding the difference between the current attitude parameters and the reference attitude parameters. The spindle speed adjustment is calculated based on the defect density and the attitude deviation, and the tool compensation is calculated based on the distance from each defect to the tool position in the defect position coordinate set.

[0012] In conjunction with the first aspect, in the eighth implementation of the first aspect of the present invention, the step of calculating the spindle speed adjustment based on the defect density and the attitude deviation, and calculating the tool compensation based on the distance from each defect in the defect position coordinate set to the tool position, includes: The defect density and the attitude deviation are input into the fuzzy inference system for fuzzification and rule matching. The proportional parameter adjustment, integral parameter adjustment and differential parameter adjustment are obtained by defuzzification. Based on the proportional parameter adjustment amount, the integral parameter adjustment amount, and the differential parameter adjustment amount, the defect density is subjected to proportional-integral-differential calculation to obtain the spindle speed adjustment amount; Calculate the distance from the center coordinates of each defect in the defect location coordinate set to the tool position, and assign a corresponding defect category weight to each defect according to the defect category; The spatial attenuation factor is calculated based on the distance of each defect, and the tool compensation amount is calculated based on the spatial attenuation factor and the defect category weight.

[0013] Secondly, the present invention provides a jewelry defect detection device based on image processing, the jewelry defect detection device based on image processing comprising: The acquisition module is used to acquire clamping torque vectors, inertial measurement data, and images of the jewelry surface during the jewelry processing process; A global defect feature recognition module is used to perform polarization filtering on the jewelry surface image to obtain a dereflection image, and to perform global defect feature recognition on the dereflection image to obtain a set of defect location coordinates; The attitude geometry feature calculation module is used to calculate attitude quaternions based on the clamping torque vector and the inertial measurement data, and convert the attitude quaternions into target point cloud data of the clamping device and input them into the geometric perception feature decoupling network to calculate attitude geometry features. The adjustment and compensation module is used to calculate the spindle speed adjustment and tool compensation based on the defect location coordinate set and the posture geometry features.

[0014] The technical solution provided by this invention uses polarization filtering technology to calculate Stokes parameters to separate specular reflection and diffuse reflection components, effectively eliminating strong reflection interference from the surface of metal jewelry. The clamping posture quaternion is converted into point cloud data and input into a geometric perception feature decoupling network. A dual-branch attention mechanism explicitly separates posture geometric features and defect semantic features, ensuring that defect recognition is unaffected by posture fluctuations. A multi-scale dilated convolutional pyramid structure is employed, capturing multi-scale defect features without increasing the number of parameters through dilated convolutions with different expansion rates. An adaptive weighted fusion mechanism integrates defect information from different levels, improving the detection capability for various types of defects such as fine scratches, chipping, and burns. Based on the detected defect density, defect location coordinate set, and posture geometric features, a fuzzy adaptive proportional-integral-derivative controller collaboratively calculates the spindle speed adjustment and tool compensation, achieving feedback control between defect detection and machining parameter regulation. This dynamically optimizes the machining trajectory according to the real-time defect distribution, effectively suppressing defect propagation and ensuring machining quality stability.

[0015] Other features and advantages of the invention will be set forth in the description which follows, and will be apparent in part from the description, or may be learned by practicing the invention. The objects and other advantages of the invention are realized and obtained in accordance with the structures particularly pointed out in the description, claims and drawings.

[0016] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, preferred embodiments are described below in detail with reference to the accompanying drawings. Attached Figure Description

[0017] Figure 1 This is a schematic diagram of an embodiment of the jewelry defect detection method based on image processing in this invention. Figure 2 This is a schematic diagram of one embodiment of the jewelry defect detection device based on image processing in this invention. Detailed Implementation

[0018] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0019] The terms "comprising" and "having," and any variations thereof, used in the embodiments of this invention are intended to cover non-exclusive inclusion. For example, a process, method, system, product, or device that includes a series of steps or units is not limited to the steps or units listed, but may optionally include other steps or units not listed, or may optionally include other steps or units inherent to these processes, methods, products, or devices.

[0020] To facilitate understanding of this embodiment, a detailed description of an image processing-based jewelry defect detection method disclosed in this invention will be provided first. For example... Figure 1 As shown, this method includes the following steps: 101. Collect clamping torque vectors, inertial measurement data, and images of the jewelry surface during the jewelry processing; In this embodiment, a six-axis torque sensor is installed in the clamping device structure of the jewelry processing equipment. The six-axis torque sensor has the ability to simultaneously measure three-axis linear force components and three-axis torque components. Its linear force measurement range covers -500N to +500N with a minimum resolution of 0.01N, and its torque component range is -50N·m to +50N·m with a resolution of 0.001N·m. This allows the sensor to obtain the real-time clamping force and torque state applied to the workpiece by the clamping jaws during processing. The linear forces in the three directions Fx, Fy, and Fz and the torques in the three directions Mx, My, and Mz are combined to form a six-dimensional clamping torque vector, which is used to characterize the force state and stability characteristics of the clamping device during processing. Simultaneously, a nine-axis inertial measurement unit (IMU) is deployed near the center of mass of the clamping device to synchronously acquire three-axis acceleration and three-axis angular velocity information generated by the clamping device during processing at a sampling frequency of 500Hz. The acceleration measurement range covers ±16g, and the angular velocity range covers ±2000° / s, effectively reflecting the dynamic response behavior caused by spindle rotation, clamping mechanism vibration, and changes in processing load. For image data acquisition, an industrial CCD camera with a resolution of 2048×2048 pixels is installed at a distance of 350mm from the jewelry surface, equipped with a 25mm fixed-focus lens and a ring LED light source. A fixed exposure time of 8ms and an aperture of f / 8 are set to continuously acquire images of the workpiece surface during jewelry processing, obtaining image sequences of the jewelry surface at different time points, reflecting the surface state evolution information during the actual processing.

[0021] 102. Perform polarization filtering on the jewelry surface image to obtain a dereflection image, and perform global defect feature identification on the dereflection image to obtain the defect location coordinate set; In this embodiment, by installing an adjustable liquid crystal polarizer at the front end of an industrial CCD camera, polarization image sequences of the same jewelry surface at four polarization angles of 0°, 45°, 90°, and 135° are acquired sequentially, and Stokes parameters are calculated accordingly. S0 represents the total light intensity, S1 and S2 reflect the two orthogonal components of linear polarization, and S3 is obtained through circular polarization measurement. The degree of polarization P = √(S1² + S2² + S3²) / S0 is calculated using the Stokes vector S = [S0, S1, S2, S3] to determine the significance of specular reflection areas in the image. When the degree of polarization P exceeds 0.6, it is considered a strong specular reflection area, and the specular reflection component and diffuse reflection component are separated accordingly. The diffuse reflection component is retained as the de-reflection image after removing high-brightness interference. The dereflected image is input into an image illumination decomposition network based on Retinex theory. The network uses a deep convolutional structure to decompose the image into low-frequency illumination component L and high-frequency reflection component R. While preserving the image structure edges, smoothing regularization is applied to the L component. An optimization function minimizes the weighted sum of the image reconstruction error term and the illumination smoothing term, achieving illumination normalization reconstruction and outputting a cleaned jewelry image. The cleaned jewelry image is then input into a feature extraction module with ResNet50 as its backbone. Three multi-scale feature maps C3, C4, and C5 are extracted in the 3rd, 4th, and 5th residual blocks, respectively. Dilated convolution operations with increasing dilation rates are applied to these three feature maps to expand the receptive field. C3 uses a dilation rate r=2, C4 uses r=4, and C5 uses r=8, respectively covering defect areas of different sizes, from minor scratches and moderate chipping to large-area burns. The processed feature maps are weighted and upsampled and fused into a defect feature map of uniform size. An attention mechanism is used to weightedly fuse features at each scale to improve overall perception capability. Global average pooling is performed on the defect feature map to extract the global feature representation of the overall image. This global feature representation is then used as the input vector to feed into the subsequent detection network. The detection network is based on an improved Faster R-CNN architecture. It generates multi-scale anchor boxes of different sizes and aspect ratios through a region candidate network and predicts their foreground probabilities and bounding box offsets. The RoIAlign operation is performed on the candidate regions, and the defect category and location coordinate information are output through a fully connected classification and bounding box regression module. High-confidence target regions are selected based on non-maximum suppression, and the defect location coordinate set consisting of defect center coordinates, category label, and confidence value is output.

[0022] 103. Calculate the attitude quaternion based on the clamping torque vector and inertial measurement data, and convert the attitude quaternion into target point cloud data of the clamping device and input it into the geometric perception feature decoupling network to calculate the attitude geometric features. In this embodiment, the clamping torque vector, along with inertial measurement data such as triaxial acceleration and angular velocity, are jointly input into an extended Kalman filter. The Kalman filter uses quaternions and angular velocity bias as state vectors. By constructing a nonlinear system model containing state transition equations and observation equations, it fuses and iteratively updates multi-source data, outputting real-time attitude quaternions that reflect the spatial rotational state of the clamping device during jewelry processing. A three-dimensional rotation matrix is ​​constructed based on the attitude quaternions, and the coordinate sets of multiple key points of the clamping device are mapped to their positions under the current attitude through rotation matrix transformation, generating target point cloud data of the clamping device under the current spatial attitude. The point cloud data includes multiple clamping claw endpoints, joint points, and surface feature points. The target point cloud data is sparsified using a farthest point sampling algorithm, extracting several spatially uniform center point sets. A local neighborhood is constructed with the center points as the core. Within each neighborhood, local structural features are aggregated through rigid transformation matrix estimation and feature normalization operations. Through continuous compression and fusion in the multi-layer abstraction process, a global feature vector is obtained. The global features are input into a geometric perception feature decoupling network that includes geometric and semantic branches. Two parallel multilayer perceptrons are used to compute geometric attention weights and semantic attention weights, respectively reflecting the weight distributions related to pose configuration and defect texture in the features. Element-wise multiplication is performed between the geometric attention weights and the global features to extract the feature subspace associated with pose changes, forming a pose geometric feature vector.

[0023] 104. Calculate the spindle speed adjustment and tool compensation based on the defect location coordinate set and attitude geometry features.

[0024] In this embodiment, the defect location coordinate set is counted, the number of defect instances contained therein is counted, and the visible surface area calculated by combining the two with the jewelry 3D model is used to calculate the defect density of the current workpiece processing state. The attitude geometric feature vector extracted by the geometric perception feature decoupling network is restored to the Euler angle attitude expression of the current clamping device in 3D space through the decoding process, and compared with the preset reference attitude parameters item by item. The angle difference is calculated to obtain the attitude deviation including roll, pitch and yaw, which reflects the degree of offset between the clamping device and the ideal configuration in actual operation. The defect density and attitude deviation are jointly input into the speed regulation mapping model. The spindle speed adjustment is calculated based on the magnitude and trend of these values. Higher defect density or larger attitude deviation results in a more significant and negatively correlated speed adjustment, reducing the risk of overheating and defect propagation. Simultaneously, using the current tool position as a reference, the Euclidean distance from each defect point in the defect coordinate set to the tool center is calculated. Combining this with weighted coefficients for each type of defect (e.g., high weight for burns, low weight for scratches), a defect spatial influence factor is constructed using a distance decay function. These factors are then weighted and summed to obtain a path compensation amount associated with the tool position. This spindle speed adjustment is superimposed on the current spindle speed to generate the adjusted target speed value. The tool compensation amount is then applied to the tool path planning control module, achieving adaptive intelligent control based on real-time vision and attitude feedback.

[0025] In one specific embodiment, the process of performing step 101 may specifically include the following steps: The three-axis linear force component and three-axis torque component of the clamping device are collected in real time by a six-axis torque sensor, and the three-axis linear force component and three-axis torque component are combined to obtain the clamping torque vector. The triaxial acceleration and triaxial angular velocity of the clamping device are synchronously acquired by the inertial measurement unit, and the triaxial acceleration and triaxial angular velocity are used as inertial measurement data. An industrial camera is used to photograph the surface of jewelry during processing, resulting in an image of the jewelry's surface.

[0026] In this embodiment, a six-axis torque sensor is embedded in the main structural support of the jewelry processing clamping device. The sensor has the ability to simultaneously measure linear forces in three directions and rotational torques in three directions. The measurement range of the linear force components Fx, Fy, and Fz covers -500N to +500N, with a resolution of 0.01N. The measurement range of the torque components Mx, My, and Mz covers -50N·m to +50N·m, with a minimum resolution of 0.001N·m. This allows the sensor to reflect the load distribution, stress state, and force disturbance of the workpiece generated by the clamping device in real time during the processing. The sensor also continuously outputs a six-dimensional vector signal at a frequency of 1kHz through an internal sampling system. The six-dimensional vector is then combined into a clamping torque vector through vector splicing. Simultaneously, a high-sensitivity nine-axis inertial measurement unit (IMU) is installed at the center of mass of the clamping device. Its accelerometer and gyroscope modules are activated to synchronously acquire three-axis linear acceleration and three-axis angular velocity at a frequency of 500Hz. The acceleration measurement range is ±16g, and the angular velocity measurement range is ±2000° / s. The three-axis acceleration and angular velocity reflect the attitude change trend and micro-dynamic response of the clamping device caused by factors such as spindle rotation, workpiece vibration, and load changes. The acquisition process is connected to the master clock timing system via a dedicated hardware trigger interface, ensuring that the IMU data timestamps are consistent with the torque sensor data, and all data are aligned on a unified time axis. An industrial CCD camera is positioned directly in front of the clamping device, equipped with a fixed-focus lens and a uniform ring LED light source. During jewelry processing, it continuously acquires images of the jewelry surface. The CCD camera outputs high-resolution grayscale images of 2048×2048 pixels at a frame rate of 50Hz. By fixing the exposure time to 8ms and the aperture setting to f / 8, the images maintain high contrast and clarity even in complex reflective environments. The clamping torque vector, inertial measurement data, and jewelry surface images are combined under a unified time reference to form a multimodal sensing dataset.

[0027] In one specific embodiment, the process of performing step 102 may specifically include the following steps: Calculate the Stokes parameters of the jewelry surface image and separate the specular reflection component and diffuse reflection component according to the Stokes parameters. The diffuse reflection component is retained to obtain the de-reflection image. The dereflection image is decomposed to obtain the target reflection component and the target illumination component, and the purified jewelry image is reconstructed based on the target illumination component. Three feature maps are extracted from the cleaned jewelry image and weighted upsampling is performed by applying dilated dilated convolutions to expand the receptive field to obtain defect feature maps. Global average pooling is then applied to the defect feature maps to obtain global defect features. The global features of the defect are input into the detection network to generate multi-scale anchor boxes and perform defect identification, thereby obtaining the set of defect location coordinates.

[0028] In this embodiment, an adjustable liquid crystal polarizer is installed at the front end of an industrial camera. Four images with different polarization angles are acquired sequentially at the same time point. The Stokes parameters of the jewelry surface image are calculated to describe the polarization state, and based on this, specular reflection-dominant and diffuse reflection-dominant regions are identified. The specular reflection component is removed, and only the diffuse reflection image is retained as the de-reflection image. The de-reflection image is input into a deep decomposition network based on image illumination modeling. In the deep decomposition network, the input image is decomposed into a target reflection component representing physical reflection attributes and a target illumination component representing light intensity distribution through a convolutional encoder structure. After smoothing the illumination component, the components are recombined to generate a purified jewelry image. The purified jewelry image is input into a multi-scale feature extraction module. Three layers of feature maps are extracted based on the ResNet backbone network, corresponding to shallow texture, mid-level structure, and deep semantic expression, respectively. Increasingly dilated convolution operations are applied to each layer to expand the receptive field layer by layer, enabling the feature network to perceive both minor scratches and large-area defect areas. The feature maps resulting from these three dilated convolution layers are uniformly upsampled to standard resolution and fused into a unified defect feature map using an attention weighting mechanism. Global average pooling is then performed on the defect feature map to compress it into a set of representative global defect feature vectors. These global defect features are input into a detection network based on an improved Faster R-CNN architecture. The detection network generates anchor boxes of multiple scales and shapes to cover potential defect regions through a region candidate network. Each anchor box is classified and its bounding box is refined. A filtering mechanism outputs a set of defect location coordinates containing spatial location, defect category, and confidence information.

[0029] In one specific embodiment, the process of decomposing the dereflected image to obtain the target reflectance component and the target illuminance component, and reconstructing the purified jewelry image based on the target illuminance component, may specifically include the following steps: The dereflection image is decomposed into an initial reflectance component and an initial illuminance component; Construct an objective function that includes a reconstruction loss term and a smoothing constraint loss term, apply a smoothing constraint to the gradient of the initial illuminance component and minimize the objective function to obtain the target reflection component and the target illuminance component; The purified jewelry image is obtained by performing element-wise multiplication on the target reflectance component and the target illuminance component.

[0030] In this embodiment, the dereflection image is input into an image decomposition network. The image decomposition network employs an end-to-end convolutional-deconvolutional structure, aiming to separate the physical properties of the image. It decomposes the input image into two sub-components: an initial reflection component representing the inherent properties of the object's surface, and an initial illuminance component representing changes in illumination distribution. Structurally, these components capture material features and illumination information respectively, and are jointly modeled by extracting local and global features through layer-by-layer convolution. During image decomposition, optimization constraints are designed to ensure the physical consistency of the decomposition results. Therefore, a joint objective function is constructed, including a reconstruction loss term and an illuminance smoothing constraint term. The reconstruction term ensures that the product of the initial reflection component and the illuminance component can reconstruct the input image, while the smoothing constraint term penalizes and restricts the gradient changes of the initial illuminance component in the spatial domain, preventing drastic changes in the illuminance map while maintaining the illumination trend, effectively suppressing artifacts and edge errors. During network training, the objective function is iteratively minimized, continuously optimizing the network weight parameters, gradually converging, and outputting the target reflection component and the target illuminance component. After optimization, a reflection component with clear structure and enhanced details and an illuminance component with soft texture and natural transition are obtained. By performing element-wise multiplication on the reflection and illuminance components, the reflection and illuminance components are fused and reconstructed to output a purified jewelry image with a unified illumination distribution and complete geometric structure.

[0031] In one specific embodiment, the process of performing step 103 may specifically include the following steps: Extended Kalman filtering is applied to fuse the clamping torque vector and inertial measurement data to obtain fused measurement data. Quaternion calculation is then performed on the fused measurement data to obtain attitude quaternions. A rotation matrix is ​​constructed based on the attitude quaternion, and the key point coordinates of the clamping device are mapped to the current attitude through the rotation matrix to obtain the target point cloud data. The farthest point of the target point cloud data is sampled to obtain the center point set, and local features are aggregated based on the center point set to obtain global features; The global features are input into a geometric perception feature decoupling network that includes geometric and semantic branches. Geometric attention weights and semantic attention weights are calculated separately. The geometric attention weights are then multiplied element-wise with the global features to obtain the pose geometric features.

[0032] In this embodiment, the three-axis linear force components and three-axis torque components of the six-axis torque sensor, along with the three-axis acceleration and three-axis angular velocity collected by the inertial measurement unit, are input into an extended Kalman filter after time alignment. The Kalman filter, combined with a dynamic prediction and observation update model, estimates and compensates for noise and drift in the sensors, thereby stably outputting the fused state estimate in a dynamic processing environment. After multiple iterations, the output state variables contain quaternion expressions required for attitude. These quaternions serve as the basic expression describing the current spatial attitude of the clamping device and are further converted into rotation matrices to construct three-dimensional transformation relationships. The initial keypoint coordinate set of the clamping device is input into the rotation matrix to achieve a rigid rotational mapping from the initial attitude to the current attitude, obtaining target point cloud data in the current processing state, representing the real-time configuration of the clamping structure in three-dimensional space. A farthest-point sampling operation is performed on the point cloud data to obtain a uniformly distributed and comprehensively information-covered set of center points. Local neighborhood regions are then divided based on these center points. Points within each region are normalized in position and their features are aggregated. Local structural features are extracted through a shared network and passed upwards through multi-layer abstraction operations to form a global feature vector. The global feature vector is input into a geometry-aware feature decoupling network, which contains two parallel branch modules. One branch focuses on modeling pose geometry features, while the other branch represents semantic features related to surface defects. Internally, the corresponding geometric attention weights and semantic attention weights are calculated to characterize the dimensional response relationships related to pose or defects in the feature space. Then, by performing element-wise multiplication of the geometric attention weights with the global features, explicit extraction and enhancement of pose-related features are achieved, outputting a pose geometry feature representation.

[0033] In one specific embodiment, the process of sampling the farthest point of the target point cloud data to obtain a set of center points, and then performing local feature aggregation based on the set of center points to obtain global features, can specifically include the following steps: Perform farthest point sampling on the target point cloud data to obtain the center point set; Calculate the Euclidean distance from each center point in the center point set to the nearest clamping point, multiply the Euclidean distance by the hardness coefficient of the jewelry material to obtain the adaptive neighborhood radius, sample neighborhood points within the adaptive neighborhood radius to obtain the local point set; Singular value decomposition is performed on the local point set to obtain the optimal rotation matrix and the optimal translation vector. The local rigid transformation matrix is ​​obtained by combining the optimal rotation matrix and the optimal translation vector. Global features are obtained by aggregating local features based on local point sets and local rigid transformation matrices.

[0034] In this embodiment, the target point cloud data is sampled from the farthest point to obtain a set of center points evenly distributed in three-dimensional space. Each sampled point has the largest farthest distance from other points in the selected point set, improving the overall sampling coverage and spatial representativeness. Euclidean distance is calculated between each center point and the coordinates of predefined key points of the clamping device. The minimum spatial distance between each center point and the nearest clamping point is extracted, and this minimum spatial distance is multiplied by the hardness coefficient corresponding to the jewelry material currently being processed. This dynamically generates an adaptive neighborhood radius that fits the current spatial configuration and material properties, allowing different materials (such as gold, silver, and platinum) to automatically match their mechanical response characteristics during the neighborhood point construction stage, achieving structural self-adjustment. Based on the adaptive neighborhood radius of each center point, all points falling within the neighborhood range are retrieved in the point cloud to form corresponding local point sets. Rigid matching analysis is then performed using each local point set as input. By constructing a point-to-point mapping relationship between the original local point coordinate system and the center point aligned coordinate system, the optimal rotation matrix and optimal translation vector are solved using singular value decomposition. This minimizes the overall squared error between the transformed local point set and the center point reference position, obtaining a local rigid transformation matrix describing the geometric transformation characteristics of the local region. A point-domain relationship is established between each local point set and its corresponding rigid transformation matrix. A shared perceptual network is used to normalize, transform, and aggregate points within the neighborhood, extracting high-dimensional feature representations of each local region. Based on the features of all local regions, a holistic global feature vector is constructed through multi-layer abstraction and feature fusion mechanisms to express the global configuration features and geometric pattern distribution of the current clamping device in three-dimensional posture.

[0035] In one specific embodiment, the process of performing local feature aggregation based on local point sets and local rigid transformation matrices to obtain global features can specifically include the following steps: The coordinates of each local point in the local point set are concatenated with the corresponding local rigid transformation matrix to form a feature input vector. The feature input vector is input into a shared multilayer perceptron for feature mapping to obtain a local feature vector. The shared multilayer perceptron contains three fully connected layers. The local feature vectors are input into a four-layer ensemble abstraction layer. In each ensemble abstraction layer, max pooling is performed and the local feature vectors of each local point are aggregated layer by layer to obtain the global features.

[0036] In this embodiment, the three-dimensional coordinates of each local point are connected and combined with the rotation and translation parameters in the current local rigid transformation matrix to form an extended feature vector containing spatial position information and local geometric transformation information. This vector simultaneously expresses the point's relative position in its local coordinate system and its transformation characteristics under the influence of surrounding configurations. The feature input vector is input into a multilayer perceptron network with a fixed structure and shared parameters. The multilayer perceptron network consists of three fully connected layers that perform nonlinear feature mapping from the original input space to the intermediate feature space and then to the higher-order semantic space. Each layer is equipped with activation functions and normalization operations to enhance feature representation and convergence stability, resulting in local feature vectors. The local feature vectors of all local points are input into a four-layer ensemble abstraction layer. Each ensemble abstraction layer performs feature aggregation and spatial compression operations at different scales. Specifically, a local neighborhood partitioning mechanism selects a point set within a certain range, and then max pooling is applied within this range to extract the most representative feature response values, thereby compressing the spatial dimension and retaining the dominant feature information. Each layer of the abstraction module sequentially performs feature aggregation and increases the receptive field at each layer, realizing the feature abstraction process from local to global. As the layers deepen, the network gradually aggregates structural information from multiple local regions to form a unified high-dimensional global feature vector.

[0037] In one specific embodiment, the process of performing step 104 may specifically include the following steps: The defect density is obtained by counting the number of defects in the defect location coordinate set and dividing by the surface area of ​​the jewelry. The attitude deviation is obtained by decoding the current attitude parameters and the reference attitude parameters based on the attitude geometric features; The spindle speed adjustment is calculated based on the defect density and attitude deviation, and the tool compensation is calculated based on the distance from each defect to the tool position in the defect location coordinate set.

[0038] In this embodiment, the defect location coordinate set is statistically processed to extract the number of valid defect instances. The visible surface area of ​​the corresponding jewelry workpiece under the current processing viewpoint is calculated using depth map information or a 3D point cloud model. The defect density is calculated as the ratio of the number of defects to the surface area. Defect density serves as a quantitative indicator of the degree of defects in the current processing area, used to dynamically adjust processing strategies and risk control levels. Simultaneously, the attitude geometric feature vector extracted by the geometric perception feature decoupling network is used to reconstruct the Euler angle attitude parameters of the current clamping device in 3D space through the attitude decoding module. Difference analysis is performed with a preset reference attitude to obtain the attitude deviation, reflecting the degree of deviation of the current attitude state from the ideal attitude benchmark. The defect density and attitude deviation are input into the spindle control logic for speed adjustment calculation. Higher defect density or greater attitude deviation results in a larger adjustment magnitude, and the adjustment direction is mostly to reduce the spindle speed to control heat accumulation and defect propagation trends, avoiding the risk of microcracks or burns in the material caused by high-speed processing. Using the current three-dimensional position of the tool in the machining path as the reference coordinates, the Euclidean distance from each defect point in the defect location coordinate set to the tool position is calculated. In combination with different defect types, corresponding weight coefficients are set, and the distances are weighted and superimposed through an exponential decay function to obtain the overall tool path compensation amount. This reflects whether the current path needs to temporarily adjust the cutting depth or feed strategy to avoid high-risk defect-dense areas, and is applied in real time during CNC path generation or interpolation.

[0039] In one specific embodiment, the process of calculating the spindle speed adjustment based on defect density and attitude deviation, and calculating the tool compensation based on the distance from each defect to the tool position in the defect location coordinate set, can specifically include the following steps: The defect density and attitude deviation are input into the fuzzy inference system for fuzzification and rule matching. The proportional parameter adjustment, integral parameter adjustment and differential parameter adjustment are obtained by defuzzification. The spindle speed adjustment is obtained by performing proportional-integral-differential calculations on the defect density based on the proportional parameter adjustment, integral parameter adjustment, and differential parameter adjustment. Calculate the distance from the center coordinates of each defect in the defect location coordinate set to the tool position, and assign a corresponding defect category weight to each defect according to the defect category; The spatial attenuation factor is calculated based on the distance of each defect, and the tool compensation amount is calculated based on the spatial attenuation factor and the defect category weight.

[0040] In this embodiment, the defect density index and attitude deviation are input into the fuzzy inference system and fuzzification is performed respectively. The defect density is divided into three fuzzy sets: low, medium, and high. The attitude deviation is divided into three levels: small, medium, and large. After fuzzification, rule matching is performed based on a preset fuzzy rule base. The fuzzy rule base is set according to experience and process feedback. For example, under the condition of high defect density and large attitude deviation, the spindle response speed is reduced to reduce machining risk. The Mamdani-type inference method combined with the center of gravity method is used for defuzzification to obtain the dynamic adjustment of the three parameters of the spindle controller: proportional, integral, and derivative. The parameter adjustment is applied to the PID structure in the current control loop. The proportional, integral, and derivative calculations are performed on the defect density to obtain the spindle speed adjustment, reflecting the comprehensive response capability of the process system to the current risk state. To achieve adaptive adjustment of the local machining path, a spatial modeling mechanism is introduced. The distance between the center coordinate of each defect instance in the defect location coordinate set and the current three-dimensional position of the tool is calculated. Combined with the semantic classification label of the defect, different defect category weights are assigned to different defect categories, with scratches having the lowest weight and burns and cracks having the highest weights. The spatial attenuation factor is calculated based on the distance from each defect to the tool. The attenuation factor is mapped by an exponential function or a Gaussian function. The closer the distance, the stronger the influence. The attenuation factor is then combined with the defect category weight to construct a spatial weighted model. The influence of all defects is superimposed and integrated to obtain the toolpath compensation amount.

[0041] The image processing-based jewelry defect detection method in the embodiments of the present invention has been described above. The image processing-based jewelry defect detection device in the embodiments of the present invention will be described below. Please refer to [link / reference]. Figure 2 One embodiment of the image processing-based jewelry defect detection device of the present invention includes: The acquisition module 201 is used to acquire clamping torque vector, inertial measurement data and jewelry surface images during the jewelry processing process; The global defect feature recognition module 202 is used to perform polarization filtering on the jewelry surface image to obtain a dereflection image, and to perform global defect feature recognition on the dereflection image to obtain a set of defect location coordinates; The attitude geometry feature calculation module 203 is used to calculate attitude quaternions based on clamping torque vector and inertial measurement data, and convert the attitude quaternions into target point cloud data of the clamping device and input them into the geometric perception feature decoupling network to calculate attitude geometry features. The adjustment and compensation module 204 is used to calculate the spindle speed adjustment and tool compensation based on the defect location coordinate set and attitude geometry features.

[0042] Through the collaborative efforts of the aforementioned components, the Stokes parameters are calculated using polarization filtering technology to separate specular reflection and diffuse reflection components, effectively eliminating strong reflection interference from the surface of metal jewelry. The clamping posture quaternion is converted into point cloud data and input into the geometric perception feature decoupling network. A dual-branch attention mechanism is used to explicitly separate posture geometric features and defect semantic features, ensuring that defect recognition is unaffected by posture fluctuations. A multi-scale dilated convolutional pyramid structure is adopted, capturing multi-scale defect features without increasing the number of parameters through dilated convolutions with different dilation rates. An adaptive weighted fusion mechanism integrates defect information from different levels, improving the detection capability for various types of defects such as fine scratches, chipping, and burns. Based on the detected defect density, defect location coordinate set, and posture geometric features, a fuzzy adaptive proportional-integral-derivative controller is used to collaboratively calculate the spindle speed adjustment and tool compensation, realizing feedback control of defect detection and machining parameter adjustment. This allows for dynamic optimization of the machining trajectory based on real-time defect distribution, effectively suppressing defect propagation and ensuring machining quality stability.

[0043] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.

[0044] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0045] The above-described embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to limit it. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for detecting jewelry defects based on image processing, characterized in that, include: Collect clamping torque vectors, inertial measurement data, and images of the jewelry surface during the jewelry processing; The surface image of the jewelry is polarized and filtered to obtain a dereflection image. Global defect feature identification is then performed on the dereflection image to obtain a set of defect location coordinates. The attitude quaternion is calculated based on the clamping torque vector and the inertial measurement data, and the attitude quaternion is converted into target point cloud data of the clamping device and input into the geometric perception feature decoupling network to calculate the attitude geometric features. The spindle speed adjustment and tool compensation are calculated based on the defect location coordinate set and the posture geometry features.

2. The image processing-based jewelry defect detection method according to claim 1, characterized in that, The acquisition of clamping torque vectors, inertial measurement data, and jewelry surface images during the jewelry processing includes: The three-axis linear force component and three-axis torque component of the clamping device are collected in real time by a six-axis torque sensor, and the three-axis linear force component and three-axis torque component are combined to obtain the clamping torque vector; The triaxial acceleration and triaxial angular velocity of the clamping device are synchronously acquired by the inertial measurement unit, and the triaxial acceleration and triaxial angular velocity are used as inertial measurement data. An industrial camera is used to photograph the surface of jewelry during processing, resulting in an image of the jewelry's surface.

3. The image processing-based jewelry defect detection method according to claim 1, characterized in that, The process involves polarization filtering the jewelry surface image to obtain a dereflection image, and then performing global defect feature identification on the dereflection image to obtain a set of defect location coordinates, including: Calculate the Stokes parameters of the jewelry surface image and separate the specular reflection component and diffuse reflection component according to the Stokes parameters, retaining the diffuse reflection component to obtain a de-reflection image; The dereflection image is decomposed to obtain the target reflection component and the target illumination component, and the purified jewelry image is reconstructed based on the target illumination component. Three feature maps are extracted from the cleaned jewelry image and weighted upsampling fusion is performed by applying dilated dilated convolutions to expand the receptive field to obtain defect feature maps. Global average pooling is then performed on the defect feature maps to obtain global defect features. The global features of the defect are input into the detection network to generate multi-scale anchor boxes and perform defect identification to obtain a set of defect location coordinates.

4. The image processing-based jewelry defect detection method according to claim 3, characterized in that, The step of decomposing the dereflected image to obtain the target reflectance component and the target illuminance component, and reconstructing the purified jewelry image based on the target illuminance component, includes: The dereflection image is decomposed into an initial reflection component and an initial illuminance component; Construct an objective function that includes a reconstruction loss term and a smoothing constraint loss term, apply a smoothing constraint to the gradient of the initial illuminance component and minimize the objective function to obtain the target reflection component and the target illuminance component; The target reflection component and the target illuminance component are multiplied element-wise to obtain the purified jewelry image.

5. The image processing-based jewelry defect detection method according to claim 1, characterized in that, The process of calculating attitude quaternions based on the clamping torque vector and the inertial measurement data, and converting the attitude quaternions into target point cloud data of the clamping device, inputting it into a geometric perception feature decoupling network to calculate attitude geometric features, includes: The clamping torque vector and the inertial measurement data are fused using an extended Kalman filter to obtain fused measurement data, and quaternion calculations are performed on the fused measurement data to obtain attitude quaternions; A rotation matrix is ​​constructed based on the attitude quaternion, and the key point coordinates of the clamping device are mapped to the current attitude through the rotation matrix to obtain the target point cloud data. The target point cloud data is sampled at the farthest point to obtain a set of center points, and local features are aggregated based on the set of center points to obtain global features; The global features are input into a geometric perception feature decoupling network that includes geometric and semantic branches. Geometric attention weights and semantic attention weights are calculated respectively. The geometric attention weights are then multiplied element-wise with the global features to obtain the pose geometric features.

6. The image processing-based jewelry defect detection method according to claim 5, characterized in that, The process involves sampling the farthest point of the target point cloud data to obtain a set of center points, and then aggregating local features based on the set of center points to obtain global features, including: Perform farthest point sampling on the target point cloud data to obtain the center point set; Calculate the Euclidean distance from each center point in the center point set to the nearest clamping point, multiply the Euclidean distance by the hardness coefficient of the jewelry material to obtain the adaptive neighborhood radius, and sample neighborhood points within the adaptive neighborhood radius to obtain a local point set; Singular value decomposition is performed on the local point set to obtain the optimal rotation matrix and the optimal translation vector, and the local rigid transformation matrix is ​​obtained by combining the optimal rotation matrix and the optimal translation vector. Based on the local point set and the local rigid transformation matrix, local feature aggregation is performed to obtain global features.

7. The image processing-based jewelry defect detection method according to claim 6, characterized in that, The process of aggregating local features based on the local point set and the local rigid transformation matrix to obtain global features includes: The coordinates of each local point in the local point set are concatenated with the corresponding local rigid transformation matrix to form a feature input vector. The feature input vector is input into a shared multilayer perceptron for feature mapping to obtain a local feature vector. The shared multilayer perceptron contains three fully connected layers. The local feature vectors are input into a four-layer ensemble abstraction layer. In each ensemble abstraction layer, max pooling is performed and the local feature vectors of each local point are aggregated layer by layer to obtain global features.

8. The image processing-based jewelry defect detection method according to claim 1, characterized in that, The calculation of spindle speed adjustment and tool compensation based on the defect location coordinate set and the attitude geometry features includes: The defect density is obtained by counting the number of defects in the defect location coordinate set and dividing it by the surface area of ​​the jewelry. Based on the attitude geometric features, the attitude deviation is calculated by decoding the difference between the current attitude parameters and the reference attitude parameters. The spindle speed adjustment is calculated based on the defect density and the attitude deviation, and the tool compensation is calculated based on the distance from each defect to the tool position in the defect position coordinate set.

9. The image processing-based jewelry defect detection method according to claim 8, characterized in that, The calculation of the spindle speed adjustment based on the defect density and the attitude deviation, and the calculation of the tool compensation based on the distance from each defect to the tool position in the defect position coordinate set, include: The defect density and the attitude deviation are input into the fuzzy inference system for fuzzification and rule matching. The proportional parameter adjustment, integral parameter adjustment and differential parameter adjustment are obtained by defuzzification. Based on the proportional parameter adjustment amount, the integral parameter adjustment amount, and the differential parameter adjustment amount, the defect density is subjected to proportional-integral-differential calculation to obtain the spindle speed adjustment amount; Calculate the distance from the center coordinates of each defect in the defect location coordinate set to the tool position, and assign a corresponding defect category weight to each defect according to the defect category; The spatial attenuation factor is calculated based on the distance of each defect, and the tool compensation amount is calculated based on the spatial attenuation factor and the defect category weight.

10. A jewelry defect detection device based on image processing, characterized in that, A method for performing the image processing-based jewelry defect detection method as described in any one of claims 1-9, comprising: The acquisition module is used to acquire clamping torque vectors, inertial measurement data, and images of the jewelry surface during the jewelry processing process; A global defect feature recognition module is used to perform polarization filtering on the jewelry surface image to obtain a dereflection image, and to perform global defect feature recognition on the dereflection image to obtain a set of defect location coordinates; The attitude geometry feature calculation module is used to calculate attitude quaternions based on the clamping torque vector and the inertial measurement data, and convert the attitude quaternions into target point cloud data of the clamping device and input them into the geometric perception feature decoupling network to calculate attitude geometry features. The adjustment and compensation module is used to calculate the spindle speed adjustment and tool compensation based on the defect location coordinate set and the posture geometry features.