Motorcycle parts burr removal system based on industrial vision

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
An industrial vision system combining a multi-angle light source array and a multispectral camera is used to extract the roughness and edge gradient features of burrs on motorcycle parts. The system uses structured light point clouds to obtain three-dimensional coordinates, solving the positioning error problem in existing technologies and achieving precise burr removal.

CN122265271APending Publication Date: 2026-06-23QINGDAO SUNSONG METAL MFG CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: QINGDAO SUNSONG METAL MFG CO LTD
Filing Date: 2026-05-06
Publication Date: 2026-06-23

Application Information

Patent Timeline

06 May 2026

Application

23 Jun 2026

Publication

CN122265271A

IPC: G06T7/00; G06T7/70; G06V10/145; G06V10/10; G06V10/141; G06V10/44; G06V10/58; G06V10/80; G06V10/82; G06V10/26; G06V10/764; G06V10/766; G06N3/045

AI Tagging

Application Domain

Image analysis Character and pattern recognition

Technology Topics

Light spot Engineering

Technical Efficacy Phrases

Eliminate reflective areasimplement extraction

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Multi-granularity image feature extraction method, device and medium for image-text fusion
CN116563859BGive full consideration to contactimplement extractionFeature extraction Computer graphics (images)
Radiation-enhanced single photon emitter and method of making
CN120280791BEnhance coupled output powerSimple structureLaser active region structureLaser output parameters controlPhotonicsParticle physics
Methods, apparatus, computer-readable media, and electronic devices for generating the chronological order of case facts.
CN122285885AImprove analysis efficiencyImprove decision-making efficiencyTime information Graphics
A robust matching method for SAR images based on attention feature aggregation
CN122265683Aimplement extractionImprove matching accuracyCharacter and pattern recognition Biological models Visual perception Neural network nn
Method and apparatus for training a content producer's category prediction model
CN116956154BRealize adaptive adjustmentThe clustering result is accurateSelective content distribution Neural learning methods

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing industrial vision-based motorcycle parts burr removal systems suffer from a lack of depth dimension data in two-dimensional images when dealing with highly reflective metal materials and complex curved surface structures. This causes the robotic arm trajectory to deviate from the actual spatial position of the burr, resulting in spatial positioning errors in the removal operation.

Method used

By employing a multi-angle light source array, a structured light camera, and a multispectral camera, combined with a dynamic illumination angle matrix and a multispectral attention mechanism, the roughness features and edge gradient features of burrs are extracted. The three-dimensional spatial coordinates of the burrs are obtained through structured light point clouds, and the removal trajectory of the robotic arm is planned.

Benefits of technology

It effectively eliminates the interference of reflection from the metal curved surface, realizes the accurate extraction of burr candidate areas and three-dimensional spatial coordinate mapping, and the robotic arm removal trajectory matches the burr shape, eliminating the spatial positioning deviation caused by two-dimensional coordinate mapping.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122265271A_ABST

Patent Text Reader

Abstract

The present application relates to the technical field of image recognition, in particular to a motorcycle part burr removing system based on industrial vision. An image processing device generates a non-reflective area by controlling a multi-angle light source array according to a dynamic light angle matrix calculated based on a surface normal vector; the roughness features of a visible light image and the edge gradient features of a near-infrared image are cross-fused in a feature pyramid network through a multispectral attention mechanism to generate a fusion feature map and segment out a burr candidate area; local point cloud sets are extracted in combination with structured light point cloud data, the distance values of the local point cloud to a reference plane are calculated to obtain three-dimensional space coordinates; and an executing mechanism plans a mechanical arm removing track according to the three-dimensional space coordinates. The present scheme eliminates metal reflection interference, obtains three-dimensional space coordinates of the burr, matches the removing track with the actual morphology of the burr, and eliminates spatial positioning deviation.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of image recognition technology, and more specifically to a burr removal system for motorcycle parts based on industrial vision. Background Technology

[0002] Existing industrial vision-based motorcycle component burr removal systems typically employ a single visible light camera with a fixed light source to acquire two-dimensional images of the component surface. During image processing, the system extracts grayscale transition regions from the image using conventional edge detection operators or identifies abnormal regions using general object detection models. After determining the two-dimensional pixel coordinates of the burr, the system, combined with a pre-calibrated transformation matrix between the camera and the robotic arm, directly maps these coordinates to the execution coordinates of the robotic arm's end effector, controlling a cutter or laser head to physically remove the identified burr area along a fixed trajectory.

[0003] Motorcycle parts are mostly made of highly reflective metals, and their surfaces commonly have complex curved structures such as deep holes and blind grooves. When a fixed light source shines on the surface of these parts, the curved metal surface produces specular reflections. These reflective spots appear as bright areas in two-dimensional images, directly destroying the original edge gradient information of the burrs within that area. This causes conventional edge detection operators or object detection models to be unable to distinguish between normal reflective metal areas and actual burr features. Furthermore, the two-dimensional images lack depth dimension data, and the two-dimensional coordinates extracted by the system under reflective interference cannot reflect the true position of the burrs in three-dimensional physical space. The robotic arm trajectory generated according to the two-dimensional coordinate mapping will deviate from the actual spatial position of the burrs, causing spatial positioning errors in the removal operation. Summary of the Invention

[0004] The purpose of this invention is to provide a burr removal system for motorcycle parts based on industrial vision, which can effectively solve the problems mentioned in the background art.

[0005] To achieve the above objectives, the technical solution adopted by the present invention is as follows: A motorcycle parts burr removal system based on industrial vision includes a vision acquisition device, an image processing device, a structured light point cloud acquisition device, and an execution mechanism. The vision acquisition device includes a multi-angle light source array, a structured light camera, and a multispectral camera. The image processing device is connected to the multi-angle light source array, the structured light camera, and the multispectral camera, respectively. The execution mechanism is connected to the image processing device. The image processing device calculates a dynamic illumination angle matrix based on the surface normal vector of the motorcycle parts. The multi-angle light source array is controlled to generate a non-reflective area according to the dynamic illumination angle matrix. The image processing device inputs the visible light image acquired by the multispectral camera into a pre-trained surface roughness detection network to obtain roughness features, and inputs the near-infrared image into an edge gradient detection network to obtain edge gradient features. The roughness features and the edge gradient features are cross-fused in a feature pyramid network using a multispectral attention mechanism to generate a fused feature map. The spur candidate region is segmented according to the response intensity threshold in the fused feature map. The image processing device calculates the three-dimensional spatial coordinates of the spur candidate region by combining the point cloud data output by the structured light point cloud acquisition device. The execution mechanism includes a robotic arm control system that plans the removal trajectory of a six-axis robotic arm according to the three-dimensional spatial coordinates.

[0006] Preferably, the image processing device acquires the initial point cloud data output by the structured light camera, performs voxel downsampling and statistical filtering on the initial point cloud data to acquire target point cloud data, calculates the normal vector of each spatial point in the target point cloud data, and maps the normal vector to the spherical coordinate system of the multi-angle light source array. Based on the distribution of the normal vector in the spherical coordinate system, calculate the set of illumination angles such that the angle between the incident light rays of each light source in the multi-angle light source array and the normal vector is within a preset non-specular reflection range. Then, spatially map the set of illumination angles according to the physical arrangement of the multi-angle light source array to generate the dynamic illumination angle matrix.

[0007] Preferably, the image processing device performs registration processing on the visible light image and the near-infrared image to obtain a registered image pair, and inputs the visible light image in the registered image pair into the surface roughness detection network; Multi-scale texture features are extracted through multiple convolutional layers in the surface roughness detection network, and the near-infrared image in the registered image pair is input into the edge gradient detection network. The Sobel operator layer in the edge gradient detection network extracts horizontal and vertical gradient features, concatenates the horizontal and vertical gradient features to obtain initial edge gradient features, and performs non-maximum suppression processing on the initial edge gradient features to obtain the final edge gradient features.

[0008] Preferably, the image processing device inputs the roughness features and the edge gradient features into the bottom-up path of the feature pyramid network and performs multiple downsampling to obtain multiple roughness feature maps and edge gradient feature maps at different scales; In the top-down path of the feature pyramid network, the first attention weight of each channel in the roughness feature map and the second attention weight of each channel in the edge gradient feature map are calculated using the multispectral attention mechanism. The first attention weight and the second attention weight are then multiplied by a matrix to obtain the cross-spectral joint weight. The roughness feature map and the edge gradient feature map at the corresponding scale are weighted and fused using the cross-spectral joint weight to generate the fused feature map.

[0009] Preferably, the image processing device performs upsampling processing on the fused feature map to obtain a target fused feature map with the same size as the visible light image, and inputs the target fused feature map into a preset classification and regression network; The spur confidence of each pixel in the target fusion feature map is output by the fully connected layer in the classification and regression network. Pixels with spur confidence greater than the response intensity threshold are marked as candidate pixels. Connectivity analysis is performed on adjacent candidate pixels to obtain multiple connected regions. The area of the bounding rectangle of each connected region is calculated. Connected regions with bounding rectangle areas less than a preset area threshold are removed to obtain the spur candidate regions.

[0010] Preferably, the structured light point cloud acquisition device projects structured light stripes onto the surface of the motorcycle parts, acquires a stripe image containing distorted structured light stripes, performs phase unrolling on the stripe image according to a phase decoding algorithm to obtain an absolute phase map, and calculates three-dimensional point cloud data based on the absolute phase map and the intrinsic and extrinsic parameter matrix of the structured light point cloud acquisition device. The image processing device maps the two-dimensional pixel coordinates of the burr candidate region in the visible light image to the three-dimensional point cloud data, extracts the local point cloud set corresponding to the two-dimensional pixel coordinates, and performs plane fitting on the local point cloud set to obtain the reference plane. Calculate the distance from each spatial point in the local point cloud set to the reference plane, and use the set of point cloud coordinates whose distance values are greater than a preset height threshold as the three-dimensional spatial coordinates.

[0011] Preferably, in the process of calculating the dynamic illumination angle matrix, the image processing device divides the multi-angle light source array into multiple illumination partitions according to the physical arrangement of the array, and calculates the average normal vector for the local normal vector set in each illumination partition. When the angle between the average normal vector and the initial light source incident direction corresponding to the illumination zone falls into the preset non-specular reflection range, the initial light source incident direction remains unchanged; when the angle between the average normal vector and the initial light source incident direction does not fall into the preset non-specular reflection range, the average normal vector is used as the axis. A rotational offset angle is generated within the preset non-mirror reflection range. The initial light source incident direction is rotated and transformed according to the rotational offset angle to obtain an updated incident direction. The updated incident direction is then written into the dynamic illumination angle matrix.

[0012] Preferably, when acquiring the registered image pair, the image processing device extracts a first set of corner points in the visible light image and a second set of corner points in the near-infrared image, calculates the Hamming distance between each first corner point in the first set and each second corner point in the second set, takes the first corner point and the second corner point with the smallest Hamming distance as the matching corner point pair, and uses a random sampling consensus algorithm to remove incorrect matching corner point pairs to obtain a set of correct matching corner point pairs; The affine transformation matrix is calculated based on the set of correctly matched corner points. The affine transformation matrix is then used to perform a spatial geometric transformation on the near-infrared image, so that the transformed near-infrared image and the visible light image are aligned in spatial dimension to obtain the registered image pair.

[0013] Preferably, when calculating the cross-spectral joint weight, the image processing device performs global average pooling on the first attention weight to obtain a first global feature vector, and performs global average pooling on the second attention weight to obtain a second global feature vector. The first global feature vector and the second global feature vector are concatenated along the channel dimension to obtain a concatenated feature vector, which is then input into the multilayer perceptron model. The transspectral attention coefficients are output through the fully connected layer and the Sigmoid activation function in the multilayer perceptron model. The transspectral attention coefficients are then multiplied element-wise with the first attention weight and the second attention weight to obtain the transspectral joint weight.

[0014] Preferably, when the image processing device performs planar fitting on the local point cloud, it uses the least squares method to construct the initial covariance matrix of the local point cloud, and performs eigenvalue decomposition on the initial covariance matrix to obtain three eigenvalues and three corresponding eigenvectors. The eigenvector corresponding to the smallest eigenvalue among the three eigenvalues is selected as the normal vector of the reference plane, and the coordinates of the geometric center point of the local point cloud are taken as the spatial point on the reference plane. The plane equation of the reference plane is constructed based on the normal vector and the coordinates of the spatial points. The coordinates of the spatial points in the local point cloud are substituted into the plane equation to calculate the distance value. The distance values are sorted according to their numerical values to obtain a height sequence. The extreme points at the beginning and end of the height sequence are removed, and the preset height threshold is recalculated.

[0015] Compared with the prior art, the beneficial effects of the present invention are as follows: 1. This scheme controls a multi-angle light source array by calculating a dynamic illumination angle matrix, ensuring that the angle between the incident light ray and the surface normal vector is within the non-specular reflection range, thus eliminating reflective areas generated by the curved metal surface. Roughness features extracted from visible light images and edge gradient features extracted from near-infrared images are cross-fused in a feature pyramid network using a multispectral attention mechanism. Utilizing the difference between roughness features and edge gradient features, burr features are separated from the fused feature map, eliminating interference from residual textures in the two-dimensional image on burr edges and achieving the extraction of burr candidate regions.

[0016] 2. This scheme maps the segmented burr candidate regions onto structured light point cloud data, extracts the corresponding local point cloud sets for planar fitting, and obtains the three-dimensional spatial coordinates of the burr candidate regions by calculating the distance from the local point cloud to the reference plane. The three-dimensional spatial coordinates contain the physical information of the burr's height and depth. The robotic arm control system plans the removal trajectory based on the three-dimensional spatial coordinates, ensuring that the feed position of the tool or laser head matches the actual shape of the burr in three-dimensional physical space, eliminating the spatial positioning deviation caused by direct mapping of two-dimensional coordinates. Attached Figure Description

[0017] Figure 1 This is a flowchart of the dynamic illumination angle matrix generation process of the present invention; Figure 2 This is a flowchart of the multispectral image registration and feature extraction process of the present invention; Figure 3 This is a flowchart of the multispectral attention mechanism and feature fusion of the present invention; Figure 4 This is a flowchart of the burr candidate region segmentation process of the present invention; Figure 5 This is a flowchart of the structured light 3D point cloud acquisition process of the present invention; Figure 6 This is a flowchart of the robotic arm trajectory planning and execution process of the present invention. Detailed Implementation

[0018] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0019] Please refer to Figure 1This embodiment provides a motorcycle parts burr removal system based on industrial vision, comprising a vision acquisition device, an image processing device, a structured light point cloud acquisition device, and an actuator. The vision acquisition device includes a multi-angle light source array, a structured light camera, and a multispectral camera. The image processing device establishes bidirectional data communication links with the multi-angle light source array, the structured light camera, and the multispectral camera, respectively. The image processing device can send control commands to the multi-angle light source array, the structured light camera, and the multispectral camera, and simultaneously receive status data and acquisition data transmitted back from them. The actuator establishes a bidirectional data communication link with the image processing device, receiving spatial coordinate data and control commands sent by the image processing device, and simultaneously transmitting execution status data back to the image processing device.

[0020] The image processing device acquires spatial point cloud data of the motorcycle component surface, calculates the surface normal vector corresponding to each spatial point on the component surface based on the spatial point cloud data, and calculates a dynamic illumination angle matrix based on the spatial distribution of the surface normal vectors. Specifically, the image processing device calculates the angle between the normal vector of each spatial point on the component surface and the corresponding incident ray, generates incident angle adjustment parameters for each light source based on the constraint of the angle, and spatially maps the adjustment parameters according to the physical arrangement of the multi-angle light source array to generate a dynamic illumination angle matrix. The image processing device sends control commands to the multi-angle light source array according to the dynamic illumination angle matrix to adjust the incident angle of each light source unit in the array, so that the angle between the incident ray of each area on the motorcycle component surface and the surface normal vector at the corresponding position is within a preset non-specular reflection range, generating a non-reflective image acquisition area on the motorcycle component surface.

[0021] in, Let be the angle between the incident ray and the surface normal vector. Let be the unit normal vector of a spatial point on the surface of a motorcycle component. This represents the unit direction vector of the incident light ray from the corresponding light source unit in the multi-angle light source array. The preset non-mirror reflection range is a pre-defined range of included angle values. When within this range, the incident light from the light source will not produce specular reflection on the surface of the component, thus avoiding the formation of bright reflective areas in the acquired image.

[0022] A multispectral camera simultaneously acquires visible light and near-infrared images of the motorcycle component surface within a reflective image acquisition area, and transmits these images to an image processing device. The image processing device inputs the received visible light images into a pre-trained surface roughness detection network. This network extracts texture features from the visible light images to generate roughness features. The surface roughness detection network employs a multi-layer convolutional neural network structure. Its training dataset contains visible light images of motorcycle metal component surfaces with different processing techniques and roughness levels. Each image in the dataset is labeled with a corresponding measured surface roughness value. The network is trained using supervised learning. The trained network can extract multi-scale texture features related to surface roughness from the visible light images, and the output roughness features reflect the differences in roughness across different areas of the component surface. The roughness of burr areas on the motorcycle component surface differs significantly from that of normally processed surfaces; the texture changes in burr areas are more pronounced, resulting in a higher intensity of the corresponding roughness feature response.

[0023] The image processing device inputs the received near-infrared image into a pre-trained edge gradient detection network. The network extracts edge gradient information from the near-infrared image to generate edge gradient features. The edge gradient detection network includes a gradient calculation layer and a non-maximum suppression layer. The gradient calculation layer uses the directional gradient operator to extract grayscale changes in different directions from the near-infrared image, generating initial gradient features. The non-maximum suppression layer refines the initial gradient features, removing redundant pixels to generate the final edge gradient features. At the boundary between burr areas and normally machined surfaces on motorcycle parts, there are significant grayscale jumps, resulting in higher intensity response of the corresponding edge gradient features. Near-infrared images are not sensitive to texture changes on metal surfaces, effectively eliminating interference from normal machining textures on the edge extraction.

[0024] The image processing device employs a multispectral attention mechanism, inputting the extracted roughness features and edge gradient features into a feature pyramid network. Cross-fusion processing is performed at multiple scale levels within the feature pyramid network to generate fused feature maps corresponding to different scales. The feature pyramid network includes a bottom-up downsampling path and a top-down upsampling path. The bottom-up path downsamples the input roughness and edge gradient features multiple times, generating feature maps at different scales to capture burr features of varying sizes. The top-down path upsamples the high-level feature maps obtained from the downsampling and fuses them with the feature maps of corresponding scales from the bottom-up path. Simultaneously, the multispectral attention mechanism dynamically adjusts the weights of the two features, enhancing the feature response of burr regions in the fused feature map and suppressing the feature response of irrelevant background regions.

[0025] The image processing device sets a response intensity threshold based on the response intensity of each pixel location in the fused feature map, and performs threshold segmentation on the fused feature map. Pixel regions with response intensities greater than the response intensity threshold are segmented to obtain spur candidate regions. The response intensity threshold is a pre-set feature response threshold value. When the feature response intensity of a pixel location is greater than this threshold, the probability that the pixel location belongs to a spur region meets the screening requirements, and it is included in the range of spur candidate regions.

[0026] The image processing device, combined with the 3D point cloud data output by the structured light point cloud acquisition device, maps the pixel coordinates of the segmented burr candidate region in the 2D image to the spatial coordinate system corresponding to the 3D point cloud data. It then extracts the local point cloud data corresponding to the 2D pixel coordinates and calculates the 3D spatial coordinates of the burr candidate region based on the local point cloud data. The structured light point cloud acquisition device projects structured light stripes and acquires distorted stripe images. Based on a phase decoding algorithm and a pre-calibrated camera intrinsic and extrinsic parameter matrix, it calculates the 3D point cloud data of the motorcycle component surface. This 3D point cloud data contains the 3D coordinate information of each spatial point on the component surface, reflecting the true spatial morphology of the component surface.

[0027] The actuator comprises a robotic arm control system and a six-axis robotic arm. The robotic arm control system receives the three-dimensional spatial coordinates of the burr candidate area output by the image processing device, constructs the spatial contour of the burr area based on the three-dimensional spatial coordinates, plans the removal trajectory of the six-axis robotic arm's end effector in Cartesian space, generates continuous trajectory control commands, and controls the end effector of the six-axis robotic arm to move along the removal trajectory to complete the burr removal operation on the surface of the motorcycle parts. The end effector of the six-axis robotic arm can be a cutting tool or a laser emitter. The appropriate end effector type is selected according to the material and size of the burr. The removal trajectory planned by the robotic arm control system perfectly matches the spatial contour of the burr area, ensuring that the feed position of the end effector is consistent with the actual spatial position of the burr.

[0028] Table 1. Correspondence between non-specular reflection regions and surface normal vector tilt angles. In Table 1, the tilt angle between the normal vector and the camera optical axis is the angle between the surface normal vector and the camera optical axis, used to characterize the degree of tilt of the component surface. The non-specular reflection incident angle range is the range of angle values between the incident ray and the surface normal vector. When the angle is within this range, specular reflection can be effectively avoided. This table is used in the dynamic illumination angle matrix calculation process to match the corresponding non-specular reflection range for component surface areas with different tilt degrees, providing a constraint basis for adjusting the incident angle of the light source, ensuring that the adjusted incident ray will not produce specular reflection spots on the component surface.

[0029] In this embodiment, by calculating the dynamic illumination angle matrix and controlling the multi-angle light source array, the specular reflection interference on the surface of motorcycle metal parts is eliminated, ensuring the quality of subsequent image acquisition. Roughness features and edge gradient features related to burr characteristics are extracted from visible light and near-infrared images acquired by a multispectral camera, respectively. The complementarity of these two features enhances the recognition capability of burr characteristics. Cross-fusion of the two features is achieved through a multispectral attention mechanism and a feature pyramid network, enhancing the response intensity of the burr region in the fused feature map. The true three-dimensional spatial coordinates of the burr are obtained by mapping structured light three-dimensional point cloud data with two-dimensional burr candidate regions. The robotic arm removal trajectory planned based on the three-dimensional spatial coordinates matches the actual spatial shape of the burr, achieving precise removal of burrs from motorcycle parts.

[0030] In a preferred embodiment, the image processing device acquires initial point cloud data output by the structured light camera and performs preprocessing operations on the initial point cloud data. The preprocessing operations include voxel downsampling and statistical filtering. After processing, the target point cloud data is obtained. The initial point cloud data is the raw 3D point cloud data acquired by the structured light camera. This data contains a large number of redundant point clouds and noisy outliers. Redundant point clouds increase the computational load of subsequent calculations, and noisy outliers affect the accuracy of normal vector calculations. Preprocessing operations can reduce the redundancy of the point cloud data and remove noisy outliers while preserving the overall contour features of the point cloud, thereby improving the quality of the point cloud data.

[0031] During the voxel downsampling process, the image processing device divides the three-dimensional space of the initial point cloud data into multiple cubic voxel grids of the same size. Each voxel grid corresponds to a fixed volume region in the three-dimensional space. The number of initial point clouds contained in each voxel grid is counted. For a voxel grid containing point clouds, the average value of all point cloud coordinates in the grid is calculated. This average value is used as the coordinates of the sampling point of the voxel grid. The sampling points of all voxel grids constitute the downsampled point cloud data.

[0032] in, For the first Spatial coordinates of sampling points in an individual pixel raster. For the first The number of initial point clouds contained within an individual pixel raster. For the first Within the individual pixel grid The initial point cloud has spatial coordinates. By using voxel downsampling, the density of the initial point cloud data can be reduced to a preset range, reducing the computational load of subsequent normal vector calculations while preserving the overall contour features of the component surface.

[0033] During statistical filtering, the image processing device calculates the Euclidean distance from each point cloud in the voxel-sampled point cloud data to a preset number of neighboring points, calculates the average of all Euclidean distances, and obtains the average neighborhood distance of that point cloud. The mean and standard deviation of all average neighborhood distances are calculated, and an outlier threshold is calculated based on the mean and standard deviation. Point clouds with an average neighborhood distance greater than the outlier threshold are identified as noise outliers and removed from the point cloud data. The remaining point cloud data constitutes the target point cloud data.

[0034] in, For the first The average neighborhood distance of a point cloud. The preset number of neighboring points, For the first Spatial coordinates of a point cloud For the first The first point cloud Spatial coordinates of neighboring points Operators for Euclidean distance calculation; Threshold for identifying outliers The mean of the average distances between all point cloud neighborhoods. The standard deviation of the average distance of all point cloud neighborhoods. This is a preset threshold coefficient. Statistical filtering effectively removes noisy outliers from the initial point cloud data, improving the accuracy of the target point cloud data and ensuring the accuracy of subsequent normal vector calculations.

[0035] The image processing device calculates the normal vector of each spatial point in the target point cloud data. Specifically, for each spatial point in the target point cloud data, a predetermined number of neighboring points are extracted, and a local point cloud set consisting of the point and its neighbors is constructed. The least squares method is used to fit a plane to the local point cloud set to obtain the normal vector of the fitted plane, which is then used as the surface normal vector of the spatial point. The normal vector of the fitted plane has two opposite directions. The image processing device uses the optical center of the structured light camera as a reference point to uniformly adjust the direction of the normal vector to point towards the optical center of the camera, ensuring the consistency of the direction of all normal vectors.

[0036] The image processing device maps all calculated surface normal vectors to a spherical coordinate system corresponding to the multi-angle light source array. This spherical coordinate system is established with the optical center of the structured light camera as the origin, the optical axis of the structured light camera as the Z-axis, the horizontal direction of the camera's imaging plane as the X-axis, and the vertical direction as the Y-axis. Normal vectors in the spherical coordinate system are represented by polar angles and azimuth angles. The polar angle is the angle between the normal vector and the Z-axis, and the azimuth angle is the angle between the projection of the normal vector onto the XY plane and the X-axis. Through the mapping to the spherical coordinate system, the direction of the normal vector in three-dimensional space can be converted into angular parameters in the spherical coordinate system, facilitating the subsequent calculation of the illumination angle.

[0037] The image processing device calculates a set of illumination angles based on the distribution of normal vectors in spherical coordinates. Each element in the illumination angle set corresponds to an incident angle adjustment parameter for a light source unit in a multi-angle light source array. The constraint condition for the adjustment parameter is that the angle between the incident ray of the adjusted light source and the normal vector of the corresponding area surface is within a preset non-specular reflection range. The image processing device spatially maps each element in the illumination angle set according to the physical arrangement of the multi-angle light source array, writes the adjustment parameter corresponding to each light source unit into the corresponding position in the matrix, and generates a dynamic illumination angle matrix.

[0038] Furthermore, in calculating the dynamic illumination angle matrix, the image processing device divides the multi-angle light source array into multiple illumination zones according to the physical arrangement of the array. Each illumination zone corresponds to a set of adjacent light source units in the array, and each illumination zone corresponds to a fixed area on the surface of the motorcycle component. For each illumination zone, the image processing device extracts the normal vectors of all spatial points within the corresponding component surface area, forming a local normal vector set. The average value of all normal vectors in the local normal vector set is calculated to obtain the average normal vector corresponding to that illumination zone.

[0039] The image processing device calculates the angle between the average normal vector corresponding to the illumination zone and the initial light source incident direction corresponding to the illumination zone, and determines whether the angle falls within a preset non-specular reflection range. When the angle falls within the preset non-specular reflection range, the initial light source incident direction corresponding to the illumination zone remains unchanged, and the angle parameter corresponding to the initial light source incident direction is written into the corresponding position of the dynamic illumination angle matrix. When the angle does not fall within the preset non-specular reflection range, a rotation offset angle is generated within the preset non-specular reflection range with the average normal vector as the rotation axis. The initial light source incident direction is rotated according to the rotation offset angle to obtain the updated incident direction, and the angle parameter corresponding to the updated incident direction is written into the corresponding position of the dynamic illumination angle matrix.

[0040] In the process of generating the rotation offset angle, the initial incident direction of the light source is rotated around the average normal vector as the axis. The angle between the incident direction and the average normal vector is calculated during the rotation. When the angle falls within the middle position of the preset non-specular reflection range, the rotation stops. This rotation angle is the rotation offset angle. By adjusting the incident direction to the middle position of the non-specular reflection range, the probability of specular reflection can be further reduced. Even if there are slight surface undulations on the component surface, it can be ensured that the angle between the incident light and the normal vector is within the non-specular reflection range.

[0041] Table 2 Correspondence between Point Cloud Preprocessing Parameters and Point Cloud Data Quality In Table 2, the voxel grid size level is positively correlated with the actual size of the voxel grid; the higher the level, the larger the voxel grid size and the lower the point cloud density after downsampling. The number of neighborhood points in the statistical filtering is the number of neighborhood points selected when calculating the average neighborhood distance. The outlier threshold coefficient is the threshold coefficient in the statistical filtering formula. The processed point cloud density level represents the density of the point cloud data after preprocessing; the normal vector calculation error range is the deviation range between the normal vector calculated based on the preprocessed point cloud data and the true normal vector. This table is used in the point cloud preprocessing process to match the corresponding preprocessing parameter combination according to the original density and noise level of the initial point cloud data. While controlling the amount of computation, it ensures the accuracy of normal vector calculation, providing accurate normal vector data for the calculation of the dynamic illumination angle matrix.

[0042] In this embodiment, the quality of point cloud data is improved and the computational load of subsequent calculations is reduced by preprocessing operations such as voxel downsampling and statistical filtering. By mapping to the spherical coordinate system, the normal vector direction is converted into an angle parameter that is easy to calculate. By dividing the illumination into zones and calculating the average normal vector, the dynamic adjustment of the multi-angle light source array is realized, ensuring that the incident light rays in each area of the component surface are within the non-specular reflection range, further eliminating the specular reflection interference of the metal curved surface and ensuring the quality of subsequent image acquisition.

[0043] In a preferred embodiment, reference Figure 2 The image processing unit performs registration processing on the visible light image and near-infrared image acquired by the multispectral camera to obtain a spatially aligned registered image pair. Because the optical centers of the visible light imaging channel and the near-infrared imaging channel of the multispectral camera are misaligned, the acquired visible light image and near-infrared image exhibit pixel offset in the spatial dimension. Registration processing eliminates this offset, ensuring that the same spatial location in both images corresponds to the same pixel coordinates, thus guaranteeing the accuracy of subsequent feature fusion.

[0044] During the registration process, the image processing device employs a corner detection algorithm to extract the first set of corner points from the visible light image and the second set of corner points from the near-infrared image. The ORB algorithm can be used for corner detection, as it can quickly extract corner features from the image and generate corresponding binary descriptors. The image processing device calculates the Hamming distance between the binary descriptors of each first corner point in the first set and the binary descriptors of each second corner point in the second set. The Hamming distance characterizes the degree of difference between two binary descriptors; the smaller the Hamming distance, the higher the feature similarity between the two corner points. The image processing device selects the first and second corner points with the smallest Hamming distance as matching corner point pairs. After traversing all corner points, an initial set of matching corner point pairs is obtained.

[0045] The initial set of matched corner pairs contains incorrectly matched corner pairs, which can affect the accuracy of subsequent transformation matrix calculations. The image processing device uses a random sample consensus algorithm to process the initial set of matched corner pairs, removing incorrectly matched pairs to obtain a set of correctly matched corner pairs. Specifically, the random sample consensus algorithm randomly selects a minimum sample set from the initial set of matched corner pairs, calculates the initial transformation matrix based on this minimum sample set, counts the number of interior points in the initial set of matched corner pairs that conform to the initial transformation matrix, and after multiple iterations, selects the transformation matrix with the largest number of interior points as the optimal transformation matrix. The matched corner pairs that conform to the optimal transformation matrix are then considered as correctly matched corner pairs, forming the set of correctly matched corner pairs.

[0046] The image processing device calculates an affine transformation matrix based on a set of correctly matched corner pairs. This matrix includes linear and translation parameters, enabling image rotation, scaling, translation, and shearing transformations. Using the calculated affine transformation matrix, the device performs a spatial geometric transformation on the near-infrared image, ensuring complete spatial alignment between the transformed near-infrared image and the visible light image. The transformed near-infrared image and the visible light image then form a registered image pair.

[0047] in, These are the original pixel coordinates in the near-infrared image. These are the pixel coordinates after affine transformation. These are linear transformation parameters, corresponding to image rotation, scaling, and shearing transformations, respectively. The translation transformation parameters correspond to the horizontal and vertical translation amounts of the image, respectively. The 3×3 matrix in the formula is the affine transformation matrix.

[0048] The image processing device inputs the visible light image from the registered image pair into the surface roughness detection network. The surface roughness detection network adopts a ResNet backbone network structure, containing multiple convolutional blocks. Each convolutional block contains a convolutional layer, a batch normalization layer, and a ReLU activation function layer. Through progressive downsampling across multiple convolutional blocks, multi-scale texture features are extracted from the visible light image, generating multi-channel roughness features. During the training process of the surface roughness detection network, the mean squared error loss function is used as the loss function, and the labeled measured surface roughness values are used as supervision labels. The network's weight parameters are updated through a backpropagation algorithm. The trained network can accurately extract texture features related to surface roughness, and the output roughness features can effectively distinguish between burr areas and normally processed surfaces.

[0049] The image processing device inputs the near-infrared image from the registered image pair into an edge gradient detection network. The input of the edge gradient detection network is a Sobel operator layer, which contains horizontal and vertical Sobel convolution kernels. Through convolution operations, the horizontal and vertical gradient features of each pixel in the near-infrared image are extracted. The image processing device concatenates the extracted horizontal and vertical gradient features along the channel dimension to obtain initial edge gradient features. Non-maximum suppression is then applied to the initial edge gradient features to remove redundant pixels and refine the edge width, thus obtaining the final edge gradient features.

[0050] in, Pixels in near-infrared images The horizontal gradient value at a given location constitutes the horizontal gradient feature. For pixels The vertical gradient value at a given point constitutes the vertical gradient feature. For near-infrared images at pixel points The grayscale value at that location; The horizontal Sobel convolution kernel is used. The kernel is a vertical Sobel convolution kernel; This is a two-dimensional convolution operator; For pixels The gradient magnitude at a given point, and the gradient magnitudes of all pixels together constitute the initial edge gradient features.

[0051] refer to Figure 3The image processing device inputs the extracted roughness features and edge gradient features into the feature pyramid network. The bottom-up path of the feature pyramid network contains multiple downsampling levels. Each downsampling level performs a downsampling operation with a stride of 2 through a convolutional layer, downsampling the input roughness features and edge gradient features step by step to obtain multiple roughness feature maps and edge gradient feature maps of different scales. The feature maps of different scales correspond to burr features of different sizes. The feature maps of higher levels correspond to the overall features of large-sized burrs, while the feature maps of lower levels correspond to the details of small-sized burrs.

[0052] In the top-down path of the feature pyramid network, the image processing device upsamples the feature maps of higher levels, making the size of the upsampled feature maps consistent with the size of the feature maps of the corresponding levels in the bottom-up path. Simultaneously, using a multispectral attention mechanism, it calculates the first attention weight for each channel in the roughness feature map at the corresponding scale, and the second attention weight for each channel in the edge gradient feature map. The first attention weight characterizes the importance of each channel's features in the roughness feature map for burr identification, and the second attention weight characterizes the importance of each channel's features in the edge gradient feature map for burr identification.

[0053] in, For the roughness feature map The first attention weight of each channel; The sigmoid activation function is used to map weight values to 0. Within the interval of 1; and It consists of two fully connected layers used for encoding and decoding features; It is a linear rectification activation function; and These represent the height and width of the roughness feature map, respectively. For the roughness feature map Each channel at pixel position Eigenvalues at; For the roughness feature map Global average pooling is performed on each channel to obtain the global feature vector of that channel.

[0054] in, For the edge gradient feature map, the first The second attention weight of each channel; For the edge gradient feature map, the first Each channel at pixel position The eigenvalues at the specified location; the definitions of the remaining parameters are consistent with the parameter definitions in the first attention weight calculation formula.

[0055] The image processing device performs a matrix multiplication operation on the calculated first attention weight and second attention weight to obtain a cross-spectral joint weight. This cross-spectral joint weight is then used to weight and fuse the roughness feature map and edge gradient feature map at the corresponding scale. The weighted roughness feature map and edge gradient feature map are then concatenated along the channel dimension to generate a fused feature map corresponding to that scale. The same fusion method is used for each scale level in the feature pyramid network to generate multiple fused feature maps at different scales.

[0056] in, For cross-spectral joint weights; The weight vector is composed of the first attention weights for all channels; The weight vector is composed of the second attention weights for all channels; This is the matrix dot product operator.

[0057] Furthermore, in calculating the cross-spectral joint weights, the image processing device performs global average pooling on the first attention weights to obtain a first global feature vector, and performs global average pooling on the second attention weights to obtain a second global feature vector. The first and second global feature vectors are then concatenated along the channel dimension to obtain a concatenated feature vector. The image processing device inputs the concatenated feature vector into a multilayer perceptron model, which contains two fully connected layers and a sigmoid activation function. The first fully connected layer performs dimensionality reduction encoding on the concatenated feature vector, the second fully connected layer performs dimensionality increase decoding on the encoded features, and the sigmoid activation function maps the output value to 0. Within the range of 1, transspectral attention coefficients are obtained. The image processing device multiplies these transspectral attention coefficients element-wise with the first and second attention weights, respectively, and performs a secondary calibration on the first and second attention weights to obtain the final transspectral joint weights. Through this secondary calibration of the transspectral attention coefficients, the weights of feature channels that contribute significantly to spur detection can be further enhanced, while the weights of irrelevant feature channels can be suppressed, thereby improving the quality of the fused feature map.

[0058] Table 3. Cross-spectral attention coefficient allocation for multi-scale feature fusion. In Table 3, the feature pyramid levels refer to the different scale levels in the feature pyramid network from the bottom to the top. The bottom feature map has the largest size, corresponding to detailed features, while the top feature map has the smallest size, corresponding to overall features. The number of roughness feature channels is the number of channels in the roughness feature map of the corresponding level; the number of edge gradient feature channels is the number of channels in the edge gradient feature map of the corresponding level; and the range of cross-spectral attention coefficients is the range of values for the cross-spectral attention coefficients output by the multilayer perceptron model of the corresponding level. This table is used in the multi-scale fusion process of the feature pyramid network to match the corresponding attention coefficient allocation strategy for feature maps of different levels. The bottom feature map corresponds to a higher range of attention coefficient values to enhance the detailed features of burrs, while the top feature map corresponds to a slightly lower range of attention coefficient values to take into account the overall features of burrs, ensuring that burr features of different sizes can be effectively enhanced in the fused feature map.

[0059] In this embodiment, spatial pixel offset between the visible light image and the near-infrared image is eliminated through image registration processing, achieving spatial alignment of the two images. A dedicated surface roughness detection network and edge gradient detection network are used to extract roughness features and edge gradient features related to burr characteristics, respectively. The complementarity of these two features eliminates interference from normal processing textures on the component surface. Through a multispectral attention mechanism and a feature pyramid network, the two features are cross-fused at multiple scale levels, effectively enhancing the feature response of the burr region in the fused feature map and providing high-quality feature data for subsequent burr candidate region segmentation.

[0060] In a preferred embodiment, reference Figure 4 The image processing device upsamples multiple fused feature maps of different scales output by the feature pyramid network, adjusting the size of all fused feature maps to match the size of the visible light image to obtain the target fused feature map. Specifically, for the top-level fused feature map, the size of the feature map is gradually increased through multiple bilinear interpolation upsampling processes. After each upsampling, the feature map is superimposed with the corresponding scale fused feature map, ultimately generating a target fused feature map that is completely consistent with the size of the visible light image. The target fused feature map contains spur feature information at different scales.

[0061] The image processing device inputs the target fusion feature map into a preset classification and regression network. This network consists of multiple convolutional layers and two fully connected layers. The convolutional layers further extract features from the target fusion feature map, and the fully connected layers encode the extracted features, outputting the spur confidence score for each pixel in the target fusion feature map. The spur confidence score ranges from 0 to 1. 1. The closer the confidence level is to 1, the higher the probability that the pixel belongs to the spur region.

[0062] in, For the target fusion feature map, the first Confidence level of burrs per pixel; For the target feature map fusion in the first Feature vector at each pixel; It is a fully connected layer; The sigmoid activation function is used to map the output value to 0. Within the range of 1.

[0063] The image processing device sets a response intensity threshold, marking pixels with a glitch confidence level greater than the threshold as candidate pixels and pixels with a glitch confidence level less than or equal to the threshold as background pixels. The device performs connected component analysis on all adjacent candidate pixels, using the 8-neighborhood connectivity criterion to determine whether adjacent candidate pixels belong to the same connected region. After traversing all candidate pixels, multiple independent connected regions are obtained. The device calculates the area of the bounding rectangle of each connected region, sets a preset area threshold, and eliminates connected regions with bounding rectangle areas smaller than the preset area threshold. The remaining connected regions are the glitch candidate regions. Through connected component analysis and area filtering, falsely detected small noise regions can be eliminated, improving the accuracy of glitch candidate region extraction.

[0064] refer to Figure 5 A structured light point cloud acquisition device projects periodic structured light stripes onto the surface of motorcycle parts. These stripes can be sinusoidal. A structured light camera acquires an image of the stripes, modulated onto the surface of the parts and including distorted structured light stripes. The acquired stripe image is then transmitted to an image processing device. The image processing device processes the stripe image using a phase decoding algorithm. First, it extracts the wrapping phase value from the stripe image. The value range of the wrapping phase value is... Then, the multi-frequency heterodyne method is used to perform phase unwrapping processing on the wrapped phase values to eliminate the phase ambiguity and obtain the absolute phase map.

[0065] in, For pixels in a striped image The absolute phase value at that point; For pixels The phase value of the package at that location; For pixels The phase expansion order at a given point is used to characterize the number of periods of the wrapped phase.

[0066] The image processing device calculates 3D point cloud data of the motorcycle component surface based on the absolute phase map and the pre-calibrated intrinsic and extrinsic parameter matrices of the structured light point cloud acquisition device. The intrinsic and extrinsic parameter matrices include the intrinsic parameter matrix and distortion coefficients of the structured light camera, as well as the extrinsic parameter matrix between the camera and the projector. The intrinsic parameter matrix characterizes the camera's internal optical parameters, while the extrinsic parameter matrix characterizes the spatial relationship between the camera and the projector. Based on the absolute phase map and the intrinsic and extrinsic parameter matrices, the 3D coordinates of each spatial point on the component surface are calculated using the principle of triangulation. The 3D coordinates of all spatial points constitute the 3D point cloud data.

[0067] The image processing device pre-calibrates the visible light image pixel coordinate system and the three-dimensional point cloud spatial coordinate system, obtaining a transformation matrix between the two coordinate systems. Based on the transformation matrix, the two-dimensional pixel coordinates of the burr candidate region in the visible light image are mapped to the spatial coordinate system corresponding to the three-dimensional point cloud data. All point clouds within the spatial region corresponding to the two-dimensional pixel coordinates are extracted to form a local point cloud set. The local point cloud set contains all three-dimensional spatial point information of the component surface corresponding to the burr candidate region, reflecting the true spatial morphology of the burr region.

[0068] The image processing device performs planar fitting on the local point cloud to obtain a reference plane. This reference plane is the fitted plane of the normally machined surface of the component corresponding to the burr candidate region, and is used as the reference for calculating the burr height. The image processing device calculates the distance values from each spatial point in the local point cloud to the reference plane. These distance values characterize the height of the spatial point relative to the reference plane. The distance values from the point cloud of the normally machined surface to the reference plane are within a small range, while the distance values from the point cloud of the burr region to the reference plane are significantly greater than those of the point cloud of the normally machined surface. The image processing device sets a preset height threshold and uses the set of point cloud coordinates with distance values greater than the preset height threshold as the three-dimensional spatial coordinates corresponding to the burr candidate region.

[0069] in, Let be the covariance matrix of the local point cloud set; This represents the number of points contained within a local point cloud set. For local point cloud concentration Spatial coordinates of a point cloud; Let be the coordinates of the geometric center point of the local point cloud, and be the average of the coordinates of all point clouds.

[0070] The image processing device performs eigenvalue decomposition on the covariance matrix, obtaining three eigenvalues and their corresponding three eigenvectors. The three eigenvalues are sorted in ascending order of value, and the eigenvector corresponding to the smallest eigenvalue is the normal vector direction of the fitting plane. The eigenvector corresponding to the smallest eigenvalue is selected as the normal vector of the reference plane. The image processing device uses the coordinates of the geometric center point of the local point cloud as a spatial point on the reference plane, and constructs the plane equation of the reference plane based on the normal vector of the reference plane and the coordinates of this spatial point.

[0071] in, For local point cloud concentration The distance values from each point cloud to the reference plane; The unit normal vector of the reference plane; These are the coordinates of a spatial point on the reference plane, i.e., the coordinates of the geometric center point of a local point cluster; For the first Spatial coordinates of a point cloud.

[0072] Furthermore, during the planar fitting process of the local point cloud set, the image processing device sorts the calculated distance values from all point clouds to the reference plane according to their numerical values to obtain a height sequence. Extreme points at both ends of the height sequence are removed, and the preset height threshold is recalculated based on the height sequence after removing extreme points. Specifically, the mean and standard deviation of the height sequence after removing extreme points are calculated, and a new height threshold is calculated based on the mean and standard deviation. This new height threshold is used as the updated preset height threshold for subsequent filtering of burr point clouds. Through the removal of extreme points and the dynamic updating of the height threshold, the interference of noise points in the local point cloud set on the height threshold calculation can be eliminated, improving the accuracy of burr 3D spatial coordinate extraction.

[0073] refer to Figure 6 The robotic arm control system receives the three-dimensional spatial coordinates of the burr candidate region output by the image processing device. Based on the three-dimensional spatial coordinates, it constructs the spatial contour of the burr region, extracts the boundary points of the spatial contour, and plans the removal trajectory of the six-axis robotic arm end effector in Cartesian space based on the boundary points. During the trajectory planning process, the robotic arm control system uses a fifth-order polynomial interpolation algorithm to interpolate the start point, end point, and intermediate path points of the removal trajectory, generating a continuous and smooth trajectory curve to ensure that the velocity and acceleration of the robotic arm end effector are continuous and without abrupt changes during the movement.

[0074] in, For the normalized time of the robotic arm's end effector Cartesian coordinates at the location; To remove the starting coordinates of the trajectory; To remove the endpoint coordinates of the trajectory; This is a normalized time parameter, with a value ranging from 0 to 1; It is a fifth-order polynomial interpolation function, which ensures that the position, velocity and acceleration of the trajectory are continuous at the start and end points.

[0075] Based on the generated removal trajectory, the robotic arm control system calculates the rotation angles of each joint of the six-axis robotic arm, generates control commands for the joint space, and sends these commands to the servo drivers of the six-axis robotic arm. This controls the end effector of the six-axis robotic arm to move along the removal trajectory, completing the deburring operation on the surface of the motorcycle parts. The feed depth of the removal trajectory is matched to the height of the burr, ensuring that the end effector can completely remove the burr without damaging the normally machined surface of the part.

[0076] Table 4. Correspondence Table of Connectivity Parameters for Candidate Regions of Burrs In Table 4, the response intensity threshold is the critical value for filtering burr confidence; the connected region area threshold level is positively correlated with the minimum area of the connected region, with a higher level resulting in a larger minimum area threshold; the burr confidence filtering interval is the range of confidence values for candidate pixels; and the candidate region retention ratio is the ratio of the number of connected regions retained after filtering to the initial number of connected regions. This table is used in the burr candidate region segmentation process to match corresponding filtering parameter combinations based on the processing precision of motorcycle parts and the typical size of burrs, eliminating falsely detected small connected regions and low-confidence candidate regions, thus ensuring the accuracy of burr candidate region extraction.

[0077] In this embodiment, a classification regression network was used to accurately classify burr pixels, and connected component analysis and area screening were used to accurately extract burr candidate regions. High-precision three-dimensional point cloud data of the motorcycle parts surface was obtained through structured light phase decoding and triangulation. The true three-dimensional spatial coordinates of the burr region were obtained through plane fitting and distance calculation of the local point cloud set. The three-dimensional spatial coordinates contain the physical information of the burr's height and depth. The robotic arm removal trajectory planned based on the three-dimensional spatial coordinates perfectly matches the actual spatial shape of the burr, eliminating the spatial positioning deviation caused by direct mapping of two-dimensional coordinates and ensuring the accuracy of the burr removal operation.

Claims

1. A motorcycle parts deburring system based on industrial vision, characterized in that, The system includes a visual acquisition device, an image processing device, a structured light point cloud acquisition device, and an execution mechanism. The visual acquisition device includes a multi-angle light source array, a structured light camera, and a multispectral camera. The image processing device is connected to the multi-angle light source array, the structured light camera, and the multispectral camera, respectively. The execution mechanism is connected to the image processing device. The image processing device calculates a dynamic illumination angle matrix based on the surface normal vector of the motorcycle parts. The multi-angle light source array is controlled to generate a non-reflective area according to the dynamic illumination angle matrix. The image processing device inputs the visible light image acquired by the multispectral camera into a pre-trained surface roughness detection network to obtain roughness features, and inputs the near-infrared image into an edge gradient detection network to obtain edge gradient features. The roughness features and the edge gradient features are cross-fused in a feature pyramid network using a multispectral attention mechanism to generate a fused feature map. The spur candidate region is segmented according to the response intensity threshold in the fused feature map. The image processing device calculates the three-dimensional spatial coordinates of the spur candidate region by combining the point cloud data output by the structured light point cloud acquisition device. The execution mechanism includes a robotic arm control system that plans the removal trajectory of a six-axis robotic arm according to the three-dimensional spatial coordinates.

2. The motorcycle parts burr removal system based on industrial vision according to claim 1, characterized in that, The image processing device acquires the initial point cloud data output by the structured light camera, performs voxel downsampling and statistical filtering on the initial point cloud data to acquire target point cloud data, calculates the normal vector of each spatial point in the target point cloud data, and maps the normal vector to the spherical coordinate system of the multi-angle light source array. Based on the distribution of the normal vector in the spherical coordinate system, calculate the set of illumination angles such that the angle between the incident light rays of each light source in the multi-angle light source array and the normal vector is within a preset non-specular reflection range. Then, spatially map the set of illumination angles according to the physical arrangement of the multi-angle light source array to generate the dynamic illumination angle matrix.

3. The motorcycle parts deburring system based on industrial vision according to claim 1, characterized in that, The image processing device performs registration processing on the visible light image and the near-infrared image to obtain a registered image pair, and inputs the visible light image in the registered image pair into the surface roughness detection network. Multi-scale texture features are extracted through multiple convolutional layers in the surface roughness detection network, and the near-infrared image in the registered image pair is input into the edge gradient detection network. The Sobel operator layer in the edge gradient detection network extracts horizontal and vertical gradient features, concatenates the horizontal and vertical gradient features to obtain initial edge gradient features, and performs non-maximum suppression processing on the initial edge gradient features to obtain the final edge gradient features.

4. The motorcycle parts burr removal system based on industrial vision according to claim 3, characterized in that, The image processing device inputs the roughness features and the edge gradient features into the bottom-up path of the feature pyramid network and performs multiple downsampling operations to obtain multiple roughness feature maps and edge gradient feature maps at different scales. In the top-down path of the feature pyramid network, the first attention weight of each channel in the roughness feature map and the second attention weight of each channel in the edge gradient feature map are calculated using the multispectral attention mechanism. The first attention weight and the second attention weight are then multiplied by a matrix to obtain the cross-spectral joint weight. The roughness feature map and the edge gradient feature map at the corresponding scale are weighted and fused using the cross-spectral joint weight to generate the fused feature map.

5. The motorcycle parts burr removal system based on industrial vision according to claim 4, characterized in that, The image processing device upsamples the fused feature map to obtain a target fused feature map with the same size as the visible light image, and inputs the target fused feature map into a preset classification and regression network; The spur confidence of each pixel in the target fusion feature map is output by the fully connected layer in the classification and regression network. Pixels with spur confidence greater than the response intensity threshold are marked as candidate pixels. Connectivity analysis is performed on adjacent candidate pixels to obtain multiple connected regions. The area of the bounding rectangle of each connected region is calculated. Connected regions with bounding rectangle areas less than a preset area threshold are removed to obtain the spur candidate regions.

6. The motorcycle parts deburring system based on industrial vision according to claim 5, characterized in that, The structured light point cloud acquisition device projects structured light stripes onto the surface of the motorcycle parts, acquires a stripe image containing distorted structured light stripes, performs phase unrolling on the stripe image according to a phase decoding algorithm to obtain an absolute phase map, and calculates three-dimensional point cloud data based on the absolute phase map and the intrinsic and extrinsic parameter matrix of the structured light point cloud acquisition device. The image processing device maps the two-dimensional pixel coordinates of the burr candidate region in the visible light image to the three-dimensional point cloud data, extracts the local point cloud set corresponding to the two-dimensional pixel coordinates, and performs plane fitting on the local point cloud set to obtain the reference plane. Calculate the distance from each spatial point in the local point cloud set to the reference plane, and use the set of point cloud coordinates whose distance values are greater than a preset height threshold as the three-dimensional spatial coordinates.

7. The motorcycle parts burr removal system based on industrial vision according to claim 2, characterized in that, In the process of calculating the dynamic illumination angle matrix, the image processing device divides the illumination into multiple illumination partitions according to the physical arrangement of the multi-angle light source array, and calculates the average normal vector for the local normal vector set in each illumination partition. When the angle between the average normal vector and the initial light source incident direction corresponding to the illumination zone falls into the preset non-specular reflection range, the initial light source incident direction remains unchanged; when the angle between the average normal vector and the initial light source incident direction does not fall into the preset non-specular reflection range, the average normal vector is used as the axis. A rotational offset angle is generated within the preset non-mirror reflection range. The initial light source incident direction is rotated and transformed according to the rotational offset angle to obtain an updated incident direction. The updated incident direction is then written into the dynamic illumination angle matrix.

8. The motorcycle parts burr removal system based on industrial vision according to claim 3, characterized in that, When acquiring the registered image pair, the image processing device extracts a first set of corner points from the visible light image and a second set of corner points from the near-infrared image, calculates the Hamming distance between each first corner point in the first set and each second corner point in the second set, takes the first corner point and the second corner point with the smallest Hamming distance as the matching corner point pair, and uses a random sampling consensus algorithm to remove incorrect matching corner point pairs to obtain a set of correct matching corner point pairs. The affine transformation matrix is calculated based on the set of correctly matched corner points. The affine transformation matrix is then used to perform a spatial geometric transformation on the near-infrared image, so that the transformed near-infrared image and the visible light image are aligned in spatial dimension to obtain the registered image pair.

9. The motorcycle parts deburring system based on industrial vision according to claim 4, characterized in that, When calculating the cross-spectral joint weight, the image processing device performs global average pooling on the first attention weight to obtain a first global feature vector, and performs global average pooling on the second attention weight to obtain a second global feature vector. The first global feature vector and the second global feature vector are concatenated along the channel dimension to obtain a concatenated feature vector, which is then input into the multilayer perceptron model. The transspectral attention coefficients are output through the fully connected layer and the Sigmoid activation function in the multilayer perceptron model. The transspectral attention coefficients are then multiplied element-wise with the first attention weight and the second attention weight to obtain the transspectral joint weight.

10. The motorcycle parts deburring system based on industrial vision according to claim 6, characterized in that, When performing planar fitting on the local point cloud, the image processing device uses the least squares method to construct the initial covariance matrix of the local point cloud, and performs eigenvalue decomposition on the initial covariance matrix to obtain three eigenvalues and three corresponding eigenvectors. The eigenvector corresponding to the smallest eigenvalue among the three eigenvalues is selected as the normal vector of the reference plane, and the coordinates of the geometric center point of the local point cloud are taken as the spatial point on the reference plane. The plane equation of the reference plane is constructed based on the normal vector and the coordinates of the spatial points. The coordinates of the spatial points in the local point cloud are substituted into the plane equation to calculate the distance value. The distance values are sorted according to their numerical values to obtain a height sequence. The extreme points at the beginning and end of the height sequence are removed, and the preset height threshold is recalculated.