A lightweight target detection method and system for a laser tracking system

By introducing a reparameterized multi-scale convolution module and differential convolution, the problems of small target feature loss and complex environmental interference in laser tracking systems are solved, achieving efficient, real-time, and lightweight target detection.

CN122244584APending Publication Date: 2026-06-19UNIV OF SCI & TECH BEIJING

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
UNIV OF SCI & TECH BEIJING
Filing Date
2026-02-13
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

In existing technologies, laser tracking systems suffer from problems such as loss of small target features, interference from complex environments, and conflicts between real-time performance and hardware resources, making it difficult to meet industrial metrological standards for small target detection accuracy and real-time performance.

Method used

By introducing a reparameterized multi-scale convolution module and a multi-branch structure, combined with shallow high-resolution feature feedback and a learnable weighted concatenation mechanism, differential convolution is used to suppress background specular interference and improve detection robustness.

Benefits of technology

It improves the recall rate and positioning accuracy for distant and tiny SMR targets, reduces the false detection rate, and meets the real-time performance and hardware resource limitations of laser tracking systems.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122244584A_ABST
    Figure CN122244584A_ABST
Patent Text Reader

Abstract

This invention provides a lightweight target detection method and system for laser tracking systems, belonging to the fields of precision measurement and computer vision technology. Firstly, by introducing a reparameterized multi-scale convolution module, the invention enriches the scale diversity of feature extraction during training using a multi-branch structure, and merges it into a single-path convolution during inference to reduce computational burden. This reduces the number of model parameters while maintaining feature extraction capabilities, improving the model's operating efficiency on embedded devices. Secondly, it introduces shallow high-resolution feature feedback and combines it with a learnable weighted concatenation mechanism to effectively alleviate the problem of small target feature vanishing in deep networks, improving the recall rate and localization accuracy for distant, small SMR targets. Finally, it introduces differential convolution based on gradient prior initialization, utilizing the unique geometric edges and concentric gradient characteristics of SMR targets to effectively suppress asymmetric background specular interference in industrial environments, reducing false detection rate and improving detection robustness.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of precision measurement and computer vision technology, and in particular to a lightweight target detection method and system for laser tracking systems. Background Technology

[0002] Laser trackers, as high-precision instruments for measuring large-scale spatial geometry, are widely used in aerospace, automotive manufacturing, and heavy machinery installation. In automated measurement processes, laser tracking systems require a vision-guided module to quickly and accurately lock onto and track cooperative targets (i.e., SMR).

[0003] However, existing visual inspection technologies for SMR targets face several challenges in practical industrial applications, including: 1. Small target feature loss problem: SMR targets are usually extremely small in wide field-of-view cameras. After multiple downsampling operations, the spatial information of small targets contained in the deep feature maps of traditional deep convolutional neural networks will be severely attenuated or even disappeared, resulting in a high false negative rate for distant targets.

[0004] 2. Complex Environmental Interference: Industrial environments are complex, containing numerous reflective objects such as metal processing parts and machine supports. The specular highlights produced by these objects are optically highly similar to SMR targets. Existing detection algorithms often rely on simple brightness or intensity thresholds, making it difficult to distinguish between real targets and background artifacts, and easily leading to false detections.

[0005] 3. The conflict between real-time performance and hardware resources: The vision guidance module of a laser tracking system typically runs on resource-constrained embedded edge devices. Traditional high-precision two-stage detection algorithms are computationally intensive and cannot meet real-time requirements; while existing general-purpose lightweight algorithms, although fast, often fail to meet the stringent standards of industrial metrology in scenarios with small targets and strong interference.

[0006] Therefore, there is an urgent need for a lightweight target detection method that can maintain extremely low computational cost while effectively solving the problems of feature loss in small targets and specular interference. Summary of the Invention

[0007] To address the problems in the existing technology, this invention provides a lightweight target detection method and system for laser tracking systems. Firstly, by introducing a reparameterized multi-scale convolution module, the invention enriches the scale diversity of feature extraction during training using a multi-branch structure, and merges it into a single-path convolution during inference to reduce computational burden. This reduces the number of model parameters while maintaining feature extraction capabilities, improving the model's operating efficiency on embedded devices. Secondly, by introducing shallow high-resolution feature feedback and combining it with a learnable weighted concatenation mechanism, the invention effectively alleviates the problem of small target feature vanishing in deep networks, improving recall and localization accuracy for distant, small SMR targets. Finally, by introducing differential convolution based on gradient prior initialization, the invention effectively suppresses asymmetric background specular interference in industrial environments by utilizing the unique geometric edges and concentric gradient characteristics of SMR targets, reducing false detection rate and improving detection robustness. To achieve the above objectives, the technical solution is as follows: On one hand, the present invention provides a lightweight target detection method for a laser tracking system, the method comprising: S1. Acquire the original image in the laser tracking field of view through an industrial vision sensor, and perform adaptive preprocessing to obtain the standard input tensor dataset of the image; S2. Based on the standard input tensor dataset of the image, obtain a multi-scale feature map dataset through the SMR object detection model; S3. Based on the multi-scale feature map dataset, cross-scale fusion is performed through a weighted bidirectional feature pyramid network to obtain the output fused features; S4. Based on the output fusion feature, the target is parsed through the detail enhancement detection module to obtain the parsed output fusion feature; S5. Based on the fused features of the parsed output, train the initialized target detection model to obtain the trained target detection model. S6. Input the real-time laser-acquired image from the industrial vision sensor into the trained target detection model, and obtain the target detection result through confidence analysis.

[0008] Optionally, in S1, the original image in the laser tracking field of view is acquired through an industrial vision sensor and adaptively preprocessed to obtain a standard input tensor dataset of the image, including: S11. Acquire the original image in the laser tracking field of view using an industrial vision sensor to obtain the original field of view image; S12. Based on the original field of view image, adaptive scaling using letterbox based on step size alignment is used to obtain the adapted original field of view image. S13. Based on the adapted original field of view image, the standard input tensor dataset of the image is obtained through tensor normalization and dimension rearrangement.

[0009] Optionally, in S12, an adapted original field-of-view image is obtained by adaptive scaling using letterboxes aligned to step size based on the original field-of-view image, including: S121. Based on the original field of view image, calculate the global scaling factor and scale the original field of view image to obtain the field of view image after the first processing. S122. Based on the field of view image after the first processing, the step-size alignment fill amount is obtained by calculation. S123. Align the fill amount according to the step size, distribute it evenly to both sides and the top and bottom ends of the first processed field of view image, and fill the boundary areas to obtain the adaptive original field of view image.

[0010] Optionally, in S2, based on the standard input tensor dataset of the image, a multi-scale feature map dataset is obtained through an SMR object detection model, including: S21. Based on the standard input tensor dataset of the image, the tensor dataset after the first processing is obtained through the convolution module and the C2f module. S22. Based on the tensor dataset after the first processing, process it through the C2f-RepNMSC module to obtain the tensor dataset after the second processing; S23. Based on the tensor dataset after the second processing, output feature maps at four different scales to obtain a multi-scale feature map dataset.

[0011] Optionally, in S3, based on the multi-scale feature map dataset, cross-scale fusion is performed through a weighted bidirectional feature pyramid network to obtain the output fused features, including: S31. Based on the multi-scale feature map dataset, the feature map dataset after the first processing is obtained by injecting high-resolution features at layer P2. S32. Based on the feature map dataset after the first processing, a bidirectional feature flow path is constructed to obtain the feature map dataset after the second processing. S33. Based on the feature map dataset after the second processing, the output fused features are obtained through a learnable weighted feature fusion method.

[0012] Optionally, in S4, based on the output fusion features, the target is parsed through the detail enhancement detection module to obtain the parsed output fusion features, including: S41. Based on the output fusion feature, the fusion feature after the first processing is obtained by parallel dual-path feature decoupling. S42. Based on the fusion features after the first processing, the fusion features after the second processing are obtained by parsing using the four-dimensional difference convolution group operator. S43. Based on the fusion features after the second processing, the fusion features after the third processing are extracted using the gradient operator; S44. Based on the fusion features after the third processing, the parsed output fusion features are obtained through dual-head decoupling output.

[0013] Optionally, the four-dimensional difference convolution group includes: a central difference convolution group, an angular difference convolution group, a horizontal difference convolution group, and a vertical difference convolution group.

[0014] Optionally, in S5, the initialized target detection model is trained based on the parsed output fusion features to obtain a trained target detection model, including: S51. Based on the fused features of the parsed output, the initial target detection model is trained through a dual-head supervision mechanism to obtain the target detection model of the first stage. S52. Based on the target detection model of the first stage, the target detection model of the second stage is obtained through consistency matching metric. S53. Based on the target detection model of the second stage, the target detection model after training is obtained by iterative calculation using a composite loss function.

[0015] Optionally, in S6, the image acquired in real-time by the industrial vision sensor using laser is input into the trained target detection model. Through confidence analysis, the target detection result is obtained, including: S61. Input the real-time laser-acquired images collected by the industrial vision sensor into the trained target detection model, and obtain a one-to-one feature map dataset through structural reparameterization and branch pruning. S62. Based on the one-to-one feature map dataset, obtain the corresponding class confidence through feature mapping; S63. Based on the corresponding category confidence level, the target detection result is obtained by comparing the confidence level threshold.

[0016] On the other hand, the present invention provides a lightweight target detection system for a laser tracking system, which is applied to any of the lightweight target detection methods for laser tracking systems, the system comprising: The image acquisition module is used to acquire raw images in the laser tracking field of view through industrial vision sensors and perform adaptive preprocessing to obtain the standard input tensor dataset of the images. The multi-scale feature acquisition module is used to obtain a multi-scale feature map dataset based on the standard input tensor dataset of the image and through the SMR object detection model; The fusion feature acquisition module is used to perform cross-scale fusion based on the multi-scale feature map dataset through a weighted bidirectional feature pyramid network to obtain the output fusion features. The feature parsing module is used to perform target parsing through the detail enhancement detection module based on the output fused features, and obtain the parsed output fused features. The model training module is used to train the initialized target detection model based on the parsed output fusion features to obtain the trained target detection model. The results output module is used to input the real-time laser-acquired images collected by the industrial vision sensor into the trained target detection model, and obtain the target detection results through confidence analysis.

[0017] Compared with the prior art, the technical solution of the present invention has at least the following beneficial effects: The above scheme achieves three main improvements. First, by introducing a reparameterized multi-scale convolution module, it enriches the scale diversity of feature extraction during training using a multi-branch structure, and merges it into a single-path convolution during inference to reduce computational burden. This reduces the number of model parameters while maintaining feature extraction capabilities, thus improving the model's efficiency on embedded devices. Second, by introducing shallow high-resolution feature feedback and combining it with a learnable weighted concatenation mechanism, it effectively alleviates the problem of small target feature vanishing in deep networks, improving the recall rate and localization accuracy for distant and tiny SMR targets. Third, by introducing differential convolution based on gradient prior initialization, it effectively suppresses asymmetric background specular interference in industrial environments by utilizing the unique geometric edges and concentric gradient characteristics of SMR targets, reducing false detection rates and improving detection robustness. Attached Figure Description

[0018] To more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0019] Figure 1 This is a flowchart of an embodiment of the lightweight target detection method for a laser tracking system according to the present invention; Figure 2 This is a flowchart illustrating the standard input tensor dataset for obtaining an image in an embodiment of the lightweight target detection method for a laser tracking system according to the present invention; Figure 3 This is a flowchart illustrating the process of obtaining the adaptive original field-of-view image in an embodiment of the lightweight target detection method for a laser tracking system according to the present invention. Figure 4This is a flowchart illustrating the process of obtaining a multi-scale feature map dataset in an embodiment of the lightweight target detection method for a laser tracking system according to the present invention; Figure 5 This is a flowchart illustrating the output fusion features obtained in an embodiment of the lightweight target detection method for a laser tracking system according to the present invention; Figure 6 This is a flowchart of the parsed output fusion features obtained in an embodiment of the lightweight target detection method for a laser tracking system of the present invention; Figure 7 This is a flowchart illustrating the trained target detection model obtained in an embodiment of the lightweight target detection method for a laser tracking system according to the present invention. Figure 8 This is a flowchart illustrating the target detection result obtained in an embodiment of the lightweight target detection method for a laser tracking system according to the present invention; Figure 9 This is a system block diagram of an embodiment of the lightweight target detection system for a laser tracking system according to the present invention. Detailed Implementation

[0020] The technical solution of the present invention will now be described with reference to the accompanying drawings.

[0021] In embodiments of the present invention, words such as "exemplarily," "for example," etc., are used to indicate that something is an example, illustration, or description. Any embodiment or design described as "exemplary" in the present invention should not be construed as being more preferred or advantageous than other embodiments or designs. Specifically, the use of the word "exemplary" is intended to present the concept in a concrete manner. Furthermore, in embodiments of the present invention, the meaning expressed by "and / or" can be both, or either one.

[0022] To make the technical problems, technical solutions and advantages of the present invention clearer, a detailed description will be given below in conjunction with the accompanying drawings and specific embodiments.

[0023] This embodiment is based on an improved YOLOv10 deep neural network architecture (RBD-YOLOv10n) and has been specifically optimized for the imaging characteristics of small reflective targets (SMR).

[0024] like Figure 1 The flowchart shown is an embodiment of the lightweight target detection method for a laser tracking system according to the present invention. The present invention provides a lightweight target detection method for a laser tracking system, which is implemented by a lightweight target detection system for a laser tracking system. The method includes: S1. Acquire the original image in the laser tracking field of view through an industrial vision sensor, and perform adaptive preprocessing to obtain the standard input tensor dataset of the image; Specifically, such as Figure 2 The flowchart shown in this embodiment of the lightweight target detection method for a laser tracking system of the present invention obtains the standard input tensor dataset of the image. In step S1, the original image in the laser tracking field of view is acquired by an industrial vision sensor and adaptive preprocessing is performed to obtain the standard input tensor dataset of the image, including: S11. Acquire the original image in the laser tracking field of view using an industrial vision sensor to obtain the original field of view image; Furthermore, the system employs a global shutter industrial vision sensor integrated with the laser tracking system as the image acquisition unit to obtain the original field-of-view image. This sensor configuration effectively suppresses the rolling shutter effect produced by SMR targets at high speeds, ensuring image clarity. The acquired image resolution is... (In this embodiment, it is) (pixels), data format is set to 8-bit RGB.

[0025] S12. Based on the original field of view image, adaptive scaling using letterbox based on step size alignment is used to obtain the adapted original field of view image. Furthermore, such as Figure 3 The flowchart shown in this embodiment of the lightweight target detection method for a laser tracking system of the present invention obtains an adaptive original field-of-view image. In step S12, based on the original field-of-view image, an adaptive original field-of-view image is obtained through adaptive scaling using letterboxes aligned to step size. This includes: S121. Based on the original field of view image, calculate the global scaling factor and scale the original field of view image to obtain the field of view image after the first processing. Furthermore, determine the global scaling factor. The calculation method selects the smaller value between the width scaling ratio and the height scaling ratio: (1) Based on this factor The original image is scaled to obtain the effective size. This strategy ensures that the SMR target maintains its original aspect ratio during scaling, avoiding feature deformation caused by non-uniform stretching.

[0026] S122. Based on the field of view image after the first processing, the step-size alignment fill amount is obtained by calculation. Furthermore, for the backbone network, 5 downsampling operations (maximum step size) were performed. Due to the structural characteristics of the network, a modulo 32 constraint is introduced when calculating the fill amount. Specifically, a modulo operation is performed on the difference between the network input size and the effective size to obtain the width fill amount. and height fill volume : (2) (3) This operation ensures that the size of the padded image is strictly aligned with the downsampling step size, preventing the loss of edge information caused by non-integer sizes of feature maps in deep networks.

[0027] S123. Align the fill amount according to the step size, distribute it evenly to both sides and the top and bottom ends of the first processed field of view image, and fill the boundary areas to obtain the adaptive original field of view image.

[0028] Furthermore, the calculated filling amount and The values ​​are evenly distributed to both sides and the top and bottom of the image, and the boundary regions are filled with a grayscale value of 114. After this operation, the SMR target is placed in the center region of the input tensor, resulting in a final image of size [size missing]. Input image .

[0029] S13. Based on the adapted original field of view image, the standard input tensor dataset of the image is obtained through tensor normalization and dimension rearrangement.

[0030] Furthermore, for the input image Perform numerical standardization and dimensionality transformation operations. First, transform the pixel intensity values ​​from integer space. Linear mapping to floating-point space To eliminate differences in illumination values ​​and accelerate network convergence, the image data dimensions were then rearranged from "height-width-channel" (HWC) to the standard "channel-height-width" (CHW) format used in deep learning frameworks, and a batch dimension was added to construct a standard input tensor dataset for images. .

[0031] S2. Based on the standard input tensor dataset of the image, obtain a multi-scale feature map dataset through the SMR object detection model; Specifically, the standard input tensor dataset of the image Input the SMR object detection model. The overall architecture of this model consists of three cascaded parts: a backbone network, a neck network, and a head network. The backbone network is responsible for extracting multi-level features from the image, the neck network is responsible for cross-scale feature fusion, and the head network is responsible for the final object prediction. Figure 4 The flowchart shown in this embodiment of the lightweight target detection method for a laser tracking system of the present invention obtains a multi-scale feature map dataset. In step S2, based on the standard input tensor dataset of the image, the multi-scale feature map dataset is obtained through an SMR target detection model, including: S21. Based on the standard input tensor dataset of the image, the tensor dataset after the first processing is obtained through the convolution module and the C2f module. Furthermore, the standard input tensor dataset for images The feature maps are processed sequentially through a series of Conv convolutional modules and C2f modules. The Conv modules are configured with a stride of 2, performing downsampling to reduce the spatial resolution of the feature maps and increase the number of channels. The C2f modules perform feature extraction and processing, with the backbone network outputting feature maps of different scales at layers 4, 6, 8, and 10. To address the issue of feature loss in deep networks for small SMR targets, this embodiment reconstructs the standard C2f module at key layers (layers 6 and 8) of the backbone network, designing and deploying a C2f-RepNMSC module.

[0032] S22. Based on the tensor dataset after the first processing, process it through the C2f-RepNMSC module to obtain the tensor dataset after the second processing; Furthermore, the C2f-RepNMSC module integrates the cross-stage partial network structure of CSPNet with the reparameterization idea of ​​RepVGG. Its internal processing is as follows: First, the input feature map... Through a Convolutional layers perform channel transformations and output features. Subsequently, It is divided into two parts along the channel dimension: the basic gradient flow branch. and stacked computation branches Basic gradient flow branch By directly bypassing intermediate computational units to participate in subsequent feature stitching, this cross-stage connection preserves the original feature information and constructs a gradient direct connection path from shallow to deep layers.

[0033] Stacked computation branches Enter by A stacked structure consisting of RepNMSC modules connected in series. Features are sequentially passed through... Each RepNMSC module (i.e., a BottleNeck structure) is cascaded. During model training, each RepNMSC module constructs three parallel computational branches to extract input features. :Include The main branch of the convolutional and batch normalization (BN) layers outputs... ;Include The auxiliary branches of the convolutional and BN layers output as: ; and an identity mapping branch containing only the BN layer (existing when the input and output channels are identical), with the output being Output features during the training phase The sum of the three: (4) This multi-branch parallel design enriches the solution space of feature extraction, enabling the model to capture diverse features ranging from local details to global contours.

[0034] During the model inference phase, to reduce computational load, the RepNMSC module utilizes structural reparameterization technology to convert the aforementioned multi-branch structure into a single-path equivalent. Convolution. First, the parameters of the BN layer in each branch are... Integrate into the corresponding convolution weights and bias In the middle, for any branch The weights after fusion and bias The calculation is as follows: (5) Subsequently, spatial alignment was performed on convolutional kernels of different sizes: convolution kernel Padding with zeros around the edges The matrix transforms the identity mapping branch into an identity matrix with weights. of Convolution kernel. Finally, the parameters of all branches are summed to obtain the final inference weights. and bias : (6) (7) Only single-path convolution is performed during inference. This reduces inference latency while maintaining feature extraction capabilities.

[0035] go through After the secondary cascaded RepNMSC operation, the output of the stacked computation branch and the basic gradient flow branch Perform concatenation (Concat), and finally pass through a Convolutional layers fuse cross-channel information, outputting the final features for that layer. This is achieved by adjusting the number of stacked layers. This structure expands the effective receptive field of the feature map and enhances the model's ability to represent features of small light spot targets.

[0036] S23. Based on the tensor dataset after the second processing, output feature maps at four different scales to obtain a multi-scale feature map dataset.

[0037] Furthermore, the multi-scale feature map dataset includes feature maps at four different scales, namely... ( ), ( ), ( )and ( ).

[0038] S3. Based on the multi-scale feature map dataset, cross-scale fusion is performed through a weighted bidirectional feature pyramid network to obtain the output fused features; Specifically, such as Figure 5 The flowchart shown in this embodiment of the lightweight target detection method for a laser tracking system of the present invention obtains the output fusion features. In step S3, based on the multi-scale feature map dataset, cross-scale fusion is performed through a weighted bidirectional feature pyramid network to obtain the output fusion features, including: S31. Based on the multi-scale feature map dataset, the feature map dataset after the first processing is obtained by injecting high-resolution features at layer P2. Furthermore, unlike conventional YOLO series models that only utilize three layers of features (P3, P4, and P5), this embodiment explicitly constructs a P2 layer feature injection channel at the input end of the neck network.

[0039] Structural Connections: P2 feature map output from layer 4 of the backbone network (resolution) Without additional downsampling compression, it can be directly connected to the neck network via a lateral connection.

[0040] Information flow: The P2 feature map is first concatenated and fused with the upsampled P3 feature map in the top-down path, and then participates in the aggregation calculation of the bottom-up path.

[0041] This structural design preserves the rich high-resolution geometric edges and texture details of the SMR target in the shallow network, establishes a direct information transmission channel from the shallow to the deep layers, and provides a precise spatial positioning reference for the subsequent detection head.

[0042] S32. Based on the feature map dataset after the first processing, a bidirectional feature flow path is constructed to obtain the feature map dataset after the second processing. Furthermore, the neck network constructs a complex bidirectional feature transfer path to enhance the interaction of multi-scale information: Top-down approach: Deep semantic features (such as P5) are gradually passed to shallow features (P4, P3, P2) through nearest neighbor interpolation upsampling, injecting high-level semantic information into shallow features and enhancing the model's ability to classify targets.

[0043] Bottom-up path: Shallow detailed features (such as P2) are accessed via a step size of 2. Convolution performs downsampling operations, gradually passing the data to deeper layers (P3, P4, P5), supplementing deep features with high-resolution spatial information and improving the localization accuracy of small targets in deep layers.

[0044] S33. Based on the feature map dataset after the second processing, the output fused features are obtained through a learnable weighted feature fusion method.

[0045] Furthermore, for any fusion node, assume the input feature set is... (e.g., features from the backbone network) and features from upsampling / downsampling paths The network assigns a learnable scalar weight to each input feature channel. .

[0046] To ensure numerical stability and avoid gradient explosion, a Fast Normalized Fusion strategy is used to normalize the weights. The weighted concatenation output features... The calculation formula is as follows: (8) in, Activation function guarantees weights , To prevent division by zero of small constants.

[0047] During the model backpropagation process, the network automatically updates its weights based on the loss feedback from the SMR object detection task. For the P2 layer feature input containing rich information about small targets, its corresponding weight value will automatically increase, so that the network adaptively assigns higher importance to small target features during the feature fusion stage and suppresses the interference of background noise features.

[0048] S4. Based on the output fusion feature, the target is parsed through the detail enhancement detection module to obtain the parsed output fusion feature; Specifically, such as Figure 6 The flowchart shown in this embodiment of the lightweight target detection method for a laser tracking system of the present invention illustrates the process of obtaining the parsed output fusion features. In step S4, based on these output fusion features, the target is parsed through a detail enhancement detection module to obtain the parsed output fusion features, including: S41. Based on the output fusion feature, the fusion feature after the first processing is obtained by parallel dual-path feature decoupling. Furthermore, the output fused features first enter the core component DEStem. DEStem employs a parallel dual-path decoupling design to simultaneously preserve semantic information and enhance edge details. Vanilla Path: A standard 3×3 convolution (labeled VC in Figure 4) is used to extract the target's general semantic features and background context information.

[0049] Gradient Boosting Path: An integrated differential convolutional group (labeled as DifferenceConv in Figure 4) is used to extract the geometric edges, light spot gradients, and rotational symmetry features of the target.

[0050] After the two features are calculated, they are fused by element-wise addition, and then output through the Act activation function.

[0051] S42. Based on the fusion features after the first processing, the fusion features after the second processing are obtained by parsing using the four-dimensional difference convolution group operator. Furthermore, the four-dimensional difference convolution group includes: a central difference convolution group, an angular difference convolution group, a horizontal difference convolution group, and a vertical difference convolution group.

[0052] Furthermore, in the gradient enhancement path, this implementation example constructs a differential convolution group containing four different types of operators to analyze the SMR target from four physical dimensions.

[0053] For any center pixel on the input feature map Define its Neighborhood is ,definition For the first in the neighborhood 1 pixel, These are the corresponding convolution weights.

[0054] (1) Central Difference Convolution (CDC)

[0055] To address the characteristic of SMR target imaging as high-brightness spots, CDC (Discrete Chromatography-Difference) is used to capture the contrast of pixel intensity relative to the center. The calculation formula is: (9) This operator can suppress background regions with uniform intensity and significantly enhance the features of a circular spot with a bright center.

[0056] (2) Angular Difference Convolution (ADC)

[0057] Taking advantage of the SMR target's physical structure being a spherical reflector and its rotation-invariant imaging properties, an ADC is used to capture the rate of change of pixel intensity with angle. The calculation formula is: (10) in, Indicates in Adjacent to each other in a clockwise direction within the neighborhood The next pixel. For an ideally circular SMR spot, the pixel difference between adjacent angles approaches zero, and the response is stable; however, for irregular metallic reflective noise, the response is drastic.

[0058] (3) Horizontal Difference Convolution (HDC)

[0059] The formula for calculating the horizontal gradient change at the edge of an SMR target is as follows: (11) in, express Pixels on opposite sides in the horizontal direction (e.g., the top left pixel corresponds to the top right pixel).

[0060] (4) Vertical Differential Convolution (VDC)

[0061] The formula for calculating the vertical gradient change at the edge of an SMR target is as follows: (12) in, express Pixels on opposite sides in the vertical direction.

[0062] S43. Based on the fusion features after the second processing, the fusion features after the third processing are extracted using the gradient operator; Furthermore, in order to give the network a clear physical meaning and accelerate convergence, and to avoid feature extraction chaos caused by random initialization, this embodiment uses four preset gradient operator matrices to weight the above four types of differential convolutions. Initialize to form a strong inductive bias: CDC initialization matrix: using the Laplacian operator to highlight the central singularity. (13) HDC initialization matrix: Using the Sobel-X operator to extract horizontal edges: (14) VDC initialization matrix: Using the Sobel-Y operator to extract vertical edges: (15) ADC initialization matrix: A diagonal gradient operator is used to supplement gradient information beyond the horizontal and vertical directions, enhancing the capture of tangential features at the edges of circular SMRs. (16) S44. Based on the fusion features after the third processing, the parsed output fusion features are obtained through dual-head decoupling output.

[0063] Furthermore, the feature map enhanced by DEHead is fed into the decoupled detection branch: Classification Branch (Cls Branch): Outputs N_class channels and is used to predict the confidence level of the target class.

[0064] Regression Branch (Reg Branch): Outputs 4×N_reg channels and is used to predict bounding box offsets.

[0065] Meanwhile, the head network integrates a distributed focus loss (DFL) module, which further improves the localization accuracy of blurred edges in SMR by transforming the bounding box regression problem into a probability distribution estimation problem.

[0066] S5. Based on the fused features of the parsed output, train the initialized target detection model to obtain the trained target detection model. Specifically, such as Figure 7 The flowchart shown in this embodiment of the lightweight target detection method for a laser tracking system of the present invention illustrates the process of obtaining a trained target detection model. In step S5, the initialized target detection model is trained based on the parsed output fusion features to obtain the trained target detection model, including: S51. Based on the fused features of the parsed output, the initial target detection model is trained through a dual-head supervision mechanism to obtain the target detection model of the first stage. Furthermore, the network header is designed with convolutional layers sharing parameters followed by two independent branches: a one-to-many branch and a one-to-one branch.

[0067] During training, the one-to-many branch adopts a traditional assignment strategy, where one ground truth is matched with multiple positive sample anchors. This mechanism generates dense supervision signals, promoting rapid convergence of the feature extraction network and enabling it to learn rich texture features.

[0068] Meanwhile, the one-to-one branch employs a bipartite matching strategy, meaning that a real target is matched only with the highest-scoring positive sample anchor, while all others are considered negative samples. This branch aims to simulate NMS-free inference behavior, enabling the network to learn exclusive predictive capabilities for the same target.

[0069] S52. Based on the target detection model of the first stage, the target detection model of the second stage is obtained through consistency matching metric. Furthermore, to ensure alignment of the two branches in the feature space and avoid training oscillations caused by differences in supervision signals, a Consistent Matching Metric is introduced. This metric forces that the "optimal anchor point" selected in a one-to-one branch must be a subset of the "Top-K anchor points" selected in a one-to-many branch, thereby ensuring that the two heads have a consistent understanding of the target.

[0070] S53. Based on the target detection model of the second stage, the target detection model after training is obtained by iterative calculation using a composite loss function.

[0071] Furthermore, the composite loss function It is a weighted average of classification loss, bounding box regression loss, and distribution focus loss, and the calculation formula is as follows: (17) (1) Classification loss ) The difference in class probabilities is calculated using binary cross-entropy loss (BCE Loss). To address the problem of extreme imbalance between positive and negative samples, a dynamic weighting factor is introduced, calculated using the following formula: (18) (2) Bounding box regression loss ) To achieve higher positioning accuracy, CIoU (Complete IoU) loss is used. This loss comprehensively considers the overlap area, center point distance, and aspect ratio consistency. The calculation formula is as follows: (18) in, For intersection, union, and comparison, Center of the prediction box Center of the real frame Euclidean distance, The length of the diagonal of the smallest bounding rectangle that covers both boxes. The consistency of aspect ratio is measured. This is the balance coefficient.

[0072] (3) Distribution focus loss )

[0073] To address the positioning uncertainty caused by blurred target edges in SMR (Single Matrix Recognition) scenarios, the DFL (Discrete Flow Failure) loss is used to transform the regression problem into a discrete probability distribution estimation problem. For a given coordinate of the bounding box... The two nearest discretization anchors are and The predicted probability distribution is and Then the DFL loss is defined as: (20) This loss causes the probability distribution predicted by the network to quickly focus near the true value, thereby improving the ability to accurately locate the boundaries of small targets.

[0074] S6. Input the real-time laser-acquired image from the industrial vision sensor into the trained target detection model, and obtain the target detection result through confidence analysis.

[0075] Specifically, such as Figure 8 The flowchart shown in this embodiment of the lightweight target detection method for a laser tracking system of the present invention obtains the target detection result. In step S6, the image acquired in real time by the industrial vision sensor is input into the trained target detection model, and the target detection result is obtained through confidence analysis, including: S61. Input the real-time laser-acquired images collected by the industrial vision sensor into the trained target detection model, and obtain a one-to-one feature map dataset through structural reparameterization and branch pruning. Furthermore, the multi-branch training structure is equivalently merged into a single-path convolutional structure to reduce memory usage and computational cost. At the same time, the one-to-many branches in the head network are physically removed, and only the trained one-to-one branches are retained for forward propagation.

[0076] S62. Based on the one-to-one feature map dataset, obtain the corresponding class confidence through feature mapping; Furthermore, the feature map, after being extracted by the RepNMSC backbone, fused by W-BiFPN, and enhanced by DEHead, is input into the retained one-to-one head. The network directly maps each grid point on the feature map to bounding box coordinates. And the corresponding category confidence level.

[0077] S63. Based on the corresponding category confidence level, the target detection result is obtained by comparing the confidence level threshold.

[0078] Furthermore, the system directly applies the confidence threshold (set to 1000 in this embodiment). Filter the prediction results: retain those with a confidence level greater than 1%. The predicted bounding box is directly regarded as the target detection result. The entire process completely abandons the traditional non-maximum suppression (NMS) operation and its complex sorting and loop calculations, eliminates the computational bottleneck in the post-processing stage, and realizes low-latency, high-precision real-time detection of SMR targets.

[0079] like Figure 9 The diagram shown is a system block diagram of an embodiment of the lightweight target detection system for a laser tracking system according to the present invention. The present invention provides a lightweight target detection system for a laser tracking system, which is applied to a lightweight target detection method for a laser tracking system. The system includes: an image acquisition module, a multi-scale feature acquisition module, a fusion feature acquisition module, a feature parsing module, a model training module, and a result output module. Specifically, The image acquisition module is used to acquire raw images in the laser tracking field of view through industrial vision sensors and perform adaptive preprocessing to obtain the standard input tensor dataset of the images. The multi-scale feature acquisition module is used to obtain a multi-scale feature map dataset based on the standard input tensor dataset of the image and through the SMR object detection model; The fusion feature acquisition module is used to perform cross-scale fusion based on the multi-scale feature map dataset through a weighted bidirectional feature pyramid network to obtain the output fusion features. The feature parsing module is used to perform target parsing through the detail enhancement detection module based on the output fused features, and obtain the parsed output fused features. The model training module is used to train the initialized target detection model based on the parsed output fusion features to obtain the trained target detection model. The results output module is used to input the real-time laser-acquired images collected by the industrial vision sensor into the trained target detection model, and obtain the target detection results through confidence analysis.

[0080] This invention provides a lightweight target detection method and system for laser tracking systems. Firstly, by introducing a reparameterized multi-scale convolution module, the multi-branch structure enriches the scale diversity of feature extraction during training, while merging into a single-path convolution during inference to reduce computational burden. This reduces the number of model parameters while maintaining feature extraction capabilities, improving the model's operating efficiency on embedded devices. Secondly, by introducing shallow high-resolution feature feedback and combining it with a learnable weighted concatenation mechanism, the problem of small target feature vanishing in deep networks is effectively alleviated, improving recall and localization accuracy for distant, small SMR targets. Finally, by introducing differential convolution based on gradient prior initialization, the unique geometric edges and concentric gradient characteristics of SMR targets are utilized to effectively suppress asymmetric background specular interference in industrial environments, reducing false detection rate and improving detection robustness.

[0081] It is understood that the present invention has been described through the above embodiments and should not be construed as limiting the implementation and scope of the present invention. Those skilled in the art will recognize that various changes or equivalent substitutions can be made to these features and embodiments without departing from the spirit and scope of the present invention. Furthermore, under the teachings of the present invention, these features and embodiments can be modified to adapt to specific situations and materials without departing from the spirit and scope of the present invention. Therefore, the present invention is not limited to the specific embodiments disclosed herein, and all embodiments falling within the scope of the claims of this application are within the protection scope of the present invention.

Claims

1. A lightweight target detection method for a laser tracking system, characterized in that, The method includes: S1. Acquire the original image in the laser tracking field of view through an industrial vision sensor, and perform adaptive preprocessing to obtain the standard input tensor dataset of the image; S2. Based on the standard input tensor dataset of the image, obtain a multi-scale feature map dataset through the SMR object detection model; S3. Based on the multi-scale feature map dataset, cross-scale fusion is performed through a weighted bidirectional feature pyramid network to obtain the output fused features; S4. Based on the output fusion features, the target is parsed through the detail enhancement detection module to obtain the parsed output fusion features; S5. Based on the parsed output fusion features, train the initialized target detection model to obtain the trained target detection model; S6. Input the image acquired in real time by the laser from the industrial vision sensor into the trained target detection model, and obtain the target detection result through confidence analysis.

2. The lightweight target detection method for a laser tracking system according to claim 1, characterized in that, In step S1, the original image in the laser tracking field of view is acquired through an industrial vision sensor and adaptively preprocessed to obtain a standard input tensor dataset of the image, including: S11. Acquire the original image in the laser tracking field of view using an industrial vision sensor to obtain the original field of view image; S12. Based on the original field of view image, an adaptive original field of view image is obtained by adaptive scaling using Letterbox based on step size alignment; S13. Based on the adapted original field of view image, the standard input tensor dataset of the image is obtained through tensor normalization and dimension rearrangement.

3. The lightweight target detection method for a laser tracking system according to claim 2, characterized in that, In step S12, based on the original field-of-view image, an adaptive original field-of-view image is obtained through adaptive scaling using letterboxes aligned to step size, including: S121. Based on the original field of view image, calculate the global scaling factor and scale the original field of view image to obtain the field of view image after the first processing. S122. Based on the field of view image after the first processing, the step-size alignment fill amount is calculated. S123. According to the step size, the fill amount is evenly distributed to both sides and the top and bottom ends of the field of view image after the first processing, and the boundary area is filled to obtain the adaptive original field of view image.

4. The lightweight target detection method for a laser tracking system according to claim 1, characterized in that, In step S2, based on the standard input tensor dataset of the image, a multi-scale feature map dataset is obtained through an SMR object detection model, including: S21. Based on the standard input tensor dataset of the image, the tensor dataset after the first processing is obtained through the convolution module and the C2f module. S22. Based on the tensor dataset after the first processing, process it through the C2f-RepNMSC module to obtain the tensor dataset after the second processing; S23. Based on the tensor dataset after the second processing, output four feature maps at different scales to obtain a multi-scale feature map dataset.

5. The lightweight target detection method for a laser tracking system according to claim 1, characterized in that, In step S3, based on the multi-scale feature map dataset, cross-scale fusion is performed through a weighted bidirectional feature pyramid network to obtain the output fused features, including: S31. Based on the multi-scale feature map dataset, a first-processed feature map dataset is obtained through high-resolution feature injection at layer P2. S32. Based on the feature map dataset after the first processing, a bidirectional feature flow path is constructed to obtain the feature map dataset after the second processing. S33. Based on the feature map dataset after the second processing, the output fused features are obtained through a learnable weighted feature fusion method.

6. The lightweight target detection method for a laser tracking system according to claim 1, characterized in that, In step S4, based on the output fusion features, the target is parsed through the detail enhancement detection module to obtain the parsed output fusion features, including: S41. Based on the output fusion features, the fusion features after the first processing are obtained through parallel dual-path feature decoupling. S42. Based on the fusion features after the first processing, the fusion features after the second processing are obtained by parsing using a four-dimensional difference convolution group operator. S43. Based on the fusion features after the second processing, extract the fusion features after the third processing using the gradient operator; S44. Based on the fusion features after the third processing, the parsed output fusion features are obtained through dual-head decoupling output.

7. The lightweight target detection method for a laser tracking system according to claim 6, characterized in that, The four-dimensional difference convolution group includes: a central difference convolution group, an angular difference convolution group, a horizontal difference convolution group, and a vertical difference convolution group.

8. The lightweight target detection method for a laser tracking system according to claim 1, characterized in that, In step S5, the initialized target detection model is trained based on the parsed output fusion features to obtain the trained target detection model, including: S51. Based on the parsed output fusion features, the initialized target detection model is trained through a dual-head supervision mechanism to obtain the first-stage target detection model; S52. Based on the target detection model of the first stage, the target detection model of the second stage is obtained through consistency matching metric. S53. Based on the target detection model of the second stage, the target detection model after training is obtained by iterative calculation using a composite loss function.

9. The lightweight target detection method for a laser tracking system according to claim 1, characterized in that, In step S6, the image acquired in real-time by the industrial vision sensor is input into the trained target detection model. Through confidence analysis, the target detection result is obtained, including: S61. Input the real-time laser-acquired images collected by the industrial vision sensor into the trained target detection model, and obtain a one-to-one feature map dataset through structural reparameterization and branch pruning. S62. Based on the one-to-one feature map dataset, obtain the corresponding category confidence score through feature mapping; S63. Based on the corresponding category confidence level, the target detection result is obtained by comparing the confidence level threshold.

10. A lightweight target detection system for a laser tracking system, used to implement the lightweight target detection method for a laser tracking system as described in any one of claims 1-9, characterized in that, The system includes: The image acquisition module is used to acquire raw images in the laser tracking field of view through industrial vision sensors and perform adaptive preprocessing to obtain the standard input tensor dataset of the images. The multi-scale feature acquisition module is used to obtain a multi-scale feature map dataset based on the standard input tensor dataset of the image and through the SMR object detection model; The fusion feature acquisition module is used to perform cross-scale fusion based on the multi-scale feature map dataset through a weighted bidirectional feature pyramid network to obtain the output fusion features; The feature parsing module is used to perform target parsing through the detail enhancement detection module based on the output fusion features to obtain the parsed output fusion features; The model training module is used to train the initialized target detection model based on the parsed output fusion features to obtain the trained target detection model. The result output module is used to input the image acquired in real time by the laser from the industrial vision sensor into the trained target detection model, and obtain the target detection result through confidence analysis.