PCB via hole solder joint defect detection method based on improved RT-DETR

By improving the RT-DETR network and adopting a lightweight backbone network and feature fusion module, the problems of solder joint confusion and decreased positioning accuracy in PCB through-hole solder joint defect detection were solved, achieving efficient and accurate defect detection.

CN122199552APending Publication Date: 2026-06-12ZHEJIANG SCI-TECH UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
ZHEJIANG SCI-TECH UNIV
Filing Date
2026-05-15
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing DETR series detectors based on Transformer suffer from problems such as solder joint confusion, decreased positioning accuracy due to cross-scale changes in defect morphology, and high computational complexity in PCB through-hole solder joint defect detection, making it difficult to meet real-time requirements.

Method used

An improved RT-DETR network is adopted, which constructs a lightweight backbone network through a structural reparameterization module, and segments and merges linear attention coding layers and a lightweight multi-scale fusion neck module to enhance the modeling capability of weld point arrays, suppress confusion between adjacent targets, and reduce computational complexity.

🎯Benefits of technology

While maintaining high inference speed, it significantly improves detection accuracy, enhances the ability to model slender defects, suppresses confusion between adjacent targets in dense weld point arrays, and improves the discrimination and positioning accuracy of weak contrast boundaries.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122199552A_ABST
    Figure CN122199552A_ABST
Patent Text Reader

Abstract

This invention relates to the fields of computer vision and PCB inspection technology, and discloses a method for detecting PCB through-hole solder joint defects based on an improved RT-DETR. First, a lightweight backbone network based on the ROMix-Rep module is constructed, and structural reparameterization technology is used to enhance the extraction of geometric features of slender defects with zero inference overhead. Second, a segmentation-merging linear attention coding layer (SMLA-EL) is introduced to explicitly model the row and column arrangement structure prior of the solder joints, effectively suppressing mutual interference between adjacent targets in dense regions. Finally, a lightweight multi-scale fusion neck network (DW-SlimNeck) is designed to achieve efficient alignment of cross-scale features with extremely low computational complexity while preserving high-frequency edge details. This invention significantly improves the detection accuracy, precision, and localization stability of PCB through-hole solder joint defects with low contrast in dense arrays without significantly increasing computational load.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of computer vision and PCB intelligent inspection technology, specifically relating to a PCB through-hole solder joint defect detection method based on improved RT-DETR. Background Technology

[0002] As a core component of electronic devices, the soldering quality of printed circuit boards (PCBs) directly determines the reliability and lifespan of the entire product. Through-hole mounting technology, due to its advantages such as stable connections and high current carrying capacity, is still widely used in the assembly of key components such as connectors and power devices. However, defects such as bridging, solder spikes, solder balls, and incomplete soldering are prone to occur during the soldering process, potentially leading to serious faults such as short circuits and open circuits. Therefore, achieving automated and high-precision detection of through-hole solder joint defects is crucial for ensuring the quality of electronic manufacturing.

[0003] With the development of computer vision technology, deep learning-based object detection methods are increasingly being applied in industrial quality inspection. Convolutional neural network models, such as YOLO and Faster R-CNN (Faster Region-based Convolutional Neural Network), can learn defect features end-to-end, significantly improving detection accuracy. However, these methods still have significant limitations when dealing with arrays of through-hole solder joints with highly regular arrangements. The DETR series detectors based on Transformer achieve global modeling through a self-attention mechanism, avoiding post-processing of non-maximum suppression and providing a new approach to detection tasks. However, directly applying them to through-hole solder joint defect detection still faces three challenges.

[0004] (1) Solder joints are usually arranged in a dense array in a regular pattern, with high appearance similarity, which makes the model very easy to confuse between adjacent solder joints, resulting in false detection and duplicate box problems. The global attention mechanism of the general detector fails to effectively embed the structural prior of row and column arrangement, making it difficult to suppress such dense mutual interference;

[0005] (2) The morphology of defects varies significantly across scales (such as slender bridging and small-sized solder balls) and has weak contrast with the background. Existing feature fusion networks are prone to losing high-frequency details of small targets and slender defects during multiple upsampling and downsampling processes, resulting in inaccurate cross-scale feature space alignment, which in turn leads to a decrease in positioning accuracy and boundary regression jitter under high thresholds;

[0006] (3) Actual production lines have stringent requirements for testing speed. Many high-performance models have high computational complexity, making it difficult to meet real-time requirements while ensuring high accuracy.

[0007] Therefore, there is an urgent need to improve existing technologies to address the practical challenges of automated inspection of through-hole solder joints. Summary of the Invention

[0008] The technical problem to be solved by the present invention is to provide a PCB through-hole solder joint defect detection method based on improved RT-DETR, which enhances the modeling ability for slender PCB through-hole solder joint defects, suppresses the confusion of adjacent targets and the problem of repeated prediction boxes in dense solder joint arrays, and lightens the model, so as to effectively detect PCB through-hole solder joint defects.

[0009] To address the aforementioned technical problems, this invention provides a method for detecting solder joint defects in PCB through-holes based on an improved RT-DETR, comprising the following steps:

[0010] The PCB image to be detected is acquired and input into the offline trained PCB through-hole solder joint defect detection network to obtain the defect category and bounding box of the through-hole solder joint;

[0011] The PCB through-hole solder joint defect detection network is based on an improvement of the RT-DETR network. It includes a lightweight backbone network constructed with a structural reparameterization module, a segmentation-merging linear attention coding layer to replace the original intra-scale feature interaction module, and a lightweight multi-scale fusion neck module to replace the original cross-scale feature fusion module.

[0012] As an improvement to the PCB through-hole solder joint defect detection method based on improved RT-DETR of the present invention:

[0013] The lightweight backbone network is an improvement on the ResNet network. It uses a structural reparameterization module to replace the traditional residual blocks in each stage of the original ResNet network as the core feature extraction unit to output multi-scale features. Meanwhile, during the transition between each processing stage, a spatial-to-depth downsampling operation is used during the transition of shallow features to rearrange the pixels of the local regions of the input feature map and map them to the channel dimension; during the transition of deep features, an anti-aliasing downsampling operation is used, which combines a low-pass filter with depthwise separable convolution to perform feature smoothing and downsampling.

[0014] As a further improvement to the PCB through-hole solder joint defect detection method based on improved RT-DETR of the present invention:

[0015] The structure reparameterization module is the ROMix-Rep module:

[0016] During offline training, the input layer contains four parallel paths for extracting standard local neighborhood features, row structure features along the horizontal direction, column structure features along the vertical direction, and preserving the original input information. Then, the output features of the four paths are added element by element. During online inference, the input layer is equivalently folded from the four parallel paths into a single depthwise separable convolution.

[0017] Then, the features processed by the input layer pass through the efficient channel attention module and two convolutional modules in sequence. The features extracted by the efficient channel attention module and the features output by the convolutional modules are then added element by element.

[0018] As a further improvement to the PCB through-hole solder joint defect detection method based on improved RT-DETR of the present invention:

[0019] The specific operation of the segment-merge linear attention encoding layer is as follows:

[0020] (1) Multi-scale features of backbone network output Flattened into a sequence of length N in the spatial dimension, the query matrix is ​​obtained by projecting the learnable matrix. Key matrix Sum matrix Then, the value matrix is ​​divided along the channel dimension. Mean score characteristics and value characteristics ;

[0021] (2) Two parallel kernelized linear attention branches are used to calculate the same sign feature for the positive and negative parts of the query matrix Q and the key matrix K, respectively. and the characteristics of different signs ;

[0022] (3) Based on the same sign feature and the characteristics of different signs A lightweight weight estimator generates a gated weight graph. and Then, the same sign feature and the characteristics of different signs Corresponding to the corresponding gating weight graph and Perform element-wise multiplication and concatenate the results into a fused feature. ;

[0023] (4) Fusion characteristics The relative position encoding is injected into the position-aware fusion module to output the encoded feature S4.

[0024] As a further improvement to the PCB through-hole solder joint defect detection method based on improved RT-DETR of the present invention:

[0025] The lightweight multi-scale fusion neck module includes a spatially matched convolutional module and a cross-stage aggregation unit, featuring... Feature S4 and feature S4 are each processed by a spatial matching convolution module to output features respectively. , and ,Then:

[0026] (1) Characteristics After upsampling and features Features are obtained by aggregating through cross-stage aggregation units and then passing them through a spatial matching convolution module. ;

[0027] (2) Characteristics After upsampling and features Features are obtained through cross-stage aggregation units. ;

[0028] (3) Characteristics After depthwise separable convolution, and with features Features are obtained through cross-stage aggregation units. ;

[0029] (4) Characteristics After passing through the spatial matching convolution module, and Features are obtained through cross-stage aggregation units. ;

[0030] (5) Features ,feature and characteristics After flattening them in the spatial dimension, they are concatenated in the sequence length dimension to output the feature sequence. .

[0031] As a further improvement to the PCB through-hole solder joint defect detection method based on improved RT-DETR of the present invention:

[0032] The specific operation of the spatial matching convolution module is as follows:

[0033] The input features are sequentially passed through a parametric strip convolution and an anti-aliasing downsampling module. The resulting intermediate features are divided into two groups in the channel dimension. The intermediate features of the first group of 1 / 2 channels are directly retained, while the intermediate features of the second group of 1 / 2 channels are passed through a depthwise separable convolution. Then, the two outputs are concatenated and rearranged in the channel dimension.

[0034] As a further improvement to the PCB through-hole solder joint defect detection method based on improved RT-DETR of the present invention:

[0035] The cross-stage aggregation unit includes two parallel processing paths:

[0036] The input features in the first branch are processed by a lightweight standard convolution and then used as a residual connection.

[0037] The input features in the second branch are split into two parallel sub-branches after passing through a lightweight standard convolution: one sub-branch consists of two concatenated spatial matching convolutional modules, and the other sub-branch consists of a heavily parameterized strip convolution. The features output from the two sub-branches are fused element-wise, concatenated with the output features of the first branch, and then output after passing through a lightweight standard convolution.

[0038] As a further improvement to the PCB through-hole solder joint defect detection method based on improved RT-DETR of the present invention:

[0039] The offline training process of the PCB through-hole solder joint defect detection network includes:

[0040] PCB images with different welding processes and different solder joint spacings were collected, and after data augmentation, uniform scaling and defect category labeling, they were used as the offline training dataset. An ensemble prediction loss function including classification loss and bounding box regression loss was used to optimize the network end-to-end. After training, the multi-branch structure in the backbone network was folded into a single standard convolution using structural reparameterization technology, and the model weights were saved.

[0041] The beneficial effects of this invention are mainly reflected in:

[0042] 1. On a self-built through-hole solder joint defect dataset, the RT-DETR-RSL model proposed in this invention achieves excellent performance of 63.0% mAP50-95 and 73.0 FPS. Compared with the baseline RT-DETR-R18 model, it improves accuracy (mAP50-95) by 5.8% while maintaining high inference speed. In cross-scenario validation on the publicly available PKU-Market PCB dataset, the mAP50-95 reaches 60.3%, which is also significantly better than mainstream detectors, demonstrating its strong generalization ability.

[0043] 2. This invention introduces the ROMix-Rep module, which uses multi-directional convolutional branches during the training phase to explicitly enhance the modeling capability for the geometric orientation of slender defects (such as bridging and pointed ends). During the inference phase, structural reparameterization technology is used to fold these defects into single-path convolutions. This design effectively improves the discrimination of weak-contrast boundaries and similar targets without increasing the computational load of inference.

[0044] 3. This invention introduces the SMLA-EL module, which, through row and column striping and feature symbol decomposition combined with a kernelized linear attention mechanism, enables the model to explicitly learn and utilize the structural prior of the regular arrangement of solder joints. It guides the model to focus on the correlation between solder joints in the same row and column, and strengthens the contrast between defective and normal regions, thereby directly and specifically suppressing the most common problems in dense solder joint arrays: adjacent target confusion and duplicate prediction boxes, significantly improving precision.

[0045] 4. This invention introduces the DW-SlimNeck module, employing lightweight units centered on depthwise separable convolutions and short-link bidirectional fusion paths, replacing the computationally complex traditional feature pyramid. This design effectively solves the spatial misalignment problem of multi-scale features during the fusion process with extremely low theoretical computational complexity, stabilizes the spatial co-location consistency of feature transmission, and better preserves high-frequency edge and detail information crucial for localization, thereby significantly improving the model's localization accuracy and stability at high Intersection over Union (IoU) thresholds. Attached Figure Description

[0046] The specific embodiments of the present invention will be further described in detail below with reference to the accompanying drawings.

[0047] Figure 1 This is a schematic diagram of the PCB through-hole solder joint defect detection network (RT-DETR-RSL) of the present invention;

[0048] Figure 2 This is a schematic diagram of the ROMix-Rep module. Figure 2 middle: (a) is the structure diagram during the offline training phase, and (b) is the structure diagram during the online inference phase.

[0049] Figure 3 This is a schematic diagram of the structure of a segment-merge linear attention coding layer (SMLA-EL);

[0050] Figure 4 This is a schematic diagram of the spatial matching convolutional module (RASM-Conv);

[0051] Figure 5 This is a schematic diagram of the structure of a multi-stage aggregation unit (RASM-VoVCSP);

[0052] Figure 6 This is a schematic diagram of the DW-SlimNeck structure;

[0053] Figure 7 A statistical graph showing the number of defects in four categories in the THT-SJ4D dataset;

[0054] Figure 8 Example image of the original PCB;

[0055] Figure 9 For YOLOv12-n pairs Figure 8 The image shown is a diagram of the detection results for the PCB.

[0056] Figure 10 For RT-DETR-R18 Figure 8The image shown is a diagram of the detection results for the PCB.

[0057] Figure 11 For the RT-DETR-RSL pair of the present invention Figure 8 The image shown is a diagram of the detection results for the PCB. Detailed Implementation

[0058] The present invention will be further described below with reference to specific embodiments, but the scope of protection of the present invention is not limited thereto:

[0059] Example 1: A PCB through-hole solder joint defect detection method based on improved RT-DETR. First, the PCB image is input into a lightweight, reparameterizable backbone network to extract multi-scale features with rich geometric details. Next, the features at key scales are fed into an array-aware coding layer, which suppresses misclassification of dense regions by explicitly modeling the row and column structure prior of the solder joint. Then, a lightweight multi-scale fusion neck network is used to efficiently align and fuse features from different levels to enhance the model's ability to represent small targets and slender defects. Finally, a query-based decoder directly outputs the defect category and accurate bounding box. The entire process is performed end-to-end without the need for non-maximum suppression post-processing.

[0060] Step 1: Improve the RT-DETR network as a PCB through-hole solder joint defect detection network.

[0061] The backbone of the RT-DETR (Real-Time Detection Transformer) network adopts ResNet series networks, such as ResNet-18 and ResNet-50. The multi-scale features {S3, S4, S5} output from the last three stages of the backbone network serve as input to the efficient hybrid encoder. The efficient hybrid encoder transforms the multi-scale features into image feature sequences through Intra-Scale Feature Interaction (AIFI) and Cross-Scale Feature Fusion (CCFM) modules. IoU-aware query selection is used to select a fixed number of image features as the initial target query multi-scale features for the decoder. Finally, the decoder with an auxiliary output prediction head iteratively optimizes the target query to generate bounding boxes and confidence scores.

[0062] The PCB through-hole solder joint defect detection network of this invention (referred to as RT-DETR-RSL) is based on the RT-DETR network. It replaces the original ResNet backbone network of RT-DETR with a lightweight backbone network (ROMix-Rep Backbone) built based on the ROMix-Rep module. It replaces the original intra-scale feature interaction (AIFI) with a segment-merge linear attention coding layer (SMLA-EL) to enhance the encoder's structured relationship modeling capability. Furthermore, it replaces the traditional feature pyramid network (i.e., the original cross-scale feature fusion module (CCFM)) with a lightweight multi-scale fusion neck module (DW-SlimNeck) for multi-scale feature fusion. Figure 1 As shown. The three improved modules are as follows:

[0063] Step 1.1: Lightweight backbone network built based on ROMix-Rep module (ROMix-Rep Backbone)

[0064] The lightweight backbone network (ROMix-Rep Backbone) of this invention retains the hierarchical structure of the original ResNet network with multiple processing stages (Stage 1 to Stage 4). It replaces the traditional residual blocks in each stage (Stage 1 to Stage 4) of the original ResNet network with ROMix-Rep modules as the core feature extraction units to output multi-scale features. Meanwhile, during the transition between each processing stage, a space-to-depth downsampling operation is employed. ) operation and anti-aliasing downsampling ( It replaces the standard strided convolution downsampling in the original ResNet.

[0065] (1) ROMix-Rep module

[0066] One of the core improvements of this invention is the use of the ROMix-Rep module as the feature extraction unit in the four processing stages (Stage 1 to Stage 4) of the backbone network to extract multi-scale features. The ROMix-Rep module includes two decoupled structural forms: the "training phase" and the "inference phase".

[0067] During the offline training phase, the structure of the ROMix-Rep module is as follows: Figure 2 As shown in (a), the input layer contains four parallel paths:

[0068] Path 1: A 3×3 depthwise separable convolution (DWConv) and batch normalization (BN) layer are used to extract standard local neighborhood features.

[0069] Path 2: A 1×3 depth-separable convolutional and batch normalized (BN) layer enhances sensitivity to horizontal (row) structures and is specifically designed to capture bridging defects along the solder joint alignment direction.

[0070] Path 3: A 3×1 depthwise separable convolutional and batch normalized (BN) layer enhances sensitivity to vertical (column) structures, specifically designed to capture longitudinal spike defects.

[0071] Path 4: An identity branch and a batch normalization (BN) layer are used to preserve the original input information. It should be noted that the batch normalization (BN) layer in the above four paths is a common technique in deep learning networks. For simplicity, it is not shown in the accompanying diagram. Figure 2 The part in (a) is omitted.

[0072] The output features of the four paths are added element-wise, then channel recalibrated through an efficient channel attention (ECA) module, and then channel blending and dimension adjustment are performed sequentially through two 1×1 convolutions.

[0073] Finally, the features extracted by the Efficient Channel Attention (ECA) module and the features output by the 1×1 convolution are added element-wise to obtain the output features of the ROMix-Rep module.

[0074] During the online reasoning phase, such as Figure 2 As shown in (b), a one-time structural reparameterization operation is performed. Through a mathematical transformation involving equivalent convolutional kernel fusion and bias term addition, the input layer effectively folds the four parallel paths (including the Identity branch) from the offline training phase into a single 3×3 depthwise separable convolution (DWConv). During inference, the input features, after being processed by a single 3×3 depthwise separable convolution input layer, directly enter the Efficient Channel Attention (ECA) module, dual 1×1 convolutional layers, and residual branch structure, maintaining complete consistency with the training phase. This ensures that during inference, the ROMix-Rep module maintains the same representational capabilities as in the training phase while its computational graph is identical to a standard depthwise separable convolutional block, achieving zero increase in inference computational overhead.

[0075] The ROMix-Rep module deploys multi-directional (horizontal, vertical, and standard) deep convolutional branches in parallel during the offline training phase to enhance geometric sensitivity to defects such as slender bridging and sharpening. During the online inference phase, it uses structural reparameterization technology to fold the multi-branch into a single standard convolution, thereby significantly improving feature extraction capabilities and enhancing the discrimination of weak contrast boundaries and similar targets without increasing the computational load of inference.

[0076] (2) Detail-preserving downsampling strategy

[0077] At key downsampling locations in the backbone network (e.g., when transitioning from high-resolution feature maps to low-resolution ones), spatial reorganization strategies are implemented to replace conventional strided convolutions. Spatial reorganization strategies include space-to-depth downsampling operations (…). ) operation and anti-aliasing downsampling ( ):

[0078] During shallow feature transitions (Stage 1 to Stage 2 and Stage 2 to Stage 3), a space-to-depth (SPD) downsampling operation is employed. The operation rearranges adjacent 2×2 local regions in the input feature map into pixels and maps them to the channel dimension, thereby reducing the spatial resolution and increasing the number of channels without using strided convolution, effectively preserving local detail information.

[0079] During the transition of deep features (Stage 3 to Stage 4), low-pass filtering (LPF) combined with depthwise separable convolution (DWConv) is used for anti-aliasing downsampling. The operation involves first smoothing the feature map using a low-pass filter, and then performing a depthwise separable convolution with a stride of 2 to suppress aliasing effects that may occur due to downsampling and improve the scale consistency of the features.

[0080] Step 1.2: Segment-Merge Linear Attention Encoding Layers (SMLA-EL)

[0081] This invention integrates a segment-merge linear attention coding layer (SMLA-EL) into the encoder of RT-DETR to insert it into the processing flow of critical scale features between the backbone and neck. It incorporates the multi-scale features output from stage 2, stage 3, and stage 4 of the backbone network. The fused feature S4 is output after passing through the Segment-Merge Linear Attention Encoding Layer (SMLA-EL). The structure of the Segment-Merge Linear Attention Encoding Layer (SMLA-EL) is as follows: Figure 3 As shown, the details are as follows:

[0082] (1) Linear projection and value feature segmentation

[0083] First, the input feature layer is flattened into a sequence (token) with length N in spatial dimensions. Then, the input is projected using three learnable matrices to obtain query matrices of dimension d. Key matrix Sum matrix .

[0084] Next, the value matrix is ​​divided along the channel dimension. Divide the data into two parts to obtain the characteristic values ​​used to calculate the parts with the same sign. and the value characteristics used to calculate the opposite sign part. .

[0085] (2) Symbolic decomposition of query and key features

[0086] To ensure the non-negativity of linear attention and to separate features of different polarities, the positive and negative components of the query matrix Q and the key matrix K are extracted. Taking the query matrix Q as an example:

[0087] (1)

[0088] in, , These represent the positive and negative parts of Q, respectively.

[0089] K is obtained by performing a similar operation. .

[0090] (3) Calculation of dual-branch kernelized linear attention

[0091] The model employs two parallel kernelized linear attention branches to compute the same-sign and opposite-sign feature dependencies, reducing the computational complexity from quadratic to linear through matrix multiplication.

[0092] The first branch, the same-sign feature extraction branch, is used to capture feature associations with the same activation polarity. First, the positive and negative parts of the key features are sorted... The order of the elements is concatenated along the channel dimension, activated by the feature mapping function Φ, and then combined with the value features. Perform the first matrix multiplication (Matmul); then, sort the query features according to... After concatenation and activation by Φ, a second matrix multiplication is performed with the result of the first matrix multiplication to output the first intermediate feature path. (Same sign feature). The vector representation of this process is as follows:

[0093] (2)

[0094] Where t represents the t-th query position to be calculated, and i and j represent the i-th and j-th key value positions participating in the weighted summation of the numerator and the normalized summation of the denominator, respectively. In this embodiment, the feature mapping function... The method employs a ReLU-based learnable power function mapping, which is achieved by performing ReLU decomposition on the positive and negative parts of the input respectively, and then applying a power transformation of the channel dimension learnable exponent.

[0095] The second branch, the opposite-sign feature extraction branch, is used to capture feature contrasts with opposite activation polarities (e.g., high-frequency contrast between defective regions and normal backgrounds). This branch reverses the concatenation order of the key features, i.e., according to... Perform concatenation. After activation by the feature mapping function Φ, it is combined with the value features. Perform the first matrix multiplication; then, concatenate the query features in the normal order. After concatenation and activation by Φ, a second matrix multiplication is performed with the result of the first matrix multiplication of this path, outputting the intermediate features of the second path. (Issuing a different sign feature). The corresponding vector representation is as follows:

[0096] (3)

[0097] This design enables the model to explicitly model "intra-row and intra-column dependencies" and "the comparison between defects and adjacent normal solder joints", effectively suppressing false alarms and duplicate predictions in high-density areas.

[0098] (4) Gating fusion and location information injection

[0099] Same sign characteristic of output from two branches and the characteristics of different signs The input is a lightweight weight estimator (WE), which adaptively generates a gated weight graph based on the input features. and Then the gating weight graph and Respectively with intermediate features and Perform element-wise multiplication and concatenate the results into a fused feature. To achieve adaptive weighting and fusion:

[0100] (4)

[0101] in, This indicates element-wise multiplication.

[0102] Then, the features are fused. Through a location-aware fusion (PAF) module, the location-aware fusion (PAF) module bases its fusion features on... The relative offset relationships between various spatial locations are used to construct relative position coding terms and inject them into the fusion features. In the process, the final encoded feature S4 is output.

[0103] Step 1.3, Lightweight Multiscale Fusion Neck Module (DW-SlimNeck)

[0104] This invention uses the DW-SlimNeck module to replace the feature pyramid in the original RT-DETR to achieve efficient feature fusion. The DW-SlimNeck module is mainly composed of two core sub-modules (spatial matching convolution module (RASM-Conv) and cross-stage aggregation unit (RASM-VoVCSP)).

[0105] Step 1.3.1: Construct the Spatial Matching Convolutional Module (RASM-Conv)

[0106] Spatial Matching Convolutional Module (RASM-Conv) as follows Figure 4 As shown, the input features with C1 channels are first processed sequentially by a reparameterized stripe convolution (Rep-Stripe) and an anti-aliasing downsampling module. The Rep-Stripe convolution combines reparameterizable 1×3 and 3×1 stripe convolutions to enhance the modeling of horizontal and vertical structural features while maintaining low computational cost. The anti-aliasing downsampling module reduces spatial resolution while suppressing spectral aliasing, thus improving the stability of the downsampled features. The intermediate features obtained after processing by the reparameterized stripe convolution and anti-aliasing downsampling modules are divided into two groups in the channel dimension, each group having C1 / 2 channels. Then, the intermediate features are processed in parallel using a dual-branch approach:

[0107] The first branch directly retains the intermediate features of the C1 / 2 channels as the direct transmission of the original information. The second branch uses depthwise separable convolution (DSConv) to further model the spatial dimension of the intermediate features of the other C1 / 2 channel. Through the decomposition of channel-wise convolution and pointwise convolution, local spatial correlations are extracted with lower parameter and computational costs.

[0108] The outputs of the two branches are concatenated along the channel dimension, and then cross-branch information mixing and feature recombination are achieved through channel shuffle. After concatenation and shuffle, the output features with C1 channels are obtained.

[0109] Step 1.3.2, Cross-Stage Aggregation Unit (RASM-VoVCSP)

[0110] The cross-stage aggregation unit (RASM-VoVCSP) adopts the concept of a cross-stage component (CSP) structure, such as... Figure 5 As shown, for an input feature map with C1 channels, two parallel processing paths are constructed, and different transformation operations are applied to the same input feature map respectively:

[0111] The input features in the first branch are transformed into features through only a light standard convolution (SC). It is directly used for cross-stage residual connection calculation, preserving the original semantic and spatial structure information for subsequent aggregation.

[0112] The input features in the second branch are transformed into features by a lightweight standard convolution (SC). Then it is divided into two parallel sub-branches for processing. In the first sub-branch, the features... The output features are processed through a bottleneck path consisting of two spatially matched convolutional modules (RASM-Conv). This is used to model richer context and spatial dependencies. The second sub-branch introduces repetitive stripe convolutions (Rep-Stripe convolutions) to enhance features. Features are output after Rep-Stripe convolution. Then, in the feature dimension, the features With features Features are obtained by performing element-wise fusion. This enhances the ability to perceive axial structures.

[0113] Finally, the features and characteristics The layers are concatenated and fused using a lightweight standard convolutional (SC) convolution. This design achieves the fusion of cross-layer information in a single aggregation operation, significantly reducing memory access and computational scheduling overhead. Meanwhile, the residual connections of the Rep-Stripe convolution provide a direction-sensitive gradient flow, enhancing training stability.

[0114] Step 1.3.3, Overall structure of DW-SlimNeck:

[0115] The structure of DW-SlimNeck is as follows: Figure 6 As shown, the multi-scale features output by the autonomous backbone network And the feature S4 output by SMLA-EL is used as the input to DW-SlimNeck, feature Both feature S1 and feature S4 are processed by a spatially matched convolutional module (RASM-Conv) for channel alignment and feature enhancement, and then output as separate features. , and ,Then:

[0116] (1) Characteristics After upsampling and features Features are obtained by aggregation through a cross-stage aggregation unit (RASM-VoVCSP) followed by channel alignment and feature enhancement through a spatial matching convolutional module (RASM-Conv). ;

[0117] (2) Characteristics After upsampling and features Features are obtained through a cross-stage aggregation unit (RASM-VoVCSP). ;

[0118] (3) Characteristics After low-cost spatial detail refinement using depthwise separable convolution (DWConv), and with features... Features are obtained through a cross-stage aggregation unit (RASM-VoVCSP). ;

[0119] (4) Characteristics After passing through the spatially matched convolutional module (RASM-Conv) and Features are obtained through a cross-stage aggregation unit (RASM-VoVCSP). ;

[0120] (5) Features ,feature and characteristics The channel dimensions are aligned to the same size, and then each channel is flattened in the spatial dimension to convert it into a one-dimensional feature sequence. The three flattened one-dimensional feature sequences are then concatenated in the sequence length dimension to output the concatenated feature sequence. It is then fed into a subsequent decoder for IoU-aware query selection and target decoding.

[0121] Through this short-link bidirectional path, DW-SlimNeck achieves efficient alignment, fusion, and detail enhancement of multi-scale features with extremely low computational cost, and outputs the processed multi-scale feature sequence to the subsequent decoder.

[0122] Step 2: Model Training

[0123] Step 2.1: Construction of the training dataset

[0124] To comprehensively and objectively evaluate the effectiveness and generalization ability of the RT-DETR-RSL model, this invention uses a self-built high-quality dedicated dataset (THT-SJ4D dataset) for training and validation.

[0125] The self-built THT-SJ4D dataset, officially named Through-Hole Solder-Joint 4-DefectDataset, is specifically designed for through-hole solder joint defect detection. The dataset contains 3525 high-resolution (2048×2048 pixels) color industrial images. Data sources cover more than ten different types of PCBs, including pure through-hole mounting (THT) processes as well as hybrid surface mount technology (SMT) and THT processes, ensuring data diversity. Solder pitch covers five common specifications from 1.00mm to 2.54mm to evaluate the model's adaptability to different array densities. All images were acquired using a Hikvision MV-CS050-10GC industrial camera with a 5MP lens under controlled multi-level brightness LED ring light. By finely adjusting the light source angle and brightness, subtle exposure variations and interfering highlights between adjacent solder joints were intentionally created to highlight weak contrast boundaries and increase the difficulty of distinguishing similar targets within the same class, making the dataset more reflective of the complex working conditions of real production lines. The defect categories are labeled as follows: bridging, pull-out, solder ball, and missing solder. The number of defects in each category is as follows: Figure 7 As shown. Then, the acquired images were augmented using standard color dithering and random horizontal flipping to improve model robustness. They were then uniformly scaled to a resolution of 640×640 pixels, and the images of the same size were randomly divided into training, validation, and test sets in a 7:2:1 ratio.

[0126] Step 2.2, Offline Training

[0127] The training parameters were set as follows: AdamW optimizer was used, initial learning rate was set to 0.0001, batch size was 4, and training lasted for 300 epochs. Offline training was performed using a server equipped with an NVIDIA A100 GPU.

[0128] On the self-built THT-SJ4D dataset, the final model achieved a performance of 63% mAP50-95 and 73.0 FPS on the test set, verifying the effectiveness of the specific implementation of the present invention.

[0129] Parameter initialization: The backbone network is initialized with weights pre-trained on the ImageNet dataset (based on the ResNet-18 architecture), while the rest of the network (SMLA-EL, DW-SlimNeck, and decoder, etc.) are randomly initialized.

[0130] Optimization strategy: The AdamW optimizer is used, with an initial learning rate of 1e-4 and a weight decay coefficient of 1e-4. A cosine annealing scheduler is used to adjust the learning rate. The batch size is set to 4, and the total number of training epochs is 300.

[0131] Loss function: Following the end-to-end training paradigm of RT-DETR, an ensemble prediction loss function is used, which includes classification loss and bounding box regression loss (a linear combination of L1 loss and generalized IoU loss). The entire training process does not require post-processing steps such as non-maximum suppression (NMS).

[0132] Step 2.3: After offline training is completed, the four parallel paths (including the Identity branch) of the ROMix-Rep module are merged into a single 3×3 depthwise separable convolutional layer using structural reparameterization, which is equivalent to folding it into a single standard convolution. The optimal weights are then saved and used as weights for the online inference model.

[0133] Step 3: Use online

[0134] Images of PCB solder joints were acquired using an MV-CS050-10GC camera and uniformly scaled online to a resolution of 640×640 pixels. These images were then input into the online-usable RT-DETR-RSL network obtained in step 2. Inference was performed by loading the model weight file. First, multi-scale features were extracted through a structure-enhanced backbone network. Then, the spatial relationships of the solder joint array were explicitly modeled using a row-column aware coding layer. Next, lightweight multi-scale fusion of neck-aligned cross-scale features was used. Finally, the decoder directly outputs the defect category and localization box, achieving end-to-end detection.

[0135] experiment

[0136] (1) Evaluation indicators

[0137] To accurately evaluate the model, this invention uses several of the most common object detection evaluation metrics: mAP, AP, and recall. The calculation formulas are as follows:

[0138] (5)

[0139] (6)

[0140] Where R is recall and P is model precision. Higher R and P indicate better model performance. TP represents true positives, FP represents false positives, FN represents false negatives, and TN represents true negatives.

[0141] (7)

[0142] (8)

[0143] Where n is 4, representing the number of categories in the dataset that participated in the detection and evaluation. Indicates the first Average accuracy of class targets, This is the mean of the average precision across all categories.

[0144] Because precision and recall are usually mutually exclusive, it is necessary to comprehensively evaluate the model's performance at different confidence thresholds. AP combines precision at different recall rates to measure the model's detection performance for each class. mAP represents the average AP value across all classes and is used as a core metric for evaluating the overall performance of the model, including mAP. 50-95 mAP 50 mAP 75 .

[0145] (2) Comparative experiment

[0146] To evaluate the performance of the proposed method, the RT-DETR-RSL network model of this invention was compared with a variety of mainstream detectors, including YOLOv11-n, YOLOv12-n, DETR, Deformable-DETR and RT-DETR series baseline models, on the self-made THT-SJ4D dataset and the publicly available PKU-Market-PCB dataset. The test results are shown in Table 1 and Table 2, respectively.

[0147] Table 1. Comparison of metrics for various methods on the THT-SJ4D dataset.

[0148]

[0149] As shown in Table 1, RT-DETR-RSL significantly outperforms its competitors in key metrics such as mAP50-95 (63.0%), mAP75 (73.8%), and Precision (94.4%). Compared to the RT-DETR-R18 in the same series, it maintains a high frame rate (73.0 FPS) while improving mAP50-95 by 5.8%. Compared to YOLOv11-n, which has the fastest inference speed, RT-DETR-RSL achieves a significant improvement in accuracy with a smaller decrease in inference rate, especially in the mAP75 metric, which measures precise localization.

[0150] Table 2 Comparison of metrics for different methods on the PKU-Market-PCB dataset

[0151]

[0152] As shown in Table 2, on the publicly available PKU-Market-PCB dataset, the RT-DETR-RSL model of this invention achieved excellent performance with mAP50-95 of 60.3%, mAP50 of 98.2%, and mAP75 of 54.3%, while precision and recall reached 98.6% and 98.3%, respectively, and inference speed reached 37.8 FPS. Compared with the baseline model of the same series, RT-DETR-R18, its mAP50-95 is significantly improved by 8.2 percentage points, mAP75 by 6.2 percentage points, and inference speed is improved by 5 FPS. The significant improvement in mAP75 reflects the model's stronger stability in cross-scale feature alignment and boundary regression; the improvement in precision indicates a significant reduction in false detection rate in texture-similar regions; and the preservation of recall shows that the model's ability to detect narrow, low-contrast defects is not affected.

[0153] The PKU-Market-PCB dataset represents a different production line background, lighting environment, and defect morphology distribution compared to the self-built THT-SJ4D dataset. In this cross-scene validation, due to changes in environmental noise and target domain shifts, the overall accuracy (mAP50-95) of mainstream general-purpose detectors (such as YOLOv11, YOLOv12, and Deformable-DETR) generally suffered a significant decline, falling to the 44%~50% range. However, the RT-DETR-RSL model of this invention still maintained a high level of mAP50-95 of 60.3% on this dataset, proving that the features extracted in this invention are not overfitted features of specific pixel distributions in a single dataset, but rather learn the prior physical topology of the solder joint array and the essential geometric morphology of defects. This strong robustness confirms the universality of the proposed lightweight multi-scale fusion network and row-column perception attention mechanism.

[0154] Example diagram of the original PCB Figure 8 As shown, YOLOv12-n, RT-DETR-R18, and RT-DETR-RSL are used for the same PCB image ( Figure 8 The test results are as follows: Figures 9-11 As shown.

[0155] YOLOv12-n ( Figure 9It can detect major defects, but its response heatmap is scattered in areas with dense solder joints, leading to significant false alarms, such as misclassification of silkscreen areas. RT-DETR-R18 ( Figure 10 Thanks to its global attention mechanism, the response concentration is improved, but false positives cannot be completely avoided in backgrounds with high texture similarity. RT-DETR-RSL ( Figure 11 The DW-SlimNeck module exhibits the strongest discrimination capability: its response to defects is continuous and concentrated from root to tip. In areas with dense solder joints, the response boundary is clear, effectively distinguishing adjacent non-contact solder joints while successfully suppressing interference from background elements such as silkscreen printing. This directly verifies the effectiveness of the SMLA-EL module in suppressing false alarms, and the effectiveness of the DW-SlimNeck module in maintaining the structural integrity of slender defects and ensuring positioning stability.

[0156] (3) Ablation test

[0157] To verify the effectiveness of each module of ROMix-Rep, SMLA-EL and DW-SlimNeck, a systematic ablation study was conducted on the THT-SJ4D dataset with RT-DETR-R18 as the baseline. The results are shown in Table 3.

[0158] Table 3 Ablation experiments of RT-DETR-RSL on the THT-SJ4D dataset

[0159]

[0160] Table 4 Ablation experiments on the impact of module combinations on cross-scale sensing and detection performance

[0161]

[0162] The results of the ablation experiment showed that:

[0163] Single Module: The introduction of any module can improve baseline performance. ROMix-Rep enhances geometric representation capabilities, increasing recall to 84.2%. SMLA-EL improves overall performance, with mAP50-95 increasing to 60.1%. DW-SlimNeck improves both recall (by 3.2%) and speed (by 16.5 frames per second).

[0164] Combination Effect: Any combination of two modules exhibits complementarity, further improving accuracy. When all three modules work together, the proposed model achieves optimal performance: mAP50-95 reaches 63.0%, an improvement of 5.8% over the baseline, while FPS increases to 73.0. To mitigate the impact of randomness, four replicate experiments were conducted under the same experimental conditions. The results show that the standard deviation of mAP50-95 is 0.11, the standard deviation of mAP75 is 0.18, and the standard deviation of FPS is 0.13, indicating the stability of the proposed model. The design of each module for "weak boundary localization," "dense mutual interference," and "cross-scale alignment" is effective and synergistic.

[0165] Finally, it should be noted that the above examples are merely some specific embodiments of the present invention. Obviously, the present invention is not limited to the above embodiments and many variations are possible. All variations that can be directly derived or conceived by those skilled in the art from the disclosure of the present invention should be considered within the scope of protection of the present invention.

Claims

1. A PCB through-hole solder joint defect detection method based on improved RT-DETR, characterized in that... The process includes: The PCB image to be detected is acquired and input into the offline trained PCB through-hole solder joint defect detection network to obtain the defect category and bounding box of the through-hole solder joint; The PCB through-hole solder joint defect detection network is based on an improvement of the RT-DETR network. It includes a lightweight backbone network constructed with a structural reparameterization module, a segmentation-merging linear attention coding layer to replace the original intra-scale feature interaction module, and a lightweight multi-scale fusion neck module to replace the original cross-scale feature fusion module.

2. The PCB through-hole solder joint defect detection method based on improved RT-DETR according to claim 1, characterized in that: The lightweight backbone network is an improvement on the ResNet network. It uses a structural reparameterization module to replace the traditional residual blocks in each stage of the original ResNet network as the core feature extraction unit to output multi-scale features. Meanwhile, during the transition between each processing stage, a spatial-to-depth downsampling operation is used during the transition of shallow features to rearrange the pixels of the local regions of the input feature map and map them to the channel dimension; during the transition of deep features, an anti-aliasing downsampling operation is used, which combines a low-pass filter with depthwise separable convolution to perform feature smoothing and downsampling.

3. The PCB through-hole solder joint defect detection method based on improved RT-DETR according to claim 2, characterized in that: The structure reparameterization module is the ROMix-Rep module: During offline training, the input layer contains four parallel paths for extracting standard local neighborhood features, row structure features along the horizontal direction, column structure features along the vertical direction, and preserving the original input information. Then, the output features of the four paths are added element by element. During online inference, the input layer is equivalently folded from the four parallel paths into a single depthwise separable convolution. Then, the features processed by the input layer pass through the efficient channel attention module and two convolutional modules in sequence. The features extracted by the efficient channel attention module and the features output by the convolutional modules are then added element by element.

4. The PCB through-hole solder joint defect detection method based on improved RT-DETR according to claim 3, characterized in that: The specific operation of the segment-merge linear attention encoding layer is as follows: (1) Multi-scale features of backbone network output Flattened into a sequence of length N in the spatial dimension, the query matrix is ​​obtained by projecting the learnable matrix. Key matrix Sum matrix Then, the value matrix is ​​divided along the channel dimension. Mean score characteristics and value characteristics ; (2) Two parallel kernelized linear attention branches are used to calculate the same sign feature for the positive and negative parts of the query matrix Q and the key matrix K, respectively. and the characteristics of different signs ; (3) Based on the same sign feature and the characteristics of different signs A lightweight weight estimator generates a gated weight graph. and Then, the same sign feature and the characteristics of different signs Corresponding to the corresponding gate weight graph and Perform element-wise multiplication and concatenate the results into a fused feature. ; (4) Fusion characteristics The relative position encoding is injected into the position-aware fusion module to output the encoded feature S4.

5. The PCB through-hole solder joint defect detection method based on improved RT-DETR according to claim 4, characterized in that: The lightweight multi-scale fusion neck module includes a spatially matched convolutional module and a cross-stage aggregation unit, featuring... Feature S4 and feature S4 are each processed by a spatial matching convolution module to output features respectively. , and ,Then: (1) Characteristics After upsampling and features Features are obtained by aggregating through cross-stage aggregation units and then passing them through a spatial matching convolution module. ; (2) Characteristics After upsampling and features Features are obtained through cross-stage aggregation units. ; (3) Characteristics After depthwise separable convolution, and with features Features are obtained through cross-stage aggregation units. ; (4) Characteristics After passing through the spatial matching convolution module, and Features are obtained through cross-stage aggregation units. ; (5) Features ,feature and characteristics After flattening them in the spatial dimension, they are concatenated in the sequence length dimension to output the feature sequence. .

6. The PCB through-hole solder joint defect detection method based on improved RT-DETR according to claim 5, characterized in that: The specific operation of the spatial matching convolution module is as follows: The input features are sequentially passed through a parametric strip convolution and an anti-aliasing downsampling module. The resulting intermediate features are divided into two groups in the channel dimension. The intermediate features of the first group of 1 / 2 channels are directly retained, while the intermediate features of the second group of 1 / 2 channels are passed through a depthwise separable convolution. Then, the two outputs are concatenated and rearranged in the channel dimension.

7. The PCB through-hole solder joint defect detection method based on improved RT-DETR according to claim 6, characterized in that: The cross-stage aggregation unit includes two parallel processing paths: The input features in the first branch are processed by a lightweight standard convolution and then used as a residual connection. The input features in the second branch are split into two parallel sub-branches after passing through a lightweight standard convolution: one sub-branch consists of two concatenated spatial matching convolutional modules, and the other sub-branch consists of a heavily parameterized strip convolution. The features output from the two sub-branches are fused element-wise, concatenated with the output features of the first branch, and then output after passing through a lightweight standard convolution.

8. The PCB through-hole solder joint defect detection method based on improved RT-DETR according to claim 7, characterized in that: The offline training process of the PCB through-hole solder joint defect detection network includes: PCB images with different welding processes and different solder joint spacings were collected, and after data augmentation, uniform scaling and defect category labeling, they were used as the offline training dataset. An ensemble prediction loss function including classification loss and bounding box regression loss was used to optimize the network end-to-end. After training, the multi-branch structure in the backbone network was folded into a single standard convolution using structural reparameterization technology, and the model weights were saved.