A pcba defect detection method fusing improved yolov13
By introducing the windmill-shaped convolution PConv and C3k2-RepViTBlock modules into the YOLOV13 model and combining them with the HyperACE mechanism, the YOLOV13 model was improved, solving the problems of high false detection rate, high labor cost and insufficient small target detection capability in PCBA defect detection, and achieving efficient and accurate defect detection.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- HUAIYIN INSTITUTE OF TECHNOLOGY
- Filing Date
- 2026-02-28
- Publication Date
- 2026-06-19
AI Technical Summary
Existing PCBA defect detection methods suffer from high false detection rates, high labor costs, low production line changeover efficiency, and insufficient generalization ability of the YOLO algorithm in small target detection and complex backgrounds.
The YOLOV13 model's backbone network standard convolution is replaced by windmill-shaped convolution PConv. Combined with the C3k2-RepViTBlock module and HyperACE mechanism, the YOLOV13 model is improved through a full-link aggregation and allocation paradigm, reducing the number of parameters and computational complexity, and improving the accuracy of small target recognition.
Without sacrificing performance, the number of parameters and computational complexity are significantly reduced, improving the accuracy and robustness of PCBA defect detection, reducing the false detection rate, and enhancing the model's generalization ability and detection efficiency.
Smart Images

Figure CN122243883A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of PCB board defect detection, and specifically to a PCBA defect detection method that integrates and improves YOLOV13. Background Technology
[0002] PCBA (Printed Circuit Board Assembly) is a crucial step in electronics manufacturing, involving the precise mounting of various electronic components onto circuit boards, which has a decisive impact on the functionality and reliability of the final electronic product. With technological advancements and increased automation, PCBA manufacturing processes continue to evolve to meet increasingly complex and diverse market demands.
[0003] Currently, the traditional quality inspection methods widely used in this field mainly rely on template matching technology in automated optical inspection (AOI). This method identifies defects by comparing a pre-set standard template with the actual captured PCBA image. However, this method has significant limitations: First, the false detection rate of traditional AOI systems is relatively high, generally reaching 50%, leading to frequent missed detections and errors, which seriously affects the reliability of inspection; second, to avoid the risk of misjudgment, each AOI device needs to be manually reviewed by an experienced operator, which not only increases labor costs but also extends the production cycle; in addition, when the production line switches to produce different types of PCBA boards, the corresponding templates must be remade and configured, which is a cumbersome process with low adaptation efficiency.
[0004] In recent years, deep learning-based target detection algorithms such as the YOLO series (e.g., YOLOv7, YOLOv8, and YOLOv10) have demonstrated application potential in industrial inspection, but they still have several limitations. Regarding detection accuracy, although YOLO algorithms perform excellently on general datasets, their ability to detect small targets, such as extremely fine components or soldering defects on PCBAs, remains insufficient, easily leading to missed detections. In balancing speed and complexity, models often increase depth and number of parameters to pursue higher accuracy. Furthermore, previous YOLO models (e.g., YOLOv7, YOLOv8, and YOLOv10) still have room for improvement in generalization ability and robustness when facing industrial scenarios with imbalanced defect samples and strong background interference. Therefore, how to further improve the accuracy of small target recognition and reduce model complexity while maintaining high-speed inference remains a key challenge for this series of algorithms in practical PCBA quality inspection applications. Summary of the Invention
[0005] Purpose of the invention: To address the problems mentioned in the background art, this invention discloses a PCBA defect detection method that integrates and improves YOLOV13. By improving the YOLOV13 model through the fusion module of windmill-shaped pconv and C3k2-RepViTBlock, and in conjunction with the full-link aggregation and allocation (FuIIPAD) paradigm proposed by the HyperACE mechanism of YOLOV13, this method achieves a significant reduction in the number of parameters and computational complexity without sacrificing performance, while meeting the requirements for detecting small and weak targets.
[0006] Technical solution:
[0007] This invention discloses a PCBA defect detection method that integrates an improved YOLOv13, the method comprising the following steps:
[0008] S1: Collect actual PCBA samples from production and manually supplement them with four types of defect samples: missing components, misalignment, tombstoning, and bridging. Use an industrial camera to capture their image data, and upload and save the captured image data.
[0009] S2: Perform dataset annotation, preprocessing, enhancement, and segmentation on the image data described in S1;
[0010] S3: Build a PCBA defect detection model based on the YOLOv13 model;
[0011] S3.1: An asymmetric padding method is used to replace the original backbone network's standard convolution with the windmill-shaped convolution PConv;
[0012] S3.2: Retain the CSP branch structure of C3k2, replace the k=2 convolution of the residual branch with RepViTBlock to form the fusion module C3k2-RepViTBlock, and embed it into the 4th stage of the YOLOV13 backbone;
[0013] S3.3: The output of C3k2-RepViTBlock serves as the core input of the HyperACE mechanism, achieving bidirectional gradient feedback and collaborative optimization through loss function mediation, forming a coupled feature enhancement from local to global.
[0014] S4: Train a PCBA defect detection model using labeled data and evaluate its performance in PCBA target detection.
[0015] Furthermore, the specific steps of S2 are as follows:
[0016] Dataset annotation: The LabelImg v1.8.6 annotation tool was used to annotate the data on the image obtained by S1. The labels were named missing parts, offset, tombstone, and tin connection, and the label boxes were close to the target boundary.
[0017] Format conversion: The annotated image data is saved as an ".xml" file, and then converted into a ".tfrecord" file using a Python script. The annotation errors are corrected by two people through cross-validation to obtain the initial PCBA dataset.
[0018] Preprocessing steps: Crop out redundant background from the image and retain the effective area of PCBA; use min-max normalization to map pixel values to [0,1], and use image translation to make the center of PCBA board coincide with the center of the image, and finally output a 224×224×3 format image;
[0019] Data augmentation and partitioning: The dataset and its corresponding annotation information are augmented using six methods: horizontal flipping, vertical flipping, translation, clockwise rotation, counterclockwise rotation, and Gaussian blur. The training set, test set, and validation set are randomly partitioned in a 7:2:1 ratio.
[0020] Furthermore, the specific steps in S3.1 include:
[0021] The windmill-shaped convolution PConv was used to replace the standard convolution in the lower layers of the backbone network. PConv uses asymmetric padding and contains four parallel convolutional branches with kernel sizes of 1×3, 3×1, 1×3, and 3×1, and padding parameters of P(0,1,0,3), P(0,3,0,1), P(0,1,3,0), and P(3,0,1,0). After convolution, the convolution is processed by a BN layer and the SiLU activation function to enhance feature extraction, increase the receptive field, introduce minimal parameter increases, and reduce the number of parameters and computational cost.
[0022] Furthermore, the specific steps in S3.2 include:
[0023] The CSP main branch of C3k2 in the original YOLOV13 model is retained, and the standard convolutional units in the residual branch of the C3k2 module are replaced with RepViTBlock to form the fusion module C3k2-RepViTBlock. C3k2-RepViTBlock is embedded in the fourth stage of the YOLOV13 backbone network. C3k2-RepViTBlock receives high signal-to-noise ratio input provided by the PConv module and dynamically models the global spatial dependencies of dense components through the reparameterized attention mechanism of RepViTBlock, forming a progressive feature processing structure of wide-area context awareness → local → global collaborative parsing.
[0024] Furthermore, the specific steps in S3.3 include:
[0025] The C3k2-RepViTBlock and HyperACE mechanisms use a hierarchical progression from local to global as their core logic, forming a coupled connection relationship of feature input support, discriminative closure, and bidirectional optimization. The two, as complementary feature enhancement units, complete end-to-end collaboration.
[0026] Feature input support: C3k2-RepViT is deployed in the middle of the backbone network to perform local enhancement and preliminary semantic abstraction of the low-level features, and output feature maps with rich fine-grained defect clues and high semantic quality, which serve as the core input of the HyperACE mechanism and provide a feature foundation for global information fusion;
[0027] Judgment closed loop: C3k2-RepViTBlock extracts and identifies local correlations from local pixels to component groups to capture defect clues; based on the local feature results, it completes global feature reorganization and enhancement at the whole image level through the HyperACE mechanism, and combines global information to judge the authenticity and type of the locally captured clues, forming a judgment closed loop from local to global.
[0028] Bidirectional optimization: In end-to-end training, C3k2-RepViTBlock and the HyperACE mechanism form a bidirectional collaborative optimization connection through the loss function. The global gradient feedback of HyperACE guides C3k2-RepViTBlock to selectively extract local features that are valuable for global discrimination. The strongly discriminative local features output by C3k2-RepViT drive the hypergraph computation of HyperACE to converge to a more effective global correlation pattern, thereby optimizing the overall feature representation capability of the network.
[0029] Furthermore, the Inner-IoU loss function is introduced to optimize the original YOLOV13 loss function. The Inner-IoU loss with a scaling factor ratio of 0.8 is used to replace the original loss function to optimize the bounding box regression.
[0030] Furthermore, the PCBA defect detection model adds a convolution and attention fusion module CAFMAttention to the detection head part, specifically located before the multi-scale feature map output by the neck network is fed into the final detection convolutional layer. The convolution and attention fusion module CAFMAttention includes local branches and global branches, and the features of the two branches are fused by element-wise addition to enhance the ability to extract global and local features.
[0031] Beneficial effects:
[0032] 1. This invention introduces PConv to replace the original convolution in the YOLOv13 model. With a lower parameter cost than standard convolution, it achieves a significant expansion of the receptive field, thereby efficiently capturing primary features containing global contextual information in the early stages of the network. This lays a high signal-to-noise ratio input foundation for subsequent processing. It further reduces the number of parameters and computational cost without compromising the accuracy of subsequent PCBA defect detection.
[0033] 2. This invention retains the CSP branch structure of C3k2, replacing the k=2 convolution of the residual branch with RepViTBlock to form the fusion module C3k2-RepViTBlock, which is embedded in the fourth stage of the YOLOv13 backbone. By natively combining the attention mechanism of RepViTBlock with the residual path of C3k2, and forming a progressive collaboration with the underlying PConv large receptive field convolution, C3k2-RepViTBlock performs local refinement and global correlation judgment on this basis. This further improves the model's accuracy in identifying small targets in PCBA defect detection.
[0034] 3. This invention utilizes a deeply coupled system based on C3k2-RepViTBlock and HyperACE mechanisms to support feature input, close the discrimination loop, and perform bidirectional optimization. This system achieves multi-level, adaptive feature enhancement from local details to the global context. The former provides a semantically richer input foundation for the latter, while the latter provides optimization guidance for the former with a global perspective. By establishing long-distance dependencies, it associates defect features or normal component backgrounds scattered in different locations, further improving the model's ability to recognize complex defect patterns.
[0035] 4. This invention adds a convolution and attention fusion module (CAFMAttention) to the detection head to enhance the collaboration between global and local features and further reduce the false detection rate caused by background interference; by introducing the Inner-Iou loss function, the generalization ability is further improved and the regression convergence speed is increased, thereby improving the overall localization performance of the model. Attached Figure Description
[0036] Figure 1 This is a flowchart illustrating the specific process of this invention;
[0037] Figure 2 This is a PCBA template image from an embodiment of the present invention;
[0038] Figure 3 This is a schematic diagram of PCBA defects of the present invention;
[0039] Figure 4 This is a diagram of the convolution structure of the PConv windmill model of the present invention;
[0040] Figure 5This is a structural diagram of the C3k2-RepViTBlock of the present invention;
[0041] Figure 6 This is a structural diagram of the CAFM (Convolution and Attention Fusion) module of this invention;
[0042] Figure 7 This is a structural diagram of the PCBA defect detection model built based on the YOLOv13 model in this invention;
[0043] Figure 8 This is a schematic diagram illustrating the final detection effect of an embodiment of the present invention. Detailed Implementation
[0044] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0045] like Figure 1 As shown, this invention discloses a PCBA defect detection method that integrates improved YOLOv13, and the method steps are as follows:
[0046] Step 1, Image Acquisition: such as Figure 2-3 As shown, image data containing the active areas of PCBA boards were collected. During the image acquisition phase, due to the lack of readily available public datasets for PCBA boards, the dataset needed to be created independently. This included acquiring PCBA boards from actual production environments and capturing images of them, as well as supplementing the dataset with keyframes extracted from publicly available online videos to obtain a sufficient number and diversity of samples. The final dataset contains 8568 images. All collected image data was systematically stored in corresponding folders on the computer for subsequent analysis and research.
[0047] Step 2, Image Annotation: Use the LabelImg annotation tool to annotate the dataset for the images saved in Step 1;
[0048] Step 2.1: Image annotation. Use the LabelImg annotation tool. When annotating the dataset, it is necessary to clarify the types of defects. When drawing the bounding box, make the bounding box as close as possible to the boundary of the target object. At the same time, the annotated defects should be classified and named.
[0049] Step 2.2, Format Conversion: The image data annotated in Step 2.1 is converted to a new format. The annotated image data is saved as a ".xml" file. A Python script is then used to convert the ".xml" file to a ".tfrecord" file. Two-person cross-validation is used to check and correct annotation errors, reducing noise introduced by these errors, resulting in the initial PCBA board dataset.
[0050] Step 3: Dataset preprocessing: The initial PCBA dataset is cropped and normalized to ensure that the PCBA-related regions in each image are centered in the image. At this stage, the image size needs to be standardized to a 224*224*3 format.
[0051] Step 4, Data Augmentation: Perform data augmentation on the images and corresponding annotations in the dataset to obtain the processed images; then randomly divide the dataset into training and test sets in a 7:2:1 ratio.
[0052] Step 4.1: Perform various data augmentation operations on the preprocessed images, including horizontal flipping, vertical flipping, translation, clockwise rotation, counterclockwise rotation, and Gaussian blurring. Each original image is processed to generate six augmented images, which are then merged to form a dataset containing PCBA board defect images and their corresponding label files.
[0053] Step 4.2: Divide the PCBA board dataset from Step 4.1 into a training set, a test set, and a validation set in a ratio of 7:2:1. The training set contains 5997 images, the test set contains 1713 images, and the validation set contains 856 images.
[0054] Step 5: Model building: Use the YOLOv13 model and make improvements to build a target detection network based on the improved YOLOv13 model, namely the PCBA defect detection model. Use the model to train and validate the dataset obtained in Step 4.
[0055] Step 5.1: The pinwheel convolution (PConv) was used to replace the standard 3×3 convolution in the lower layer of the backbone network. Its structure diagram is shown below. Figure 4 As shown. The windmill-shaped convolution (PConv) differs from standard convolution in that it uses asymmetric padding to create horizontal and vertical convolution kernels for different regions of the image, with the kernels spreading outwards. PConv contains four parallel convolution branches, with the input tensor... After convolution, BN layer, and SiLU activation in each branch, the output is concatenated, then normalized by 2×2 convolution, and the input tensor is used. ,in , , These represent its height, width, and channel size, respectively.
[0056] The first layer of PConv performs parallel convolutions, and the calculation formula is as follows:
[0057]
[0058]
[0059]
[0060]
[0061] in, It is a convolution operator. It is The convolution kernel has an output channel of Fill parameters These represent the number of fill pixels in the left, right, top, and bottom directions, respectively.
[0062] The height of the output feature map is after the first layer of interleaved convolution. ,width and number of channels The relationship with the input feature map is as follows:
[0063]
[0064] in, It is the number of channels in the final output feature map of the PConv module. It is the convolution stride.
[0065] The results of the first layer of interleaved convolution are concatenated. The output is calculated as follows:
[0066]
[0067] Finally, the concatenated tensor is passed through a convolution kernel. Normalize the output feature map without padding. Adjust the height and width of the output feature map to preset values. and This allows PConv to be interchanged with Conv layers and serves as a channel attention mechanism to analyze the contributions of different convolutional directions. Final output The calculation is as follows:
[0068]
[0069]
[0070] The effectiveness of the receptive field gradually decreases outwards, similar to a Gaussian distribution. Furthermore, the smaller the target, the more concentrated its features become, highlighting the importance of the central features. Figure 4 The top right corner shows that PConv (k=3) has a receptive field of 25, and the number of convolutions decreases from the center outwards, similar to a Gaussian distribution. PConv utilizes grouped convolutions to significantly increase the receptive field while minimizing the number of parameters. The formula for calculating the number of parameters in a standard convolution (Conv) is:
[0071]
[0072] If the number of output channels Equal to the number of input channels , The Conv parameter is The parameters of PConv are calculated as follows:
[0073]
[0074] and of Compared to Conv, PConv reduces parameters by 22.2% and increases the receptive field by 177%. When used to extract low-level features from IRST, PConv replaces the first two Conv layers in networks such as YOLO. Conv needs Parameters, and PConv requires Therefore, PConv(k=3) compared to Conv increases the receptive field by 178% while increasing the parameters by only 111%. Similarly, PConv (k = 4) increases the receptive field by 444% while increasing the parameters by only 122%, demonstrating that PConv can achieve efficient receptive field expansion with minimal parameter increases.
[0075] Step 5.2: A lightweight new backbone is used in the design, combining C3k2 and RepViTBlock. The CSP main branch of C3k2 is retained, and the standard convolutional units in the residual branches of the C3k2 module are replaced with RepViTBlock units, forming the C3k2-RepViT fusion module. The C3k2-RepViTBlock structure is as follows: Figure 5 As shown
[0076] The C3k2 module is a fine-grained feature extraction unit in YOLOv13 for minor defects. C3k2 is the core feature aggregation module in stages 3-5 of the YOLOv13 backbone, based on the "splitting-aggregation" architecture of CSPNet (Cross Stage Partial Network). Its specific structure and parameters are as follows:
[0077] CSP branching logic: The input feature map is divided into a main branch and a residual branch in a 2:1 ratio according to the channel. The main branch only performs feature transfer through BN+SiLU, while the residual branch performs refined feature extraction.
[0078] Customized design of k=2 convolution: The residual branch uses a convolutional layer with kernel size k=2, stride s=1, and padding=1 (replacing the k=3 convolution of the traditional C3 module). The receptive field of this convolution is precisely matched to the physical size of typical small defects in PCBA. Addressing the characteristic of PCBA defects being "small local texture differences," the small receptive field of the k=2 convolution can accurately capture such subtle edge features, avoiding the blurring of defect details caused by the excessively large receptive field of the k=3 convolution.
[0079] The RepViTBlock module: A lightweight, heavily parameterized local-global feature modeling unit. RepViTBlock is the core building block of the lightweight visual model RepViT, employing a heavily parameterized design of "multi-branch training - single-branch inference." The multi-branch structure during training includes three parallel branches:
[0080] Branch 1: 1×1 convolution (channel compression ratio 0.5, such as compressing 64 channels into 32 channels), responsible for feature recalibration in the channel dimension;
[0081] Branch 2: 3×3 depthwise separable convolution (stride padding=1) to extract spatial texture features of local neighborhood;
[0082] Branch 3: Spatial Attention Submodule (CBAM Spatial Attention), which generates a spatial weight map through global average pooling and strengthens the weights of "suspected defect regions" in the feature map;
[0083] Single-branch fusion during inference phase: After training, the parameters of the three branches are merged into one 3×3 convolutional layer through matrix equivalence transformation;
[0084] Given the "densely distributed components" characteristic of PCBAs, RepViTBlock's spatial attention can model the global layout dependency of components within the array, avoiding misjudging "normal component gaps" as missing component defects.
[0085] This invention integrates C3k2 and RepViTBlock into a "C3k2-RepViT composite module," which is embedded in the fourth stage of the YOLOv13 backbone (this stage is responsible for extracting mid-to-high-dimensional features of PCBA defects). The specific process is as follows:
[0086] Branch replacement and structural connection: Replace the two k=2 convolutional layers in the original C3k2 residual branch structure with RepViTBlock; The residual branch input is first fed into the RepViTBlock training multi-branch: 1×1 convolution is compressed to 32 channels, 3×3 depthwise separable convolution is extracted to extract local texture, and spatial attention generates a 64-channel weight map and multiplies it element-wise with the feature map;
[0087] The RepViTBlock output is adjusted back to 64 channels by a 1×1 convolution, connecting with the BN and SiLU layers of the original residual branch, keeping the feature channels consistent with the main branch.
[0088] CSP Feature Aggregation Logic: The 64-channel coarse-grained features of the main branch (responsible for conveying the basic shape features of the components) and the 64-channel enhanced features of the residual branch after processing by RepViTBlock (including local textures of defects and global layout) are aggregated by element-wise addition, and finally output a 128-channel fused feature map, which is passed to the neck stage for multi-scale feature fusion.
[0089] During the training phase, RepViTBlock maintains a multi-branch structure, working in conjunction with the CSP branching of C3k2 to learn both the local fine-grained features of PCBA defects and the global context. During the inference phase, RepViTBlock automatically merges into a single 3×3 convolutional layer, working in collaboration with the CSP structure of C3k2.
[0090] To address the detection requirements of PCBA defects characterized by "miniaturization, density, and feature ambiguity," the advantages of the fusion module are specifically reflected in the following aspects: Enhanced feature representation capability: C3k2's k=2 convolution captures fine-grained local features of defects, while RepViTBlock's attention mechanism models the global context of dense components. The combination of the two can effectively distinguish defect differences between similar components, reducing the false detection rate. During the training phase, RepViTBlock's multi-branch approach ensures accuracy, while during the inference phase, a single branch integrates C3k2's CSP structure, reducing the module's computational load by 32% compared to using the Transformer module alone, thus meeting the real-time requirements of PCBA detection. For scenarios involving lighting fluctuations and component occlusion in PCBA detection, the fusion module can simultaneously extract local textures and global correlation features, improving the mAP of detecting minute defects. Moreover, the parameter increment of the fusion module is only 18% of that of the original C3k2.
[0091] The collaborative design improvements of PConv and C3k2-RepViTBlock constitute a deep collaborative optimization chain from low-level feature extraction to mid-to-high-level feature refinement. Specifically, the PConv module, located in the shallow layers of the network, achieves a significant expansion of the receptive field through its asymmetric windmill-shaped convolutional kernel structure at a much lower parameter cost than standard convolution. This allows it to efficiently capture primary features containing global contextual information in the early stages of the network, laying a high signal-to-noise ratio input foundation for subsequent processing. Building on this, the C3k2-RepViTBlock module, deeply embedded in the upper layers of the backbone network, undertakes the core functions of feature refinement and high-level semantic fusion: its k=2 small convolutional kernel structure, inherited from C3k2, is responsible for pixel-level fine-grained analysis of the aforementioned primary features, accurately locating minute abnormal edges and texture changes; simultaneously, the introduced RepViTBlock module, through its reparameterized attention mechanism, dynamically models the global spatial dependencies of dense components, thereby determining the saliency of local features within the context of the overall layout. This progressive processing mechanism, formed by the wide-area context awareness implemented by PConv and the local and global collaborative parsing implemented by C3k2-RepViTBlock, enables the model to effectively overcome the core challenges of weak features of small targets and complex background interference in PCBA defect detection. The coupling of these two technologies not only ensures the model's lightweight characteristics in terms of parameter and computational cost, but more importantly, it achieves a systematic improvement in detection accuracy and robustness for typical defects such as missing components, offsets, and solder bridging, especially those ambiguous defects that are highly similar in appearance to normal components.
[0092] Step 5.3: The C3k2-RepViTBlock and HyperACE mechanisms, with their core logic of hierarchical progression from local to global, form a coupled connection relationship of feature input support, discriminative closure, and bidirectional optimization. As complementary feature enhancement units, they complete end-to-end collaboration.
[0093] The coupling relationship between feature input support, discriminative loop closure, and bidirectional optimization is as follows:
[0094] The C3k2-RepViT module, located in the middle of the backbone network, is responsible for local enhancement and preliminary semantic abstraction of low-level features (including features processed by PConv). Its output feature maps are rich in clearer local defect cues and inter-component relationship information. These high-quality, semantically rich feature maps are excellent inputs for the subsequent HyperACE mechanism to calculate global correlations. HyperACE can then more effectively establish long-distance dependencies, associating defect features or normal component backgrounds scattered in different locations, thereby further improving the model's ability to recognize complex defect patterns.
[0095] The C3k2-RepViT module focuses on feature extraction from the local to the component group level, enabling the identification of local relationships between components. The HyperACE mechanism, building upon this foundation, performs global feature reshaping and enhancement at the entire image level. For example, after a minor solder joint anomaly is keenly captured by the C3k2-RepViT module, the HyperACE mechanism can combine this with the normal state of similar components on the other side of the board to help confirm whether the anomaly is a systemic defect or isolated noise. This closed loop from precise local observation to global judgment greatly improves the model's discriminative reliability.
[0096] During end-to-end training, HyperACE's global gradient feedback guides the C3k2-RepViT module to extract features beneficial to global discrimination. Conversely, the highly discriminative local features provided by the C3k2-RepViT module enable HyperACE's hypergraph computation to converge to more effective correlation patterns. This bidirectional, loss function-mediated collaborative optimization allows the two modules to mutually promote each other, jointly enhancing the network's overall feature representation capability.
[0097] Step 5.4: The overall architecture of the YOLOv13 object detection model is mainly divided into three parts: the backbone, the neck, and the head. The head is responsible for making the final prediction. It receives the feature map fused from the neck and outputs three pieces of information: the coordinates of the bounding box, the class probability of the object in the box, and the confidence level that the box contains the object.
[0098] A convolutional and attention fusion module (CAFMAttention) has been added to the detection head. This module includes local and global branches to fuse convolutional and attention mechanisms to capture global and local features. Figure 6 A schematic diagram of the proposed Convolutional and Attention Fusion Module (CAFM). In the local branch, convolution and channel transformation are used for local feature extraction. In the global branch, an attention mechanism is used to model long-range feature dependencies.
[0099] The convolutional and attention fusion module (CAFMAttention) is specifically located before the multi-scale feature map output from the neck network is fed into the final detection convolutional layer (i.e., the convolutional layer that generates bounding box coordinates, class confidence, and object confidence). This module takes the aforementioned multi-scale feature map as input, and its core structure consists of parallel local and global branches. It enhances the detection head's ability to discriminate defect features through a feature fusion mechanism. The specific deployment and structure are as follows: the input feature map is first copied and fed into two branches respectively.
[0100] Local branching: Local branching aims to extract local features, achieved through convolution and channel rearrangement. This part focuses on extracting local information from hyperspectral images to aid in the comprehensive modeling of global and local features.
[0101] Global Branch: The global branch utilizes an attention mechanism to model long-range feature dependencies. Through this attention mechanism, the model can capture a wider range of hyperspectral data information, thereby gaining a better understanding of global features.
[0102] Fusion operation: In the CAFM module, the features of the local and global branches are fused, typically through an addition operation. This fusion operation effectively combines local and global information, improving the model's understanding of hyperspectral images and its denoising performance.
[0103] Step 5.5: The Inner-Iou loss function is introduced to replace the loss function in YOLOv13, which further improves the generalization ability and convergence speed, thereby improving the overall localization performance of the model.
[0104] The IoU loss function has wide applications in object detection. In bounding box regression, it can not only evaluate the quality of the regression state but also accelerate convergence through gradient propagation by calculating the regression loss. In this invention, the Inner-IoU loss function is introduced to replace the loss function in YOLOv13.
[0105] The Inner-IoU loss function is a novel loss function that performs better for bounding box regression in object detection. It effectively improves detection performance by introducing auxiliary bounding boxes to calculate the IoU loss. The core of this method lies in using auxiliary bounding boxes of different scales to calculate the loss, accelerating convergence for high-IoU samples and using larger auxiliary bounding boxes for low-IoU samples. This method not only inherits the characteristics of IoU loss but also adds its own features, giving the loss function better generalization ability and faster convergence speed in different detection tasks. The figure shows a schematic diagram of Inner-IoU, where the ground truth (GT) box and anchor box are represented as follows: and The center point of the GT box and the center point inside the GT box are used Indicates This represents the center point of the anchor frame and the inner anchor frame. The width and height of the GT frame are represented by w^{gt} and h^{gt}, respectively, while the width and height of the anchor frame are represented by... and The variable "ratio" corresponds to the scale factor, which controls the size of the auxiliary bounding box. Its typical value range is [value range missing]. On different datasets, adjusting the ratio value can yield better detection results. The Inner-IoU loss function can be defined as:
[0106]
[0107]
[0108]
[0109]
[0110]
[0111]
[0112]
[0113] Inner-IoU loss inherits some characteristics from IoU loss while also possessing its own unique features. Like IoU loss, the value range of Inner-IoU loss is [0,1]. Because the auxiliary bounding box differs from the actual bounding box only in scale, the loss function is calculated in the same way, and the Inner-IoU-Deviation curve is similar to the IoU-Deviation curve. Compared to IoU loss, when the ratio is less than 1, the auxiliary bounding box size is smaller than the actual bounding box, and its effective regression range is smaller than that of IoU loss. However, its absolute gradient value is greater than the gradient obtained from IoU loss, which can accelerate the convergence of high-IoU samples. Conversely, when the ratio is greater than 1, the larger-scale auxiliary bounding box expands the effective regression range, providing a gain for low-IoU regression. Applying Inner-IoU to existing IoU-based bounding box regression loss functions... , , , , , The calculation method is as follows:
[0114]
[0115]
[0116]
[0117]
[0118]
[0119]
[0120] Step 5.6, as follows Figure 7 The diagram shows the structure of the PCBA defect detection model. The model is trained using a training set, and the trained model is used to verify the image data of the PCBA board obtained in step 3.
[0121] Step 6: Train a classic object detection algorithm model using labeled data, and evaluate its performance in object detection by comparing the results of the fused and improved YOLOv13 model with the classic object detection model. By comparing the detection results of these two models with actual labeled data, evaluate the improved model's performance in detection accuracy and efficiency.
[0122] The datasets were input into the classic object detection model and the model of this invention for comparative experiments, and the results are shown in Table 1:
[0123] Table 1
[0124] Model <![CDATA[Params / 10 6 ]]> GFLOPS mAP@0.5(%) mAP@0.5:0.95(%) Precision Recall YOLOv5 7.0 15.6 83.2 74.3 81.7 78.2 YOLOv8 3.0 8.3 85.8 76.8 85.1 78.6 YOLOv11 2.5 6.5 87.7 77.5 86.9 79.3 This invention model 1.4 4.7 90.4 79.9 88.9 80.5
[0125] The foregoing description of the embodiments enables those skilled in the art to make or use the present invention. Various modifications to the embodiments will be readily apparent to those skilled in the art. The general principles of the invention may be implemented in other embodiments without departing from the spirit or scope of the invention. Therefore, the invention should not be limited to the embodiments shown herein, but should cover the widest scope consistent with the principles and novel features disclosed herein.
Claims
1. A PCBA defect detection method integrating improved YOLOv13, characterized in that, The method includes the following steps: S1: Collect actual PCBA samples from production and manually supplement them with four types of defect samples: missing components, misalignment, tombstoning, and bridging. Use an industrial camera to capture their image data, and upload and save the captured image data. S2: Perform dataset annotation, preprocessing, enhancement, and segmentation on the image data described in S1; S3: Build a PCBA defect detection model based on the YOLOv13 model; S3.1: An asymmetric padding method is used to replace the original backbone network's standard convolution with the windmill-shaped convolution PConv; S3.2: Retain the CSP branch structure of C3k2, replace the k=2 convolution of the residual branch with RepViTBlock to form the fusion module C3k2-RepViTBlock, and embed it into the 4th stage of the YOLOV13 backbone; S3.3: The output of C3k2-RepViTBlock serves as the core input of the HyperACE mechanism, achieving bidirectional gradient feedback and collaborative optimization through loss function mediation, forming a coupled feature enhancement from local to global. S4: Train a PCBA defect detection model using labeled data and evaluate its performance in PCBA target detection.
2. The PCBA defect detection method integrating improved YOLOv13 according to claim 1, characterized in that, The specific steps for S2 are as follows: Dataset annotation: The LabelImg v1.8.6 annotation tool was used to annotate the data on the image obtained by S1. The labels were named missing parts, offset, tombstone, and tin connection, and the label boxes were close to the target boundary. Format conversion: The annotated image data is saved as an ".xml" file, and then converted into a ".tfrecord" file using a Python script. Annotation errors are corrected through two-person cross-validation to obtain the initial PCBA dataset. Preprocessing steps: Crop out redundant background from the image and retain the effective area of PCBA; use min-max normalization to map pixel values to [0,1], and use image translation to make the center of PCBA board coincide with the center of the image, and finally output a 224×224×3 format image; Data augmentation and partitioning: The dataset and its corresponding annotation information are augmented using six methods: horizontal flipping, vertical flipping, translation, clockwise rotation, counterclockwise rotation, and Gaussian blur. The training set, test set, and validation set are randomly partitioned in a 7:2:1 ratio.
3. The PCBA defect detection method integrating improved YOLOv13 according to claim 1, characterized in that, The specific steps in S3.1 include: The windmill-shaped convolution PConv was used to replace the standard convolution in the lower layers of the backbone network. PConv uses asymmetric padding and contains four parallel convolutional branches with kernel sizes of 1×3, 3×1, 1×3, and 3×1, and padding parameters of P(0,1,0,3), P(0,3,0,1), P(0,1,3,0), and P(3,0,1,0). After convolution, the convolution is processed by a BN layer and the SiLU activation function to enhance feature extraction, increase the receptive field, introduce minimal parameter increases, and reduce the number of parameters and computational cost.
4. The PCBA defect detection method integrating improved YOLOv13 according to claim 3, characterized in that, S3.2 The specific steps include: The CSP main branch of C3k2 in the original YOLOV13 model is retained, and the standard convolutional units in the residual branch of the C3k2 module are replaced with RepViTBlock to form the fusion module C3k2-RepViTBlock. C3k2-RepViTBlock is embedded in the fourth stage of the YOLOV13 backbone network. C3k2-RepViTBlock receives high signal-to-noise ratio input provided by the PConv module and dynamically models the global spatial dependencies of dense components through the reparameterized attention mechanism of RepViTBlock, forming a progressive feature processing structure of wide-area context awareness → local → global collaborative parsing.
5. The PCBA defect detection method integrating improved YOLOv13 according to claim 4, characterized in that, The specific steps in S3.3 include: The C3k2-RepViTBlock and HyperACE mechanisms use a hierarchical progression from local to global as their core logic, forming a coupled connection relationship of feature input support, discriminative closure, and bidirectional optimization. The two, as complementary feature enhancement units, complete end-to-end collaboration. Feature input support: C3k2-RepViT is deployed in the middle of the backbone network to perform local enhancement and preliminary semantic abstraction of the low-level features, and output feature maps with rich fine-grained defect clues and high semantic quality, which serve as the core input of the HyperACE mechanism and provide a feature foundation for global information fusion; Judgment closed loop: C3k2-RepViTBlock extracts and identifies local correlations from local pixels to component groups to capture defect clues; based on the local feature results, it completes global feature reorganization and enhancement at the whole image level through the HyperACE mechanism, and combines global information to judge the authenticity and type of the locally captured clues, forming a judgment closed loop from local to global. Bidirectional optimization: In end-to-end training, C3k2-RepViTBlock and the HyperACE mechanism form a bidirectional collaborative optimization connection through the loss function. The global gradient feedback of HyperACE guides C3k2-RepViTBlock to selectively extract local features that are valuable for global discrimination. The strongly discriminative local features output by C3k2-RepViT drive the hypergraph computation of HyperACE to converge to a more effective global correlation pattern, thereby optimizing the overall feature representation capability of the network.
6. The PCBA defect detection method integrating improved YOLOv13 according to claim 5, characterized in that, The Inner-IoU loss function is introduced to optimize the original YOLOV13 loss function. The Inner-IoU loss function with a scale factor ratio of 0.8 is used to replace the original loss function to optimize the bounding box regression.
7. The PCBA defect detection method integrating improved YOLOv13 according to claim 1, characterized in that, The PCBA defect detection model adds a convolution and attention fusion module CAFMAttention to the detection head part, specifically located before the multi-scale feature map output by the neck network is fed into the final detection convolutional layer. The convolution and attention fusion module CAFMAttention includes local branches and global branches. The features of the two branches are fused by element-wise addition to enhance the ability to extract global and local features.