Weld intelligent defect detection model training method, detection method and electronic equipment

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By combining the YOLOv8 object detection network and the PPO reinforcement learning model, weld defect detection is optimized, solving the problem of high false alarm rate in weld defect detection in complex industrial scenarios, and achieving high-precision weld defect detection with low false alarm rate.

CN122199516APending Publication Date: 2026-06-12CHONGQING UNIV +1

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: CHONGQING UNIV
Filing Date: 2026-04-21
Publication Date: 2026-06-12

Application Information

Patent Timeline

21 Apr 2026

Application

12 Jun 2026

Publication

CN122199516A

IPC: G06T7/00; G06V10/774; G06N3/0464; G06N3/092; G06V10/40

AI Tagging

Application Domain

Image analysis Character and pattern recognition

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

Existing weld defect detection methods have a high false alarm rate in complex industrial scenarios, requiring manual verification, and traditional methods have limited false alarm suppression capabilities.

⚗Method used

The YOLOv8 object detection network is combined with the PPO reinforcement learning model. By constructing a reinforcement learning environment, the object detection model is trained using a sample set of weld seam X-ray images. An experience pool is built to train the PPO reinforcement learning model, optimize the adaptive decision-making of candidate boxes, delete false positive boxes and retain reliable candidate boxes.

🎯Benefits of technology

It significantly reduces the number of false alarms, improves detection accuracy and recall, maintains high detection precision and reliability, and reduces the false alarm rate of weld X-ray images in complex industrial scenarios.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122199516A_ABST

Patent Text Reader

Abstract

The present application belongs to the technical field of machine vision, and provides a welding seam intelligent defect detection model training method, a detection method and an electronic device. The training method comprises: training a target detection network using a welding seam X-ray image sample set to obtain a target detection model; obtaining a preliminary detection result of each welding seam X-ray image using the target detection model; constructing a state of a reinforcement learning environment using each welding seam X-ray image and the preliminary detection result thereof; calculating an immediate reward value of each action in a set of actions performed in each state according to a reward calculation rule; forming a triple consisting of each state, the action performed in the state and the immediate reward value of the action; a plurality of triples form an experience pool; training a PPO reinforcement learning model using the experience pool; and sequentially connecting the target detection model and a policy network of the PPO reinforcement learning model to obtain a welding seam intelligent defect detection model. The present application can effectively reduce the number of false positives while maintaining a high recall rate.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of machine vision technology, and in particular to a training method, detection method and electronic equipment for an intelligent weld defect detection model. Background Technology

[0002] Currently, in industrial settings such as boiler piping, weld quality inspection typically relies on weld X-ray images for defect identification. Traditional weld defect detection methods involve experienced inspectors visually interpreting weld X-ray images, identifying and judging defects such as porosity, lack of fusion, and cracks based on their experience. This method is highly dependent on operators, labor-intensive, inefficient, and easily affected by subjective factors.

[0003] With the development of deep learning and object detection technologies, automatic weld defect detection methods based on convolutional neural networks have been introduced. These methods utilize general object detection models such as YOLO (You Only Look Once, a real-time object detection network) to intelligently detect weld X-ray images. These methods improve detection efficiency to some extent and reduce the burden of manual interpretation. However, in complex industrial scenarios, the weld area suffers from complex backgrounds, uneven contrast, numerous artifacts, diverse defect shapes, and significant size variations, leading to a high false positive rate. At weld edges and noisy areas, the object detection model easily generates numerous candidate boxes unrelated to actual defects, resulting in a high false positive rate. This necessitates manual review and filtering of each detection result. Furthermore, the determination of defect-free images is also problematic. Even in defect-free weld X-ray images, the object detection model may still detect incorrect defect candidate boxes, leading to a high false positive rate and affecting accuracy. This also requires manual review and filtering of each object detection result.

[0004] To address the issue of high false alarm rates, related technologies have reduced false positives by adjusting detection thresholds and using simple confidence screening. However, these solutions have limited false alarm suppression capabilities and still require manual review and screening of target detection results one by one. Summary of the Invention

[0005] This application aims to at least solve the technical problems existing in the prior art and provide a training method, detection method and electronic equipment for intelligent weld defect detection model.

[0006] In a first aspect, this application provides a training method for an intelligent weld defect detection model. The method includes: acquiring a sample set of weld X-ray images; training a target detection network using the sample set to obtain a target detection model; using the target detection model to identify defects in multiple weld X-ray images to obtain preliminary detection results for each weld X-ray image; constructing a state of a reinforcement learning environment using each weld X-ray image and its preliminary detection results; calculating the immediate reward value of each action in the action set executed in each state according to reward calculation rules; forming a triplet with each state, the action executed in that state, and the immediate reward value of the action executed, and multiple triplets forming an experience pool; constructing a PPO reinforcement learning model and training the PPO reinforcement learning model using the experience pool; and sequentially connecting the target detection model and the policy network of the trained PPO reinforcement learning model to obtain the intelligent weld defect detection model.

[0007] Secondly, this application provides a method for intelligent weld defect detection based on object detection and PPO reinforcement learning. The method includes: inputting an X-ray image of the weld to be tested; using the object detection model of the intelligent weld defect detection model to identify defects in the X-ray image of the weld to be tested and obtaining preliminary detection results; constructing a test state based on the preliminary detection results and the X-ray image of the weld to be tested; if more than one test state is constructed, inputting each test state into the policy network of the PPO reinforcement learning model of the intelligent weld defect detection model to obtain the action probability distribution of the test state, and performing a deletion action or a retention action on the candidate box corresponding to the test state according to the action probability distribution; wherein, the intelligent weld defect detection model is trained according to the training method described in the first aspect of this application.

[0008] Thirdly, this application provides a computer device including a memory, a processor, and a computer program stored in the memory, wherein the processor executes the computer program to implement the steps of the method described in the first or second aspect of this application.

[0009] Fourthly, this application provides a computer program product, including a computer program that, when executed by a processor, implements the steps of the method described in the first or second aspect of this application.

[0010] Fifthly, this application provides an electronic device comprising: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores a computer program executable by the at least one processor, the computer program being executed by the at least one processor to enable the at least one processor to perform the method described in the first or second aspect of this application.

[0011] The beneficial technical effects of this application are as follows: 1. In the training process of the intelligent weld defect detection model, the target detection network is first trained using a sample set of weld X-ray images to obtain the target detection model. Based on the target detection model, the preliminary detection results of the weld X-ray images are obtained. Based on the preliminary detection results of multiple weld X-ray images, action sets, and reward calculation rules, multiple triples are constructed to form an experience pool. One candidate box corresponds to one triple. The state of the triple includes the weld X-ray image and candidate box information. Finally, the policy network of the target detection model and the PPO (Proximal Policy Optimization) reinforcement learning model are connected to obtain the trained intelligent weld defect detection model. The experience pool is used to train the PPO reinforcement learning model, enabling the PPO reinforcement learning model to learn an adaptive intelligent decision-making mechanism. It can delete or retain false positive candidate boxes in the preliminary detection results, especially showing a significant advantage in the ability to suppress false positives. The final intelligent weld defect detection model can effectively reduce the number of false positives while maintaining a high recall rate. 2. The intelligent weld defect detection method based on a trained intelligent weld defect detection model provided in this application first performs preliminary detection on the input weld X-ray image using a trained target detection model. Then, a PPO reinforcement learning model adaptively judges each predicted box according to its strategy, deleting false positives and retaining reliable candidate boxes, finally outputting the defect detection result optimized by reinforcement learning. For weld X-ray images in complex industrial scenarios, this method effectively filters out erroneous candidate boxes in the preliminary detection results caused by background noise, artifacts, or non-defect structures, significantly reducing the number of false positives and improving the detection accuracy of weld X-ray images. Furthermore, while deleting false positives, it minimizes the deletion of true positives (TPs), maintaining the integrity of defect identification. It significantly reduces false positives while maintaining a high recall rate, improving detection accuracy and the reliability of detection results. Attached Figure Description

[0012] Figure 1 This is a flowchart illustrating a preferred embodiment of the intelligent weld defect detection model training method of the present invention. Figure 2 This is a schematic diagram illustrating the specific process of a weld seam intelligent defect detection model training method in an example of the present invention; Figure 3 This is a schematic diagram of the intelligent weld defect detection model structure in a preferred embodiment of the present invention; Figure 4 This is a schematic diagram of the strategy network structure in a preferred embodiment of the present invention; Figure 5 This is a schematic diagram of the evaluation network structure in a preferred embodiment of the present invention; Figure 6This is a flowchart illustrating a preferred embodiment of the intelligent weld defect detection method of the present invention. Figure 7 This is a schematic diagram of experimental effect comparison in one example of the present invention; Figure 8 This is a schematic diagram of the structure of an electronic device in a preferred embodiment of the present invention. Detailed Implementation

[0013] Embodiments of the present invention are described in detail below. Examples of these embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary and are only used to explain the present invention, and should not be construed as limiting the present invention.

[0014] The execution entity of the intelligent weld defect detection model training method or the intelligent weld defect detection method based on target detection and PPO reinforcement learning provided in this application includes, but is not limited to, at least one of the following electronic devices that can be configured to execute the method provided in the embodiments of this application: a server, a terminal, or a computer. In other words, the intelligent weld defect detection model training method can be executed by software or hardware installed on a terminal device or a server device, and the software can be a blockchain platform. The server includes, but is not limited to, a single server, a server cluster, a cloud server, or a cloud server cluster. The server can be an independent server or a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (CDN), and big data and artificial intelligence platforms.

[0015] This application provides a method for training an intelligent defect detection model for welds. In a preferred embodiment, please see [link to preferred embodiment]. Figure 1 and Figure 2 The method includes: Step A1: Obtain a sample set of weld X-ray images, train a target detection network using the sample set of weld X-ray images, and obtain a target detection model.

[0016] Examples, but not limited to, acquiring X-ray images of weld seams under inspection using industrial digital radiographic inspection systems. Generally, industrial digital radiographic inspection systems include a constant-potential X-ray generator, a linear array or flat-panel digital detector. The X-rays emitted by the constant-potential X-ray generator penetrate the weld seam under inspection and form an X-ray image of the weld seam on the linear array or flat-panel digital detector. Weld X-ray images are generally grayscale images, converting the intensity differences of the rays after penetrating the material into different brightness levels. The thicker or denser the weld, the more rays it absorbs, and the darker (or brighter, depending on the image polarity setting, but generally darker represents a thicker area) the weld X-ray image appears. Figure 7 The diagram illustrates X-ray images of welds in industrial settings such as boiler pipes. It employs double-wall double-image (DWDI) exposure technology, where an X-ray beam penetrates two layers of pipe walls at a certain angle, forming two staggered elliptical images on a linear array or flat-panel digital detector. This allows for the detection of a relatively long area of the weld in a single exposure.

[0017] For example, weld defect types are not limited to porosity, cracks, lack of fusion, undercut, and burn-through. Porosity is formed when gas fails to escape during weld metal solidification, appearing as isolated or dense, regularly shaped (mostly circular or elliptical) black spots on weld X-ray images. Cracks appear as thin black lines on weld X-ray images, which can be straight, curved, or branching. Lack of fusion appears as continuous or discontinuous black lines on weld X-ray images, divided into bevel lack of fusion and interlayer lack of fusion. Bevel lack of fusion appears as a black line parallel to the base metal on the sidewall of the weld bevel. Interlayer lack of fusion appears between weld beads in multi-layer welds, usually as a short black line, which is generally straight and extends along the weld direction, but its position is fixed, unlike cracks which branch or change direction. Undercut appears as continuous or intermittent, recessed black lines or bands at the junction of the weld and the base metal. In images, it appears as a sudden inward indentation of the weld outline, causing the thickness of this area to decrease, thus appearing darker than normal weld metal. Burn-through appears as a localized, circular or elliptical, deep black (or extremely bright) image.

[0018] It should be noted that the weld X-ray image sample set can be pre-stored in a database, and the execution entity of this method reads the weld X-ray image sample set from the database. The weld X-ray image sample set includes multiple weld X-ray image samples. Each weld X-ray image sample includes one weld X-ray image and its true label. The true label can be annotated by experts and includes one or more true defect annotation boxes or an empty box, where an empty box indicates that the weld X-ray image does not contain weld defects. The true defect annotation boxes cover the type of weld defect in the weld X-ray image and the bounding box (i.e., annotation box) information.

[0019] In one example, 38,994 weld X-ray images were acquired, preprocessed, and labeled with ground truth, generating 38,994 weld X-ray image samples. These samples were then divided into training and validation sets in a 9:1 ratio. The target detection network can be from the YOLO family. YOLOv8 (an existing target detection network) can be used (specifically, variants such as YOLOv8m or YOLOv8l can be selected to balance accuracy and speed). YOLOv8 employs an anchor-free design, and its backbone network and Feature Pyramid Network (FPN) can effectively extract multi-scale features from macroscopic morphology to microscopic defects in weld images. Preprocessing is not limited to pixel value normalization and size adjustment. The object detection network is trained using a training set. During training, a loss function is calculated based on the detection results (preliminary detection results) output by the network and the true labels of the weld X-ray images of the samples. The loss function is not limited to a weighted sum of existing bounding box regression loss and defect category prediction loss. The network parameters are updated using Stochastic Gradient Descent (SGD) based on the loss function value. Training is stopped when a stopping condition is met, and the set of network parameter values at which the loss function value is minimized is saved. These saved parameter values are then loaded into the object detection network to obtain the trained object detection model. The stopping condition is not limited to reaching the maximum preset number of training iterations or the loss function value being less than a preset loss threshold. The trained image segmentation model is validated using a validation set. If validation is successful, the validated object detection model becomes the final object detection model. If validation fails, deep learning model parameter tuning methods such as changing the learning rate and training optimizer are used, and the model is trained again using the training set. YOLOv8 was fully trained using the training set, and the model parameters were updated through multiple iterations to obtain the optimal target detection model capable of performing preliminary detection of weld defects. The candidate boxes output by the trained target detection model will be used as input for subsequent reinforcement learning processes.

[0020] Step A2: Use the target detection model to identify defects in multiple weld X-ray images and obtain preliminary detection results for each weld X-ray image.

[0021] For example, 2601 weld X-ray images containing real weld defects and 3187 weld X-ray images without weld defects were selected as detection samples to comprehensively cover weld imaging conditions under different working conditions. The preliminary detection results of the weld X-ray images without weld defects were mainly used to analyze and obtain false positive bounding boxes to construct the negative samples required for subsequent PPO reinforcement learning training. The trained target detection model was used to detect the selected weld X-ray images to obtain preliminary detection results. The preliminary detection results included information on more than one candidate bounding box, which included the defect category and the coordinates, width, and height of the candidate bounding box.

[0022] Step A3 involves constructing the state of the reinforcement learning environment using each weld X-ray image and its preliminary detection results. The preliminary detection result for each weld X-ray image includes one or more candidate bounding boxes or is empty. The candidate bounding box information includes the defect category, coordinates, width, and height. An empty preliminary detection result indicates that the target detection model did not detect any weld defects in the weld X-ray image. The defect category can be numerically represented, with different values representing different defect types. Therefore, to enable the subsequent PPO reinforcement learning model to more accurately determine the authenticity of each candidate bounding box and learn decision-making experience, preferably, each weld X-ray image and each candidate bounding box information in the preliminary detection result of that weld X-ray image constitute a state, i.e., the state of the reinforcement learning environment.

[0023] Step A4: Calculate the immediate reward value for each action in the action set performed in each state according to the reward calculation rules. Specifically, the action set includes actions to be retained and actions to be deleted. Preferably, step A4 includes: Step A41: Set true / false labels for each candidate box in the preliminary detection results of the weld X-ray image in this state. The true / false labels include true positives (TP) and false positives (FP). A true positive (TP) indicates that the preliminary detection results output by the target detection model are correct and consistent with the true label. A false positive (FP) indicates that the preliminary detection results output by the target detection model are false positives and do not exist in the true label. Step A42: Calculate the immediate reward value for performing each action in the state based on the true / false flags of the candidate boxes in the state, according to the reward calculation rules.

[0024] The reward calculation rules include: when the candidate box of a state is marked as a true positive, executing the "keep" action results in a positive immediate reward value, while executing the "delete" action results in a negative immediate reward value; when the candidate box of a state is marked as a false positive, executing the "keep" action results in a negative immediate reward value, while executing the "delete" action results in a positive immediate reward value. For example, the immediate reward values are set as follows: TP: +2 for the "keep" action, -6 for the "delete" action; FP: +3 for the "delete" action, -3 for the "keep" action. This strengthens the penalty for missed detections and suppresses false positives, thereby guiding PPO to maximize precision while maintaining recall. PPO stands for Proximal Policy Optimization.

[0025] Step A5: Form a triplet for each state, the action performed in that state, and the immediate reward value for that action. Multiple triplets constitute the experience pool. Specifically, the triplet format is "State—Action—Reward," where the state is the state obtained in step A4, the action is the action performed in that state, and the reward is the immediate reward value for the performed action. For example... Figure 2 As shown, all triples are combined into a JSON (JavaScript Object Notation) file for training the PPO reinforcement learning model.

[0026] Step A6: Construct the PPO reinforcement learning model and train it using the experience pool. See below. Figure 3 As shown, the PPO reinforcement learning model includes a policy network and an evaluation network. The policy network takes the states in the triplet as input and outputs an action probability distribution. The agent executes a "keep" or "delete" action based on the action probability distribution for the candidate boxes in that state, determining whether a candidate box should be considered the final detection result. This achieves adaptive optimization of the initial detection results, especially effectively filtering false positives. The evaluation network also takes the states in the triplet as input and outputs an estimate of the state value function for that state. Training the PPO reinforcement learning model using an experience pool allows it to learn from human experience.

[0027] Step A7, please see Figure 3 As shown, the policy network that sequentially connects the target detection model and the trained PPO reinforcement learning model is used to obtain the intelligent weld defect detection model. In the intelligent weld defect detection model, the target detection network and the policy network are connected in series.

[0028] In a preferred embodiment, each of the multiple weld X-ray images is assigned a true label, which is one or more true defect annotation boxes or an empty box; the step of assigning a true / false mark to each candidate box in the preliminary detection result of the weld X-ray images in this state includes: To determine whether each candidate box in the preliminary inspection results of the weld X-ray image in this state has a matching real defect annotation box in the real label: If a real defect marker box with a matching location exists, the intersection-union ratio (IUR) of the candidate box and the real defect marker box with a matching location is calculated. If the IUR is greater than or equal to the IUR threshold, the candidate box is set as a true positive. If the IUR is less than the IUR threshold, the candidate box is set as a false positive. If no matching true defect marker box exists, the true / false flag of the candidate box is set to false positive.

[0029] In this implementation, if the geometric center of a real defect bounding box in the real labels is located within the neighborhood of the geometric center of a candidate bounding box, then the real defect bounding box is considered to match the current candidate bounding box position; otherwise, the positions are considered not to match. By judging position matching and comparing the cross-union ratio (CUP) with the CUP threshold, accurate real / false labels can be assigned to each candidate bounding box in the initial detection results, thereby improving the policy accuracy of the PPO reinforcement learning model.

[0030] In a preferred embodiment, please see Figure 4 As shown, the policy network includes a first feature extraction module and a decision output head. The first feature extraction module includes a first CNN network, a first fully connected layer, a first MLP network, and a first concatenation unit. The first CNN network extracts features from the weld X-ray image of the state of the input PPO reinforcement learning model. The first fully connected layer maps the features extracted by the first CNN network to a fixed-dimensional first image feature vector. The first MLP network extracts a first candidate box structured feature vector from the candidate box information of the input PPO reinforcement learning model's state. The first concatenation unit concatenates the first image feature vector and the first candidate box structured feature vector at the feature layer to obtain a first state vector. The decision output head includes one or more third fully connected layers to map the first state vector to an action probability distribution. CNN stands for Convolutional Neural Network, and MLP is short for Multi-Layer Perceptron.

[0031] In this embodiment, the input to the policy network is state s, which is multimodal and can be represented as s={I, v}, where I is the normalized weld X-ray image, and v is the candidate box information, such as 8-dimensional, including normalized coordinates, width, height, area, and candidate box attribute / category information. Specifically, the first CNN network includes a convolutional neural network and activation function units connected in sequence. The activation function unit is not limited to the ReLU activation function. Specifically, the decision output head includes two fully connected layers (the third fully connected layer) and a softmax function processing unit. The activation function of each fully connected layer can be the ReLU activation function. The softmax function processing unit maps the vector output by the two fully connected layers to an action probability distribution, such as the probability of deleting an action and the probability of retaining an action. The ReLU (short for Rectified Linear Unit) activation function represents the Rectified Linear Unit activation function; the Softmax function is a normalized exponential function.

[0032] In this embodiment, more preferably, the evaluation network and the policy network use the same feature extraction network structure to process the same state to obtain the second state vector. See [link to previous embodiment]. Figure 5 As shown, the evaluation network includes a second feature extraction module and an evaluation output head. The second feature extraction module includes a second CNN network, a second fully connected layer, a second MLP network, and a second concatenation unit. The second CNN network is used to extract features from the weld X-ray image of the state of the input PPO reinforcement learning model. The second fully connected layer is used to map the features extracted by the second CNN network into a fixed-dimensional second image feature vector. The second MLP network is used to extract the second candidate box structured feature vector from the candidate box information of the state of the input PPO reinforcement learning model. The second concatenation unit is used to concatenate the second image feature vector and the second candidate box structured feature vector at the feature layer to obtain the second state vector. The evaluation output head includes one or more fourth fully connected layers, which are used to map the second state vector into a state value function estimate. The state value function estimate is used for the calculation of the maximum advantage function value and policy update, thereby achieving stable training and convergence of the candidate box selection decision.

[0033] In this embodiment, the second CNN network includes a convolutional neural network and an activation function unit connected in sequence. The activation function unit is not limited to the ReLU activation function. Specifically, the evaluation output head includes two fully connected layers and a scalar mapping unit. The activation function of each fully connected layer can be the ReLU activation function. The scalar mapping unit is used to map the vector output by the two fully connected layers to a single scalar, i.e., the estimated state value function. The scalar mapping unit is not limited to the sigmoid function.

[0034] In a preferred embodiment, multiple training trajectory samples and multiple validation trajectory samples are collected from the experience pool. Both training and validation trajectory samples correspond to all triples generated from a preliminary detection result. The preliminary detection result contains multiple candidate boxes, each generating a triple. The triples generated from all candidate boxes in the preliminary detection result are then sequentially combined to form either a training trajectory sample or a validation trajectory sample. In step A6, the PPO reinforcement learning model is trained using the experience pool. Specifically, the following steps are executed iteratively until the trained PPO reinforcement learning model converges on the multiple validation trajectory samples: Step A61 involves inputting the same triplet from the training trajectory samples into both the policy network and the evaluation network. The policy network outputs the action probability distribution of the triplet, and the evaluation network outputs the estimated state value function of the triplet. Specifically, the state of the same triplet is input into both the policy network and the evaluation network. Let this same triplet be the t-th triplet, and its state be st, where st = {I, vt}, where I is the weld X-ray image, and vt is the t-th candidate box information in the preliminary detection results of the weld X-ray image. t is a positive integer.

[0035] Step A62: Execute an action based on the action probability distribution and obtain the immediate reward value of the action from the input triples. Calculate the maximum advantage function value based on the immediate reward value rt and the estimated state value function. Update the network parameters of the policy network based on the maximum advantage function value.

[0036] Specifically, the temporal difference residual (TD-error) is calculated first as follows: δt = rt + γVφ(st+1) - Vφ(st). δt represents the temporal difference residual, γ is the discount factor, Vφ(st) represents the state value function estimate obtained by the evaluation network from the state input of the t-th triple in the training trajectory sample, and Vφ(st+1) represents the state value function estimate obtained by the evaluation network from the state input of the (t+1)-th triple in the training trajectory sample.

[0037] Then, the value of the maximum dominance function At is calculated according to the following equation: At=δ t+γ λ At+1, λ represents the smoothing coefficient. It indicates multiplication.

[0038] Finally, the network parameters of the policy network are updated using gradient descent based on the maximum advantage function value At, which are the learnable parameters.

[0039] In this embodiment, preferably, to ensure training stability, a clipped surrogate objective is used to limit the magnitude of network parameter updates.

[0040] Step A63: Calculate the mean squared error loss between the maximum advantage function value and the estimated state value function value, and update the network parameters of the evaluation network based on the mean squared error loss.

[0041] This invention also discloses an intelligent weld defect detection method based on target detection and PPO reinforcement learning. In a preferred embodiment, please see... Figure 6 The method includes: Step B1: Input the X-ray image of the weld to be tested; Step B2: Use the target detection model of the intelligent weld defect detection model to identify defects in the X-ray image of the weld under test and obtain preliminary detection results. Step B3: Construct the test state based on the preliminary detection results and the X-ray image of the weld to be tested; if the preliminary detection results are empty, the test state cannot be constructed, and the detection ends at this time; if the preliminary detection results are not empty, that is, there is more than one candidate box, each candidate box generates a test state, and a test state includes the X-ray image of the weld to be tested and a candidate box information. Step B4: If more than one test state is constructed, each test state is input into the policy network of the PPO reinforcement learning model of the weld intelligent defect detection model to obtain the action probability distribution of the test state, and the deletion action or retention action is performed on the candidate box corresponding to the test state according to the action probability distribution; wherein, the weld intelligent defect detection model is trained according to the above weld intelligent defect detection model training method steps.

[0042] Experimental results verification: The test results on the unified test set (644 weld X-ray images) are shown in Table 1. Table 1 shows the test results of the intelligent defect detection model trained according to the intelligent defect detection model training method on the unified test set (644 weld X-ray images).

[0043] Table 1. Test results of the intelligent defect detection model on the unified test set.

[0044] The results show that after adopting the scheme of the present invention, false positives are significantly reduced and precision is improved by 31.18%; recall remains basically unchanged; overall performance (F1, the harmonic mean of precision and recall) is significantly improved by 20.11%; demonstrating the significant technical effect of the present invention, which can achieve high-precision, low-false-report weld defect detection in industrial scenarios.

[0045] In one application example of the present invention, for X-ray images of weld seams of small-diameter steel pipes used in industrial boilers, the original image's true label is as follows: Figure 7 The first row of images shows the candidate bounding boxes marked using only the preliminary detection results from the object detection model, as shown in the second row. The detection results using the method of this invention are shown in the third row. Figure 7 In the middle, the first column is compared: The YOLOv8 object detection model generated two false "other defect" boxes, which were successfully removed by the PPO solution of this invention, while retaining the true positive examples. In the second comparison: the real image has no real defects, but the YOLOv8 object detection model generated multiple false positive boxes, all of which were deleted by the PPO solution of this invention.

[0046] The third comparison: Real images contain no real defects, yet the YOLOv8 object detection model generates false positives for "other defect" and "porosity." The PPO solution of this invention removes most of these false positives, retaining only one residual false positive bounding box. This demonstrates a significant reduction in false positives while maintaining true instance detection capability.

[0047] The present invention also discloses a computer device, including a memory, a processor, and a computer program stored in the memory, characterized in that the processor executes the computer program to implement the steps of a weld intelligent defect detection model training method or a weld intelligent defect detection method based on target detection and PPO reinforcement learning.

[0048] This invention combines the YOLOv8 object detection model with the PPO reinforcement learning algorithm to form an intelligent defect detection method for weld X-ray images. In complex industrial scenarios, through adaptive decision optimization of candidate boxes, this invention can significantly reduce the false alarm rate, improve detection accuracy, and achieve stable and reliable defect identification while maintaining a high recall rate. Its technical effects are specifically reflected in the following aspects: First, compared with existing deep learning-based weld defect detection methods, this invention has a significant advantage in suppressing false positives. Experimental results show that on a unified test dataset, the accuracy of the traditional YOLOv8 model is only 0.4674, while the accuracy of this invention, after introducing PPO reinforcement learning and using optimized reward functions and adapted hyperparameters, can reach up to 0.7792. This result demonstrates that this invention can effectively filter out false detection boxes caused by background noise, artifacts, or non-defect structures, significantly reducing the number of false alarms and improving the detection accuracy of images.

[0049] Secondly, this invention maintains a high recall rate while significantly reducing false positives. Experiments show that under different training configurations, the recall rate of this invention remains around 0.8126, without a significant decrease compared to the original YOLOv8 model (0.8150). This indicates that this invention can minimize the false positives (TPs) during the deletion of false positives, thus preserving the integrity of defect identification.

[0050] Third, the overall performance index F1 score of this invention is significantly improved, which can more comprehensively reflect the balance between precision and recall. Experimental results show that the F1 score of the traditional YOLOv8 model is 0.5944, while the F1 score of the method of this invention is improved to 0.7955, an improvement of 33.9%. This result indicates that this invention significantly enhances the overall detection performance while maintaining a high recall rate.

[0051] The present invention also discloses a computer program product, including a computer program that, when executed by a processor, implements the steps of the above-mentioned intelligent weld defect detection model training method or the intelligent weld defect detection method based on target detection and PPO reinforcement learning provided by the present invention. The computer program product should be understood as a software product that mainly implements its solution through a computer program, such as a program product integrated in the cloud or a software library.

[0052] The present invention also discloses an electronic device, in one embodiment of which the electronic device includes at least one processor; and a memory communicatively connected to the at least one processor; wherein, The memory stores a computer program that can be executed by the at least one processor, the computer program being executed by the at least one processor to enable the at least one processor to execute the weld intelligent defect detection model training method or the weld intelligent defect detection method based on target detection and PPO reinforcement learning provided by the present invention.

[0053] like Figure 8 The diagram shown is a structural schematic of an electronic device for a weld intelligent defect detection model training method or a weld intelligent defect detection method based on target detection and PPO reinforcement learning, according to an embodiment of the present invention. The electronic device may include a processor 10, a memory 11, a communication bus 12, and a communication interface 13. It may also include a computer program stored in the memory 11 and executable on the processor 10, such as a weld intelligent defect detection model training method or a weld intelligent defect detection method program based on target detection and PPO reinforcement learning.

[0054] In some embodiments, the processor 10 may be composed of integrated circuits, such as a single packaged integrated circuit or multiple integrated circuits with the same or different functions, including combinations of one or more central processing units (CPUs), microprocessors, digital processing chips, graphics processors, and various control chips. The processor 10 is the control unit of the electronic device, connecting various components of the entire electronic device through various interfaces and lines. It executes programs or modules stored in the memory 11 (e.g., executing a weld seam intelligent defect detection model training method or a weld seam intelligent defect detection method based on target detection and PPO reinforcement learning), and calls data stored in the memory 11 to perform various functions of the electronic device and process data.

[0055] The memory 11 includes at least one type of readable storage medium, including flash memory, portable hard drive, multimedia card, card-type memory (e.g., SD or DX memory), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory 11 can be an internal storage unit of an electronic device, such as a portable hard drive. In other embodiments, the memory 11 can be an external storage device of the electronic device, such as a plug-in portable hard drive, SmartMediaCard (SMC), SecureDigital (SD) card, FlashCard, etc. Furthermore, the memory 11 can include both internal and external storage units of the electronic device. The memory 11 can be used not only to store application software and various types of data installed on the electronic device, such as the code of a weld intelligent defect detection model training method or a weld intelligent defect detection method program based on target detection and PPO reinforcement learning, but also to temporarily store data that has been output or will be output.

[0056] The communication bus 12 can be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc. This bus can be divided into an address bus, a data bus, a control bus, etc. The bus is configured to enable communication between the memory 11 and at least one processor 10, etc.

[0057] Communication interface 13 is used for communication between the aforementioned electronic device and other devices, including a network interface and a user interface. Optionally, the network interface may include a wired interface and / or a wireless interface (such as a Wi-Fi interface, Bluetooth interface, etc.), typically used to establish communication connections between the electronic device and other electronic devices. The user interface may be a display, an input unit (such as a keyboard), or optionally, a standard wired or wireless interface. Optionally, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, or an OLED (Organic Light-Emitting Diode) touchscreen, etc. The display may also be appropriately referred to as a screen or display unit, used to display information processed in the electronic device and to display a visual user interface.

[0058] Figure 8 Only electronic devices with components are shown; it will be understood by those skilled in the art that... Figure 8 The structure shown does not constitute a limitation on the electronic device and may include fewer or more components than shown, or combine certain components, or have different component arrangements.

[0059] For example, although not shown, the electronic device may also include a power supply (such as a battery) to power various components. Preferably, the power supply can be logically connected to at least one processor 10 via a power management device, thereby enabling functions such as charging management, discharging management, and power consumption management. The power supply may also include one or more DC or AC power supplies, recharging devices, power fault detection circuits, power converters or inverters, power status indicators, and other arbitrary components. The electronic device may also include various sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be elaborated further here.

[0060] It should be understood that the embodiments are for illustrative purposes only and are not limited to this structure in the scope of the patent application.

[0061] Furthermore, if the modules / units integrated into the electronic device are implemented as software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. The computer-readable storage medium can be volatile or non-volatile. For example, a computer-readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a USB flash drive, a portable hard drive, a magnetic disk, an optical disk, a computer memory, or a read-only memory (ROM).

[0062] In the description of this specification, the references to terms such as "an embodiment," "some embodiments," "example," "specific example," "a implementation," "a preferred implementation," or "some examples," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of the present invention. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples.

[0063] Although embodiments of the invention have been shown and described, those skilled in the art will understand that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims

1. A training method for an intelligent defect detection model for welds, characterized in that, The method includes: Obtain a sample set of weld seam X-ray images, and use the sample set of weld seam X-ray images to train a target detection network to obtain a target detection model; Defects were identified in multiple weld X-ray images using a target detection model, and preliminary detection results were obtained for each weld X-ray image. The state of the reinforcement learning environment is constructed using each weld X-ray image and its preliminary detection results; Calculate the immediate reward value for each action in the action set performed in each state according to the reward calculation rules; Each state, the action performed in that state, and the immediate reward value of the action are grouped into a triplet, and multiple triplets constitute an experience pool. Construct a PPO reinforcement learning model and train it using an experience pool. A policy network that sequentially connects the target detection model and the trained PPO reinforcement learning model is used to obtain an intelligent weld defect detection model.

2. The method for training a smart weld defect detection model according to claim 1, characterized in that, The preliminary detection results of each weld X-ray image include more than one candidate bounding box; the state of constructing a reinforcement learning environment using each weld X-ray image and its preliminary detection results includes: Each weld X-ray image and the preliminary detection results of that weld X-ray image are combined into a state. The candidate box information includes the defect category and the coordinates, width and height of the candidate box.

3. The method for training a smart weld defect detection model according to claim 2, characterized in that, The action set includes retained actions and deleted actions; the calculation of the immediate reward value for each action in the action set in each state according to the reward calculation rules includes: For each candidate box in the preliminary detection results of the weld X-ray image of this state, set a true / false flag, which includes true positives and false positives; The instant reward value for each action performed in this state is calculated based on the true / false flags of the candidate boxes in this state, according to the reward calculation rules. The reward calculation rules include: when the true / false flag of the candidate box of a state is a true example, executing the retain action will give a positive immediate reward value, and executing the delete action will give a negative immediate reward value; when the true / false flag of the candidate box of a state is a false positive example, executing the retain action will give a negative immediate reward value, and executing the delete action will give a positive immediate reward value.

4. The method for training a smart weld defect detection model according to claim 3, characterized in that, In multiple weld X-ray images, each weld X-ray image is assigned a true label, which can be one or more true defect annotation boxes or an empty box; the preliminary detection results of the weld X-ray images in this state are used to assign true / false markers to each candidate box, including: In the preliminary detection results of the weld X-ray image in this state, it is determined whether each candidate box has a matching real defect label box in the real label: If a matching real defect label box exists, the intersection-union ratio (IUR) of the candidate box and its matching real defect label box is calculated. If the IUR is greater than or equal to the IUR threshold, the candidate box is set as a true positive example. If the IUR is less than the IUR threshold, the candidate box is set as a false positive example. If no matching real defect label box exists, the candidate box is set as a false positive example.

5. The method for training a weld intelligent defect detection model according to any one of claims 1-4, characterized in that, The PPO reinforcement learning model includes a policy network and an evaluation network; The policy network includes a first feature extraction module and a decision output head. The first feature extraction module includes a first CNN network, a first fully connected layer, a first MLP network, and a first concatenation unit. The first CNN network is used to extract features from the weld X-ray image of the state of the input PPO reinforcement learning model. The first fully connected layer is used to map the features extracted by the first CNN network into a first image feature vector of fixed dimension. The first MLP network is used to extract the first candidate box structured feature vector from the candidate box information of the state of the input PPO reinforcement learning model. The first concatenation unit is used to concatenate the first image feature vector and the first candidate box structured feature vector at the feature layer to obtain a first state vector. The decision output head includes one or more third fully connected layers to map the first state vector into an action probability distribution. The evaluation network includes a second feature extraction module and an evaluation output head. The second feature extraction module includes a second CNN network, a second fully connected layer, a second MLP network, and a second concatenation unit. The second CNN network is used to extract features from the weld X-ray image of the state of the input PPO reinforcement learning model. The second fully connected layer is used to map the features extracted by the second CNN network into a fixed-dimensional second image feature vector. The second MLP network is used to extract the second candidate box structured feature vector from the candidate box information of the state of the input PPO reinforcement learning model. The second concatenation unit is used to concatenate the second image feature vector and the second candidate box structured feature vector at the feature layer to obtain the second state vector. The evaluation output head includes one or more fourth fully connected layers, which are used to map the second state vector into a state value function estimate.

6. The method for training a smart weld defect detection model according to claim 5, characterized in that, Multiple training trajectory samples and multiple validation trajectory samples are collected from the experience pool. Each training trajectory sample and validation trajectory sample corresponds to all triples generated from a preliminary detection result. The PPO reinforcement learning model is trained using the experience pool, and the following steps are executed iteratively until the trained PPO reinforcement learning model converges on the multiple validation trajectory samples: The same triplet from the training trajectory samples is input into the policy network and the evaluation network respectively. The policy network outputs the action probability distribution of the triplet, and the evaluation network outputs the state value function estimate of the triplet. The action is executed based on the action probability distribution and the immediate reward value of the action is obtained from the input triple. The maximum advantage function value is calculated based on the immediate reward value and the state value function estimate. The network parameters of the policy network are updated based on the maximum advantage function value. Calculate the mean squared error loss between the maximum advantage function value and the state value function estimate, and update the network parameters of the evaluation network based on the mean squared error loss.

7. A method for intelligent weld defect detection based on target detection and PPO reinforcement learning, characterized in that, The method includes: Input the X-ray image of the weld to be tested; The target detection model of the intelligent weld defect detection model is used to identify defects in the X-ray image of the weld under test and obtain preliminary detection results. The test state is constructed based on the preliminary test results and the X-ray image of the weld to be tested; If more than one test state is constructed, each test state is input into the policy network of the PPO reinforcement learning model of the weld intelligent defect detection model to obtain the action probability distribution of the test state, and the deletion action or retention action is performed on the candidate box corresponding to the test state according to the action probability distribution. The intelligent weld defect detection model is trained according to the intelligent weld defect detection model training method described in any one of claims 1-6.

8. A computer device comprising a memory, a processor, and a computer program stored in the memory, characterized in that, The processor executes the computer program to implement the steps of the weld intelligent defect detection model training method according to any one of claims 1-6.

9. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by the processor, it implements the steps of the weld intelligent defect detection model training method according to any one of claims 1-6.

10. An electronic device, characterized in that, The electronic device includes: At least one processor; and, A memory communicatively connected to the at least one processor; wherein the memory stores a computer program executable by the at least one processor, the computer program being executed by the at least one processor to enable the at least one processor to perform the weld intelligent defect detection model training method as described in any one of claims 1-6.