A state space model-based steel pipe weld detection agent
By using a state-space model-based intelligent agent for steel pipe weld inspection, the problems of low inspection efficiency and insufficient accuracy in existing technologies are solved, achieving efficient and accurate steel pipe weld inspection. It has real-time processing capabilities and adaptability, making it suitable for complex industrial scenarios.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SIAS UNIV
- Filing Date
- 2026-03-20
- Publication Date
- 2026-06-19
AI Technical Summary
Existing steel pipe weld inspection technologies suffer from low inspection efficiency, poor repeatability, susceptibility to human factors, and difficulty in effectively capturing long-distance contextual semantic relationships by existing deep learning models. They also have high computational complexity and cannot form an adaptive intelligent inspection process, making it difficult to meet the needs of complex industrial scenarios.
A state-space model-based intelligent agent for steel pipe weld detection is adopted. It employs a hierarchical closed-loop architecture and combines multi-scale visual perception, serialized encoding, and dual-branch cognition and decision-making core modules to achieve synchronous perception, accurate identification, and automatic segmentation of steel pipe welds. It possesses autonomy, responsiveness, and adaptability, simulating the closed-loop cognitive reasoning process of human experts.
It achieves efficient and accurate steel pipe weld inspection, has real-time processing capabilities, significantly improves inspection accuracy and robustness under complex backgrounds, can proactively assess the uncertainty of inspection results and perform adaptive re-inspection, and meets the real-time requirements of industrial production lines.
Smart Images

Figure CN122243955A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of interdisciplinary technology of industrial nondestructive testing and artificial intelligence, and specifically relates to an intelligent agent for steel pipe weld inspection based on a state space model. Background Technology
[0002] The quality of steel pipe welds directly determines the safe service performance and long-term reliability of steel pipe structures. Developing efficient and accurate non-destructive testing (NDT) technologies has always been a core issue in industrial quality control and safe production systems. Currently widely used testing technologies can be broadly divided into two categories: The first category mainly relies on manual image evaluation or traditional image processing algorithms (such as edge detection and threshold segmentation). These methods are not only inefficient and have poor repeatability, but are also easily influenced by the operator's experience and subjective judgment. When faced with complex industrial backgrounds, various pseudo-defects (such as oil stains, rust, and mechanical scratches), and real, minute defects (such as porosity, slag inclusions, and lack of fusion), they exhibit significant insufficient robustness and limited generalization ability. The second category consists of deep learning-based automatic detection methods, represented by Convolutional Neural Networks (CNNs) and Visual Transformers. Although these methods significantly improve the automation level of detection, they still have structural limitations. CNNs are limited by their local receptive field, making it difficult to effectively capture long-distance contextual semantic relationships in weld images, especially showing weak global recognition capabilities for linearly extended defects (such as cracks). Visual Transformers rely on self-attention mechanisms to achieve global information interaction, but their computational complexity is proportional to the square of the input image resolution (or sequence length). This results in a heavy computational burden when processing high-resolution industrial images, making it difficult to meet the real-time response requirements of actual production lines. Furthermore, their accuracy in extracting subtle local features is often insufficient. More fundamentally, existing detection models are mostly passive, single-pass forward reasoning "static" structures, capable only of performing end-to-end classification or segmentation tasks. They lack the sequential decision-making and interaction capabilities of human detection experts, such as "active observation-dynamic focusing-comprehensive judgment." This limitation prevents existing methods from forming a closed-loop, adaptive intelligent detection process, severely restricting their in-depth application and performance improvement in complex and ever-changing real-world industrial scenarios.
[0003] In recent years, as artificial intelligence has evolved from perceptual intelligence to cognitive decision-making intelligence, intelligent agent technology has been regarded as an important technological path to achieve advanced industrial intelligence. An intelligent agent (Wang Q, Ni S, Liu H, et al. AutoPatent: A Multi-Agent Framework for Automatic Patent Generation. 2025, https: / / doi.org / 10.48550 / arXiv.2412.09796) generally refers to an autonomous system capable of perceiving its environment, autonomously planning, calling tools, and executing actions to achieve specific goals. It uses large-scale pre-trained models as its cognitive foundation, integrating comprehensive capabilities such as perception, understanding, planning, memory, execution, and tool invocation. It possesses characteristics such as autonomy, responsiveness, interactivity, and adaptability, and can automatically handle complex tasks involving multiple steps. Introducing the intelligent agent architecture into the field of industrial inspection is expected to simulate and even surpass the closed-loop cognitive reasoning process of human experts, promoting a systematic shift in inspection modes from "one-way static recognition" to "active interactive exploration," which is of great significance for building a more intelligent, flexible, and reliable next-generation inspection system. There is currently no published literature on the introduction of intelligent agents into the inspection of steel pipe welds. Summary of the Invention
[0004] To overcome the shortcomings of the prior art, the present invention aims to provide a state-space model-based intelligent agent for steel pipe weld inspection. With the state-space model as the cognitive core, it achieves synchronous perception, accurate identification and automatic segmentation of internal and surface defects of steel pipe welds by deeply integrating multimodal data such as digital X-ray images and phased array ultrasonic scanning, thus providing an integrated intelligent inspection solution for complex industrial scenarios.
[0005] To achieve the above objectives, the technical solution adopted by the present invention is as follows: A state-space model-based intelligent agent for steel pipe weld inspection adopts a hierarchical closed-loop system architecture, which is divided into a perception layer, a cognition layer, a decision layer and an application layer. It achieves dynamic interaction and autonomous inspection with industrial scenarios through a multi-scale visual perception module, a serialization encoding module, a dual-branch cognition and decision core module based on the state-space model and a decision fusion and execution module.
[0006] The multi-scale visual perception module, corresponding to the perception layer, receives raw weld images from digital X-ray or phased array ultrasonic scanning; extracts multi-scale visual features through a lightweight feature pyramid network; and outputs feature maps including three levels: C1, C2, and C3, which respectively carry low-level edge information, mid-level structural features, and high-level semantic information, serving as the basic input for subsequent serialization encoding.
[0007] The serialization encoding module, corresponding to the cognitive layer, transforms the two-dimensional feature map into a one-dimensional sequence suitable for processing by the state-space model. Considering the linear extension of the weld seam, a main serialization path along the weld seam direction is designed, supplemented by a lateral context supplementation mechanism to enhance the continuity of feature expression. Specifically, it includes a weld seam trajectory localization unit and a feature expansion and enhancement unit.
[0008] The weld trajectory positioning unit, based on the C1 feature map, outputs the weld centerline coordinate sequence through a pre-trained U-Net network.
[0009] The feature expansion and enhancement unit expands the C3 feature map along the centerline trajectory. P Expanding, we obtain the main feature sequence. S 0; To enhance the contextual representation capability of features, in the main feature sequence S Add width to both sides of 0. W Adjacent feature points form an enhanced feature sequence. S =[ S left , S 0, S right ] is used to transmit global structural information and local details.
[0010] The dual-branch cognition and decision-making core module based on the state-space model corresponds to the decision-making layer and serves as the decision-making center of the agent. It adopts a dual-branch state-space model structure to achieve parallel processing of recognition and decision-making; specifically, it includes a defect recognition branch and a decision-focusing branch.
[0011] The defect identification branch performs temporal modeling of the enhanced feature sequence through multiple stacked selective state space model blocks, dynamically focusing on the context information most relevant to the defect, effectively distinguishing real defects from background noise interference, and outputting the defect category and its confidence level corresponding to each sequence position.
[0012] The decision-focusing branch operates in parallel with the defect identification branch. It generates uncertainty scores for each position by calculating the dynamic range and information entropy of the feature sequence. When the agent detects changes in device parameters or increased environmental noise, it automatically initiates the threshold calibration process, i.e., issues a re-inspection command, triggering the system to perform secondary refined feature extraction and judgment on the corresponding area in the original high-resolution image, simulating the magnified inspection behavior of experts on suspicious areas.
[0013] The aforementioned decision fusion and execution module corresponds to the decision layer and the application layer. It is responsible for integrating the output of the cognitive layer and controlling the closed-loop execution and final output of the detection process; specifically, it includes: 1) Receive the re-inspection instruction from the decision-making focusing branch and trigger the re-inspection process for the suspicious area; 2) Adaptively fuse the preliminary results of the defect identification branch with the refined judgment results of the re-inspection area to generate a final inspection report that includes defect type, pixel-level localization, size measurement and comprehensive confidence. 3) Supports result visualization, alarm output, and human-computer interaction, completing a full closed loop from perception to execution.
[0014] The aforementioned intelligent agent for steel pipe weld inspection further integrates welding process knowledge and prior domain rules of defect morphology, and embeds a self-supervised feedback mechanism to dynamically optimize model parameters during continuous operation, thereby achieving autonomous improvement of inspection performance and scene adaptation, thus providing a high-precision and high-efficiency intelligent solution for steel pipe weld quality inspection in complex industrial environments.
[0015] Compared with the prior art, the beneficial effects of the present invention are as follows: The beneficial effects of this invention are mainly reflected in the following three aspects: 1. High detection efficiency and strong system scalability: Based on the computational advantages of linear complexity of the state-space model, the system can process high-resolution industrial images in real time, effectively meeting the timeliness requirements of online inspection on the production line; the intelligent agent of this invention is easy to deploy on edge computing devices and has good scalability and engineering adaptability.
[0016] 2. High detection accuracy and good environmental robustness: The state-space model has dynamic selective attention and long sequence dependency modeling capabilities, which can accurately capture the subtle features of weld defects and their correlation patterns in long-distance contexts, significantly suppress interference from complex backgrounds and false defects, and effectively improve the detection rate and recognition accuracy of typical defects such as micro-porosity, lack of fusion and linear cracks.
[0017] 3. Possesses human-like intelligent decision-making behavior: By introducing a "decision-focusing" autonomous feedback mechanism, the system can proactively assess the uncertainty of detection results and adaptively refine and re-examine suspicious areas, fully simulating the closed-loop cognitive logic of human experts' "observation-focusing-judgment." This not only significantly improves the overall reliability of detection but also enhances the system's adaptability under varying operating conditions and the interpretability of the decision-making process. Attached Figure Description
[0018] Figure 1 This is a schematic diagram of the overall architecture of the intelligent agent of the present invention.
[0019] Figure 2 This is a structural diagram of the core module of the bi-branch cognition and decision-making based on the state-space model of this invention.
[0020] Figure 3 This is a schematic diagram of the main serialization path along the weld seam in the serialization encoding module of the present invention.
[0021] Figure 4 This is a comparison chart showing the effect of triggering "re-examination" in the decision-focusing branch of this invention. Detailed Implementation
[0022] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be described in detail below with reference to embodiments and accompanying drawings. The following description is for illustrative purposes only and does not constitute a limitation on the scope of protection.
[0023] Example: This example uses the spiral weld of an oil and gas pipeline (such as X80 steel grade, pipe diameter Φ1219mm) as the inspection object. The intelligent agent described in this invention is deployed at the online inspection station of the production line to realize automated and intelligent closed-loop inspection of internal and surface defects of the weld.
[0024] like Figure 1 As shown, a state-space model-based intelligent agent for steel pipe weld inspection adopts a hierarchical closed-loop system architecture, such as... Figure 2 As shown, the intelligent agent for steel pipe weld inspection of this invention is divided into four layers: perception layer, cognition layer, decision-making layer, and application layer. Through the collaborative work of a multi-scale visual perception module, a serialization encoding module, a dual-branch cognition and decision-making core module based on a state-space model, and a decision fusion and execution module, it achieves dynamic interaction and autonomous inspection with industrial scenarios. The entire system follows an autonomous operation logic of "perception → cognition → decision-making → application → feedback," forming a complete closed-loop inspection intelligent agent. This intelligent agent uses a state-space model as its core to drive its cognition and decision-making capabilities. After system startup, it receives multimodal images (including X-ray images and ultrasonic scan images) acquired and spatially registered by an integrated "digital X-ray + phased array ultrasound" synchronous scanning system, and then enters the autonomous inspection process. After inspection, the system achieves closed-loop feedback optimization through an adaptive parameter update module, enabling the model to iteratively evolve during continuous operation.
[0025] The multi-scale visual perception module, corresponding to the perception layer, receives raw weld images from digital X-ray or phased array ultrasonic scanning. X-ray features focus on internal defect detection, while ultrasonic features focus on surface defect identification. Multi-scale visual features are extracted through a lightweight feature pyramid network and fused to improve the coverage of all types of defects. The output includes feature maps at three levels: C1, C2, and C3, which respectively carry low-level edge information, mid-level structural features, and high-level semantic information, serving as the basic input for subsequent serialization encoding.
[0026] In this embodiment, the weld seam image (e.g., an X-ray image with a resolution of 2048×1024) acquired by the industrial acquisition device is first input to a multi-scale visual perception module. This module is based on a lightweight feature pyramid network and extracts feature maps at multiple scales that contain local details and global semantic information. Specifically, this embodiment outputs feature maps at three levels: C1, C2, and C3. The C1 feature map (size 512×256×16) is used for weld seam trajectory localization; the C2 feature map (size 256×128×32) is used for mid-level structural feature extraction. By fusing the edge details of C1 with the high-level semantics of C3, the contour and texture representation of the weld seam area are enhanced; and the C3 feature map (size 128×64×64) contains high-level semantic information.
[0027] The serialization encoding module, corresponding to the cognitive layer, transforms the two-dimensional feature map into a one-dimensional sequence suitable for processing by the state-space model. Considering the linear extension of the weld seam, a main serialization path along the weld seam direction is designed, supplemented by a lateral context supplementation mechanism to enhance the continuity of feature expression. Specifically, it includes a weld seam trajectory localization unit and a feature expansion and enhancement unit.
[0028] The serialization encoding module in this embodiment (its working principle is as follows) Figure 3 (As shown) This module is responsible for converting the two-dimensional C3 feature map into a one-dimensional feature sequence suitable for state-space model processing. Considering the linear extension of the steel pipe weld, this module adopts a main sequence path along the weld direction; the specific process is as follows: Weld trajectory localization unit: Based on C1 feature maps, it locates the weld centerline using a pre-trained VM-UNet and outputs a coordinate sequence. P =[( x 1, y 1), ( x 2, y 2),...,( x 128, y 128)]; Feature expansion and enhancement units: such as Figure 3 As shown, the C3 feature map is traced along the centerline. P Unfold to obtain the length L =128 main feature sequence S 0; To enhance the contextual representation capability of features, in the main feature sequence S Add width to both sides of 0. W =8 adjacent feature points, thus forming an enhanced feature sequence with a total length L' = L + 2W = 144. S =[ S left , S 0, S right ],in,S left and S right These are the left and right neighbor features, each with a width of 8 pixels. This design enables the model to perceive contextual information about the weld edge region, improving the robustness of subsequent defect identification.
[0029] The dual-branch cognition and decision-making core module based on the state-space model corresponds to the decision-making layer and serves as the decision-making center of the agent. It adopts a dual-branch state-space model structure to achieve parallel processing of recognition and decision-making; specifically, it includes a defect recognition branch and a decision-focusing branch. The enhanced feature sequence described in this embodiment S The data is fed into a dual-branch cognition and decision-making core module based on a state-space model. This module employs a dual-branch parallel architecture to achieve collaborative processing of recognition and decision-making tasks. Its core mechanism integrates selective scanning S6 with parameter optimization. Specifically, it includes: Defect identification branch: This branch consists of 8 stacked selective state-space model blocks, each of which uses a state transition equation with gating mechanism: ,in, for t The state vector at time step (the hidden state of the model at the current time step). This is the input vector at time (the input data at the current time). There are four weight matrices (parameters learned during model training, corresponding to the mappings of "state → state", "input → state", "state → gating", and "input → gating", respectively). tanh(.) is the hyperbolic tangent activation function with an output range of [0,1], used to generate the "gating vector" to control the proportion of information transmission. The output range is [0,1], used to generate a "gating vector" to control the proportion of information transmission. This is element-wise multiplication, representing the element-wise multiplication of two vectors of the same dimension. This equation balances the contributions of "historical state information" and "current input information" through a gating mechanism, avoiding the gradient vanishing / exploding problem of traditional recursive networks and more efficiently capturing long-term dependencies in sequence data. By performing temporal modeling on sequence S, it can dynamically focus on the contextual information most relevant to the defect semantics, effectively distinguishing between real defects and background noise, and outputting the defect category (such as cracks, porosity, inclusions) and its preliminary confidence level for each sequence position. Decision Focusing Branch: This branch operates in parallel with the defect identification branch. Its core function is to assess the uncertainty of the detection process, providing a basis for proactive decision-making. By calculating the dynamic range and information entropy of the feature sequence, it generates an uncertainty score for each location. U The calculation is as follows: U =(max( S )-min(S ))+ H ( S ),in The information entropy of the sequence, For normalized confidence; when at a certain position U When the value exceeds a preset threshold (e.g., 0.7), the system determines the area as a "suspicious area" and automatically generates a re-inspection instruction.
[0030] The aforementioned decision fusion and execution module corresponds to the decision layer and application layer. It is responsible for integrating the output of the cognitive layer and controlling the closed-loop execution and final output of the detection process. For example... Figure 2 As shown, this module specifically includes: 1) Re-inspection trigger: Receives a re-inspection instruction from the decision-making focusing branch, triggering a re-inspection process for the suspicious area; 2) Result fusion: Adaptively fuse the preliminary results of the defect identification branch with the refined judgment results of the re-inspection area (such as weighted averaging) to generate a final inspection report that includes defect type, pixel-level localization, size measurement and comprehensive confidence. 3) Human-computer interaction and output: Supports result visualization, alarm output and human-computer interaction, completing a complete closed loop from perception to execution.
[0031] In this embodiment, the decision fusion and execution module receives a re-inspection instruction from the decision focusing branch, schedules and executes a refined detection loop; for each marked suspicious region, the system performs the following operations: 1) Accurately crop out the corresponding image patches from the original high-resolution image; 2) The image patch is sent back to the multi-scale visual perception module for targeted local secondary feature extraction; 3) The extracted fine features are then fed back into the dual-branch cognition and decision-making core module based on the state-space model for micro-judgment to obtain the re-examination confidence. Then, this module intelligently fuses the preliminary identification results with all re-examination results. For example... Figure 4 As shown, for example, if the initial confidence level of a certain area is 0.72, and the confidence level increases to 0.92 after refined re-inspection, the system calculates a final confidence index of 0.85 using a weighted fusion strategy (e.g., final confidence level = 0.6 × initial confidence level + 0.4 × re-inspection confidence level). After integrating all information, the module generates a final inspection report containing defect type, pixel-level precise location, geometric dimensions, and overall confidence level, and provides alarms or visual markers through a human-computer interaction interface, thus completing the closed loop of the entire inspection task.
[0032] To verify the effectiveness of this intelligent agent architecture, this embodiment constructs and trains a multimodal fusion detection model based on a state-space model.
[0033] Data Preparation and Model Training: The model was trained using 8500 precisely spatially registered X-ray-ultrasound image pairs and their pixel-level defect annotations. The dataset was randomly divided into training, validation, and test sets in an 8:1:1 ratio. Training was performed using the AdamW optimizer with an initial learning rate of 3e-4, for a total of 300 training epochs. The loss function used was binary cross-entropy.
[0034] Performance Comparison Analysis: The trained model was evaluated on an independent test set and compared with several mainstream baseline methods. Key performance metrics are shown in Table 1 below: Table 1. Performance comparison of different methods on the test set method Mean Intersection over Union (mIoU) Average accuracy (mPre) Average Recall (mRec) F1 score Crack defect recall rate Single Digital X-ray Model (VM-UNet) 78.3% 85.1% 73.2% 78.7% 41.5% Single phased array ultrasound model (VM-UNet) 76.8% 79.8% 75.9% 77.8% 88.7% Traditional Cascaded Multimodal UNet 81.5% 84.3% 79.1% 81.6% 83.4% Method of the present invention 87.9% 90.2% 86.1% 88.1% 92.3% Experimental results show that the method of the present invention is significantly superior to the comparative method in all key indicators, especially the recall rate of 92.3% for high-risk cracks, which proves the effectiveness of multimodal fusion and agent decision-making mechanism.
[0035] The trained optimal model parameters are solidified into an inference engine and integrated into the industrial control computer of the synchronous scanning system, forming a complete online inspection pipeline of "synchronous acquisition → automatic registration → intelligent inference → report generation". Actual testing shows that the total cycle time for a single inspection can be controlled within 10 seconds, fully meeting the real-time requirements of online inspection of weld seams in oil and gas pipelines.
[0036] This invention deeply integrates state-space model with intelligent agent architecture, providing an integrated intelligent solution for synchronous and accurate detection of volumetric and area defects. It significantly improves the comprehensiveness, automation and reliability of detection, and has important engineering practical significance for ensuring the safe operation of pipelines.
[0037] The above description is merely a specific embodiment of the present invention and does not limit the scope of patent protection of the present invention. Any equivalent structural or procedural modifications made based on the description and drawings of the present invention, or direct or indirect applications in other related technical fields, are similarly included within the scope of patent protection of the present invention.
Claims
1. A state-space model-based intelligent agent for inspecting steel pipe welds, characterized in that: The system adopts a hierarchical closed-loop architecture, which is divided into a perception layer, a cognition layer, a decision-making layer and an application layer. It achieves dynamic interaction and autonomous detection with industrial scenarios through a multi-scale visual perception module, a serialization encoding module, a dual-branch cognition and decision-making core module based on a state space model and a decision fusion and execution module.
2. The intelligent agent according to claim 1, characterized in that: The multi-scale visual perception module, corresponding to the perception layer, receives raw weld images from digital X-ray or phased array ultrasonic scanning; extracts multi-scale visual features through a lightweight feature pyramid network; and outputs feature maps including three levels: C1, C2, and C3, which respectively carry low-level edge information, mid-level structural features, and high-level semantic information, serving as the basic input for subsequent serialization encoding.
3. The intelligent agent according to claim 2, characterized in that: The serialization encoding module, corresponding to the cognitive layer, transforms the two-dimensional feature map into a one-dimensional sequence suitable for processing by the state-space model. In view of the structural characteristics of the linear extension of the weld, a main serialization path along the weld direction is designed, supplemented by a lateral context supplementation mechanism to enhance the continuity of feature expression. Specifically, it includes: weld seam trajectory positioning unit and feature unfolding and enhancement unit.
4. The intelligent agent according to claim 3, characterized in that: The weld trajectory positioning unit, based on the C1 feature map, outputs the weld centerline coordinate sequence through a pre-trained U-Net network.
5. The intelligent agent according to claim 3, characterized in that: The feature expansion and enhancement unit expands the C3 feature map along the centerline trajectory. P Expanding, we obtain the main feature sequence. S 0; To enhance the contextual representation capability of features, in the main feature sequence S Add width to both sides of 0. W Adjacent feature points form an enhanced feature sequence. S =[ S left , S 0, S right ] is used to transmit global structural information and local details.
6. The intelligent agent according to claim 1, characterized in that: The dual-branch cognition and decision-making core module based on the state-space model corresponds to the decision-making layer and serves as the decision-making center of the agent. It adopts a dual-branch state-space model structure to achieve parallel processing of recognition and decision-making; specifically, it includes a defect recognition branch and a decision-focusing branch.
7. The intelligent agent according to claim 6, characterized in that: The defect identification branch performs temporal modeling of the enhanced feature sequence through multiple stacked selective state space model blocks, dynamically focusing on the context information most relevant to the defect, effectively distinguishing real defects from background noise interference, and outputting the defect category and its confidence level corresponding to each sequence position.
8. The intelligent agent according to claim 6, characterized in that: The decision-focusing branch operates in parallel with the defect identification branch. It generates uncertainty scores for each position by calculating the dynamic range and information entropy of the feature sequence. When the agent detects changes in device parameters or increased environmental noise, it automatically initiates the threshold calibration process, i.e., issues a re-inspection command, triggering the system to perform secondary refined feature extraction and judgment on the corresponding area in the original high-resolution image, simulating the magnified inspection behavior of experts on suspicious areas.
9. The intelligent agent according to claim 1, characterized in that: The aforementioned decision fusion and execution module corresponds to the decision layer and the application layer. It is responsible for integrating the output of the cognitive layer and controlling the closed-loop execution and final output of the detection process; specifically, it includes: 1) Receive the re-inspection instruction from the decision-making focusing branch and trigger the re-inspection process for the suspicious area; 2) Adaptively fuse the preliminary results of the defect identification branch with the refined judgment results of the re-inspection area to generate a final inspection report that includes defect type, pixel-level localization, size measurement and comprehensive confidence. 3) Supports result visualization, alarm output, and human-computer interaction, completing a full closed loop from perception to execution.
10. The intelligent agent according to any one of claims 1-9, characterized in that: The aforementioned intelligent agent for steel pipe weld inspection further integrates welding process knowledge and prior domain rules of defect morphology, and embeds a self-supervised feedback mechanism to dynamically optimize model parameters during continuous operation, thereby achieving autonomous improvement of inspection performance and scene adaptation.