A system for constructing a multi-modal dataset for split-pipeline internal detector

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By constructing a split-type pipeline internal detector multimodal dataset system, the problem of insufficient multimodal datasets was solved, and the effective integration and analysis of multimodal information was realized, thereby improving the efficiency and accuracy of pipeline inspection.

CN122241273APending Publication Date: 2026-06-19SINOMACH SENSING TECH CO LTD +1

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: SINOMACH SENSING TECH CO LTD
Filing Date: 2026-05-14
Publication Date: 2026-06-19

Application Information

Patent Timeline

14 May 2026

Application

19 Jun 2026

Publication

CN122241273A

IPC: G06F18/23; G06F18/241; G06F18/213; G06N3/0895; G06N3/0499

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

Existing technologies lack high-quality, cross-equipment, and cross-operating-condition multimodal pipeline inspection datasets, which prevents the full release of the collaborative value of multimodal information. Furthermore, multimodal deep learning lacks mature analytical experience to support it, resulting in low pipeline inspection efficiency.

⚗Method used

A multimodal dataset system for split-type in-pipe detectors is constructed, including modules for data reading, reconstruction, bounding, feature mapping, and clustering. The multimodal dataset is generated through benchmark data alignment, signal threshold comparison, multilayer perceptron training, and multi-level label constraints.

🎯Benefits of technology

It enables the effective integration and analysis of multimodal data, improves the efficiency and accuracy of pipeline inspection, ensures the quality and consistency of the dataset, and supports the application of multimodal deep learning.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122241273A_ABST

Patent Text Reader

Abstract

This application provides a system for constructing a multimodal dataset for a split-type pipeline in-situ detector, relating to the field of data processing technology. The system includes: a data reading module configured to acquire multimodal data; a bounding module configured to mark data signals as abnormal signal points and merge them into abnormal signal regions if the data signal exceeds a preset signal threshold; a feature mapping and clustering module configured to map data signal segments to modal features; concatenate the modal features to obtain region features; map the region features to a contrast space for contrastive learning training; and map the region features to a category space for clustering training. A dataset construction module configured to establish a multi-level label constraint system and input it along with several single datasets into a working condition adaptive framework to generate a multimodal dataset, thereby addressing the current lack of a dedicated system for constructing multimodal datasets for pipeline in-situ detectors.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of data processing technology, and in particular to a system for constructing a multimodal dataset of a split-type pipeline detector. Background Technology

[0002] Pipeline transportation is a crucial method for large-scale oil and gas transport. During long-term operation, pipelines are prone to defects such as corrosion, cracks, and pinholes. Allowing these defects to develop can lead to pipeline failure, causing media leaks or even explosions, posing a significant threat to the safety of the pipeline route. To avoid these catastrophic consequences, regular pipeline inspections are necessary to ensure timely repair and maintenance of any defects.

[0003] Axially excited magnetic flux leakage detectors (MFLs) are the primary equipment for detecting pipeline defects, effectively detecting metal loss. However, they are limited in detecting axially oriented narrow cracks, weld abnormalities, mechanical damage, and corrosion pits. To meet increasingly complex inspection needs, internal detectors based on different detection principles have been put into use, such as circumferentially excited magnetic flux leakage detectors (TFIs), stress detectors, and ultrasonic detectors. While this change provides a more diverse perspective for pipeline non-destructive testing, it also brings challenges in data analysis. On the one hand, the data generated by new inspection equipment (such as TFIs and stress detectors) lacks mature analytical experience, making manual processing inefficient. On the other hand, the potential correlation information between multimodal data obtained from multiple detectors has not been effectively developed, resulting in the underutilization of the synergistic value of multimodal information.

[0004] The development of artificial intelligence technology, especially multimodal deep learning, has laid the technological foundation for the efficient analysis and processing of multimodal internal detection data. However, multimodal deep learning has high requirements for both the quantity and quality of data. Although internal detection data is no longer scarce after years of accumulation, high-quality, labeled datasets that are cross-device and cross-condition remain extremely limited. This leads to incompatibility between a large amount of internal detection data and mature multimodal analysis tools, which is the main reason why efficient and reliable multimodal internal detection models are still lacking. Summary of the Invention

[0005] This application provides a system for constructing a multimodal dataset of split-type in-pipe detectors to solve the technical problem of the lack of a dedicated system for constructing multimodal datasets of in-pipe detectors.

[0006] This application provides a system for constructing a multimodal dataset of a split-type in-pipe detector, including: Data reading module, data reconstruction module, bounding module, feature mapping and clustering module, dataset construction module; The data reading module is configured as follows: Acquire multimodal data collected by a split-type pipeline in-situ detector; The data reconstruction module is configured as follows: Determine the baseline data in the multimodal data; the baseline data is the pipeline inspection data collected by one of the pipeline in-pipe detectors in the multimodal data; Based on the baseline data, the remaining multimodal data is aligned along the mileage dimension so that the number of pipe segment samples in the remaining modal data is equal to the number of pipe segment samples in the baseline data, thus obtaining the target multimodal data; the remaining modal data refers to the modal data other than the baseline data. The framing module is configured as follows: The data signal corresponding to the target multimodal data is compared with a preset signal threshold. If the data signal is greater than the preset signal threshold, the data signal is marked as an abnormal signal point. Clustering of adjacent abnormal signal points based on the eight-connection relationship, and merging the abnormal signal points into an abnormal signal region; The feature mapping and clustering module is configured as follows: An encoder is configured for each modal data to map the data signal segments of the abnormal signal region under each modal data into modal features; The modal features are concatenated and a multilayer perceptron is used to obtain the region features. The region features are mapped to a contrast space using a multilayer perceptron for contrastive learning training; the contrastive learning training is used to ensure that the region features are not affected by noise and modal differences. Using a multilayer perceptron, the region features are mapped to a category space for cluster training; Store the model parameters of the encoder and the multilayer perceptron after training is completed; The dataset construction module is configured as follows: A multi-level label constraint system is established; the label constraint system includes: pipeline defect category, data acquisition principle, hardware configuration, pipeline structural characteristic parameters, and pipeline inspection equipment operating parameters; The label constraint system and several of the single datasets are input into the working condition adaptive framework to generate a multimodal dataset.

[0007] In some embodiments, the data reading module is further configured to: Read multimodal data using a one-dimensional array data type; Based on the arrangement rules or number of channels when the multimodal data is saved, the multimodal data is reconstructed, and the data type of the multimodal data is converted into a first two-dimensional array; The multimodal data is transposed to convert the data type of the multimodal data into a second two-dimensional array.

[0008] In some embodiments, the data reconstruction module is further configured to: Based on the circumferential weld alignment list of the multimodal data, the pipe section information of the multimodal data is extracted and stored in the pipe section information dictionary; the pipe section information includes: the pipe section length, the number of pipe section samples, and the global index of the pipe section interval corresponding to the multimodal data; Virtual welds are added before the first weld and after the last weld of the pipe section corresponding to the multimodal data to form virtual pipe sections, and the pipe section information of the multimodal data corresponding to the virtual pipe sections is stored in the pipe section information dictionary; Based on the pipe section information dictionary, the remaining multimodal data is sequentially truncated and resampled using interpolation or sampling methods to make the number of pipe section samples in the remaining modal data equal to the number of pipe section samples in the reference data, thus obtaining the data to be spliced. The data to be spliced is spliced sequentially to obtain the target multimodal data.

[0009] In some embodiments, the data reconstruction module is further configured to: When the number of pipe section samples in the data to be spliced is less than the number of pipe section samples in the reference data, and the difference between the number of pipe section samples in the data to be spliced and the number of pipe section samples in the reference data is greater than a preset ratio, then the cubic spline interpolation method is used for resampling. When the number of pipe section samples in the data to be spliced is less than the number of pipe section samples in the reference data, and the difference between the number of pipe section samples in the data to be spliced and the number of pipe section samples in the reference data is less than or equal to a preset ratio, then linear interpolation is used for resampling. If the number of pipe section samples in the data to be spliced is greater than the number of pipe section samples in the reference data, then the sampling method is used for resampling.

[0010] In some embodiments, the system further includes: The reconstruction result verification module is configured as follows: The number of pipe section samples in the target multimodal data is compared with the number of pipe section samples in the benchmark data to generate a sample number comparison result; Variational mode decomposition technology is used to decompose the data signal corresponding to the target multimodal data into several time-domain components; The time-domain component is subjected to frequency domain transformation to obtain the frequency domain features corresponding to the time-domain component; The waveform similarity is calculated by using dynamic time warping technology on the time-domain components. The similarity of the frequency domain feature energy distribution corresponding to the frequency domain feature is compared using the maximum information coefficient to obtain a frequency domain feature similarity report; Based on the sample quantity comparison results, waveform similarity, and frequency domain feature similarity report, a reconstruction result verification report is obtained.

[0011] In some embodiments, the framing module is further configured to: The target multimodal data is divided into windows of preset length, and the data signals corresponding to the target multimodal data are Gaussian smoothed along the length and width of each window.

[0012] In some embodiments, the framing module is further configured to: If there is only one abnormal signal point corresponding to a single modal data in the first abnormal signal region, then the first abnormal signal region is removed; if there are multiple abnormal signal points corresponding to multiple modal data in the second abnormal signal region, then the second abnormal signal region is retained. Based on the second anomalous signal region, a bounding result file is generated; the bounding result file includes: the mileage range, clock direction, mode, anomalous intensity statistics, and cross-modal matching identifier of the anomalous signal region.

[0013] In some embodiments, the system further includes: The visualization module is configured as follows: Based on the mileage range and clock direction information in the defined result file, segments are extracted from the corresponding modal data; The captured segment is sent to a display device for display in a set format; the set format includes: line chart, pseudo-color image, or grayscale image.

[0014] The display device is used to add the tag data corresponding to the extracted segment to the category space.

[0015] In some embodiments, the clustering training includes: unsupervised clustering mode and weakly supervised clustering mode; the feature mapping and clustering module is further configured to: When all the regional features have labeled data, unsupervised clustering pattern training is performed using the KMeans or HDBSCAN algorithm combined with the silhouette coefficient. When the region features contain labeled data, the constrained KMeans algorithm is used for weakly supervised clustering pattern training.

[0016] In some embodiments, the dataset building module is further configured to: The label constraint system and several of the single datasets are input into the feature extractor to make the modal features tend to be distributed uniformly in a unified space.

[0017] This application provides a system for constructing a multimodal dataset of a split-type pipeline in-situ detector, comprising: a data reading module, a data reconstruction module, a bounding module, a feature mapping and clustering module, and a dataset construction module; the data reading module is configured to: acquire multimodal data collected by the split-type pipeline in-situ detector; the data reconstruction module is configured to: determine the baseline data in the multimodal data; the baseline data is pipeline detection data collected by one type of pipeline in-situ detector in the multimodal data; based on the baseline data, align the remaining multimodal data in the mileage dimension so that the number of pipe segment samples in the remaining modal data is equal to the number of pipe segment samples in the baseline data, thereby obtaining the target multimodal data; the remaining modal data is the modal data other than the baseline data in the modal data; the bounding module is configured to: compare the data signal corresponding to the target multimodal data with a preset signal threshold; if the data signal is greater than the preset signal threshold, mark the data signal as an abnormal signal point; and cluster adjacent abnormal signal points based on an octet relationship. The system is configured to: merge the abnormal signal points into abnormal signal regions; configure an encoder for each modal data to map the data signal segments of the abnormal signal region under each modal data into modal features; concatenate the modal features and use a multilayer perceptron to obtain region features; use the multilayer perceptron to map the region features to a contrast space for contrastive learning training; the contrastive learning training is used to make the region features unaffected by noise and modal differences; use the multilayer perceptron to map the region features to a category space for clustering training; store the model parameters of the encoder and the multilayer perceptron after training; the dataset construction module is configured to: establish a multi-level label constraint system; the label constraint system includes: pipeline defect category, data acquisition principle, hardware configuration, pipeline structural feature parameters, and pipeline inspection equipment operating parameters; input the label constraint system and several single datasets into the working condition adaptive framework to generate a multimodal dataset, so as to realize a dedicated system for constructing a multimodal dataset for pipeline inspection. Attached Figure Description

[0018] To more clearly illustrate the technical solution of this application, the drawings used in the embodiments will be briefly introduced below. Obviously, for those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0019] Figure 1A flowchart illustrating the system runtime for constructing the multimodal dataset of the split-type in-pipe detector in this application; Figure 2 This is a flowchart of the operation of the reconstruction result verification module in this application; Figure 3 This is a flowchart illustrating the operation of the visualization module in this application; Figure 4 This is a line graph corresponding to the segment extracted from the results file in this application.

[0020] Explanation of reference numerals in the attached figures: 1-Data reading module; 2-Data reconstruction module; 3-Bounding module; 4-Feature mapping and clustering module; 5-Dataset construction module; 6-Reconstruction result verification module; 7-Visualization module. Detailed Implementation

[0021] To enable those skilled in the art to better understand the technical solutions in this application, the technical solutions in the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of this application.

[0022] Because some technologies lack a dedicated system for constructing multimodal datasets for pipeline in-situ detection, this application provides a system for constructing multimodal datasets for split-type pipeline in-situ detectors. The system for constructing multimodal datasets for split-type pipeline in-situ detectors is described below: For example, the split-type pipeline internal detector is a multi-group internal detector constructed by separating modules with different detection principles, such as ultrasonic and magnetic leakage detectors, because the requirements of the detector structure (such as spatial layout, magnetic field, sound wave conduction conditions, etc.) are very different or even mutually exclusive, and the size of the equipment must meet the size restrictions of pipeline components (such as the minimum bend size).

[0023] Among them, diameter measuring detectors are internal detectors that use mechanical or optical diameter measuring tools to detect pipe geometric deformations (such as dents, ellipticity, etc.). MFL (Medium Flux Leakage) detectors are axially excited magnetic flux leakage detectors, typically used to detect pipe metal loss (such as corrosion, mechanical damage). TFI (Total Flux Leakage) detectors are circumferentially excited magnetic flux leakage detectors, mostly used to detect axial defects (such as axial cracks), complementing MFL detectors. Eddy current detectors are detectors that use the principle of electromagnetic induction to detect surface and near-surface defects (such as cracks, corrosion) in pipes. Ultrasonic detectors are detectors that use ultrasonic waves to measure wall thickness or detect internal defects. Stress detectors are internal detectors that monitor pipe stress distribution or strain state (such as welding stress, external load effects).

[0024] like Figure 1 The diagram shown is a flowchart of the system runtime for constructing the multimodal dataset of the split-type pipeline detector in this application.

[0025] This application provides a system for constructing a multimodal dataset of a split-type in-pipe detector, including: Data reading module 1, data reconstruction module 2, bounding module 3, feature mapping and clustering module 4, dataset construction module 5.

[0026] The data reading module 1 is configured as follows: Acquire multimodal data collected by a split-type pipeline in-situ detector; the multimodal data refers to a data set from multiple different modes (types, sources, or forms). In the field of pipeline in-situ inspection, multimodal data usually refers to data obtained by different detection principles (such as magnetic leakage, ultrasonic, eddy current, stress, etc.) that reflects different characteristics of the pipeline (such as metal loss, wall thickness, cracks, etc.).

[0027] For example, in practical work, to ensure the efficiency and reliability of data recorded by detectors inside pipelines, the raw data is usually stored in binary format. However, during the data processing stage, it needs to be converted into a calculable decimal numerical form for easier calculation. Therefore, before formal processing, the raw data must first be read and the format converted.

[0028] The data reading module 1 is further configured as follows: Multimodal data is read using a one-dimensional array data type; the multimodal data is reconstructed according to the arrangement rules or number of channels when the multimodal data is saved, and the data type of the multimodal data is converted into a first two-dimensional array; the multimodal data is transposed, and the data type of the multimodal data is converted into a second two-dimensional array.

[0029] Specifically, in this application, the above operations are implemented by the data reading module 1. For the original binary data file (such as a .bint file), according to the preset data type (such as float16, float32, int16, int32, etc.), it is read into a one-dimensional array (shape 1×m) using Python's numpy.fromfile; then, the one-dimensional array is reconstructed according to the file's saved arrangement rules or the number of channels (let's say n) to obtain a first two-dimensional array (shape k×n, where k×n=m); finally, the resulting two-dimensional array is transposed to obtain the final second two-dimensional array (shape n×k). This second two-dimensional array is the final output and can be directly used for subsequent analysis and algorithm calculations.

[0030] For example, to address the issue of discrepancies in total mileage among the data collected by the various components of a split-type internal detector, this application proposes a data reconstruction module 2. Its core function is to use a reliable modal data (such as leakage magnetic flux detection data (MFL data) of axial excitation) as the baseline data to uniformly align the other multi-component split-type detector data (hereinafter referred to as multimodal data, such as leakage magnetic flux detection data (TFI data) of circumferential excitation, stress detection data (stress data), and ultrasonic detection data (ultrasonic data)) in the mileage dimension, thereby ensuring that the number of samples for each modal data remains consistent.

[0031] The data reconstruction module 2 is configured as follows: A baseline data is determined within the multimodal data; the baseline data is pipeline inspection data collected by one type of pipeline detector within the multimodal data. The baseline data refers to the more reliable modal data within the multimodal data, and is not limited thereto.

[0032] Based on the baseline data, the remaining multimodal data is aligned along the mileage dimension so that the number of pipe segment samples in the remaining modal data is equal to the number of pipe segment samples in the baseline data, thus obtaining the target multimodal data; the remaining modal data refers to the modal data other than the baseline data.

[0033] For example, by aligning the remaining multimodal data along the mileage dimension, pipeline detection data obtained from different detection principles or multiple sources can be matched and synchronized based on a unified mileage length benchmark, ensuring that each modal data corresponds accurately in the same mileage coordinate system, thereby achieving effective integration and analysis of multimodal information.

[0034] Specifically, the data reconstruction module 2 is further configured as follows: Based on the circumferential weld alignment list of the multimodal data, the pipe section information of the multimodal data is extracted and stored in the pipe section information dictionary; the pipe section information includes: the pipe section length, the number of pipe section samples, and the global index of the pipe section interval corresponding to the multimodal data.

[0035] For example, based on the circumferential weld alignment list between each modal data (modal data other than the reference data in the multimodal data) and the reference data, the pipe section information of each modality is extracted, and the length of each pipe section in each modal data, the number of samples contained therein, the global index of the pipe section interval, and the number of samples of the corresponding pipe section in the reference data are recorded and stored in the pipe section information dictionary. It is worth noting that the acquisition of the circumferential weld alignment list is not within the scope of this application and can be obtained manually or by a specific algorithm, and is not limited here.

[0036] Virtual welds are added before the first weld and after the last weld of the pipe section corresponding to the multimodal data to form virtual pipe sections, and the pipe section information of the multimodal data corresponding to the virtual pipe sections is stored in the pipe section information dictionary. In order to avoid local deformation of the first and last welds due to large differences in mileage between the first and last sections of the pipeline, the algorithm adds virtual welds before the first weld and after the last weld to form two virtual pipe sections, and adds the corresponding information to the pipe section information dictionary.

[0037] It is worth noting that the insertion position of the virtual weld should be ensured to be within the effective mileage range, usually 0.2m before (after) the first (last) weld.

[0038] Based on the pipe section information dictionary, the remaining multimodal data is sequentially truncated and resampled using interpolation or sampling methods to make the number of pipe section samples in the remaining modal data equal to the number of pipe section samples in the reference data, thus obtaining the data to be spliced.

[0039] Specifically, the data reconstruction module 2 is further configured as follows: When the number of pipe section samples in the data to be spliced is less than the number of pipe section samples in the reference data, and the difference between the number of pipe section samples in the data to be spliced and the number of pipe section samples in the reference data is greater than a preset ratio, then the cubic spline interpolation method is used for resampling.

[0040] When the number of pipe section samples in the data to be spliced is less than the number of pipe section samples in the reference data, and the difference between the number of pipe section samples in the data to be spliced and the number of pipe section samples in the reference data is less than or equal to a preset ratio, then linear interpolation is used for resampling.

[0041] If the number of pipe section samples in the data to be spliced is greater than the number of pipe section samples in the reference data, then the sampling method is used for resampling.

[0042] The data to be spliced is spliced sequentially to obtain the target multimodal data.

[0043] Specifically, based on the obtained pipe segment information dictionary, the data (data to be stitched) of each pipe segment to be aligned is sequentially extracted in a loop, and resampling is performed through interpolation or sampling to ensure that the number of samples in each modality is consistent with the number of samples in the reference modality, until all pipe segments have been resampled. To ensure the reliability of resampling, cubic spline interpolation is used when the number of samples in the pipe segment to be aligned is less than that in the reference pipe segment and the difference exceeds 5%; linear interpolation is used if the difference does not exceed 5%; and sampling can be performed directly when the number of samples in the pipe segment to be aligned is greater than that in the reference pipe segment. Finally, the resampled data from each pipe segment are stitched together in order to obtain the final mileage aligned data (target multimodal data).

[0044] like Figure 2 The diagram shown is a flowchart of the reconstructed result verification module 6 running in this application.

[0045] The system also includes: Reconstruction result verification module 6, wherein the reconstruction result verification module 6 is configured as follows: The number of pipe section samples in the target multimodal data is compared with the number of pipe section samples in the reference data to generate a sample number comparison result. By verifying the number of samples in the reconstruction result (target multimodal data) within the pipe section interval, it is ensured that each modal data and the reference data have a consistent sample size under a unified mileage benchmark, thereby eliminating structural deviations caused by improper truncation or splicing.

[0046] Variational mode decomposition (VMD) is used to decompose the data signal corresponding to the target multimodal data into several time-domain components; frequency domain transformation is performed on the time-domain components to obtain the frequency-domain features corresponding to the time-domain components; dynamic time warping is used to calculate the waveform similarity of the time-domain components to obtain the waveform similarity; the similarity of the frequency band energy distribution corresponding to the frequency-domain features is compared using the maximum information coefficient to obtain a frequency-domain feature similarity report.

[0047] Specifically, to verify the information consistency and structural matching degree before and after data alignment, this application conducts a consistency analysis of data features based on mode decomposition and dynamic time warping (DTW) techniques. First, variational mode decomposition is used to decompose the signals of each pipe section into several independent time-domain components, and frequency domain transformation is performed on each component to obtain the corresponding frequency-domain features. Then, DTW is used in the time domain to calculate the waveform similarity of each component, and the similarity of frequency band energy distribution is compared in the frequency domain using the maximum information coefficient (MIC). By comparing the similarity of different components and the overall data, the information consistency and structural matching degree are comprehensively evaluated.

[0048] Based on the sample quantity comparison results, waveform similarity, and frequency domain feature similarity report, a reconstruction result verification report is obtained.

[0049] For example, the reconstruction result verification report can effectively identify problems such as mileage drift, sampling distortion, or intermodal synchronization errors that may occur during the reconstruction process. The execution logs of the entire verification process, sample quantity comparison results, feature consistency indicators, and related data verification results are all written into a separate reconstruction result verification report, facilitating subsequent traceability, review, and comparative analysis, thereby continuously ensuring the reliability and traceability of the reconstruction results.

[0050] For example, in defect-free pipe sections, the internal detection signal typically exhibits strong stability; however, in areas where components are located or defects exist, the signal shows significant fluctuations and disturbances. Based on this data characteristic, this application constructs a non-stationary region autonomous bounding module to achieve automatic identification and location of abnormal signals. This module fully utilizes the statistical characteristics of the signal itself, completing the bounding task without relying on a complex training process, thus effectively solving the problems of low efficiency and strong subjectivity in manual abnormal region bounding. First, the detection data of different modalities are divided into sliding windows along the time (or mileage) direction to evaluate the signal stationarity within a local range; then, non-stationary sampling points are marked within each window, and adjacent abnormal points are aggregated into a complete abnormal region; finally, the final abnormal bounding result is determined through cross-modal feature matching, realizing an automated process from local detection to overall confirmation.

[0051] The framing module 3 is further configured to: The target multimodal data is divided into windows of preset length, and the data signals corresponding to the target multimodal data are Gaussian smoothed along the length and width of each window.

[0052] Specifically, the modal data is divided into fixed-length windows to analyze signal stability within a local area; within each window, the data signals corresponding to the target multimodal data are Gaussian smoothed along both the horizontal and vertical directions to suppress random noise and highlight trend changes.

[0053] The framing module 3 is configured as follows: The data signal corresponding to the target multimodal data is compared with a preset signal threshold. If the data signal is greater than the preset signal threshold, the data signal is marked as an abnormal signal point. The smoothed data signal is compared with the preset signal threshold, and sampling points that still deviate significantly from the preset signal threshold are marked as abnormal signal points.

[0054] Clustering of adjacent anomalous signal points based on octet connectivity merges spatially continuous anomalous signal points into a single anomalous signal region, thereby eliminating interference from isolated noise points and forming a complete candidate box.

[0055] The framing module 3 is further configured to: If there is only one abnormal signal point corresponding to a single modal data in the first abnormal signal region, the first abnormal signal region is removed; if there are multiple abnormal signal points corresponding to multiple modal data in the second abnormal signal region, the second abnormal signal region is retained.

[0056] Based on the second anomalous signal region, a bounding result file is generated; the bounding result file includes: the mileage range, clock direction, mode, anomalous intensity statistics, and cross-modal matching identifier of the anomalous signal region.

[0057] Specifically, candidate anomalous regions obtained from different modalities are feature-matched based on mileage location and clock direction. False anomalous signal regions (first anomalous signal regions) appearing only in a single modality are eliminated, while second anomalous signal regions consistently confirmed across multiple modalities are retained as the final bounding result. The bounding result is saved as a bounding result file in the form of a structured JSON file. This bounding result file records detailed information for each anomalous region, including its mileage range, clock direction, associated modality, anomalous intensity statistics, and cross-modal matching identifier, ensuring accurate location and backtracking in subsequent processing.

[0058] In the feature matching stage, since the modal data has been aligned by mileage, efficient matching and confirmation can be achieved simply by setting a threshold based on mileage and clock direction.

[0059] For example, the bounding module 3, by combining signal stationarity features with local statistical detection strategies, achieves autonomous bounding of abnormal regions without the need for model training. This not only improves annotation efficiency but also enhances the reliability and cross-modal consistency of the bounding results. The results from the bounding module 3 have two main subsequent processing approaches: First, they can be directly input into the feature mapping and clustering module 4 to automatically construct and cluster the feature space under unsupervised conditions, thereby achieving feature extraction and pattern discovery of abnormal regions and providing suggested labels; second, they can be manually reviewed and finely annotated via the visualization module 7 to improve data quality and correct potential false detections.

[0060] like Figure 3 The diagram shown is a flowchart of the operation of the visualization module 7 in this application.

[0061] The system also includes: The visualization module 7, based on the JSON file saved by the previous framing module 3, reads the framed records one by one, extracts precise segments from the original data file of the corresponding modality according to the mileage and clock direction information of the records, and presents them visually in the window page in the form of line graphs, pseudo-color images, or grayscale images. Since data from multiple modalities are displayed simultaneously on one page, it allows for intuitive manual identification of signal patterns and abnormal characteristics. This module also integrates an interactive annotation function, allowing operators to delete obvious false detection areas or add semantic tags (such as flanges, valves, welds, etc.) to identifiable typical component areas, and add annotations to form labeled data.

[0062] For example, in the field of pipeline inspection, feature annotation refers to the precise marking of pipe sections (such as welds, elbows, tees, etc.) and their defect features (such as corrosion, cracks, metal loss, etc.) in pipeline inspection data to identify key parts and abnormal information, and support defect identification and condition assessment.

[0063] Specifically, the visualization module 7 is configured as follows: Based on the mileage range and clock direction information in the defined result file, segments are extracted from the corresponding modal data; the extracted segments are sent to a display device for display in a set format, including: line graph, pseudo-color image, or grayscale image. The display device is then used to add the label data corresponding to the extracted segments to the category space.

[0064] For example, the results of manual review and annotation can be used not only to directly train supervised models, but also as reference information with prior constraints to be introduced into the feature mapping and clustering module 4 based on contrastive learning, enabling weakly supervised training. This process can retain the exploration capabilities of unsupervised learning while incorporating human knowledge to guide feature space optimization, effectively improving the discriminative power of feature representation and the reliability of clustering results, thus laying a more solid foundation for subsequent defect identification and classification.

[0065] For example, such as Figure 4 The image shown is a line graph corresponding to the segment extracted from the framing results file in this application. Firstly, the line graph is used for manual framing in framing module 3. The display device has the function of saving signal visualization graphs (Png) and signal information (Excel). The signal visualization graph is the line graph corresponding to the extracted segment, and the signal information is a segment of signal that may have pipeline defects, selected through manual screening.

[0066] For example, by sequentially displaying the signal segments recorded in the Excel file, the location and nature of defects can be manually determined. Each signal segment corresponds to a sub-image representing a specific mode of signal, such as eddy current signal, axial component of flux leakage, radial component of flux leakage, or circumferential component of flux leakage. The nature of the defect can be determined using the flux leakage signal, while the location of the defect can be determined using the eddy current signal. The display device can also show the mileage and center channel number of the involved mode, as well as the pipeline defect number in the Excel file, by displaying the sub-image file name corresponding to the signal segment, facilitating subsequent manual verification. The display device can also input the manual judgment results, which can be directly entered into the Excel spreadsheet for later use.

[0067] For example, the feature mapping and clustering module 4 implements unsupervised clustering based on contrastive learning. On the one hand, it distinguishes data and assigns pseudo-labels; on the other hand, it completes the pre-training of feature extraction, laying the foundation for subsequent feature alignment through domain adaptation methods. At the same time, it retains a weakly supervised interface so as to introduce supervision signals when obtaining some manual annotations, further improving the discriminative ability of features.

[0068] Specifically, the feature mapping and clustering module 4 is configured as follows: First, the input data includes multimodal signal data and optional manual annotations: the former is the signal segment of each anomalous region in each mode, and the latter is the category label (such as corrosion, crack, flange, etc.) and confidence level (can be empty, indicating unsupervised mode) for some anomalous regions. In the data preprocessing and sample pair construction stage, the mean-variance standardization of each modal signal is performed independently to eliminate dimensional differences. Then, for each anomalous region, random augmented views are generated by adding noise, truncating, and randomly masking some modal channels to form positive sample pairs. Views of other regions in the same batch are used as negative samples to support the subsequent comparative learning task.

[0069] An encoder is configured for each modal data to map the data signal segments of the abnormal signal region under each modal data into modal features; for the feature mapping network structure, a multi-branch fusion encoder is adopted, and a dedicated temporal convolutional (TCN) encoder is configured for each modality to map the signal segments into modal feature vectors.

[0070] The modal features are concatenated and a multilayer perceptron is used to obtain regional features. After concatenating the modal features along the feature dimension, a unified regional feature representation is obtained by fusing a multilayer perceptron (MLP). During the contrast learning stage, the projection head uses two layers of MLP to map the regional features to the contrast space (low-dimensional vector) for similarity calculation.

[0071] The region features are mapped to a contrast space using a multilayer perceptron for contrastive learning training. This contrastive learning training is used to ensure that the region features are unaffected by noise and modal differences. Based on the improved SimCLR framework, unsupervised contrastive learning training is conducted to adapt to multimodal temporal data: cosine similarity is used to calculate sample similarity in the projection space; the loss function is InfoNCE (NT-Xent) loss, which makes two views of the same region closer to each other in the feature space, and views of different regions farther apart; positive and negative sample pairs are constructed within the batch during training, and the Adam optimizer is used with an adjustable temperature coefficient to control the distribution sharpness; the training objective is to learn universal region features that are unaffected by noise and modal differences.

[0072] For example, contrastive learning is a self-supervised machine learning method. Its core idea is to allow the model to learn the essential feature representation of the data by bringing similar samples (positive sample pairs) closer together and pushing dissimilar samples (negative sample pairs) further apart, without relying on manual annotation.

[0073] Using a multilayer perceptron, the region features are mapped to a category space for cluster training; the cluster training includes unsupervised clustering mode and weakly supervised clustering mode.

[0074] The feature mapping and clustering module 4 is further configured as follows: When all the region features have labeled data, unsupervised clustering pattern training is performed using the KMeans or HDBSCAN algorithm combined with silhouette coefficients; when some of the region features have labeled data, weakly supervised clustering pattern training is performed using the constrained KMeans algorithm.

[0075] Specifically, in the weakly supervised interface design, while maintaining the unsupervised backbone, an optional supervision signal path is added, including: a classification head, which connects to a lightweight classifier MLP after the Encoder output to map region features to the category space; a label input, which provides category labels and confidence scores for manually labeled regions; a supervised loss, which calculates the cross-entropy loss for labeled samples; joint training, where the total loss is a weighted combination of unsupervised contrastive loss and supervised classification loss, with the weights dynamically adjusted according to the proportion of labeled samples—only contrastive loss (λ=1) is used when there are no labels; when there are labels, λ is gradually reduced to introduce supervision signals to improve clustering purity and class separability; and mode switching, where the training process can flexibly switch between unsupervised (introducing high-confidence pseudo-labels) and weakly supervised modes, with a unified interface.

[0076] The unsupervised clustering and weakly supervised augmented clustering sections are as follows: After training, the projection head is removed, and an encoder is used to extract feature vectors from all abnormal regions. Unsupervised clustering is performed using KMeans or HDBSCAN algorithms, and the clustering quality can be evaluated using metrics such as silhouette coefficient. The weakly supervised augmentation method involves: if category labels exist, label information (such as constrained KMeans) is introduced during the clustering initialization stage to encourage similar samples to cluster more easily; the labels can be used to adjust cluster centers or merge / split clusters to conform to the category structure perceived by humans; after clustering, semantic labels (such as corrosion, cracks, flanges, welds, etc.) are assigned to each cluster based on domain knowledge.

[0077] The model parameters of the encoder and the multilayer perceptron are stored after training; the model parameters of the trained feature mapper (Encoder and fused MLP) are saved; this model can be directly used as a pre-trained feature extractor for subsequent tasks (such as defect classification, domain adaptation, etc.), requiring only a few layers to be fine-tuned on new data, significantly reducing development costs and cycle; if more annotations are obtained in the future, weakly supervised or fully supervised training can be continued based on this pre-trained model to achieve model iterative upgrades.

[0078] For example, under multiple operating conditions and with multiple devices, pipeline inspection data often exhibits significant distributional differences, leading to "data silos." To achieve high-quality data aggregation and modeling across devices and operating conditions, this application proposes a system for constructing a multimodal dataset of split-type pipeline detectors. This system adopts a two-way collaborative approach from both the data and model sides: on the data side, a structured and measurable multi-level label constraint system is constructed to achieve fine-grained source tracing and feature partitioning of the original data, providing clear prior boundaries for domain division and distribution alignment; on the model side, a feature space alignment algorithm based on adversarial training and distribution metric constraints is designed to eliminate differences between devices and operating conditions through deep feature mapping, ultimately forming a unified semantic feature space, laying the foundation for cross-domain adaptation and large-scale training of multimodal large models. The entire process balances the integrity and reproducibility of the label system with the effectiveness and universality of feature alignment, effectively supporting the transformation from "data silos" to a "data ocean."

[0079] The dataset construction module 5 is configured as follows: A multi-level label constraint system is established; the label constraint system includes: pipeline defect category, data acquisition principle, hardware configuration, pipeline structural characteristic parameters, and pipeline inspection equipment operating parameters.

[0080] Specifically, to provide clear and measurable constraint boundaries during domain adaptation, the dataset construction module 5 first establishes a multi-level label constraint system, recording information in the following dimensions for each set of feature data. First, at the level of basic categories and geometric information, the specific category of the defect or component is clarified; second, the data acquisition principle and hardware configuration of the detection equipment parameters are characterized, including the detection principle, number of detection channels, number of sensors in a single probe, probe spacing, sampling frequency, and resolution; third, pipeline structural feature parameters are incorporated to reflect the physical properties of the object under test, covering the pipe wall thickness, material type, and pipe section type within the monitoring interval; simultaneously, operating status parameters are supplemented to reflect the differences in operating conditions during data acquisition, including the real-time operating speed of the equipment and the magnetization level.

[0081] The dataset construction module 5 is also configured to: The label constraint system and several of the single datasets are input into the feature extractor to make the modal features tend to be distributed uniformly in a unified space.

[0082] Specifically, the Feature Extractor takes multi-source heterogeneous data with label constraints as input, including multimodal signals, equipment parameters, pipeline structure parameters, and operating status parameters. It employs a multimodal branch fusion network: the pre-trained Encoder from the previous module processes each modal signal, the MLP processes the corresponding scalar parameters, and after concatenating the outputs of the two branches at a high level, a unified feature vector is generated through a shared fusion layer. The core objective is to map data from different equipment / operating conditions to a unified feature space, minimizing domain differences. The Domain Discriminator takes the unified feature vector output by the Feature Extractor as input and structurally adopts an MLP. The last layer is designed for binary classification to distinguish the data source domains, aiming to determine the domain affiliation of the feature vector as accurately as possible. To implement an adversarial training mechanism, a gradient reversal layer is introduced between the Feature Extractor and the Domain Discriminator. GRL: This layer performs an identity mapping during forward propagation to maintain consistency in feature transfer. During backpropagation, it multiplies the gradient by a negative coefficient, causing the feature extractor to optimize in the gradient ascent direction to confuse the neighborhood discriminator, while the neighborhood discriminator optimizes in the gradient descent direction to accurately distinguish the neighborhood. Ultimately, an adversarial game drives the feature extractor to learn a neighborhood-independent unified representation. The feature extractor and the neighborhood discriminator engage in a minimax game: the discriminator minimizes the neighborhood classification error rate; the feature extractor maximizes the discriminator's error rate, i.e., eliminating neighborhood features. Through this game, the features learned by the feature extractor tend to have a consistent distribution in a unified space, facilitating subsequent clustering or training of large models.

[0083] The label constraint system and several of the single datasets are input into the working condition adaptive framework to generate a multimodal dataset.

[0084] Specifically, in the joint optimization objective design, this application adopts a multi-component collaborative working condition adaptive framework: the adversarial loss function uses the standard domain classification cross-entropy loss (L0.05). adv This is used to drive the domain discriminator to accurately identify feature sources; at the same time, the Maximum Mean Discrepancy (MMD) is introduced as a distribution difference measure to directly calculate the distribution distance between the source domain and the target domain in the feature space, as an additional constraint term (L). MMD The kernel function is a Gaussian kernel, and the bandwidth parameter is set empirically based on the feature dimension and sample size; the total loss function is integrated into L. total =L task+α·L adv +β·L MMD , where L task For the main task loss of auxiliary tasks (such as classification) (which can be set to zero or only retain the self-supervised loss of the reconstruction class in pure domain adaptation scenarios), L adv With L MMD They are responsible for adversarial alignment and distribution matching, respectively. α and β are trade-off coefficients that can be dynamically adjusted based on the feedback of the distribution alignment effect on the validation set, thereby achieving a balance between task performance and domain generalization ability.

[0085] For example, the single dataset is modal data under a single working condition, such as modal data of different pipe sizes. However, by adopting a multi-component collaborative working condition adaptive framework, the dimensions corresponding to the modal data under different working conditions can be unified to obtain a multimodal dataset.

[0086] For example, the domain mapper module can be further encapsulated as an inference service, receiving new data and its tag metadata in real time and outputting a unified feature vector, thereby achieving online real-time conversion. As a pre-processing unit for multimodal large models, it can be used as a fixed layer or fine-tuned as needed, effectively eliminating distribution shifts caused by new equipment or complex operating conditions, ensuring that the model maintains high performance in "unknown domains". With the help of this unified feature space, data silos originally scattered across different sources, devices, and operating conditions can be transformed into a large-scale training set consistent with multiple sources, i.e., a "data ocean", significantly enhancing the model's generalization ability and cross-scenario adaptability, providing a solid data foundation for subsequent intelligent analysis and applications.

[0087] This application provides a system for constructing a multimodal dataset of a split-type pipeline detector, solving the following technical problems: (1) The problem of mileage alignment of multimodal data of split internal detector The current technical limitation is that the split-type internal detector suffers from significant deviations in odometer markings between different modal data due to factors such as fluctuations in operating speed (e.g., low-speed redundant sampling, high-speed data loss). These deviations are not uniformly accumulated, therefore data alignment cannot be achieved through simple scaling or overall mapping.

[0088] This application addresses the following issues: (1) It proposes a data reconstruction module 2, which first determines a certain mode as the baseline data in the multimodal data, and then adaptively adjusts the sampling strategy (value reconstruction or difference compensation) for different intervals based on the pipe section interval comparison table of the two modes, and establishes a unified mileage coordinate across modes. To ensure the reliability of the reconstruction results, this module will adaptively select linear interpolation or cubic spline interpolation according to the difference in size between pipe sections, ensuring the integrity of information while taking into account the running speed. At the same time, to avoid abnormal shrinkage of the first and last welds, this module will automatically add virtual welds in reasonable areas and delineate virtual pipe sections to ensure that the data is not distorted. (2) Finally, to intuitively obtain the reliability of the reconstruction results, this application proposes a reconstruction result verification module 6. On the one hand, it verifies the number of samples in the pipe section interval of the reconstruction results, and on the other hand, it realizes the consistency verification of data characteristics in the pipe section interval based on modal decomposition and dynamic time warping technology, and checks the information consistency of each component and the whole. The above process information, including execution logs and data verification results, will be saved in a separate report file so as to verify the reliability of the reconstruction results at any time.

[0089] (2) Bottlenecks in the efficiency and quality of annotation of massive internal detection data The current technical shortcomings are as follows: Data annotation is a fundamental step in pipeline inspection, therefore, the analysis software of various inspection agencies generally has data annotation capabilities. However, existing annotation work is still predominantly done manually. Currently, manual annotation is extremely inefficient when dealing with data spanning hundreds (or even thousands) of kilometers, and this inefficiency is further reduced in projects with numerous inspection channels. Furthermore, manual annotation lacks accuracy, especially for small defects (such as microcracks), which are easily missed. Additionally, manual data annotation relies on experience, making it difficult to annotate emerging multi-source data.

[0090] The specific solutions proposed in this application are as follows: (1) To address the problem of low efficiency in manual data selection, this application proposes a bounding module 3. This module first performs sliding window segmentation on the data of different modalities, then calculates and marks non-stationary sampling points within the window, then clusters points with eight-connectivity relationships into an abnormal region, and finally performs feature matching on the bounding results of each modality to determine the final bounded abnormal region. (2) To address the problem of difficulty in labeling emerging multi-source data, this application proposes a feature mapping and clustering module 4. This module is based on the task of classifying non-stationary region data in multimodal detection, and uses contrastive learning as a tool to process the output of the bounding module 3 and the corresponding equipment operating status (such as operating speed, magnetization level) and pipe information (such as wall thickness) in the corresponding interval, so as to achieve unsupervised clustering of different features. Moreover, the intermediate product of the algorithm, namely the trained feature mapper, can also be used as a pre-trained feature extractor for subsequent algorithm development, which can greatly reduce development costs and development cycle. (3) In particular, this application also proposes a visualization module 7, which provides the function of visualizing the output of the bounding module 3 and can perform manual pre-annotation of the region. Therefore, this module can introduce expert prior knowledge and adjust the pure unsupervised learning task of feature mapping and clustering module 4 into a weakly supervised learning task, thereby further improving the training effect. Moreover, since it is a local visualization, there is no need for a lot of page turning, bounding selection, and annotation operations. Therefore, even if manual work is introduced, it will not excessively reduce work efficiency. At the same time, the semantic large model interface of this module also provides a development foundation for providing semantic constraints to develop multimodal large models.

[0091] (3) Multi-source dataset organization and domain adaptation barriers Current technical limitations: Variations in factors such as detector type, number of detection channels, pipe diameter, pipe material, operating speed, and operating environment lead to significant distribution shifts in data collected from different devices and under different operating conditions. These distributional differences make it extremely difficult to construct high-quality multimodal datasets covering multiple devices and operating conditions, resulting in a severe "data silo" phenomenon. Ultimately, this limits the generalization performance of general models, and the training of large multimodal models faces obstacles due to a lack of sufficient high-quality data.

[0092] This application specifically addresses the aforementioned problems by constructing a general multimodal large-scale dataset construction module 5 based on domain adaptation technology. First, a multi-level label constraint system is built. For feature data, it not only simply records the category and geometric information (length, width, and depth), but also uses detection equipment parameters (such as detection principle, number of detection channels, number of sensors in a single probe group, and probe spacing), pipe structure feature parameters of the feature interval (pipe wall thickness, material type, and pipe section type within the monitoring interval), and operating status parameters (real-time operating speed of the equipment, magnetization level) as key constraint dimensions, accurately characterizing the multimodal feature source of each data set. This label system enables fine-grained source tracing and feature partitioning of the data source, providing clear constraint boundaries for subsequent domain adaptive mapping. Then, a feature space alignment algorithm is constructed based on deep neural network technology. An adversarial training mechanism between the feature extractor and the domain discriminator is built by introducing a gradient reversal layer. Specifically, the feature extractor learns to map multi-source heterogeneous data (different devices / operating conditions) with label constraints to a unified feature space, while the domain discriminator attempts to distinguish the source domains of these mapped features. Through adversarial game theory, the feature extractor is forced to eliminate distribution differences between domains, ultimately achieving consistency between the feature representations of different device / operating condition data and the target large dataset. This process achieves fine-grained alignment of feature distributions through joint optimization of minimizing the maximum mean difference (MMD) between domains and the adversarial loss function. In particular, the fully trained domain mapper not only serves the current dataset construction process but can also be integrated as a preprocessing module in a general large model processing flow. The feature mapping and clustering module 4 can perform real-time feature space transformation on newly input device and operating condition-specific data, mapping it to the standardized feature representation space used by the large model. This significantly improves the adaptability of the multimodal large model to unknown devices and complex operating conditions, achieving a leap from "data silos" to "data oceans."

[0093] (4) Lack of a dedicated system for constructing multimodal datasets for pipeline detection Current technical shortcomings: Although core technologies such as data reconstruction, comparative learning, and domain adaptation have formed relatively mature technical systems in general fields or some special fields, and have been put into use and achieved certain results, there are still significant shortcomings in adapting to special scenarios of pipeline inspection. For example, each technical module generally exists in isolation, and there is a lack of systematic coupling design between modules to meet the needs of constructing multimodal datasets for pipeline inspection, and an efficient closed-loop integrated system has not been established.

[0094] This application specifically addresses the following issues: To resolve the aforementioned problems, this application employs modular encapsulation technology to functionally encapsulate core algorithms such as data reconstruction and alignment (solving the problem of unifying mileage benchmarks for multi-source heterogeneous data), target detection (defining pipeline defects and features), contrastive learning (unsupervised (semi-supervised) classification of pipeline defects and features), and domain adaptation (acquiring large-scale datasets of features across equipment and operating conditions), forming a unified system. Based on this, a software platform for constructing a split-type multimodal dataset for pipeline internal inspection is further developed. This provides a standardized and scalable technical foundation for constructing proprietary datasets (adapted to specific operating conditions) and generalized datasets with generalization capabilities (supporting cross-operating conditions and cross-equipment), helping to improve the intelligence level and engineering generalization of pipeline internal inspection data analysis.

[0095] For example, existing methods for multimodal data processing and dataset construction in pipeline inspection generally suffer from three major shortcomings: First, the technical modules are isolated, with core algorithms such as data reconstruction, comparative learning, and domain adaptation operating independently, lacking a systematic coupling design to meet the needs of multimodal dataset construction, making it difficult to form an efficient closed loop; second, unsupervised analysis capabilities are lacking, and when faced with massive amounts of unlabeled multi-source data, existing methods mostly rely on manual experience or single-modal rules, lacking the ability to autonomously perceive and cluster non-stationary regions of multimodality, and are unable to cope with the mining of implicit features under complex working conditions; third, the concept of cross-working-condition dataset construction is vague, lacking a clear understanding of the distribution shift caused by the interplay of multiple factors such as equipment type, inspection channel, pipeline material, and operating speed, failing to establish a fine-grained label constraint system to characterize data sources, and failing to form an effective domain adaptation mechanism to break down "data silos," resulting in limited generalization of general models and hindered training of large multimodal models due to insufficient data quality.

[0096] To address the aforementioned issues, this application focuses on the core challenges of multimodal data processing and dataset construction in the field of pipeline inspection. It proposes a system for constructing multimodal datasets for split-type pipeline detectors, providing an integrated solution from data alignment and intelligent annotation to cross-domain fusion, and progressively demonstrating its technical rationality through reasoning. Firstly, regarding the multimodal mileage deviation problem caused by speed fluctuations in split-type detectors, existing methods lack dynamic segmentation and adaptive resampling mechanisms, relying only on simple scaling or overall mapping, which easily leads to data distortion. This application, however, uses the circumferential weld as an anchor point to divide pipeline sections, adaptively selecting linear interpolation or cubic spline interpolation for resampling based on pipe section size differences, and introducing virtual welds at the beginning and end welds to prevent distortion. Subsequently, through dual verification of sample quantity and modal decomposition and DTW feature consistency, it ensures that mileage alignment is both accurate and traceable, laying a temporally consistent foundation for subsequent multimodal analysis.

[0097] Building upon this foundation, to overcome the bottlenecks of low efficiency and poor accuracy in manual annotation of massive datasets, existing methods rely on manual bounding boxes. This approach is extremely inefficient and prone to missing small details when dealing with data spanning hundreds of kilometers, and faces even greater challenges in annotating emerging multi-source data. This application designs a sliding window for autonomous bounding of non-stationary regions and 8-connected clustering. It then combines regional correlation analysis and a priori physical trend model for preliminary quantification and pattern recognition. Furthermore, it generates suggested labels through feature matching and weakly supervised visual interaction, ensuring annotation efficiency while incorporating expert knowledge to improve accuracy. Further, addressing the "data silos" caused by differences in equipment and operating conditions, existing methods lack multi-dimensional label constraints and domain adaptation mechanisms, failing to unify the feature representation of data from different sources. This application constructs a multi-dimensional label system encompassing equipment parameters, pipeline structure, and operating status. Based on adversarial training with a gradient inversion layer and MMD distribution constraints, it maps data from different sources to a unified feature space. This allows small-scale data to be integrated into large-scale standardized datasets through domain adaptation. The trained domain mapper can also serve as a preprocessing module for general large models, eliminating distribution shifts in real time and supporting the leap from "data silos" to "data oceans." The entire solution, in a modular, visual, and incrementally scalable manner, realizes a multimodal dataset construction system that is interactive, efficient, intelligent, and versatile, providing a solid data and technical foundation for the intelligent upgrade of pipeline inspection and the application of large-scale models across scenarios.

[0098] The above detailed embodiments further illustrate the purpose, technical solution, and beneficial effects of the embodiments of this application. It should be understood that the above are merely specific embodiments of the embodiments of this application and are not intended to limit the protection scope of the embodiments of this application. Any modifications, equivalent substitutions, improvements, etc., made on the basis of the technical solutions of the embodiments of this application should be included within the protection scope of the embodiments of this application.

Claims

1. A system for constructing a multimodal dataset of a split-type in-pipe detector, characterized in that, include: Data reading module (1), data reconstruction module (2), bounding module (3), feature mapping and clustering module (4), dataset construction module (5); The data reading module (1) is configured as follows: Acquire multimodal data collected by a split-type pipeline in-situ detector; The data reconstruction module (2) is configured as follows: Determine the baseline data in the multimodal data; the baseline data is the pipeline inspection data collected by one of the pipeline in-pipe detectors in the multimodal data; Based on the baseline data, the remaining multimodal data is aligned along the mileage dimension so that the number of pipe segment samples in the remaining modal data is equal to the number of pipe segment samples in the baseline data, thus obtaining the target multimodal data; the remaining modal data refers to the modal data other than the baseline data. The framing module (3) is configured as follows: The data signal corresponding to the target multimodal data is compared with a preset signal threshold. If the data signal is greater than the preset signal threshold, the data signal is marked as an abnormal signal point. Clustering of adjacent abnormal signal points based on the eight-connection relationship, and merging the abnormal signal points into an abnormal signal region; The feature mapping and clustering module (4) is configured as follows: An encoder is configured for each modal data to map the data signal segments of the abnormal signal region under each modal data into modal features; The modal features are concatenated and a multilayer perceptron is used to obtain the region features. The region features are mapped to a contrast space using a multilayer perceptron for contrastive learning training. The contrastive learning training is used to ensure that the regional features are not affected by noise and modal differences; Using a multilayer perceptron, the region features are mapped to a category space for cluster training; The model parameters of the encoder and the multilayer perceptron are stored after training is completed; the model parameters are used to generate a pre-trained model, and the pre-trained model is used to generate a single dataset based on multimodal data under a single working condition; The dataset building module (5) is configured as follows: A multi-level label constraint system is established; the label constraint system includes: pipeline defect category, data acquisition principle, hardware configuration, pipeline structural characteristic parameters, and pipeline inspection equipment operating parameters; The label constraint system and several of the single datasets are input into the working condition adaptive framework to generate a multimodal dataset.

2. The system for constructing a multimodal dataset of a split-type pipeline detector according to claim 1, characterized in that, The data reading module (1) is also configured to: Read multimodal data using a one-dimensional array data type; Based on the arrangement rules or number of channels when the multimodal data is saved, the multimodal data is reconstructed, and the data type of the multimodal data is converted into a first two-dimensional array; The multimodal data is transposed to convert the data type of the multimodal data into a second two-dimensional array.

3. The system for constructing a multimodal dataset of a split-type pipeline detector according to claim 1, characterized in that, The data reconstruction module (2) is further configured as follows: Based on the circumferential weld alignment list of the multimodal data, the pipe section information of the multimodal data is extracted and stored in the pipe section information dictionary; the pipe section information includes: the pipe section length, the number of pipe section samples, and the global index of the pipe section interval corresponding to the multimodal data; Virtual welds are added before the first weld and after the last weld of the pipe section corresponding to the multimodal data to form virtual pipe sections, and the pipe section information of the multimodal data corresponding to the virtual pipe sections is stored in the pipe section information dictionary; Based on the pipe section information dictionary, the remaining multimodal data is sequentially truncated and resampled using interpolation or sampling methods to make the number of pipe section samples in the remaining modal data equal to the number of pipe section samples in the reference data, thus obtaining the data to be spliced. The data to be spliced is spliced sequentially to obtain the target multimodal data.

4. The system for constructing a multimodal dataset of a split-type pipeline detector according to claim 3, characterized in that, The data reconstruction module (2) is further configured as follows: When the number of pipe section samples in the data to be spliced is less than the number of pipe section samples in the reference data, and the difference between the number of pipe section samples in the data to be spliced and the number of pipe section samples in the reference data is greater than a preset ratio, then the cubic spline interpolation method is used for resampling. When the number of pipe section samples in the data to be spliced is less than the number of pipe section samples in the reference data, and the difference between the number of pipe section samples in the data to be spliced and the number of pipe section samples in the reference data is less than or equal to a preset ratio, then linear interpolation is used for resampling. If the number of pipe section samples in the data to be spliced is greater than the number of pipe section samples in the reference data, then the sampling method is used for resampling.

5. The system for constructing a multimodal dataset of a split-type pipeline detector according to claim 1, characterized in that, The system also includes: The reconstruction result verification module (6) is configured as follows: The number of pipe section samples in the target multimodal data is compared with the number of pipe section samples in the benchmark data to generate a sample number comparison result; Variational mode decomposition technology is used to decompose the data signal corresponding to the target multimodal data into several time-domain components; The time-domain component is subjected to frequency domain transformation to obtain the frequency domain features corresponding to the time-domain component; The waveform similarity is calculated by using dynamic time warping technology on the time-domain components. The similarity of the frequency domain feature energy distribution corresponding to the frequency domain feature is compared using the maximum information coefficient to obtain a frequency domain feature similarity report; Based on the sample quantity comparison results, waveform similarity, and frequency domain feature similarity report, a reconstruction result verification report is obtained.

6. The system for constructing a multimodal dataset of a split-type pipeline detector according to claim 1, characterized in that, The framing module (3) is also configured to: The target multimodal data is divided into windows of preset length, and the data signals corresponding to the target multimodal data are Gaussian smoothed along the length and width of each window.

7. The system for constructing a multimodal dataset of a split-type pipeline detector according to claim 1, characterized in that, The framing module (3) is further configured as follows: If there is only one abnormal signal point corresponding to a single modal data in the first abnormal signal region, then the first abnormal signal region is removed; if there are multiple abnormal signal points corresponding to multiple modal data in the second abnormal signal region, then the second abnormal signal region is retained. Based on the second abnormal signal region, a bounding result file is generated; The defined result file includes: the mileage range, clock direction, mode, anomaly intensity statistics, and cross-modal matching identifier of the abnormal signal region.

8. The system for constructing a multimodal dataset of a split-type pipeline detector according to claim 7, characterized in that, The system also includes: Visualization module (7), which is configured as follows: Based on the mileage range and clock direction information in the defined result file, segments are extracted from the corresponding modal data; The captured segment is sent to a display device for display in a set format; the set format includes: line chart, pseudo-color image, or grayscale image; The display device is used to add the tag data corresponding to the extracted segment to the category space.

9. The system for constructing a multimodal dataset of a split-type pipeline detector according to claim 1, characterized in that, The clustering training includes: unsupervised clustering mode and weakly supervised clustering mode; the feature mapping and clustering module (4) is further configured as follows: When all the regional features have labeled data, unsupervised clustering pattern training is performed using the KMeans or HDBSCAN algorithm combined with the silhouette coefficient. When the region features contain labeled data, the constrained KMeans algorithm is used for weakly supervised clustering pattern training.

10. The system for constructing a multimodal dataset of a split-type pipeline detector according to claim 1, characterized in that, The dataset building module (5) is also configured to: The label constraint system and several of the single datasets are input into the feature extractor to make the modal features tend to be distributed uniformly in a unified space.