Leakage detection model construction method, leakage detection method and device
By preprocessing and extracting features from pipeline images and dynamically adjusting the contribution of visual features, the problem of high false negative rate in the detection of minute leaks by static models is solved, and higher detection accuracy is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- YILIAN CLOUD COMPUTING (HANGZHOU) CO LTD
- Filing Date
- 2026-03-03
- Publication Date
- 2026-06-30
Smart Images

Figure CN121767359B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of artificial intelligence technology, and in particular to a method for constructing a leakage detection model, a leakage detection method, and an apparatus. Background Technology
[0002] Pipelines inevitably experience aging, corrosion, and weld defects during long-term use. Additionally, loose flanges and improper valve closure are also common occurrences. These factors can all lead to leaks of the media transported within the pipeline (especially hazardous substances such as petroleum and chemicals). Leaks not only waste resources and cause economic losses, but can also trigger serious accidents such as fires and explosions, posing a significant threat to public safety, social security, and the environment. Therefore, developing efficient and reliable pipeline leak detection technologies to achieve timely detection and early warning of leaks, especially early, minor leaks, is of paramount practical importance.
[0003] With the popularization of artificial intelligence technology, static visual detection models built based on deep learning methods have become one of the mainstream technical approaches in the field of pipeline leak detection. These methods typically rely on pre-trained convolutional neural networks (such as CNN architectures of various depths), recurrent neural networks (such as LSTM), or Transformer models. The basic detection principle is as follows: first, the model is trained end-to-end using a large amount of labeled data (leaked and normal samples), enabling the model to learn a fixed mapping relationship from the input image to the leak classification; during deployment, the trained model parameters are frozen, and each frame of input image is processed using the same forward propagation process, outputting the detection result. Within the scope of the training data, these methods have shown some effectiveness for large-scale leaks with distinct features.
[0004] However, existing static models have the following drawbacks when dealing with minor pipeline leaks: the visual features of minor leaks have extremely low signal-to-noise ratios, and the feature extraction patterns of static models are fixed, making it impossible to adaptively adjust their analysis focus and strategies according to the characteristics of these weak signals. This makes it difficult to effectively identify the real leak signals from the complex background noise of pipelines, resulting in a high rate of missed detections. Summary of the Invention
[0005] This application provides a method for constructing a leak detection model, a leak detection method, and an apparatus to solve the problem of high failure rate in the detection of minor leaks in pipelines in the prior art.
[0006] Firstly, this application provides a method for constructing a leakage detection model, the method comprising:
[0007] The original images of the equipment to be inspected are preprocessed to obtain preprocessed images;
[0008] Perform feature extraction on the preprocessed image to obtain a first feature map of the preprocessed image;
[0009] In the preprocessed image, multiple different types of visual attribute values are determined, and the magnitude of the visual attribute values is positively correlated with the likelihood of leakage in the device under test.
[0010] Based on the magnitude of each visual attribute value, target visual features corresponding to each visual attribute value are extracted from the first feature map, and a second feature map is generated based on the target visual features. The larger the visual attribute value, the greater the contribution of the corresponding target visual feature to the second feature map.
[0011] Based on the second feature map, the leakage probability of the device to be detected is determined;
[0012] The leakage detection model is constructed based on the leakage probability.
[0013] In one possible implementation, determining multiple different types of visual attribute values in the preprocessed image includes:
[0014] The visual difference degree of different regions in the preprocessed image, the structural complexity and texture disorder degree of the preprocessed image are determined, and the visual difference degree, the structural complexity and the texture disorder degree are used as the visual attribute values.
[0015] In one possible implementation, determining multiple different types of visual attribute values in the preprocessed image includes:
[0016] The texture inconsistency score, region anomaly index, and dynamic change intensity in the preprocessed image are determined, and the texture inconsistency score, region anomaly index, and dynamic change intensity are used as the visual attribute values.
[0017] In one possible implementation, determining the visual difference in different regions of the preprocessed image, the structural complexity of the preprocessed image, and the texture disorder of the preprocessed image includes:
[0018] The visual difference is determined based on a first ratio of the maximum and minimum pixel values in the preprocessed image, wherein the first ratio is positively correlated with the visual difference.
[0019] Calculate the pixel gradient magnitude of each pixel in the preprocessed image;
[0020] Pixels whose corresponding pixel gradient magnitude is greater than a preset threshold are identified as target pixels;
[0021] The structural complexity is determined based on a second ratio of the number of target pixels to the number of pixels in the preprocessed image, wherein the second ratio is positively correlated with the structural complexity.
[0022] The texture disorder is determined based on the entropy value of grayscale in the preprocessed image, and the entropy value is positively correlated with the texture disorder.
[0023] In one possible implementation, performing feature extraction on the preprocessed image to obtain a first feature map of the preprocessed image includes:
[0024] Extract feature maps at multiple spatial scales from the preprocessed image;
[0025] The feature maps of the multiple spatial scales are fused to generate the first feature map.
[0026] In one possible implementation, extracting feature maps at multiple spatial scales from the preprocessed image includes:
[0027] The preprocessed image is convolved with convolution kernels of different sizes to extract feature maps of multiple spatial scales.
[0028] In one possible implementation, the preprocessing of the multiple frames of original images of the device to be detected to obtain preprocessed images includes:
[0029] The original image is subjected to brightness correction and artifact correction to obtain the preprocessed image. The brightness correction is used to adjust the pixel values of the original image to the target range, and the artifact correction is used to remove artifacts caused by the vibration of the device under test in the original image.
[0030] In one possible implementation, the step of extracting target visual features corresponding to each visual attribute value from the first feature map based on the magnitude of each visual attribute value, and generating a second feature map based on the target visual features, includes:
[0031] Based on the magnitude of each of the visual attribute values, convolution kernel parameters are generated for the convolution kernel, and the convolution kernel parameters are used to indicate the contribution of each of the target visual features to the second feature map;
[0032] The first feature map is convolved by the convolution kernel based on the kernel parameters to generate the second feature map.
[0033] Secondly, this application provides a leakage detection method, which is based on a leakage detection model. The leakage detection model is constructed using the method described in the first aspect. The leakage detection method includes:
[0034] Acquire multiple frames of images of the device under test;
[0035] The multi-frame images are input into the leakage detection model, which then outputs the leakage probability of the device under test based on the multi-frame images.
[0036] Thirdly, this application provides a leakage detection model construction device, the device comprising:
[0037] The preprocessing module is used to preprocess multiple frames of original images from the device to be inspected to obtain preprocessed images.
[0038] The feature extraction module is used to perform feature extraction operations on the preprocessed image to obtain a first feature map of the preprocessed image;
[0039] In the preprocessed image, multiple different types of visual attribute values are determined, and the magnitude of the visual attribute values is positively correlated with the likelihood of leakage in the device under test.
[0040] Based on the magnitude of each visual attribute value, target visual features corresponding to each visual attribute value are extracted from the first feature map, and a second feature map is generated based on the target visual features. The larger the visual attribute value, the greater the contribution of the corresponding target visual feature to the second feature map.
[0041] The model building module is used to determine the leakage probability of the device to be detected based on the second feature map.
[0042] The leakage detection model is constructed based on the leakage probability.
[0043] Fourthly, this application provides an electronic device, including: a processor, and a memory communicatively connected to the processor;
[0044] The memory stores computer-executed instructions;
[0045] The processor executes computer execution instructions stored in the memory to implement the method as described in the first aspect, or to implement the method as described in the second aspect.
[0046] Fifthly, this application provides a computer-readable storage medium storing computer-executable instructions, which, when executed by a processor, are used to implement the method described in the first aspect, or to implement the method described in the second aspect.
[0047] The technical effects provided by this application are as follows:
[0048] Compared to traditional static models that use a fixed feature extraction pattern for all images, making them ill-suited for handling minute leaks with extremely low signal-to-noise ratios, this method dynamically adjusts the contribution of each visual attribute value's corresponding visual feature to the second feature map based on the magnitude of that value. This allows the model to primarily rely on visual features corresponding to larger visual attributes when analyzing the second feature map to predict leak probabilities. Since larger visual attribute values indicate a higher likelihood of leaks in the device under test, the model constructed by this method can eliminate interference from irrelevant information and focus on visual features strongly correlated with leaks in the device under test, thereby improving the accuracy of leak identification and reducing the false negative rate. Attached Figure Description
[0049] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application.
[0050] Figure 1 Schematic diagram of the leakage detection model construction method provided in the embodiments of this application Figure 1 ;
[0051] Figure 2 Schematic diagram of the leakage detection model construction method provided in the embodiments of this application Figure 2 ;
[0052] Figure 3 This is a schematic flowchart of the leakage detection method provided in the embodiments of this application;
[0053] Figure 4 This is a schematic diagram of the structure of the leakage detection model construction device provided in the embodiments of this application;
[0054] Figure 5 A schematic diagram of an electronic device provided in an embodiment of this application.
[0055] The accompanying drawings illustrate specific embodiments of this application, which will be described in more detail below. These drawings and descriptions are not intended to limit the scope of the concept in any way, but rather to illustrate the concept of this application to those skilled in the art through reference to particular embodiments. Detailed Implementation
[0056] Exemplary embodiments will now be described in detail, examples of which are illustrated in the accompanying drawings. When the following description relates to the drawings, unless otherwise indicated, the same numbers in different drawings denote the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments matching this application. Rather, they are merely examples of apparatuses and methods matching some aspects of this application as detailed in the appended claims.
[0057] The high failure rate in existing technologies for detecting minor pipeline leaks is primarily due to the fixed feature extraction patterns of static models. These models cannot adaptively adjust their analysis focus and strategies based on the characteristics of weak signals, making it difficult to effectively identify genuine leak signals from complex pipeline background noise, resulting in a high failure rate. Therefore, the inventors of this application considered identifying multiple visual attribute values associated with leaks in an image. Larger visual attribute values indicate a higher likelihood of a leak in the device under test. Then, based on the magnitude of these visual attribute values, visual features corresponding to each value are extracted from the image's feature map. Larger visual attribute values contribute more to the final feature map. The extracted visual features are used to generate the final feature map, which is then used to determine the leak probability of the device under test. The principle is that larger visual attribute values indicate a higher likelihood of leaks; therefore, more visual features corresponding to larger visual attribute values are extracted. This results in a final feature map containing more visual features corresponding to these larger values, allowing the model to focus more on visual features strongly correlated with leaks during inference, thereby improving the ability to identify minor leaks.
[0058] The technical solution of this application and how the technical solution of this application solves the above-mentioned technical problems are described in detail below with specific embodiments. These specific embodiments can be combined with each other, and the same or similar concepts or processes may not be described again in some embodiments. The embodiments of this application will now be described with reference to the accompanying drawings.
[0059] First, it should be noted that in this embodiment, when building the model and performing leak detection, multiple consecutive frames of images of the device under test can be acquired, or multiple frames of images of the device under test can be acquired at fixed time intervals. The following explanation uses the acquisition of multiple consecutive frames of images of the device under test as an example.
[0060] Figure 1 Schematic diagram of the leakage detection model construction method provided in the embodiments of this application Figure 1 This method can be applied to servers, such as... Figure 1 As shown, the method includes:
[0061] S101. Preprocess the multiple frames of original images of the device to be inspected to obtain preprocessed images;
[0062] In this embodiment of the application, the device to be tested can be a pipeline or other devices that need to be leak-detected.
[0063] In this step, the server can first receive a continuous sequence of multiple frames of raw RGB images transmitted from the camera. For example, the frame rate of the image sequence can be 30fps, and the sequence length can be 10 frames to increase timing information.
[0064] In the preprocessing stage, the server can employ image enhancement methods based on Generative Adversarial Networks (GANs). Specifically, the server can deploy a pre-trained denoising GAN model, trained using a large pipeline image dataset including various lighting conditions, motion blur, and noisy scenes. For each frame, the server inputs the image into the GAN generator and outputs an enhanced image. This process includes: first, dividing the image into blocks (e.g., 256×256 pixel blocks), applying the GAN model for denoising and sharpening, and then stitching the images back together to form a complete image. Simultaneously, the server can integrate an adaptive brightness adjustment module, ensuring consistent lighting by calculating the global histogram of the image and applying histogram equalization. It's important to note that the principle behind using a GAN model for preprocessing leverages its adversarial training mechanism. The generator learns the mapping from noisy or blurred images to sharp images, while the discriminator ensures the realism of the output, effectively handling complex interferences in industrial environments, such as dynamic lighting and vibration artifacts. Histogram equalization enhances image contrast by redistributing pixel intensity.
[0065] Through the above processing, the final output is a preprocessed image sequence with the characteristics of denoising, deblurring, and brightness equalization, which can be used for subsequent feature extraction.
[0066] S102. Perform feature extraction on the preprocessed image to obtain the first feature map of the preprocessed image;
[0067] In this step, the server can use a pre-trained deep convolutional neural network (CNN) model to perform feature extraction on the preprocessed image. Specifically, after the server inputs the preprocessed image into the CNN model, the CNN model extracts multi-level features through multiple convolutional layers and attention mechanisms. The server can then extract the output of the model's intermediate layers (e.g., the feature map of the last convolutional layer), with dimensions H×W×C (H and W are spatial dimensions, and C is the number of channels), as the first feature map.
[0068] S103. Determine multiple different types of visual attribute values in the preprocessed image. The magnitude of the visual attribute values is positively correlated with the possibility of leakage in the device under test.
[0069] In this step, the server can calculate the following visual attribute values from the preprocessed image: texture consistency score, region anomaly index, and dynamic change intensity. The specific calculation methods for these visual attribute values are as follows:
[0070] Texture Inconsistency Score: In pipeline leak detection, normal, dry pipe surfaces (such as metal flanges and rubber seals) typically possess relatively stable and consistent texture patterns. However, once a leak occurs, the wetting, adhesion, accumulation, or flow of the leaking substance (such as liquid or gas) disrupts this inconsistency, causing abnormal changes in the texture of local areas of the pipe surface. These changes may include becoming mottled, developing water stain patterns, or producing irregular reflective areas. Therefore, an increase in the texture inconsistency score is positively correlated with an increase in the likelihood of a leak. The server can use a pre-trained texture analysis network to calculate the texture uniformity of local image regions. Specifically, this is achieved by dividing the image into a grid (e.g., 16×16 blocks), extracting texture features from each block, and calculating the cosine similarity with the global texture model (representing the normal pipe appearance). This yields a texture inconsistency score (1 - cosine similarity). A higher texture inconsistency score indicates a more disordered texture and a higher likelihood of a leak.
[0071] Region Anomaly Index: The Region Anomaly Index is a dimensionless scalar value used to quantify the degree of deviation between a specific region in an image and the normal pipe background pattern. Its detection principle is as follows: The server first learns the healthy appearance pattern (including color, texture, and structure) of key pipe components (such as flanges and valves) under leak-free conditions. When a new pre-processed image is input, the server can calculate the degree of difference between each local region in the image and the learned normal pattern. This degree of difference is the Region Anomaly Index. The higher the value, the higher the probability of an anomaly in that region (such as wetting, discoloration, frost, or structural deformation caused by a leak), thus positively correlated with the probability of a leak. Specifically, the server can use an isolated forest or unsupervised anomaly detection algorithm to train a normal pipe appearance model on the pre-processed image, and then calculate the reconstruction error of each pixel region. The larger the error, the higher the Region Anomaly Index, which is positively correlated with the probability of a leak.
[0072] Dynamic Change Intensity: This value quantifies the degree of change in pixel motion patterns in a continuous image sequence, distinct from normal background disturbances, caused by the dynamic behavior of leaked substances (such as gas escape or liquid seepage) such as flow and diffusion. The principle is that leaks in the device under test are often accompanied by continuous and specific microscopic motion patterns of pixels, while environmental disturbances (such as changes in illumination or overall vibration of the device under test) typically manifest as global, irregular, or instantaneous motion patterns of pixels. Therefore, the magnitude of the dynamic change intensity value is positively correlated with the likelihood of a leak. Specifically, for multiple consecutive pre-processed images, the server can calculate the average value of the inter-frame optical flow amplitude and use the optical flow method to capture the motion patterns of pixels to obtain the dynamic intensity change value. A higher dynamic change intensity value indicates a greater likelihood of fluid disturbance caused by a leak.
[0073] S104. Based on the magnitude of each visual attribute value, extract the target visual features corresponding to each visual attribute value from the first feature map, and generate a second feature map based on the target visual features. The larger the visual attribute value, the greater the contribution of the corresponding target visual feature to the second feature map.
[0074] In this step, the server combines the first feature map (from S102) with visual attribute values to generate a second feature map through an attention network. First, the server normalizes each visual attribute value extracted in the previous step to the 0-1 range, then inputs it into a fully connected network to generate a corresponding attention weight map. For example, for each visual attribute value, the server can use a fully connected layer to map the visual attribute value to a weight matrix with the same spatial size as the first feature map through upsampling or broadcasting. Next, the server can apply channel attention and spatial attention to the first feature map, where channel attention adjusts the importance of feature channels based on visual attribute values, and spatial attention emphasizes regions with high visual attribute values. Finally, the second feature map is generated through weighted summation or element-wise multiplication fusion, where regions with higher visual attribute values contribute more to the second feature map.
[0075] S105. Based on the second feature map, determine the leakage probability of the device to be tested;
[0076] In this step, the server can use a time-aware classification network to process the second feature map sequence to determine the leakage probability. Specifically, the server can stack the second feature maps of multiple consecutive frames in temporal order and input them into a gated recurrent unit (GRU) or a Transformer encoder for temporal modeling. The GRU network contains hidden layers (e.g., 64 units) and captures temporal dependencies through a gating mechanism; the Transformer encoder uses self-attention to calculate inter-frame relationships. The outputs of the GRU network and the Transformer encoder are converted into probability values between 0 and 1 through fully connected layers and a sigmoid activation function, representing the likelihood of leakage in the device being detected.
[0077] S106. Construct a leakage detection model based on leakage probability.
[0078] During the training phase, the server can train the entire model using a prepared large-scale labeled dataset (containing video clips of normal and leaking pipes). The technical principle is to use the backpropagation algorithm and chain rule to propagate the gradient generated by the final loss function (such as binary cross-entropy loss or focus loss) back from the output of S105 to the input of S101, thereby adjusting the model parameters. Specifically, this includes: fine-tuning the pre-trained CNN weights according to the specific needs of the leak detection task, making them more focused on extracting features related to pipe leaks; adjusting the weight parameters in the attention network to more accurately establish the mapping relationship between visual attribute values and feature importance; and updating the generator parameters in the GAN model so that the generated cleaned images are more conducive to subsequent leak identification.
[0079] During the training phase, the server can use the AdamW optimizer. The AdamW optimizer adds weight decay regularization to the Adam algorithm, which can prevent overfitting and improve the model's generalization ability. Simultaneously, to address the class imbalance problem in industrial scenarios where leaked samples are far fewer than normal samples, the server can employ a focus loss function. This function reduces the contribution of easily classified samples (mostly normal samples) to the total loss, allowing the model to focus more on difficult-to-classify samples (small leaked samples) during training, thereby improving the detection rate of small-scale leaks.
[0080] After model training, the server can evaluate the trained model using an independent test set. Evaluation metrics can include precision, recall, and F1 score, as well as plotting ROC curves (Receiver Operating Characteristic curves) and calculating AUC (Area Under the Curve). The ROC curve and AUC value can be used to evaluate the model's performance in binary classification problems. These evaluation metrics comprehensively measure the model's performance at different decision thresholds. Simultaneously, the server can analyze the model's performance on difficult samples, such as those with drastic lighting changes or other interference sources, ensuring its environmental robustness. These validation results allow for the selection of a more reasonable final decision threshold for the model (e.g., classifying a probability greater than 0.7 as leakage).
[0081] Furthermore, the server can perform lightweight processing on the model and convert it into an efficient inference format (such as ONN× or TensorRT engine) to fully utilize the hardware acceleration capabilities of GPUs or dedicated AI chips.
[0082] Finally, the server will deploy the optimized model files to the production environment, specifically to a cloud server or edge computing device.
[0083] Figure 1 The method shown has the following technical effects:
[0084] Compared to traditional static models that use a fixed feature extraction pattern for all images, making them ill-suited for handling minute leaks with extremely low signal-to-noise ratios, this method dynamically adjusts the contribution of each visual attribute value's corresponding visual feature to the second feature map based on the magnitude of that value. This allows the model to primarily rely on visual features corresponding to larger visual attributes when analyzing the second feature map to predict leak probabilities. Since larger visual attribute values indicate a higher likelihood of leaks in the device under test, the model constructed by this method can eliminate interference from irrelevant information and focus on visual features strongly correlated with leaks in the device under test, thereby improving the accuracy of leak identification and reducing the false negative rate.
[0085] Figure 2 Schematic diagram of the leakage detection model construction method provided in the embodiments of this application Figure 2 ,like Figure 2 As shown, the method includes:
[0086] S201. Perform brightness correction and artifact correction on the original image to obtain a preprocessed image. Brightness correction is used to adjust the pixel values of the original image to the target range, and artifact correction is used to remove artifacts caused by the vibration of the device to be detected in the original image.
[0087] Since the original image may be too bright or too dark due to sudden changes in ambient light, brightness correction is required to standardize the image pixel values and improve the reliability of feature extraction. In this embodiment, a statistical normalization method is used to adjust the image grayscale values to the target range.
[0088] After brightness correction, the server first converts each frame of the image to grayscale to simplify calculations. Then, the server calculates the mean and standard deviation of the pixel values for the entire grayscale image. Next, the server applies a pre-trained U-Net segmentation network to generate a pipe region mask. Specifically, the server inputs the input image to the U-Net segmentation network, which outputs a probability map and generates a binary mask through thresholding. Finally, the image pixel values are subtracted from the mean pixel values, divided by the sum of the standard deviation and a small constant, and then multiplied element-wise with the mask to ensure that brightness correction is performed only on the pipe regions in the image, ultimately outputting a uniformly lit image. It should be noted that the U-Net network is trained using a large number of labeled images during the training phase, where the pipe regions are marked as 1 and the background as 0, thus enabling the generation of the pipe region mask.
[0089] Since the device under test may vibrate, causing vibration artifacts in the original image, it is necessary to eliminate these artifacts so that subsequent analysis can focus on the leakage characteristics of the device under test. In artifact correction, the server first selects a frame without leakage and jitter as a reference frame. Then, the server performs a difference operation between the current frame and the reference frame, calculating the absolute difference. Next, the server applies a morphological erosion operation (e.g., using a 3×3 kernel) to the difference result to remove boundary noise. Subsequently, the server uses an adaptive thresholding method (e.g., Otsu's method) to binarize the difference image. Specifically, Otsu's method automatically determines the threshold by minimizing the intra-class variance, marking regions with pixel values greater than the threshold as motion artifacts (mask value 1), and others as 0. Finally, the server blends the reference frame and the current frame according to mask guidance; that is, masked areas use reference frame pixels, and unmasked areas use current frame pixels, ultimately outputting a cleaned image that suppresses vibration artifacts.
[0090] S202. Extract feature maps of multiple spatial scales from the preprocessed image, and fuse the feature maps of multiple spatial scales to generate the first feature map.
[0091] In this step, the server can preprocess the image input using three parallel convolutional paths, each focusing on features at a different spatial granularity:
[0092] Local detail resolver: Uses small convolutional kernels (e.g., 3×3) with a stride of 1 and "same" padding to capture pixel-level subtle changes (such as faint reflections). The convolution is followed by a ReLU activation function and a batch normalization layer, resulting in an output feature map with 64 channels.
[0093] Mid-range structure parser: Uses a medium-sized convolutional kernel (e.g., 5×5), stride of 1, and "same" padding, focusing on structural information such as edges and contours. It is also followed by ReLU and batch normalization, with 64 output channels.
[0094] Global Region Parser: Uses large convolutional kernels (e.g., 7×7), stride of 1, and "same" padding to detect large-area abnormal regions. The processed output is a feature map with 64 channels.
[0095] The output feature maps of the three paths described above have the same height and width, but the number of channels is independent. The server concatenates the three feature maps along the channel dimension to form a comprehensive feature tensor. Subsequently, the server can apply a 1×1 convolution as a feature combiner to perform information fusion and dimensionality reduction on the concatenated tensor: the 1×1 convolution has a kernel size of 1×1, a stride of 1, and the number of output channels can be adjusted (e.g., kept at 64 or further reduced) to generate structured multi-granularity visual context features, i.e., the first feature map, which integrates visual information from different spatial scales from micro to macro. By capturing leakage features at different spatial levels, such as pixel-level reflections (local), edge contours (mid-field), and anomalous regions (global), it can avoid missing features when extracting only a single spatial scale, providing rich material for subsequent model decisions and enhancing the model's sensitivity to weak leakage.
[0096] S203. Determine the visual difference degree of different regions in the preprocessed image, the structural complexity and texture disorder degree of the preprocessed image, and use the visual difference degree, structural complexity and texture disorder degree as visual attribute values. The magnitude of the visual attribute value is positively correlated with the possibility of leakage in the device under test.
[0097] Visual dissimilarity measures the apparent difference between a local region and the global background in a preprocessed image. In leak detection, even minor leaks can cause subtle changes in color, brightness, or reflectivity on the pipe surface (such as liquid wetting or gas adhesion). Visual dissimilarity, calculated using statistical methods, quantifies these changes; a higher visual dissimilarity value indicates a more significant difference between the local region and the background, and a higher probability of a leak. The server first segments the preprocessed image into multiple overlapping or non-overlapping local blocks (e.g., 16×16 pixel blocks). Then, for each block, its difference from the global image is calculated. Specifically, a histogram contrast method can be used, which calculates the Bhattacharyya distance between the grayscale histogram of each block and the grayscale histogram of the entire image. The Bhattacharyya distance is derived by comparing the similarity of the histogram probability distributions; a higher Bhattacharyya distance indicates a more significant difference. Finally, the average Bhattacharyya distance of all blocks is taken as the visual dissimilarity. This method can capture local color or brightness changes caused by leaks.
[0098] Structural complexity is used to quantify the density and clutter of anomalous linear structures or edge clusters in a preprocessed image. Leakage events (such as crack propagation or interface leakage) often introduce additional edges and contours. These anomalies can be quantified by calculating structural complexity through edge detection and contour analysis. A higher structural complexity value indicates a more complex structure and a higher risk of leakage. The server can use the Canny edge detection algorithm for calculation: first, the image is smoothed using a Gaussian filter; then, the gradient magnitude and direction are calculated, and a binary edge map is generated through non-maximum suppression and double thresholding. Next, the ratio of the total number of edge pixels to the total number of pixels in the image is calculated as the edge density. Furthermore, contour detection algorithms (such as chain code-based methods) can be used to extract edge contours and calculate the total length and number of contours. This, combined with the edge density, generates a comprehensive score (such as a weighted average) to more comprehensively reflect structural complexity.
[0099] Texture disorder measures the degree of disorder and randomness in the texture of a preprocessed image. Leaking substances (such as gas disturbances or liquid spread) introduce additional texture variations, distinguishing them from the stable background texture of the pipe. Texture disorder, calculated using filtering and entropy, quantifies this disorder; a higher texture disorder value indicates a more random texture and a more pronounced leak. The server can use Gabor filters with multiple scales and orientations (e.g., 4 scales, 6 orientations) to filter the preprocessed image, obtaining a set of filtered responses. Then, the energy (i.e., the sum of squared pixel values) of each response map is calculated, and the Shannon entropy of these energies is used as the texture disorder. A high entropy value indicates strong texture randomness, which is related to the disturbance caused by the leak. Furthermore, the variance of Local Binary Patterns (LBP) can be combined to enhance the sensitivity to local texture variations.
[0100] Optionally, determine the visual dissimilarity of different regions in the preprocessed image, the structural complexity of the preprocessed image, and the texture disorder of the preprocessed image, including:
[0101] The visual difference is determined based on the first ratio of the maximum and minimum pixel values in the preprocessed image, and the first ratio is positively correlated with the visual difference.
[0102] Calculate the pixel gradient magnitude of each pixel in the preprocessed image;
[0103] Pixels whose corresponding pixel gradient magnitude is greater than a preset threshold are identified as target pixels.
[0104] The structural complexity is determined based on a second ratio of the number of target pixels to the number of pixels in the preprocessed image, and the second ratio is positively correlated with the structural complexity.
[0105] The texture disorder is determined based on the entropy value of grayscale in the preprocessed image, and the entropy value is positively correlated with the texture disorder.
[0106] For visual dissimilarity: The server first scans all pixels in the entire preprocessed image to find the maximum and minimum grayscale values. The server calculates the ratio of the maximum value to the minimum value (to avoid division by zero, this can be the sum of the minimum value and a small constant), which is the first ratio. The first ratio reflects the image contrast intensity; the larger the first ratio, the wider the range of pixel values and the more obvious the visual dissimilarity. Since the first ratio is positively correlated with visual dissimilarity, the server can use the calculated first ratio as the output value of visual dissimilarity.
[0107] For structural complexity, the server can use gradient operators (such as the Sobel operator) to calculate the gradient magnitude of each pixel. Specifically, the server can first apply Sobel convolution kernels in the horizontal and vertical directions to the image, and then calculate the gradient magnitude using the sum of squares and square roots formulas to obtain a gradient magnitude map of the same size as the image.
[0108] The server compares the value of each pixel in the gradient magnitude map with a preset threshold T. If the gradient magnitude of a pixel is greater than T, the pixel is marked as a target pixel (value 1); otherwise, it is marked as a non-target pixel (value 0), thus generating a binary mask image. The server can use an adaptive thresholding method to set the preset threshold to improve environmental robustness. For example, the Otsu algorithm can be used to automatically calculate the threshold. The Otsu algorithm can find the threshold T that minimizes the intra-class variance by analyzing the histogram of the gradient magnitude map, thus adapting to different image conditions.
[0109] Next, the server counts the total number of target pixels (i.e., the number of pixels with a value of 1 in the mask image) and calculates the ratio of this number to the total number of pixels in the preprocessed image (height × width). The second ratio is calculated as: number of target pixels / total number of pixels.
[0110] Since the second ratio is positively correlated with structural complexity, the server can use this ratio as the output value of structural complexity.
[0111] To address texture disorder, the server first calculates a grayscale histogram of the preprocessed image. For example, assuming the image is 8-bit grayscale with a grayscale range of 0 to 255, the server can count the number of pixels appearing at each grayscale level and generate a histogram array. Next, the server normalizes the histogram array into a probability distribution, where the probability of each grayscale level is the number of occurrences of that grayscale level divided by the total number of pixels in the preprocessed image.
[0112] Subsequently, the server calculates Shannon entropy based on the probability distribution. This entropy value can quantify the uncertainty of information; the higher the entropy value, the more disordered the texture.
[0113] Since entropy is positively correlated with texture disorder, the server can use the calculated entropy as the output value of texture disorder.
[0114] The three visual attribute values output in this step are all scalars, and the server can cache these values in memory for calculation in subsequent steps.
[0115] S204. Based on the magnitude of each visual attribute value, generate the convolution kernel parameters of the convolution kernel. The convolution kernel parameters are used to indicate the contribution of each target visual feature to the second feature map. The convolution kernel performs a convolution operation on the first feature map based on the convolution kernel parameters to generate the second feature map. The larger the visual attribute value, the greater the contribution of the corresponding target visual feature to the second feature map.
[0116] In this step, the server obtains three visual attribute values: visual dissimilarity, structural complexity, and texture disorder. These attribute values quantify the physical properties of the image related to leakage.
[0117] Visual difference reflects the apparent difference between a local area and the background; the larger the value, the more obvious the color or brightness change caused by leakage.
[0118] Structural complexity measures the density of abnormal edges or contours; a higher value indicates a more significant structural anomaly introduced by the leak.
[0119] Texture disorder assesses the disorder of the texture; a higher value indicates a stronger random disturbance caused by the flow of leaked material.
[0120] Therefore, the model needs to focus on the visual features corresponding to these visual attribute values.
[0121] In this step, the server first combines three visual attribute values (visual dissimilarity, structural complexity, and texture disorder) into a vector, which serves as the input to the dynamic weight decision-maker. The dynamic weight decision-maker can be a multilayer perceptron (MLP), which has the following structure: an input layer with 3 neurons (corresponding to the three attribute values), a hidden layer with 16 neurons using the ReLU activation function, and an output layer with 6 neurons.
[0122] MLP learns a non-linear mapping function through forward propagation. This mapping function transforms the three attribute values into a 6-dimensional dynamically recombined weight vector. Each component in this weight vector corresponds to the contribution of different feature dimensions, and Softmax ensures that the sum of all weights is 1, forming a normalized probability distribution.
[0123] During training, the parameters (weights and biases) of the MLP can be optimized using the backpropagation algorithm and the Adam optimizer. The goal is to minimize the overall leakage classification loss, enabling the MLP to learn an intelligent mapping method: when a certain visual attribute value (such as structural complexity) is large, it indicates that the structure in the current image is abnormally significant. The MLP will automatically increase the components related to structural features in the weight vector, thereby strengthening the contribution of these features in subsequent fusion.
[0124] For example, if an image has high texture disorder (indicating strong leakage perturbation), the weights output by the MLP will favor increasing the weights of texture-related features; conversely, if the attribute value is small, the weights will decrease accordingly. This mapping relationship is data-driven, and its optimality is ensured through end-to-end training. For instance, when the structural complexity is high, indicating that the structure in the current image is abnormally significant, the MLP will automatically increase the components related to structural features in the weight vector, thereby strengthening the contribution of these features in subsequent fusion; if an image has high texture disorder (indicating strong leakage perturbation), the weights output by the MLP will favor increasing the weights of texture-related features; conversely, if the attribute value is small, the weights will decrease accordingly.
[0125] In the second feature map generation stage, the server can use a Dynamic Kernel Generation Network (KGN) to generate the second feature map. Specifically, KGN takes the dynamically reconstructed weight vector obtained earlier as conditional input, and its structure is a two-layer fully connected network: the first layer maps 6-dimensional weights to 128-dimensional hidden features (using ReLU activation), and the second layer outputs convolutional kernel parameters. The number of parameters in the convolutional kernel is determined by the number of output channels, the number of input channels, and the kernel size (e.g., a 3×3 kernel). The forward propagation of KGN generates a set of convolutional kernel parameters, which are reconstructed into a four-dimensional tensor (e.g., the size is the number of output channels × the number of input channels × 3×3). Then, the server performs a convolution operation between the first feature map and the generated convolutional kernel: using a standard convolution with a stride of 1 and padding of "same", the reconstructed feature map is output, which is the second feature map. It should be noted that the larger the dynamically recombined weight vector, the more "customized" the corresponding convolution kernel parameters will be in strengthening the relevant features. For example, if the visual difference value is large, the convolution kernel generated by KGN will focus on amplifying the differences in details; if the structural complexity value is large, the convolution kernel will enhance the fusion of edge information. That is, visual features with high weights dominate the feature recombination process, and the larger the visual attribute value, the greater the contribution of its corresponding visual feature to the second feature map.
[0126] Since a larger visual attribute value corresponds to a greater contribution of the corresponding feature, the S204 method can automatically focus on weak leak-related signals. For example, when a pipe has a minor leak (such as a leak hole area accounting for less than 5%), the visual difference may only increase slightly, but the mapping of MLP and KGN amplifies these subtle changes, enhancing them in the second feature map. This image content adaptive mechanism overcomes the insensitivity of traditional static models to weak features. Experimental data shows that the F1 score of this invention reaches 0.973, a significant improvement over baseline models (such as SVM or 3D-CNN), and the false negative rate is greatly reduced. Furthermore, the S204 method employs a dynamic weight decision mechanism, enabling the model to distinguish between real leak features and environmental interference. For example, in scenarios of sudden changes in lighting or equipment vibration, visual attribute values may fluctuate due to noise, but the MLP, through training, can learn to ignore non-leakage-related attribute changes (such as a temporary increase in edge density caused by vibration). Only when the visual attribute value remains large and consistent with the leak pattern will the weight increase significantly, thereby reducing the false positive rate and exhibiting higher reliability in complex industrial environments.
[0127] S205. Based on the second feature map, determine the leakage probability of the device to be tested;
[0128] Specifically, in this step, the server can stack multiple consecutive frames of second feature map sequences in chronological order and input them into a Convolutional Long Short-Term Memory (ConvLSTM) network. The ConvLSTM can use 3x3 convolutional kernels with 64 hidden layer channels. This ConvLSTM network can process feature maps at each time step through gating mechanisms (input gate, forget gate, output gate) and cell states, capturing the dynamic evolution patterns of leakage (such as gas diffusion or liquid spread). The final hidden state of the ConvLSTM network contains spatiotemporal features.
[0129] The server reduces the dimensionality of the ConvLSTM network output through a fully connected layer, and then applies the Softmax function to convert the output into a scalar value between 0 and 1 as the leakage probability. The closer the leakage probability is to 1, the greater the possibility of leakage.
[0130] S206. Construct a leakage detection model based on leakage probability.
[0131] This step can be referred to in the aforementioned embodiments, and will not be repeated here.
[0132] Figure 3 This is a schematic flowchart of a leakage detection method provided in an embodiment of this application. The method is based on a leakage detection model, which is constructed using the aforementioned model construction method. The leakage detection method includes:
[0133] S301. Acquire multiple frames of images of the device to be tested;
[0134] Specifically, a leak detection inference service can run on the server, which continuously monitors video streams from on-site cameras.
[0135] S302. Input multiple consecutive frames of images into the leakage detection model, and the leakage detection model outputs the leakage probability of the device under test based on the multiple consecutive frames of images.
[0136] For each newly received continuous image sequence (e.g., 5 frames), the inference service calls the leak detection model constructed by the aforementioned method to automatically execute the complete process from S101 to S105 and outputs the leak probability. The server can also be configured with alarm logic. When the leak probability exceeds a threshold, an alarm signal is automatically triggered, and the control system is activated to take emergency measures to address the pipeline leak.
[0137] Figure 4 This is a schematic diagram of the leakage detection model construction device provided in the embodiments of this application, as shown below. Figure 4 As shown, the device 40 includes:
[0138] The preprocessing module 401 is used to preprocess multiple consecutive frames of original images of the device to be inspected to obtain preprocessed images;
[0139] Feature extraction module 402 is used to perform a first feature extraction operation on the preprocessed image to obtain a first feature map of the preprocessed image;
[0140] In the preprocessed image, multiple different types of visual attribute values are identified, and the magnitude of the visual attribute values is positively correlated with the likelihood of leakage in the device under test.
[0141] Based on the magnitude of each visual attribute value, the target visual features corresponding to each visual attribute value are extracted from the first feature map, and a second feature map is generated based on the target visual features. The larger the visual attribute value, the greater the contribution of the corresponding target visual feature to the second feature map.
[0142] The model building module 403 is used to determine the leakage probability of the device to be detected based on the second feature map.
[0143] A leak detection model is constructed based on the leak probability.
[0144] The apparatus provided in this embodiment can execute the method provided in the above method embodiment. Its implementation principle and technical effect are similar, and will not be described in detail here.
[0145] Figure 5 This is a schematic diagram of an electronic device provided in an embodiment of this application, such as... Figure 5As shown, the electronic device 50 provided in this embodiment includes at least one processor 501 and a memory 502. Optionally, the device 50 further includes a communication component 503. The processor 501, memory 502, and communication component 503 are connected via a bus 504.
[0146] In a specific implementation, at least one processor 501 executes computer execution instructions stored in memory 502, causing at least one processor 501 to perform the above-described method.
[0147] The specific implementation process of processor 501 can be found in the above method embodiments, and its implementation principle and technical effect are similar. It will not be repeated here.
[0148] In the above embodiments, it should be understood that the processor can be a Central Processing Unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), etc. The general-purpose processor can be a microprocessor or any conventional processor. The steps of the method disclosed in this invention can be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules within the processor.
[0149] The memory may include random access memory (RAM) and may also include non-volatile memory (NVM), such as at least one disk storage device.
[0150] The bus can be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, or an Extended Industry Standard Architecture (EISA) bus, etc. Buses can be categorized as address buses, data buses, control buses, etc. For ease of illustration, the buses shown in the accompanying drawings are not limited to a single bus or a single type of bus.
[0151] This application also provides a computer program product, including a computer program that, when executed by a processor, implements the above-described method.
[0152] This application also provides a computer-readable storage medium storing computer-executable instructions, which, when executed by a processor, implement the above-described method.
[0153] The aforementioned readable storage medium can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic storage, flash memory, magnetic disk, or optical disk. The readable storage medium can be any available medium accessible to a general-purpose or special-purpose computer.
[0154] An exemplary readable storage medium is coupled to a processor, enabling the processor to read information from and write information to the readable storage medium. Of course, the readable storage medium can also be a component of the processor. The processor and the readable storage medium can reside in an Application Specific Integrated Circuit (ASIC). Alternatively, the processor and the readable storage medium can exist as discrete components in the device.
[0155] The division of units is merely a logical functional division; in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be indirect coupling or communication connection through some interfaces, devices, or units, and may be electrical, mechanical, or other forms.
[0156] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0157] In addition, the functional units in the various embodiments of the present invention can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit.
[0158] If a function is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this invention, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of the various embodiments of this invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0159] Those skilled in the art will understand that all or part of the steps of the above-described method embodiments can be implemented by hardware related to program instructions. The aforementioned program can be stored in a computer-readable storage medium. When executed, the program performs the steps of the above-described method embodiments; and the aforementioned storage medium includes various media capable of storing program code, such as ROM, RAM, magnetic disks, or optical disks.
[0160] Finally, it should be noted that other embodiments of the invention will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention that follow the general principles of the invention and include common knowledge or customary techniques in the art not disclosed herein, and is not limited to the precise structures described above and shown in the accompanying drawings, and various modifications and changes can be made without departing from its scope. The scope of the invention is limited only by the appended claims.
Claims
1. A method for constructing a leakage detection model, characterized in that, The method includes: The original images of the equipment to be inspected are preprocessed to obtain preprocessed images; Perform feature extraction on the preprocessed image to obtain a first feature map of the preprocessed image; In the preprocessed image, multiple different types of visual attribute values are determined, and the magnitude of the visual attribute values is positively correlated with the likelihood of leakage in the device under test. The visual attribute values are combined into a vector and input into a multilayer perceptron. A nonlinear mapping function is learned through forward propagation, and the visual attribute values are converted into a dynamic recombination weight vector. The dynamically recombined weight vector is used as a conditional input to the dynamic kernel generation network, and a set of convolutional kernel parameters are generated by the forward propagation of the dynamic kernel generation network. The convolutional kernel parameters are reshaped into a four-dimensional tensor. The convolutional kernel parameters are used to indicate the contribution of each target visual feature to the second feature map. The first feature map is convolved with the generated convolution kernel, and the recombined feature map is output as the second feature map. The larger the visual attribute value, the greater the contribution of the corresponding target visual feature to the second feature map. Based on the second feature map, the leakage probability of the device to be detected is determined; The leakage detection model is constructed based on the leakage probability; The determination of multiple different types of visual attribute values in the preprocessed image includes: Determine the visual difference degree of different regions in the preprocessed image, the structural complexity of the preprocessed image, and the texture disorder degree, and use the visual difference degree, the structural complexity, and the texture disorder degree as the visual attribute values; Alternatively, the texture inconsistency score, region anomaly index, and dynamic change intensity in the preprocessed image can be determined, and the texture inconsistency score, region anomaly index, and dynamic change intensity can be used as the visual attribute values.
2. The method according to claim 1, characterized in that, Determining the visual difference in different regions of the preprocessed image, the structural complexity of the preprocessed image, and the texture disorder of the preprocessed image includes: The visual difference is determined based on a first ratio of the maximum and minimum pixel values in the preprocessed image, wherein the first ratio is positively correlated with the visual difference. Calculate the pixel gradient magnitude of each pixel in the preprocessed image; Pixels whose corresponding pixel gradient magnitude is greater than a preset threshold are identified as target pixels; The structural complexity is determined based on a second ratio of the number of target pixels to the number of pixels in the preprocessed image, wherein the second ratio is positively correlated with the structural complexity. The texture disorder is determined based on the entropy value of grayscale in the preprocessed image, and the entropy value is positively correlated with the texture disorder.
3. The method according to claim 1, characterized in that, The step of performing feature extraction on the preprocessed image to obtain a first feature map of the preprocessed image includes: Extract feature maps at multiple spatial scales from the preprocessed image; The feature maps of the multiple spatial scales are fused to generate the first feature map.
4. The method according to claim 3, characterized in that, The step of extracting feature maps at multiple spatial scales from the preprocessed image includes: The preprocessed image is convolved with convolution kernels of different sizes to extract feature maps of multiple spatial scales.
5. The method according to claim 1, characterized in that, The preprocessing of multiple frames of original images of the device to be detected to obtain preprocessed images includes: The original image is subjected to brightness correction and artifact correction to obtain the preprocessed image. The brightness correction is used to adjust the pixel values of the original image to the target range, and the artifact correction is used to remove artifacts caused by the vibration of the device under test in the original image.
6. A leak detection method, characterized in that, The method is based on a leakage detection model, which is constructed using the method described in any one of claims 1-5. The leakage detection method includes: Acquire multiple frames of images of the device under test; The multi-frame images are input into the leakage detection model, which then outputs the leakage probability of the device under test based on the multi-frame images.
7. A leak detection model construction device, characterized in that, The device includes: The preprocessing module is used to preprocess multiple frames of original images from the device to be inspected to obtain preprocessed images. The feature extraction module is used to perform feature extraction operations on the preprocessed image to obtain a first feature map of the preprocessed image; In the preprocessed image, multiple different types of visual attribute values are determined, and the magnitude of the visual attribute values is positively correlated with the likelihood of leakage in the device under test. The visual attribute values are combined into a vector and input into a multilayer perceptron. A nonlinear mapping function is learned through forward propagation, and the visual attribute values are converted into a dynamic recombination weight vector. The dynamically recombined weight vector is used as a conditional input to the dynamic kernel generation network, and a set of convolutional kernel parameters are generated by the forward propagation of the dynamic kernel generation network. The convolutional kernel parameters are reshaped into a four-dimensional tensor. The convolutional kernel parameters are used to indicate the contribution of each target visual feature to the second feature map. The first feature map is convolved with the generated convolution kernel, and the recombined feature map is output as the second feature map. The larger the visual attribute value, the greater the contribution of the corresponding target visual feature to the second feature map. The model building module is used to determine the leakage probability of the device to be detected based on the second feature map. The leakage detection model is constructed based on the leakage probability; The feature extraction module is further configured to determine the visual difference degree of different regions in the preprocessed image, the structural complexity and texture disorder degree of the preprocessed image, and use the visual difference degree, the structural complexity and the texture disorder degree as the visual attribute values. Alternatively, the texture inconsistency score, region anomaly index, and dynamic change intensity in the preprocessed image can be determined, and the texture inconsistency score, region anomaly index, and dynamic change intensity can be used as the visual attribute values.