A pipeline leakage detection method and system based on visual space-time
By combining visual spatiotemporal analysis with adaptive block detection and background difference method, real-time detection of minor leaks in chemical pipelines is achieved, solving the problems of low efficiency and high false detection rate in traditional methods. It is suitable for gas and liquid leak identification in complex environments.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SHANGHAI UNIV
- Filing Date
- 2023-05-29
- Publication Date
- 2026-06-23
AI Technical Summary
Existing technologies struggle to accurately identify minute leaks in chemical pipelines and perform real-time detection. Traditional methods rely on manual inspection, which is inefficient and costly. Existing image recognition methods have a high false detection rate in complex environments and struggle to distinguish between leaks and environmental interference.
A vision-based spatiotemporal pipeline leak detection method is adopted. The static and dynamic features of chemical pipeline videos are analyzed by adaptive block detection, background subtraction and inter-frame filtering. The location of leaking gas and droplets is identified by combining a density map prediction network and the YOLOX target detection network.
It enables real-time dynamic detection of minute leaks in chemical pipelines, reducing false detection rates, improving early detection rates, reducing computational costs, and is suitable for complex environments and for detecting any type of gas and liquid leaks.
Smart Images

Figure CN116645350B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of image processing and deep learning technology, specifically to a method and system for detecting leaks in pipelines based on visual spatiotemporal perception. Background Technology
[0002] Leaks in chemical pipelines often cause environmental pollution, property damage, and personal injury. Leak detection and location are critical tasks in chemical plant maintenance and condition monitoring. Traditional chemical pipeline condition monitoring typically requires professionals to manually detect pipeline faults. The effectiveness of this method is closely related to the expertise of the inspectors and the frequency of inspections, making it very labor-intensive and costly. To achieve remote, safe, rapid, and accurate leak detection and location in large chemical plants, an intelligent leak detection method is essential.
[0003] Intelligent pipeline leak detection methods mainly include ultrasonic methods, acoustic methods, negative pressure wave methods, flow balance methods, and distributed fiber optic leak detection methods. However, in practical applications, ultrasonic methods are prone to missed detections, acoustic methods are costly, negative pressure wave methods are susceptible to external interference, flow balance methods are difficult to locate leaks, and distributed fiber optic leak detection methods are also costly, making them all unsuitable for accurate pipeline leak detection. Furthermore, current research on the accurate location and real-time detection of minute pipeline leaks (seepage) is limited, relying mainly on visual observation or observation using fluorescent tracers, which requires high levels of human skill and expertise, and does not achieve real-time detection.
[0004] Image recognition technology is an automatic detection technology that replaces the human eye. By simulating the human visual system, it not only inherits the accuracy, real-time performance, and sensitivity of human observation but also expands the observation area. In harsh environments like heavy industrial production bases, machine vision systems can reduce the probability of human injury and improve work efficiency. In terms of hardware, a low-cost industrial camera can detect target motion and decompose it frame-by-frame to meet image processing needs. In terms of software and algorithms, with further understanding of deep learning, the accuracy of target detection and image recognition has reached the level required for industrial production. Image recognition also has potential applications in pipeline identification and leak detection. However, for chemical pipelines, there are problems such as complex pipeline layouts, small pipe diameters, inconspicuous leakage characteristics, and easy contamination of pipe and joint surfaces. Therefore, existing image recognition methods cannot accurately identify leaks.
[0005] Video image-based pipeline leak detection algorithms are constantly being proposed. Existing pipeline leak detection methods mainly rely on visual features such as color, shape, transparency, and texture. “Gubbi J, Marusic S, Palaniswami M. Smoke detection in video using wavelets and support vector machines[J]. Fire Safety Journal, 2009, 44(8): 1110-1115.” proposes a video smoke detection method based on wavelet transform and SVM. It extracts a total of 60 features, including arithmetic mean, geometric mean, bias, skewness, kurtosis, and entropy, from all sub-band images of the three-level wavelet decomposition to describe smoke. “Cruz H, Eckert M, Meneses J, et al. Efficientforest fire detection index for application in unmanned aerial systems (UASs)[J]. Sensors, 2016, 16(6): 893.” From the perspective of color tone, it extracts suspected smoke areas by comparing the pixel tone distribution of areas containing flames and smoke with other areas. “Yuan F. Video-based smoke detection with histogram sequence of LBP and LBPV pyramids[J]. Fire Safety Journal, 2009, 44(8): 1110-1115.” "Safety Journal, 2011, 46(3): 132-139." proposes a smoke detection algorithm based on multi-scale features of the Local Binary Pattern (LBP) and Local Binary Pattern Variance (LBPV) pyramid.
[0006] However, in practical applications, relying solely on the static characteristics of pipeline leaks is insufficient to distinguish leaking gases and droplets from similar objects (such as clouds or water mist on lenses), resulting in a high false detection rate. Summary of the Invention
[0007] To address the shortcomings of existing technologies, the purpose of this invention is to provide a vision-based spatiotemporal method and system for detecting pipeline leaks.
[0008] According to a first aspect of the present invention, a vision-spatiotemporal method for detecting pipe leaks is provided, comprising:
[0009] Use a fixed camera to capture video;
[0010] An adaptive block-segmentation method is used to perform target detection on static single-frame images in the video to obtain static target regions.
[0011] The spatiotemporal dynamic characteristics of the video are analyzed using background subtraction and inter-frame filtering methods to obtain the dynamic target region;
[0012] Based on the positional relationship between the static target area and the dynamic target area, the area that satisfies both spatial prediction and temporal prediction is the actual leakage area.
[0013] Preferably, the step of using an adaptive block segmentation method to perform target detection on static single-frame images in the video to obtain static target regions includes:
[0014] The original image is input into the density map prediction network to obtain a density map that includes the target location and size;
[0015] Based on the density map, a block-based method using a sliding window is employed to obtain the fine detection regions.
[0016] The YOLOX target detection network is used to perform detailed detection on the segmented fine detection regions to obtain detection results;
[0017] The detection results are restored to the original image to obtain the static target area of leaked gas and surface water.
[0018] Preferably, the density map prediction network includes an encoding end and a decoding end;
[0019] The encoding end adopts a VGG network structure with added multidimensional dynamic convolutional block ODConv; the VGG network structure includes convolutional block A1, convolutional block A2, convolutional block A3, convolutional block A4 and multidimensional dynamic convolutional block A5.
[0020] The input to the convolutional block A1 is the original image I∈R. H×W×3 The output features are Convolutional block A1 consists of: two 3×3 convolutional layers with 64 channels each, two ReLU activation functions, and one max pooling layer. The ReLU activation function is:
[0021]
[0022] The input features of the convolutional block A2 are: Output features are Convolutional block A2 consists of two 3×3 convolutional layers with 128 channels each, two ReLU activation functions, and one max pooling layer.
[0023] The input features of the convolutional block A3 are: Output features are Convolutional block A3 consists of: two 3×3 convolutional layers with 256 channels each, two ReLU activation functions, and one max pooling layer;
[0024] The input feature of the convolutional block A4 is: Output features are Convolutional block A4 consists of two 3×3 convolutional layers with 512 channels each and two ReLU activation functions;
[0025] The multidimensional dynamic convolutional block A5 input features are: Output features are The multidimensional dynamic convolutional block includes global average pooling, fully connected layers, ReLU activation function, and Sigmoid activation function; the Sigmoid activation function is:
[0026]
[0027] The decoding end includes a dilated convolutional block B1, a dilated convolutional block B2, a dilated convolutional block B3, a dilated convolutional block B4, a dilated convolutional block B5, and a normal convolutional block B6.
[0028] The input feature of the dilated convolution block B1 is: Output features are Convolutional block B1 consists of: a 3×3 dilated convolutional layer with 512 channels and a dilation rate of 2, and a ReLU activation function;
[0029] The input feature of the dilated convolution block B2 is: Output features are Convolutional block B2 consists of: a 3×3 dilated convolutional layer with 512 channels and a dilation rate of 4, and a ReLU activation function;
[0030] The input feature of the dilated convolution block B3 is: Output features are Convolutional block B3 consists of: a 3×3 dilated convolutional layer with 512 channels and a dilation rate of 4, and a ReLU activation function;
[0031] The input feature of the dilated convolution block B4 is: Output features are Convolutional block B4 consists of: a 3×3 dilated convolutional layer with 256 channels and a dilation rate of 4, and a ReLU activation function;
[0032] The input feature of the dilated convolution block B5 is: Output features are Convolutional block B5 consists of: a 3×3 dilated convolutional layer with 128 channels and a dilation rate of 2, and a ReLU activation function;
[0033] The input feature of the ordinary convolutional block B6 is: Output features are The output density map shows that convolutional block B6 consists of a 3×3 convolutional layer with 1 channel.
[0034] Preferably, the true density map of the actual pipeline image is obtained by multiplying a two-dimensional Gaussian kernel with an impulse function, and the density map prediction network is trained using the true density map; wherein, the expression of the true density map is:
[0035]
[0036]
[0037] Where, x i ,y i Refers to the target in the image. For a two-dimensional Gaussian kernel, σ1 and σ2 are directly related to the length and width of the target; δ(xx) i yy i ) is the impulse function.
[0038] Preferably, the step of obtaining the segmented fine detection regions using a sliding window-based segmentation method based on the density map includes:
[0039] In the density map, a window of the target size is slid in a non-overlapping manner, and the sum of all pixel values in each window is calculated. The sum is then compared with a set density threshold.
[0040] If the sum is below the threshold, all pixels in this window are set to "0", otherwise they are set to "1", resulting in a binary mask with values of 0 and 1.
[0041] Pixels that are "1" in the binary mask image are filtered out and merged into the candidate region using the eight-neighbor method;
[0042] The original image is cropped with reference to the bounding rectangle of the candidate region to obtain the segmented fine detection region.
[0043] Preferably, the step of analyzing the spatiotemporal dynamic features of the video using background subtraction and inter-frame filtering to obtain the dynamic target region includes:
[0044] The background difference method based on the Gaussian mixture model is used to obtain the dynamic target region of the leaking gas.
[0045] Inter-frame filtering was used to obtain dynamic target regions of leaking droplets.
[0046] Preferably, the step of obtaining the dynamic target region of the leaked gas using the background difference method based on the Gaussian mixture model includes:
[0047] Based on the video, background modeling is performed using a Gaussian mixture model: each pixel in the Gaussian mixture model is described by multiple single models: P(p)={[w i (x,y,t),u i (x,y,t),σ i (x,y,t) 2 ]}, i = 1, 2, ..., K, where K represents the number of individual models in the Gaussian mixture model; each individual Gaussian model is determined by its weights, mean, and variance, w i (x,y,t) represents the weights of each model, satisfying:
[0048]
[0049] u i (x,y,t) represents the mean value of the pixel at (x,y) in the i-th model, σ i (x,y,t) 2 This represents the variance of that pixel;
[0050] Foreground detection and parameter updates are performed on the Gaussian mixture model:
[0051] If the pixel value at (x,y) in the newly read video image sequence satisfies |I(x,y,t)-u i (x,y,t)|≤λ·σ i If (x,y,t) and λ is a constant, then the new pixel is considered to match the model and is judged to be the background, that is, the pixel is part of the image other than the leaked gas; otherwise, the new pixel is judged to be the foreground, that is, the pixel is one of the leaked gas pixels.
[0052] If the new pixel is the background, then the weights, mean, and variance of the single model matching the new pixel need to be adjusted; where the weight increment is: dw=α(1-w i (x, y, t-1)), where parameter α represents the update rate; the new weights are: w i (x,y,t)=w i (x,y,t-1)+dw=w i (x,y,t-1)+α(1-w i (x,y,t-1)); the new mean is represented as: u i (x,y,t)=(1-α)×u i (x,y,t-1)+α×u i (x,y,t); the new variance is represented as: σ i (x,y,t) 2 = (1-α)×σ i (x,y,t-1) 2+α×[I(x,y,t)-u(x,y,t)] 2 Finally, perform weight normalization:
[0053] If the new pixel is the foreground, add a new single model. The weights of the new model are fixed, the mean is the new pixel, and the variance is also fixed.
[0054] Preferably, the step of obtaining the dynamic target region of the leaking droplets using inter-frame filtering includes:
[0055] Calculate the difference between adjacent frames of the video:
[0056]
[0057] in and These are the f-th and (f-1)-th original frames in the n-frame sequence, respectively; x f These are differential frames; f = 2, ..., n;
[0058] Set the threshold for differential frames to t a Pixels smaller than the threshold are set to 0 to remove background noise;
[0059] Perform timing operations on the differential frames to obtain the lines formed by the leaking droplets, including:
[0060] The average value of k differential frames and the filtered consecutive frames is taken to obtain the time-averaged frame, where k is the number of time frames. The effect of the movement of the leaking droplets on the k consecutive frames can be observed in the time-averaged frame.
[0061] All video data is converted into a set of time-averaged frames, and the leaking droplets form lines in all time-averaged frames;
[0062] Vertical neighborhood filtering is performed using the vertical characteristics of the lines formed by leaking droplets, including:
[0063] Assuming v is the position of a pixel in the line on the horizontal axis, then count the number of pixels in the vertical direction within the horizontal direction {va, v+a} of the pixel, where a is the number of neighboring pixels to the right and left of the pixel.
[0064] Determine the relationship between the number of pixels in the vertical direction of the pixel and a set threshold for the number of adjacent pixels. If the number of adjacent pixels is less than the set threshold, the pixel is considered a noise pixel and removed; otherwise, it is retained.
[0065] Preferably, the region determined based on the positional relationship between the static target region and the dynamic target region, and which simultaneously satisfies both spatial and temporal prediction, is the actual leakage region, including:
[0066] For the leaking gas target, two rectangles are used to represent the static gas region B1 detected in the static single-frame image and the dynamic gas region B2 detected in the spatiotemporal domain.
[0067] Calculate the areas of the intersection and union regions of the two rectangles, and calculate the Intersection over Union (IoU):
[0068]
[0069] If the IoU is greater than the set value, it is considered that there is a gas leak in the intersection region that simultaneously satisfies spatial prediction and temporal prediction.
[0070] For the leaking droplet target, two rectangles are used to represent the static ground water area B3 detected in the static single-frame image and the dynamic droplet area B4 detected in the spatiotemporal domain, respectively. The upper left corner coordinates of B4 are (x1, y1), and the lower right corner coordinates are (x2, y2). Calculate the centroid coordinates (x1, y1) of B3. p ,y p The centroid coordinates (x) of B4 and B4 c ,y c If x1≤x p ≤x2 and y p ≥y c If the leaking droplets are located above the water on the ground, it can be determined that there is a droplet leak in the area.
[0071] According to a second aspect of the present invention, a vision-based spatiotemporal pipeline leak detection system is provided, comprising:
[0072] The data module uses a fixed camera to capture video.
[0073] The static target module uses an adaptive block segmentation method to perform target detection on static single-frame images in the video to obtain static target regions.
[0074] The dynamic module uses background subtraction and inter-frame filtering to analyze the spatiotemporal dynamic features of the video and obtain the dynamic target region.
[0075] The comprehensive judgment module determines the positional relationship between the static target area and the dynamic target area. The area that satisfies both spatial prediction and temporal prediction is the actual leakage area.
[0076] Compared with the prior art, the embodiments of the present invention have at least one of the following beneficial effects:
[0077] (1) The vision-spatiotemporal pipeline leak detection method and system provided in this embodiment of the invention comprehensively analyzes the static and dynamic features of chemical pipeline video through adaptive block detection algorithm, background difference method and inter-frame filtering detection method, realizes real-time dynamic detection of minor leaks, greatly reduces the false detection rate of leaks, and improves the early detection rate to avoid serious and dangerous faults; at the same time, there is no need for manual pipeline fault detection, saving manpower and material resources.
[0078] (2) The vision-spatiotemporal pipeline leak detection method and system provided in this embodiment of the invention uses an adaptive block detection method to obtain the location information of leaking gas and ground water in a single frame image. This method not only improves the accuracy of the model, but also reduces the computational cost and improves the detection efficiency.
[0079] (3) The visual spatiotemporal pipeline leak detection method and system provided in this embodiment of the invention uses background difference method and inter-frame filtering method to detect dynamic target area, effectively eliminate background information, accurately identify the dynamic process of pipeline leakage in spatiotemporal domain, overcome the interference of complex environment, and increase robustness; and this invention does not need to consider the physical properties of leaking gas and liquid, so it is applicable to the detection of any type of gas and liquid leak.
[0080] (4) The vision-spatiotemporal pipeline leak detection method and system provided in this embodiment of the invention adopts an independently developed AI-based scene anomaly detection edge device, makes full use of cloud computing and edge computing resources, and uses a cloud platform for management, effectively solving the problems of difficult pipeline leak detection, difficult implementation of deep learning algorithms, difficult deployment of detection equipment, difficult management of edge devices, difficult training of instrument users, and difficult migration of edge devices. Attached Figure Description
[0081] Other features, objects, and advantages of the present invention will become more apparent from the following detailed description of non-limiting embodiments with reference to the accompanying drawings:
[0082] Figure 1 This is a flowchart of a vision-spatiotemporal-based pipeline leak detection method according to an embodiment of the present invention;
[0083] Figure 2 This is a flowchart of an adaptive block segmentation method according to a preferred embodiment of the present invention;
[0084] Figure 3 This is a schematic diagram of a multidimensional dynamic convolutional block ODConv according to a preferred embodiment of the present invention;
[0085] Figure 4 This is a schematic diagram of an AI edge device according to a specific embodiment of the present invention. Detailed Implementation
[0086] The present invention will now be described in detail with reference to specific embodiments. These embodiments will help those skilled in the art to further understand the present invention, but do not limit the invention in any way. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of the present invention. These all fall within the scope of protection of the present invention.
[0087] See Figure 1 This invention provides an embodiment of a vision-based spatiotemporal method for detecting leaks in pipelines, comprising:
[0088] S1 uses a fixed camera to capture video;
[0089] S2, an adaptive block-segmentation method is used to perform target detection on the static single-frame images in the video obtained in S1 to obtain the static target region;
[0090] S3 uses background subtraction and inter-frame filtering to analyze the spatiotemporal dynamic features of the video obtained in S1, and obtains the dynamic target region.
[0091] S4, based on the static target area obtained in S2 and the dynamic target area obtained in S3, determines the positional relationship. The area that satisfies both spatial prediction and temporal prediction is the real leakage area.
[0092] This embodiment uses an adaptive block detection algorithm, background subtraction method, and inter-frame filtering detection method to comprehensively analyze the static and dynamic features of chemical pipeline videos, achieving real-time dynamic detection of minor leaks, greatly reducing the false detection rate of leaks, and improving the early detection rate to avoid serious and dangerous faults; at the same time, it eliminates the need for manual pipeline fault detection, saving manpower and resources.
[0093] See Figure 2 In a preferred embodiment of the present invention, step S2 is performed to obtain a static target region, and the specific process is as follows:
[0094] S21, the original image with reduced resolution is input into the density map prediction network, and a relatively large field of view is used to perform a coarse detection on the entire image to obtain a density map containing the approximate distribution and size of the target, so as to identify areas where defects may exist.
[0095] S22, Based on the density map obtained in S21, a block-based method using a sliding window is used to obtain the block-based fine detection region;
[0096] S23, the YOLOX target detection network is used to perform fine detection on the block-based fine detection region obtained in S22 to obtain the detection result;
[0097] S24, restore the detection results obtained in S23 to at least the original image to obtain the static target area of leaked gas and ground water.
[0098] In a preferred embodiment, step S21 is implemented. In this embodiment, the density map prediction network comprises two parts: an encoding end and a decoding end.
[0099] The encoding end employs a VGG network structure incorporating a multidimensional dynamic convolutional block ODConv. This network consists of five parts: convolutional block A1, convolutional block A2, convolutional block A3, convolutional block A4, and multidimensional dynamic convolutional block A5.
[0100] The input to convolution block A1 is the original image I∈R H×W×3 The output features are Convolutional block A1 consists of: two 3×3 convolutional layers with 64 channels each, two ReLU activation functions, and one max pooling layer. The ReLU activation function is:
[0101]
[0102] The input features of convolutional block A2 are Output features are Convolutional block A2 consists of two 3×3 convolutional layers with 128 channels each, two ReLU activation functions, and one max pooling layer.
[0103] The input features of convolutional block A3 are Output features are Convolutional block A3 consists of: two 3×3 convolutional layers with 256 channels each, two ReLU activation functions, and one max pooling layer;
[0104] The input features of the convolutional block A4 are Output features are Convolutional block A4 consists of two 3×3 convolutional layers with 512 channels each and two ReLU activation functions;
[0105] Multidimensional dynamic convolutional block A5, such as Figure 3 As shown, the input features are Output features are The multidimensional dynamic convolutional block includes global average pooling, fully connected layers, ReLU activation function, and Sigmoid activation function. The Sigmoid activation function is:
[0106]
[0107] See Figure 3Furthermore, the specific process of the multidimensional dynamic convolutional block A5 is as follows: First, the input features are compressed through channel-wide average pooling. Then, they pass through a fully connected layer and activation function to enter four branches to obtain multidimensional attention coefficients, corresponding to the attention coefficients α in the spatial dimension, input channel dimension, output channel dimension, and overall convolutional kernel dimension, respectively. s ,α f ,α c ,α s Then, the attention coefficients of these dimensions are weighted with the corresponding n convolutional kernels W to obtain the multidimensional convolutional kernel DW. Each multidimensional convolutional kernel DW i The calculation formula is as follows:
[0108] DW i =α wi ·α fi ·α ci ·α si ·W i
[0109] Where i = 1...n. Finally, the input feature X4 is convolved with a multidimensional convolution kernel to obtain the output feature X5.
[0110] The decoding network uses five dilated convolutional blocks and one ordinary convolutional block to process the input feature X5. These dilated convolutional blocks are dilated convolutional block B1, dilated convolutional block B2, dilated convolutional block B3, dilated convolutional block B4, dilated convolutional block B5 and ordinary convolutional block B6.
[0111] The input features of the dilated convolution block B1 are Output features are Convolutional block B1 consists of: a 3×3 dilated convolutional layer with 512 channels and a dilation rate of 2, and a ReLU activation function;
[0112] The input features of the dilated convolution block B2 are Output features are Convolutional block B2 consists of: a 3×3 dilated convolutional layer with 512 channels and a dilation rate of 4, and a ReLU activation function;
[0113] The input features of the dilated convolution block B3 are Output features are Convolutional block B3 consists of: a 3×3 dilated convolutional layer with 512 channels and a dilation rate of 4, and a ReLU activation function;
[0114] The input features of the dilated convolution block B4 are Output features are Convolutional block B4 consists of: a 3×3 dilated convolutional layer with 256 channels and a dilation rate of 4, and a ReLU activation function;
[0115] The input features of the dilated convolution block B5 are Output features are Convolutional block B5 consists of: a 3×3 dilated convolutional layer with 128 channels and a dilation rate of 2, and a ReLU activation function;
[0116] The input features of a regular convolutional block B6 are Output features are The output density map shows that convolutional block B6 consists of a 3×3 convolutional layer with 1 channel.
[0117] In another embodiment of the invention, before training the density map prediction network, it is necessary to generate ground truth values for the density map based on target information in the pipeline image. A density map containing the target size and location is generated using a two-dimensional independent Gaussian distribution function. Two extended parameters of the Gaussian function are adjusted according to the target size to obtain an attention map that better fits the target size. The formula for the two-dimensional Gaussian kernel is as follows:
[0118]
[0119] Where σ1 and σ2 are directly related to the length and width of the target. The complete expression for the attention map is shown below:
[0120]
[0121] Where, x i ,y i This refers to the target in the image. A density map containing information about the target's position and size is obtained by multiplying a two-dimensional Gaussian kernel by an impulse function.
[0122] In a preferred embodiment, step S22 is performed. Based on the density map generated by the density map prediction network described above, a block-based fine detection region is obtained using a sliding window-based block-based algorithm. The specific steps are as follows:
[0123] S221, in the density map, a window of average target size (i.e., 60×60) is slid in a non-overlapping manner. The sum of all pixel values in each window is calculated, and the sum is compared with a set density threshold, which is set to 180. If the sum is lower than the threshold, all pixels in this window will be "0", otherwise they will be "1", thus obtaining a binary mask map with values of 0 and 1.
[0124] S222: Filter out pixels that are "1" in the binary mask image and merge them into a larger candidate region using the eight-neighbor algorithm.
[0125] S223, use the bounding rectangle of the candidate region to crop the original image to obtain block-based fine detection regions.
[0126] This embodiment uses an adaptive block detection method to obtain the location information of leaked gas and ground water in a single frame image. This method not only improves the accuracy of the model, but also reduces the computational cost and improves the detection efficiency.
[0127] In a preferred embodiment of the present invention, step S3 is performed to obtain the dynamic target region. Specifically, this is implemented in two parts:
[0128] S31. For large targets such as leaked gas, the background difference method based on the Gaussian mixture model is used to obtain the dynamic target area.
[0129] S32, for small targets such as leaking droplets, the inter-frame filtering method is used to obtain the dynamic target region.
[0130] In a preferred embodiment, step S31 is implemented. In this embodiment, the background subtraction method based on the Gaussian mixture model consists of two parts: background training and foreground detection and parameter updating. The specific steps are as follows:
[0131] S311, input a video of a chemical pipeline, and perform background modeling using a Gaussian mixture model.
[0132] Specifically, in a Gaussian mixture model, each pixel is described by multiple single models:
[0133] P(p)={[w i (x,y,t),u i (x,y,t),σ i (x,y,t) 2 ]}, i = 1, 2, ..., K. The value of K is generally between 3 and 5, representing the number of individual models contained in the Gaussian mixture model. Each individual Gaussian model is determined by three parameters: weights, mean, and variance. w i (x,y,t) represents the weights of each model, satisfying:
[0134]
[0135] u i (x,y,t) represents the mean value of the pixel at (x,y) in the i-th model, σ i (x,y,t) 2 This represents the variance of the pixel.
[0136] S312, foreground detection and parameter update.
[0137] Specifically, if the pixel value at (x,y) in the newly read video image sequence satisfies |I(x,y,t)-ui (x,y,t)|≤λ·σ i If a pixel is (x, y, t), it is considered a match for the model and is determined to be background; otherwise, it is determined to be foreground. Here, λ is a constant, which can be set to 2.5. The camera is stationary, and the video scene contains a large amount of static background; the leaking gas is the moving foreground in the video.
[0138] If the new pixel is the background, then the weights, mean, and variance of the single model matching the new pixel need to be adjusted. The weight increment is: dw = α(1-w i (x,y,t-1)), where the parameter α represents the update rate.
[0139] The new weights are represented as follows:
[0140] w i (x,y,t)=w i (x,y,t-1)+dw=w i (x,y,t-1)+α(1-w i (x,y,t-1))
[0141] The new mean is expressed as follows:
[0142] u i (x,y,t)=(1-α)×u i (x,y,t-1)+α×u i (x,y,t)
[0143] The new variance representation is as follows:
[0144] σ i (x,y,t) 2 = (1-α)×σ i (x,y,t-1) 2 +α×[I(x,y,t)-u(x,y,t)] 2
[0145] Perform weight normalization:
[0146]
[0147] If the new pixel is in the foreground, add a new single model. The weight of the new single model is a small fixed value, the mean is set to the new pixel, and the variance is a large fixed value.
[0148] In other preferred embodiments of the present invention, if the number of current single models has reached the maximum allowed number, then the single model with the lowest importance in the current multi-model set is removed. The importance calculation formula is as follows:
[0149]
[0150] It should be noted that the foreground refers to the dynamic target area of the leaking gas. If a new pixel is identified as foreground, it means that the pixel is one of the pixels representing the leaking gas; if a new pixel is identified as background, it means that the pixel is part of the image other than the leaking gas.
[0151] In a preferred embodiment, step S32 is implemented. This inter-frame filtering method consists of three parts: inter-frame difference calculation, timing operation, and vertical neighborhood filtering. The specific steps are as follows:
[0152] S321 performs difference calculation between adjacent frames of the video captured by the fixed camera:
[0153]
[0154] in and These are the f-th and (f-1)-th original frames in the n-frame sequence, respectively; x f For differential frames; f = 2,...,n. Then, for the differential frame x... f To perform noise cancellation, the threshold for the differential frames is set to t. a Pixels smaller than the threshold are set to 0 to remove background noise, where t a =0.5.
[0155] S322 performs timing calculations to obtain the lines that the leaking droplets will form.
[0156] Specifically, the average value is taken for k differential and filtered consecutive frames, where k is the number of time frames. The resulting frame is the time-averaged frame, in which the effect of the leaking droplet motion on the k consecutive frames can be observed, where k=5. Subsequently, the video data can be converted into a set of time-averaged frames, in which the leaking droplet will form lines.
[0157] S323 utilizes the vertical characteristics of leaking droplets for vertical neighborhood filtering.
[0158] Specifically, each pixel is surrounded by a vertical band (line) with non-zero values. The leaking droplet has more neighboring pixels within this vertical band. Assuming v is the position of the corresponding pixel on the horizontal axis, the number of pixels in the vertical direction within the horizontal range {va, v+a} of the pixel is counted, where a is the number of neighboring pixels to the right and left of the corresponding pixel. It is assumed that a pixel in the leaking droplet has at least q² neighboring pixels in the vertical band; otherwise, it is considered a noise pixel, and pixels not belonging to the leaking droplet are removed, where a = 2 and q² = 10.
[0159] This embodiment uses background subtraction and inter-frame filtering to detect dynamic target areas, effectively eliminating background information and accurately identifying the dynamic process of pipeline leakage in the spatiotemporal domain. It can overcome interference from complex environments and increase robustness. Furthermore, this embodiment does not need to consider the physical properties of the leaking gas or liquid, so it is applicable to the detection of any type of gas and liquid leakage.
[0160] In a preferred embodiment of the present invention, step S4 is performed to obtain the actual leakage area. The specific process is as follows:
[0161] For the leaking gas target, two rectangles are used to represent the detected gas region B1 in a single frame image and the detected dynamic gas region B2 in the spatiotemporal domain. The areas of the intersection and union regions of the two rectangles are calculated, and the Intersection over Union (IoU) is calculated using the following formula:
[0162]
[0163] If the IoU is greater than 0.5, the overlap rate between the two regions is considered to exceed 50%, indicating a gas leak in the overlapping region. In this embodiment, the region that simultaneously satisfies both spatial and temporal prediction refers to the intersection of the two rectangles, i.e., the overlapping region.
[0164] For the leaking droplet target, two rectangles are used to represent the surface water region B3 detected in a single frame image and the moving droplet region B4 detected in the spatiotemporal domain, respectively. The upper left corner of B4 is (x1, y1), and the lower right corner is (x2, y2). Calculate the centroid coordinates (x1, y1) of B3. p ,y p The centroid coordinates (x) of B4 and B4 c ,y c If x1≤x p ≤x2 and y p ≥y c If the leaking droplets are located above the water on the ground, it can be determined that there is a droplet leak in the area.
[0165] Based on the same inventive concept, other embodiments of the present invention also provide a vision-based spatiotemporal pipeline leak detection system, including a data module, a static target module, a dynamic target module, and a comprehensive judgment module.
[0166] The data module uses a fixed camera to capture video; the static target module uses an adaptive block-segmentation method to detect targets in static single-frame images in the video to obtain static target regions; the dynamic module uses background subtraction and inter-frame filtering to analyze the spatiotemporal dynamic features of the video to obtain dynamic target regions; the comprehensive judgment module judges the positional relationship based on the static and dynamic target regions, and the region that satisfies both spatial prediction and temporal prediction is the real leakage area.
[0167] To provide a more in-depth understanding of the technical solution of this invention, a specific embodiment is provided.
[0168] This embodiment uses the publish and subscribe method of the Internet of Things to manage AI edge devices in multiple scenarios. Each AI edge device controls the gimbal to rotate and monitor multiple inspection points, thereby realizing the location of pipeline leaks.
[0169] The acetonitrile chemical plant in Anqing includes areas such as pipelines, valves, flanges, and instrument connections. Among these, the connections between valves and flanges are the most prone to leaks. This embodiment focuses on abnormal target detection in these areas. By developing AI edge devices, it is possible to classify and identify gas leaks, gas or liquid leaks from connections, and liquid drips in real time locally.
[0170] This embodiment uses NVIDIA's JETSON AGX (or TX2) edge AI chip to design an AI edge device, such as... Figure 4 As shown, a real-time parallel working mechanism is designed in the AI edge software system. One path acquires video images in real time and uses the vision-spatiotemporal pipeline leak detection method or approach described in the above embodiment in its AI model to implement the inference logic mechanism, thereby diagnosing abnormal situations in the scene. The other path is designed with a real-time communication mechanism with the cloud to transmit the abnormal situations diagnosed by the edge device to the cloud for storage. At the same time, new AI models trained in the cloud can be quickly downloaded and deployed to the edge device, thereby flexibly updating the AI model inference logic and managing it using the cloud platform. The specific implementation is as follows:
[0171] For AI edge devices, model inference is built using the TensorFlow Lite framework to deploy quantized models. The treading module and TensorRT are used to allocate CPU threads in real-time based on GPU resources, forming a high-concurrency, multi-threaded working mechanism for rapid processing of large amounts of data. For deployment, docker-compose is used to package the necessary environment and program files for the edge, enabling one-click deployment on the edge device by enabling relevant commands.
[0172] For parallel-executing program units, the MQTT protocol is used to build the Internet of Things (IoT), which significantly improves communication efficiency compared to the traditional HTTP protocol. Through a publish-subscribe model, edge devices, the cloud, and user terminals are connected into a highly efficient communication network.
[0173] For the cloud, Spring is used as the underlying framework to build the backend, forming an enterprise-level control system with high security, strong scalability, and low resource consumption. The cloud server uses Vue to build the frontend and develop control interface pages, with functions including data visualization, control of multiple AI edge devices, and historical data analysis of diagnostic results.
[0174] Specific embodiments of the present invention have been described above. It should be understood that the present invention is not limited to the specific embodiments described above, and those skilled in the art can make various modifications or variations within the scope of the claims, which do not affect the essence of the present invention. The above preferred features can be used in any combination without conflict.
Claims
1. A vision-based spatiotemporal method for detecting leaks in pipelines, characterized in that, include: Use a fixed camera to capture video; An adaptive block-segmentation method is used to perform target detection on static single-frame images in the video to obtain static target regions. The spatiotemporal dynamic characteristics of the video are analyzed using background subtraction and inter-frame filtering methods to obtain the dynamic target region; Based on the positional relationship between the static target area and the dynamic target area, the area that satisfies both spatial prediction and temporal prediction is the real leakage area. The step of using an adaptive block segmentation method to perform target detection on static single-frame images in the video to obtain static target regions includes: The original image is input into the density map prediction network to obtain a density map that includes the target location and size; Based on the density map, a block-based method using a sliding window is employed to obtain the fine detection regions. The YOLOX target detection network is used to perform detailed detection on the segmented fine detection regions to obtain detection results; The detection results are restored to the original image to obtain the static target area of leaked gas and surface water; The region determined based on the positional relationship between the static target region and the dynamic target region, and which simultaneously satisfies both spatial and temporal prediction, is the actual leakage region, including: For leaking gas targets, two rectangles are used to represent the detected static gas regions in the static single-frame image. Dynamic gas regions detected in the spatiotemporal domain ; Calculate the areas of the intersection and union regions of the two rectangles, and calculate the Intersection over Union (IoU): ; If the IoU is greater than the set value, it is considered that there is a gas leak in the intersection region that simultaneously satisfies spatial prediction and temporal prediction. For leaking droplets, two rectangles are used to represent the static water accumulation areas detected in the static single-frame image. Dynamic droplet regions detected in the spatiotemporal domain ,in The coordinates of the top left corner are ( The coordinates of the lower right corner are ( ),calculate The coordinates of the centroid ( )and The coordinates of the centroid ( ),like and If the leaking droplets are located above the water on the ground, it can be determined that there is a droplet leak in the area.
2. The method for detecting pipe leaks based on visual spatiotemporal perception according to claim 1, characterized in that, The density map prediction network includes an encoding end and a decoding end; The encoding end adopts a VGG network structure with added multidimensional dynamic convolutional block ODConv, including convolutional block A1, convolutional block A2, convolutional block A3, convolutional block A4 and multidimensional dynamic convolutional block A5. The input to convolution block A1 is the original image. The output features are Convolutional block A1 consists of: two 3×3 convolutional layers with 64 channels each, two ReLU activation functions, and one max pooling layer. The ReLU activation function is: ; The input features of the convolutional block A2 are: The output features are Convolutional block A2 consists of two 3×3 convolutional layers with 128 channels each, two ReLU activation functions, and one max pooling layer. The input features of the convolutional block A3 are: The output features are Convolutional block A3 includes: two 3×3 convolutional layers with 256 channels each, two ReLU activation functions, and one max pooling layer; The input feature of the convolutional block A4 is: The output features are Convolutional block A4 consists of two 3×3 convolutional layers with 512 channels each and two ReLU activation functions. The multidimensional dynamic convolutional block A5 input features are: The output features are The multidimensional dynamic convolutional block includes global average pooling, fully connected layers, ReLU activation function, and Sigmoid activation function; the Sigmoid activation function is: ; The decoding end includes a dilated convolutional block B1, a dilated convolutional block B2, a dilated convolutional block B3, a dilated convolutional block B4, a dilated convolutional block B5, and a normal convolutional block B6. The input feature of the dilated convolution block B1 is: The output features are Convolutional block B1 consists of: a 3×3 dilated convolutional layer with 512 channels and a dilation rate of 2, and a ReLU activation function; The input feature of the dilated convolution block B2 is: The output features are Convolutional block B2 consists of: a 3×3 dilated convolutional layer with 512 channels and a dilation rate of 4, and a ReLU activation function; The input feature of the dilated convolution block B3 is: The output features are Convolutional block B3 consists of: a 3×3 dilated convolutional layer with 512 channels and a dilation rate of 4, and a ReLU activation function; The input feature of the dilated convolution block B4 is: The output features are Convolutional block B4 consists of: a 3×3 dilated convolutional layer with 256 channels and a dilation rate of 4, and a ReLU activation function; The input feature of the dilated convolution block B5 is: The output features are Convolutional block B5 consists of: a 3×3 dilated convolutional layer with 128 channels and a dilation rate of 2, and a ReLU activation function; The input feature of the ordinary convolutional block B6 is: The output features are The output density map, or convolutional block B6, consists of a 3×3 convolutional layer with 1 channel.
3. The method for detecting pipeline leaks based on visual spatiotemporal perception according to claim 2, characterized in that, The true density map of the actual pipeline image is obtained by multiplying a two-dimensional Gaussian kernel with an impulse function. The density map prediction network is then trained using this true density map. The expression for the true density map is: ; ; in, Refers to the target in the image. It is a two-dimensional Gaussian kernel. It is directly related to the length and width of the target; Let be the impulse function.
4. The method for detecting pipeline leaks based on visual spatiotemporal perception according to claim 1, characterized in that, The step of obtaining the segmented fine detection regions based on the density map using a sliding window-based segmentation method includes: In the density map, a window of the target size is slid in a non-overlapping manner, and the sum of all pixel values in each window is calculated. The sum is then compared with a set density threshold. If the sum is below the threshold, all pixels in this window are set to "0", otherwise they are set to "1", resulting in a binary mask with values of 0 and 1. Pixels that are "1" in the binary mask image are filtered out and merged into the candidate region using the eight-neighbor method; The original image is cropped with reference to the bounding rectangle of the candidate region to obtain the segmented fine detection region.
5. The method for detecting pipe leaks based on visual spatiotemporal perception according to claim 1, characterized in that, The method employs background subtraction and inter-frame filtering to analyze the spatiotemporal dynamic features of the video, obtaining the dynamic target region, including: The background difference method based on the Gaussian mixture model is used to obtain the dynamic target region of the leaking gas. Inter-frame filtering was used to obtain dynamic target regions of leaking droplets.
6. The method for detecting pipeline leaks based on visual spatiotemporal perception according to claim 5, characterized in that, The method of obtaining the dynamic target region of leaked gas using the background difference method based on the Gaussian mixture model includes: Based on the video, background modeling is performed using a Gaussian mixture model: each pixel in the Gaussian mixture model is described by multiple single models. Let i = 1, 2, ..., K, where K represents the number of individual models in the Gaussian mixture model; each individual Gaussian model is determined by its weights, mean, and variance. The weights of each model are represented, satisfying: ; Let represent the mean value of the pixel at (x, y) in the i-th model. This represents the variance of that pixel; Foreground detection and parameter updates are performed on the Gaussian mixture model: If the pixel value of the image at (x,y) in the newly read video image sequence satisfies , If it is a set constant, then the new pixel is considered to match the model and is judged to be the background, that is, the pixel is part of the image other than the leaked gas; otherwise, the new pixel is judged to be the foreground, that is, the pixel is one of the leaked gas pixels. If the new pixel is the background, then the weights, mean, and variance of the single model matching the new pixel need to be adjusted; where the weight increment is: ,parameter Indicates the update rate; The new weights are: + The new mean is: ; The new variance is: ; Finally, weight normalization is performed: ; If the new pixel is the foreground, add a new single model. The weights of the new model are fixed, the mean is the new pixel, and the variance is also fixed.
7. The method for detecting pipeline leaks based on visual spatiotemporal perception according to claim 5, characterized in that, The method of obtaining dynamic target regions of leaking droplets using inter-frame filtering includes: Calculate the difference between adjacent frames of the video: ; in and These are the f-th and (f-1)-th original frames of the n-frame sequence, respectively; These are differential frames; f=2,...,n; Set the threshold for differential frames to be... Pixels smaller than the threshold are set to 0 to remove background noise; Perform timing operations on the differential frames to obtain the lines formed by the leaking droplets, including: The average value of k differential frames and the filtered consecutive frames is taken to obtain the time-averaged frame, where k is the number of time frames. The effect of the movement of the leaking droplets on the k consecutive frames can be observed in the time-averaged frame. All video data is converted into a set of time-averaged frames, and the leaking droplets form lines in all time-averaged frames; Vertical neighborhood filtering is performed using the vertical characteristics of the lines formed by leaking droplets, including: Assuming v is the position of a pixel in the line on the horizontal axis, the number of pixels in the vertical direction within the horizontal direction {va, v+a} of the pixel is counted, where a is the number of neighboring pixels to the right and left of the pixel. Determine the relationship between the number of pixels in the vertical direction of the pixel and a set threshold for the number of adjacent pixels. If the number of adjacent pixels is less than the set threshold, the pixel is considered a noise pixel and removed; otherwise, it is retained.
8. A vision-based spatiotemporal pipeline leak detection system, used to implement the method of claim 1, characterized in that, include: The data module uses a fixed camera to capture video. The static target module uses an adaptive block segmentation method to perform target detection on static single-frame images in the video to obtain static target regions. The dynamic module uses background subtraction and inter-frame filtering to analyze the spatiotemporal dynamic features of the video and obtain the dynamic target region. The comprehensive judgment module determines the positional relationship between the static target area and the dynamic target area. The area that satisfies both spatial prediction and temporal prediction is the actual leakage area.