Water supply network leakage detection method, device, equipment and medium

By combining a self-supervised learning framework with contrastive learning and occlusion completion pre-training tasks, the problem of dependence on labeled data in water supply network leakage detection is solved, achieving efficient and accurate leakage detection, adapting to diverse scenarios, and improving the automation level of the detection system.

CN120296428BActive Publication Date: 2026-06-26GUANGDONG UNIV OF TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
GUANGDONG UNIV OF TECH
Filing Date
2025-04-09
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing methods for detecting leaks in water supply networks rely on a large amount of labeled data, and the low frequency of leak events leads to data scarcity and high labeling costs, which severely restricts the generalization ability of the models.

Method used

We adopt a self-supervised learning framework, and through contrastive learning and masking completion pre-training tasks, we use wavelet time-frequency maps for multi-resolution analysis and combine them with a self-supervised learning model for leak detection, thereby reducing the dependence on labeled data.

Benefits of technology

It improves the detection accuracy and generalization ability of leakage signals, enabling effective learning and recognition even in the absence of labeled data, adapting to diverse real-world scenarios, reducing manual labeling costs, and improving the automation level of leakage detection in water supply networks.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN120296428B_ABST
    Figure CN120296428B_ABST
Patent Text Reader

Abstract

The application relates to a water supply network leakage detection method and device, equipment and medium, the method comprising: replacing corresponding non-core wavelet time-frequency diagrams in wavelet time-frequency diagrams in training samples with equivalent wavelet time-frequency diagrams, and covering part of the core wavelet time-frequency diagrams in the wavelet time-frequency diagrams as training samples; taking the core wavelet time-frequency diagrams as supervision labels; pre-training a masking completion pre-training model of a water supply network leakage detection model to a convergence state; taking the wavelet time-frequency diagrams in the training samples as training samples, and taking sample labels corresponding to the wavelet time-frequency diagrams as supervision labels; fine-tuning a fine-tuning model to a convergence state to complete training of the water supply network leakage detection model; and inputting vibration sound signals of a water supply pipeline to be detected into the water supply network leakage detection model trained to the convergence state to determine whether the water supply pipeline to be detected is in a leakage state or a non-leakage state. The application can greatly reduce dependence on labeled data and improve generalization ability.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of pipeline leakage detection, and in particular to a method, apparatus, electronic equipment and computer-readable storage medium for detecting leakage in water supply networks. Background Technology

[0002] As a crucial component of urban infrastructure, water supply networks bear the critical responsibility of delivering clean water to users. With the rapid growth in residential water demand and the exacerbation of network aging issues, leak detection has become a prominent challenge in the operation and maintenance of water supply networks. Leaks not only lead to water waste but can also cause water shortages and water pollution, impacting public water use efficiency and safety. Therefore, developing accurate and efficient leak detection technologies is of significant practical importance.

[0003] Currently, researchers have proposed various technical solutions for detecting leaks in water supply networks, mainly including transient wave methods, mass balance methods, ground-penetrating radar methods, acoustic methods, and fiber optic sensing methods. Among them, acoustic methods have become the mainstream technology due to their advantages of high sensitivity, accuracy, and reliability. In recent years, the introduction of machine learning algorithms for feature extraction and pattern recognition of leak sound signals has further improved detection efficiency. For example, patent CN119393688A extracts Mel-frequency cepstral coefficients (MFCCs) from sound signals as input features; patent CN119123338A uses a spectral heatmap as input features. However, existing methods still have significant limitations: firstly, existing sound signal extraction methods are limited by fixed window resolution, making it difficult to capture the non-stationary characteristics of leak signals; secondly, the training of machine learning models relies on a large amount of labeled data, while the frequency of leak events in real-world scenarios is low, resulting in scarce leak data and high labeling costs, which severely restricts the model's generalization ability.

[0004] In summary, training machine learning models in existing technologies relies on a large amount of labeled data, while the frequency of missing events in real-world scenarios is low, resulting in scarce missing data and high labeling costs, which severely restricts the generalization ability of the model. The applicant has made corresponding explorations to address this problem. Summary of the Invention

[0005] The purpose of this application is to solve the above-mentioned problems by providing a method, device, electronic equipment and computer-readable storage medium for detecting leakage in water supply networks.

[0006] To achieve the various objectives of this application, the following technical solution is adopted:

[0007] A method for detecting leakage in a water supply network, proposed to meet one of the purposes of this application, includes:

[0008] Obtain a sample training set, wherein the sample training set includes multiple training samples, the training samples include wavelet time-frequency maps corresponding to the vibration sound signals of water supply pipes and their corresponding sample labels, the sample labels characterize the corresponding leakage state of the water supply pipes, and the wavelet time-frequency maps include core wavelet time-frequency maps and non-core wavelet time-frequency maps.

[0009] Two wavelet time-frequency images are extracted from the same training sample as positive sample pairs, and wavelet time-frequency images of the same size as the positive sample pairs are extracted from different training samples as negative sample pairs. The positive sample pairs and the negative sample pairs are used to perform contrastive learning self-supervised training on the pre-trained contrastive learning model of the water supply network leakage detection model, so that the pre-trained contrastive learning model is suitable for generating the equivalent wavelet time-frequency images of the negative sample pairs based on the positive sample pairs.

[0010] The equivalent wavelet time-frequency map is used to replace the corresponding non-core wavelet time-frequency map in the wavelet time-frequency map of the training sample and to cover part of the core wavelet time-frequency map in the wavelet time-frequency map as the training sample. The core wavelet time-frequency map is used as the supervision label to pre-train the masking and completion pre-training model of the water supply network leakage detection model until the convergence state is implemented.

[0011] Using the wavelet time-frequency map in the training samples as training samples and the sample labels corresponding to the wavelet time-frequency map as supervision labels, the fine-tuning training of the water supply network leakage detection model is carried out until convergence, so as to complete the training of the water supply network leakage detection model.

[0012] The vibration sound signal of the water supply pipeline to be tested is input into the water supply network leakage detection model that has been trained to convergence state, so as to determine whether the water supply pipeline to be tested is in a state of leakage or no leakage, so as to complete the leakage detection of the water supply network.

[0013] Optionally, the basic network architecture of the water supply network leakage detection model is a self-supervised learning model, which includes a contrastive learning pre-trained model, an occlusion completion pre-trained model, and a fine-tuning model.

[0014] Optionally, the contrastive learning pre-trained model includes a first encoder and a first decoder, wherein the first encoder includes a convolutional block constructed from a convolutional layer, a batch normalization layer, a ReLU activation function and a max pooling layer, and three residual blocks, wherein each residual block is followed by a self-attention mechanism layer; the first decoder is constructed from a global average pooling layer and a fully connected layer.

[0015] Optionally, the occlusion completion pre-trained model includes a second encoder and a second decoder. The second encoder includes a convolutional block constructed from a convolutional layer, a batch normalization layer, a ReLU activation function, and a max pooling layer, as well as three residual blocks, each of which is followed by a self-attention mechanism layer. The second decoder is constructed from multiple deconvolutional layers.

[0016] Optionally, the weights of the first encoder of the contrastive learning pre-trained model and the weights of the second encoder of the occlusion completion pre-trained model are frozen, and two convolutional blocks, two fully connected layers, and a sigmoid activation function are connected to construct a fine-tuned model of the water supply network leakage detection model.

[0017] Optionally, the step of fine-tuning the water supply network leakage detection model to a convergent state by using wavelet time-frequency maps in the training samples as training samples and the corresponding sample labels of the wavelet time-frequency maps as supervision labels includes:

[0018] The contrastive learning pre-trained model and the occlusion completion pre-trained model are trained using a sample dataset with the sample labels removed. Error backpropagation is used to transmit the loss values ​​of the first encoder and the second encoder back to the water supply network leakage detection model. The parameters of the encoder in the structure of the water supply network leakage detection model are adjusted until the water supply network leakage detection model converges.

[0019] Using the wavelet time-frequency graphs in the training samples as training samples and the sample labels corresponding to the wavelet time-frequency graphs as supervision labels, the fine-tuning model is fine-tuned and trained until convergence.

[0020] Optionally, the steps for obtaining the sample training set include:

[0021] The vibration sound signals of each water supply pipe in the water supply network are collected, and the leakage status of the water supply pipe is marked to determine the sample label, wherein the sample label represents the state of leakage or the state of no leakage.

[0022] The vibration sound signal of the water supply pipeline is normalized to determine the normalized vibration sound signal of the water supply pipeline. The Morlet wavelet basis function is used to perform continuous wavelet transform on the normalized vibration sound signal of the water supply pipeline to determine the wavelet time-frequency diagram corresponding to the vibration sound signal of the water supply pipeline.

[0023] The sample training set is constructed based on the wavelet time-frequency diagram corresponding to the vibration sound signal of the water supply pipeline and its corresponding sample label.

[0024] A water supply network leakage detection device provided for another purpose of this application includes:

[0025] The training set acquisition module is configured to acquire a sample training set, wherein the sample training set includes multiple training samples, the training samples include wavelet time-frequency maps corresponding to the vibration sound signals of the water supply pipe and their corresponding sample labels, the sample labels represent the corresponding leakage state of the water supply pipe, and the wavelet time-frequency maps include core wavelet time-frequency maps and non-core wavelet time-frequency maps.

[0026] The contrastive learning training module is configured to extract two segments of wavelet time-frequency maps from the same training sample as positive sample pairs, and extract wavelet time-frequency maps of the same size as the positive sample pairs from different training samples as negative sample pairs. The positive sample pairs and the negative sample pairs are used to perform contrastive learning self-supervised training on the pre-trained contrastive learning model of the water supply network leakage detection model, so that the pre-trained contrastive learning model that has converged is suitable for generating the equivalent wavelet time-frequency maps of the negative sample pairs based on the positive sample pairs.

[0027] The masking and completion training module is configured to replace the corresponding non-core wavelet time-frequency map in the wavelet time-frequency map of the training sample with the equivalent wavelet time-frequency map and mask part of the core wavelet time-frequency map in the wavelet time-frequency map as the training sample, and use the core wavelet time-frequency map as the supervision label to pre-train the masking and completion pre-training model of the water supply network leakage detection model to the convergence state.

[0028] The fine-tuning model training module is configured to use the wavelet time-frequency map in the training samples as training samples and the sample labels corresponding to the wavelet time-frequency map as supervision labels to fine-tune the model of the water supply network leakage detection model until convergence, so as to complete the training of the water supply network leakage detection model.

[0029] The pipeline leakage detection module is configured to input the vibration sound signal of the water supply pipeline to be detected into the water supply network leakage detection model that has been trained to a convergent state, so as to determine whether the water supply pipeline to be detected is in a state of leakage or no leakage, thereby completing the leakage detection of the water supply network.

[0030] An electronic device provided for another purpose of this application includes a central processing unit and a memory, the central processing unit being configured to invoke and run a computer program stored in the memory to perform the steps of the water supply network leakage detection method of this application.

[0031] A computer-readable storage medium is provided for another purpose of this application, which stores, in the form of computer-readable instructions, a computer program implemented according to the water supply network leakage detection method, which, when called by a computer, executes the steps included in the corresponding method.

[0032] Compared to existing technologies, this application addresses the problems that existing machine learning models rely on a large amount of labeled data for training, while in real-world scenarios, the frequency of missing data events is low, resulting in scarce missing data and high labeling costs, which severely restricts the model's generalization ability. This application provides, but is not limited to, the following beneficial effects:

[0033] Firstly, this application enhances signal capture capabilities through multi-resolution time-frequency analysis. Traditional leakage signal detection methods (such as acoustic methods) often face limitations due to fixed window resolution, failing to effectively capture high-frequency transient features and low-frequency sustained modes in leakage signals. However, this application utilizes continuous wavelet transform (CWT) to perform multi-resolution time-frequency analysis on the vibration and acoustic signals of the water supply network, enabling more accurate identification of leakage signals. This method not only captures high-frequency transient changes during leakage but also tracks sustained leakage features in the low-frequency band, thereby comprehensively improving the detection accuracy of leakage signals.

[0034] Secondly, this application significantly reduces the reliance on labeled data and improves generalization ability. Existing leak detection methods often rely on large amounts of labeled data for training. However, leak events occur infrequently in real-world environments, leading to data scarcity and high labeling costs. To address this issue, this application proposes using unlabeled data for self-supervised learning. Specifically, a self-supervised learning framework is constructed through two pre-training tasks: contrastive learning and occlusion completion. This allows the model to learn the time-frequency correlation characteristics of leak signals using unlabeled data, significantly reducing the reliance on manually labeled data. This approach not only reduces manual costs but also improves the adaptability and generalization ability of the leak detection system in different scenarios.

[0035] Third, this application innovatively combines the dual pre-training tasks of masking completion and contrastive learning, enabling the model to learn the features of missing signals during the pre-training stage. Through contrastive learning, the model can learn the similarities and differences between positive and negative samples, which helps improve the model's ability to distinguish missing signals; while through the masking completion task, the model can learn to fill in the missing parts of the signal, thereby improving its robustness and accuracy. This self-supervised learning framework enhances the model's adaptability, especially in the absence of a large amount of labeled data, enabling effective learning and recognition.

[0036] Fourth, due to the low frequency of leakage events in real-world scenarios and the difficulty in obtaining leakage data, the annotation workload is large and costly. However, by introducing self-supervised learning without labeled data, this application significantly reduces the need for manually labeled data, making leakage detection more efficient. Even when labeled data is scarce, the model can still be accurately trained and identified, improving the efficiency of water supply network leakage detection.

[0037] Fifth, this application significantly improves the model's generalization ability and adapts to diverse real-world scenarios. By combining different types of training data (including signal data under different pipe materials, pipe ages, and water pressure scenarios) and through the design of a self-supervised learning framework, this application enhances the model's generalization ability in diverse real-world environments. Whether it's a newly built pipe network, an aging pipe network, or a water supply network under different water pressure environments, the trained model can effectively identify leakage signals, ensuring that leakage detection technology can adapt to more complex real-world scenarios.

[0038] Sixth, this application ultimately establishes an intelligent sound signal recognition model for water supply network leakage, which can effectively improve the automation level of water supply network leakage detection. This intelligent detection system can not only monitor the leakage situation of the water supply network in real time, but also predict potential pipeline failures (such as water pipe bursts and other safety accidents) in advance, thereby greatly reducing the probability of safety accidents. This will effectively reduce maintenance costs, improve water supply safety, and avoid greater disasters and economic losses caused by undetected leakage.

[0039] In summary, this application can effectively improve the accuracy and efficiency of water supply network leakage detection, reduce reliance on manually labeled data, enhance the model's generalization ability in different network scenarios, and promote the development of water supply network leakage detection towards intelligence and automation. Attached Figure Description

[0040] The above and / or additional aspects and advantages of this application will become apparent and readily understood from the following description of the embodiments taken in conjunction with the accompanying drawings, wherein:

[0041] Figure 1 This is a flowchart illustrating the water supply network leakage detection method in the embodiments of this application;

[0042] Figure 2 This is a schematic diagram illustrating the extraction of acoustic signal features from water supply network leakage using continuous wavelet transform in an embodiment of this application.

[0043] Figure 3 This is a schematic diagram of the contrastive learning pre-training task in an embodiment of this application;

[0044] Figure 4 This is a schematic diagram of the occlusion completion pre-training task in an embodiment of this application;

[0045] Figure 5 This is an exemplary network structure for a self-supervised learning model in the embodiments of this application;

[0046] Figure 6 This is an example network structure of the encoder structure of the pre-trained model and the fine-tuned model in the embodiments of this application;

[0047] Figure 7This is a schematic diagram of the water supply network leakage detection device in the embodiments of this application;

[0048] Figure 8 This is a schematic diagram of the structure of the computer device in the embodiments of this application. Detailed Implementation

[0049] The embodiments of this application are described in detail below. Examples of these embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary and are only used to explain this application, and should not be construed as limiting this application.

[0050] Those skilled in the art will understand that, unless specifically stated otherwise, the singular forms “a,” “an,” “the,” and “the” used herein may also include the plural forms. It should be further understood that the term “comprising” as used in this application means the presence of the stated features, integers, steps, operations, elements, and / or components, but does not exclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and / or groups thereof. It should be understood that when we say an element is “connected” or “coupled” to another element, it can be directly connected or coupled to the other element, or there may be intermediate elements. Furthermore, “connected” or “coupled” as used herein can include wireless connections or wireless coupling. The term “and / or” as used herein includes all or any units and all combinations of one or more associated listed items.

[0051] Those skilled in the art will understand that, unless otherwise defined, all terms used herein (including technical and scientific terms) have the same meaning as commonly understood by one of ordinary skill in the art to which this application pertains. It should also be understood that terms such as those defined in general dictionaries should be understood to have the same meaning as in the context of the prior art, and should not be interpreted in an idealized or overly formal sense unless specifically defined as herein.

[0052] Those skilled in the art will understand that the terms "client," "terminal," and "terminal device" as used herein include both devices that receive wireless signals, devices that only possess wireless signal receiver capabilities without transmission capabilities, and devices with receiving and transmitting hardware, devices that have receiving and transmitting hardware capable of bidirectional communication over a bidirectional communication link. Such devices may include: cellular or other communication devices such as personal computers or tablets, having single-line displays, multi-line displays, or cellular or other communication devices without multi-line displays; PCS (Personal Communications Service) that can combine voice, data processing, fax, and / or data communication capabilities; PDAs (Personal Digital Assistants) that may include radio frequency receivers, pagers, internet / intranet access, web browsers, notebooks, calendars, and / or GPS (Global Positioning System) receivers; and conventional laptops and / or handheld computers or other devices that have and / or include radio frequency receivers. As used herein, "client," "terminal," and "terminal device" can be portable, transportable, installed in a means of transportation (air, sea, and / or land), or suitable and / or configured to operate locally and / or in a distributed manner, operating in any other location on Earth and / or in space. "Client," "terminal," and "terminal device" as used herein can also be a communication terminal, an internet access terminal, or a music / video playback terminal, such as a PDA, a MID (Mobile Internet Device), and / or a mobile phone with music / video playback capabilities, or a smart TV, set-top box, etc.

[0053] The hardware referred to by the names "server," "client," and "service node" in this application is essentially an electronic device with the equivalent capabilities of a personal computer. It is a hardware device with the necessary components revealed by the von Neumann architecture, such as a central processing unit (including an arithmetic logic unit and a control unit), memory, input devices, and output devices. The computer program is stored in its memory, and the central processing unit loads the program stored in the secondary storage into the main memory to run it, execute the instructions in the program, and interact with the input and output devices to complete specific functions.

[0054] It should be noted that the concept of "server" used in this application can also be extended to the case of server clusters. Based on the network deployment principles understood by those skilled in the art, the servers should be logically divided. Physically, these servers can be independent of each other but accessible through interfaces, or they can be integrated into a single physical computer or a computer cluster. Those skilled in the art should understand this flexibility and should not use it to constrain the implementation of the network deployment method in this application.

[0055] One or more of the technical features of this application, unless explicitly specified herein, can be deployed on a server and accessed by a client remotely calling the online service interface provided by the server, or can be directly deployed and run on a client for access.

[0056] Unless otherwise specified, the neural network models referenced or potentially referenced in this application may be deployed on a remote server and invoked remotely on the client, or deployed on a client with the capability to invoke directly. In some embodiments, when running on the client, the corresponding intelligence may be acquired through transfer learning in order to reduce the requirements on the client's hardware resources and avoid excessive consumption of the client's hardware resources.

[0057] Unless otherwise specified, all data involved in this application may be stored remotely on a server or on a local terminal device, as long as it is suitable for use by the technical solution of this application.

[0058] Those skilled in the art will understand that although the various methods in this application are described based on the same concept and thus present commonality among them, they can be performed independently unless otherwise specified. Similarly, the various embodiments disclosed in this application are all based on the same inventive concept; therefore, concepts expressed in the same way, as well as concepts that are appropriately changed for convenience but are expressed differently, should be understood equivalently.

[0059] Unless otherwise expressly stated, the various embodiments disclosed in this application can be combined in a cross-cutting manner to flexibly construct new embodiments, as long as such combination does not depart from the inventive spirit of this application and can meet the needs of the prior art or solve a certain deficiency in the prior art. Those skilled in the art should be aware of such modifications.

[0060] As a crucial component of urban infrastructure, water supply networks bear the critical responsibility of delivering clean water to users. With the rapid growth in residential water demand and the exacerbation of aging network problems, leakage has become a prominent challenge in the operation and maintenance of water supply networks. Leakage not only leads to water waste but can also cause water shortages and water pollution, impacting public water use efficiency and safety. Therefore, developing accurate and efficient leakage detection technologies is of significant practical importance.

[0061] Based on the above exemplary scenario, please refer to Figure 1 In one embodiment of the water supply network leakage detection method of this application, the method includes:

[0062] Step S10: Obtain a sample training set, wherein the sample training set includes multiple training samples, the training samples include wavelet time-frequency maps corresponding to the vibration sound signals of the water supply pipe and their corresponding sample labels, the sample labels represent the corresponding leakage state of the water supply pipe, and the wavelet time-frequency maps include core wavelet time-frequency maps and non-core wavelet time-frequency maps.

[0063] The water supply network leakage detection system in the terminal equipment can acquire a sample training set. This training set includes multiple training samples, each containing a wavelet time-frequency plot corresponding to the water supply pipeline vibration sound signal and its corresponding sample label. The sample label represents the corresponding leakage state of the water supply pipeline. The wavelet time-frequency plot includes a core wavelet time-frequency plot and non-core wavelet time-frequency plots. Specifically, the water supply pipeline vibration sound signal is caused by factors such as water flow, pressure changes, pipeline damage, or leakage. These vibration signals can be collected by sensors (such as accelerometers or microphones), and the sound signals captured by the sensors can reflect the health status of the pipeline. By analyzing these vibration sound signals, the system can identify whether there are abnormalities (such as leakage) in the pipeline. The wavelet time-frequency plot is a time-frequency representation obtained after processing the water supply pipeline vibration sound signal using wavelet transform. The time-frequency plot can simultaneously display the characteristics of the signal in both the time and frequency domains, capturing instantaneous frequency changes, thus helping to identify leaks or other abnormal events. In water supply network leakage detection, wavelet time-frequency plots help analyze the frequency components of vibration and acoustic signals, revealing characteristic changes that may be caused by leakage. The core wavelet time-frequency plot refers to the main or significant frequency components extracted from the signal; it typically represents the most important features of the signal and may include frequency information directly related to leakage. The core portion of the time-frequency plot occupies the most critical component of the entire signal and is usually the focus of leakage analysis and detection. The non-core wavelet time-frequency plot contains other frequency components in the signal, which may not be directly related to leakage or are not as crucial for leakage detection. Sample labels refer to the annotation information for each training sample, typically used in supervised learning tasks. In this system, sample labels characterize the leakage status of the water supply pipeline. For example, the label can be "leakage" or "no leakage," meaning each training sample tells the model whether a given vibration and acoustic signal is related to leakage. In this way, the model learns how to predict the health status of the pipeline based on the input signal. Leakage status refers to the current health status of the water supply pipeline, specifically whether a water leak has occurred. By analyzing the wavelet time-frequency graph, the system can determine whether leakage has occurred in the pipeline.

[0064] In some embodiments, the step of obtaining a sample training set includes:

[0065] Step S101: Collect the vibration sound signal of each water supply pipe in the water supply network and mark the leakage status of the water supply pipe to determine the sample label, wherein the sample label represents the state of leakage or the state of no leakage.

[0066] Step S102: Normalize the vibration sound signal of the water supply pipeline to determine the normalized vibration sound signal of the water supply pipeline. Use Morlet wavelet basis function to perform continuous wavelet transform on the normalized vibration sound signal of the water supply pipeline to determine the wavelet time-frequency diagram corresponding to the vibration sound signal of the water supply pipeline.

[0067] Step S103: Construct the sample training set based on the wavelet time-frequency diagram corresponding to the vibration sound signal of the water supply pipeline and its corresponding sample label.

[0068] Specifically, the system can collect vibration sound signals from various water supply pipes in the water supply network and mark the corresponding pipe leakage status, including whether there is leakage or not. More specifically, a suitable number of leakage sound signal identification devices can be deployed in the water supply pipes of the target area. During relatively quiet periods (after 22:00 daily), leakage sound signals from water supply pipes such as ductile iron pipes and PE pipes are collected. During the sound signal data acquisition process, a pressure regulating valve is used on the main pipe to adjust the pressure, with a range of 0.15 MPa to 0.32 MPa. Small adjustments are made each time (e.g., 0.05 MPa) to simulate different levels of leakage. Each sound signal lasts for 5.46 seconds, with a signal sampling frequency of 8000 Hz. Finally, the sound signal data is saved in waveform audio file format (WAV), and these sound signal data will be marked as either leaking or not leaking. In this embodiment, the Z-score adaptive thresholding algorithm is used to remove sound signals with significant noise effects, and finally 2364 sound signal data are retained, of which 1836 sound signals are marked as having omissions and 528 sound signals are marked as having no omissions.

[0069] Furthermore, the collected original water supply pipeline vibration sound signal is normalized to determine the normalized water supply pipeline vibration sound signal. The normalized water supply pipeline vibration sound signal is then subjected to continuous wavelet transform processing using the Morlet wavelet basis function to obtain the wavelet time-frequency map corresponding to the water supply pipeline vibration sound signal, so as to establish a wavelet time-frequency map dataset.

[0070] Furthermore, the acquired raw sound signal is processed using a maximum-minimum normalization method. The maximum-minimum normalization formula is expressed as:

[0071] ,

[0072] in, It is the original vibration sound signal of the water supply pipe; It is the minimum value of the vibration sound signal of the water supply pipeline; It is the maximum value of the vibration sound signal of the water supply pipeline; It is the normalized vibration sound signal of the water supply pipeline.

[0073] For further details, please refer to Figure 2 The normalized vibration sound signal of the water supply pipeline is processed by continuous wavelet transform using Morlet wavelet basis functions to obtain the wavelet time-frequency diagram corresponding to the vibration sound signal of the water supply pipeline. The formula for the continuous wavelet transform is as follows:

[0074] ,

[0075] in, The input is the time-domain signal of the vibration sound signal from the water supply pipeline; It is a wavelet function; The scale parameter controls the scaling of the wavelet function; These are translation parameters that control the wavelet's position on the time axis; wavelet function Functions generated by scaling and translation; This is the inner product of the time-domain signal and the wavelet function; For time-domain signals in scale Translation position The wavelet transform coefficients at the given location.

[0076] The expression for the Morlet wavelet basis function is:

[0077] ,

[0078] in, These are Morlet wavelet basis functions; This is a normalization factor to ensure wavelet energy normalization; It is a complex exponential term; This is a Gaussian attenuation term that limits the duration of the wavelet in the time domain.

[0079] Among them, scale parameter The choice of scale determines the sensitivity range of the wavelet function to the signal frequency. A smaller scale results in a narrower wavelet function, which can more effectively extract high-frequency components from the signal; a larger scale results in a wider wavelet function, which can more effectively extract low-frequency components from the signal. In this embodiment, the scale parameter is set from 1 to 32. This yields a feature matrix in the range [32, 8000].

[0080] Furthermore, bilinear interpolation is used to scale the feature matrix extracted by the continuous wavelet transform. Bilinear interpolation can reduce the amount of data while maintaining data quality, thus accelerating the training time of the self-supervised learning model. The final feature matrix has a shape of [32, 1000], and the bilinear interpolation formula is expressed as follows:

[0081] .

[0082] Step S20: Extract two wavelet time-frequency images from the same training sample as positive sample pairs, and extract wavelet time-frequency images of the same size as the positive sample pairs from different training samples as negative sample pairs. Use the positive sample pairs and the negative sample pairs to perform contrastive learning self-supervised training on the pre-trained contrastive learning model of the water supply network leakage detection model, so that the pre-trained contrastive learning model is suitable for generating the equivalent wavelet time-frequency images of the negative sample pairs based on the positive sample pairs.

[0083] Step S30: Replace the corresponding non-core wavelet time-frequency map in the wavelet time-frequency map of the training sample with the equivalent wavelet time-frequency map and cover part of the core wavelet time-frequency map in the wavelet time-frequency map as the training sample. Use the core wavelet time-frequency map as the supervision label to pre-train the masking and completion pre-training model of the water supply network leakage detection model until the convergence state.

[0084] Step S40: Using the wavelet time-frequency map in the training samples as training samples and the sample labels corresponding to the wavelet time-frequency map as supervision labels, fine-tune the model of the water supply network leakage detection model until it converges, so as to complete the training of the water supply network leakage detection model.

[0085] After obtaining the training set, two wavelet time-frequency images are extracted from the same training sample as positive sample pairs, and wavelet time-frequency images of the same size as the positive sample pairs are extracted from different training samples as negative sample pairs. Using the positive and negative sample pairs, a contrastive learning self-supervised training is performed on the pre-trained contrastive learning model of the water supply network leakage detection model, so that the pre-trained contrastive learning model, after convergence, is suitable for generating equivalent wavelet time-frequency images of the negative sample pairs based on the positive sample pairs. Please refer to [link to relevant documentation]. Figure 3 Specifically, two segments of wavelet time-frequency plots can be extracted from each training sample as positive sample pairs, and wavelet time-frequency plots of the same size as the positive sample pairs can be randomly extracted from five training samples as negative sample pairs.

[0086] The equivalent wavelet time-frequency image replaces the corresponding non-core wavelet time-frequency image in the training samples, and partially obscures the core wavelet time-frequency image in the training samples. The core wavelet time-frequency image is used as the supervision label to pre-train the obscuring and completion pre-training model of the water supply network leakage detection model until convergence. (See also...) Figure 4Specifically, a traversal method can be used to find the 15 columns with the largest amplitude in the wavelet time-frequency graph. Taking the selected 15 columns as the center, 3 to 7 columns around the center column are randomly covered to use the partially covered core wavelet time-frequency graph as training samples. The wavelet time-frequency graphs in the training samples are used as training samples, and the sample labels corresponding to the wavelet time-frequency graphs are used as supervision labels to fine-tune the model of the water supply network leakage detection model until convergence, thereby completing the training of the water supply network leakage detection model.

[0087] In some embodiments, please refer to Figure 5 The basic network architecture of the water supply network leakage detection model is a self-supervised learning model, which includes a contrastive learning pre-trained model, an occlusion completion pre-trained model, and a fine-tuning model.

[0088] In a further embodiment, the contrastive learning pre-trained model includes a first encoder and a first decoder, wherein the first encoder includes a convolutional block constructed from a convolutional layer, a batch normalization layer, a ReLU activation function and a max pooling layer, and three residual blocks, wherein each residual block is followed by a self-attention mechanism layer; the first decoder is constructed from a global average pooling layer and a fully connected layer.

[0089] Specifically, the first encoder first extracts features using convolutional blocks constructed from convolutional layers, batch normalization layers, ReLU activation functions, and max pooling layers. Then, the first encoder increases the number of feature channels sequentially from 16 to 32, 64, and 128 using three residual blocks, with each residual block followed by a self-attention mechanism layer.

[0090] The output of a convolutional layer can be represented by the following formula:

[0091] ,

[0092] in, This is the output of the convolutional layer; The input is a wavelet time-frequency plot; For convolution kernel; This is a bias term.

[0093] The batch normalization layer normalizes the output of the convolutional layer, using the following formula:

[0094] ,

[0095] in, This is the output of the convolutional layer; For the output of the batch normalization layer; The mean of the batch data; The variance of the batch data; Scaling parameters (can be learned); Offset parameter (learnable); It is a constant to prevent division by zero.

[0096] The formula for the ReLU activation function is:

[0097] ,

[0098] in, The output of the ReLU activation function; This is the ReLU activation function. ReLU sets negative values ​​to 0 and retains positive values.

[0099] The max pooling layer downsamples the feature map using the following formula:

[0100] ,

[0101] in, For pooling operations at position Output value at; The pooling window size is 3×3. The location of the output feature map; This refers to the position within the pooling window.

[0102] The residual block consists of convolutional layers, batch normalization layers, and ReLU activation functions, with input... After passing through multiple functions, we obtain The final output of the residual block is .

[0103] The formula for the self-attention mechanism is as follows:

[0104] ,

[0105] in, To query the weight matrix; This is the key weight matrix; Value weight matrix; The input matrix is ​​denoted as .

[0106] For the contrastive learning pre-trained model, a first decoder is constructed. The structure of the first decoder consists of a global average pooling layer and a fully connected layer. In this embodiment, contrastive loss is used as the loss function of the decoder, and the formula for the contrastive loss is as follows:

[0107] ,

[0108] in, and It is a feature representation of positive sample pairs; is a similarity function, using cosine similarity; exp is the natural exponential function, which converts the similarity into exponential form; It is a temperature parameter used to adjust the scale of similarity; It represents the number of negative samples.

[0109] In a further embodiment, the occlusion completion pre-trained model includes a second encoder and a second decoder. The second encoder includes a convolutional block constructed from a convolutional layer, a batch normalization layer, a ReLU activation function, and a max pooling layer, as well as three residual blocks, each of which is followed by a self-attention mechanism layer. The second decoder is constructed from multiple deconvolutional layers.

[0110] Specifically, the second encoder first extracts features using convolutional blocks constructed from convolutional layers, batch normalization layers, ReLU activation functions, and max pooling layers. Then, the second encoder increases the number of feature channels sequentially from 16 to 32, 64, and 128 using three residual blocks, with each residual block followed by a self-attention mechanism layer.

[0111] The output of a convolutional layer can be represented by the following formula:

[0112] ,

[0113] in, This is the output of the convolutional layer; The input is a wavelet time-frequency plot; For convolution kernel; This is a bias term.

[0114] The batch normalization layer normalizes the output of the convolutional layer, using the following formula:

[0115] ,

[0116] in, This is the output of the convolutional layer; For the output of the batch normalization layer; The mean of the batch data; The variance of the batch data; Scaling parameters (can be learned); Offset parameter (learnable); It is a constant to prevent division by zero.

[0117] The formula for the ReLU activation function is:

[0118] ,

[0119] in, The output of the ReLU activation function; This is the ReLU activation function, which sets negative values ​​to 0 and retains positive values.

[0120] The max pooling layer downsamples the feature map using the following formula:

[0121] ,

[0122] in, For pooling operations at position Output value at; The pooling window size is 3×3. The location of the output feature map; This is the input feature map after ReLU activation, located at... Activation value; This refers to the position within the pooling window.

[0123] The residual block consists of convolutional layers, batch normalization layers, and ReLU activation functions, with input... After passing through multiple functions, we obtain The final output of the residual block is .

[0124] The formula for the self-attention mechanism is as follows:

[0125] ,

[0126] in, To query the weight matrix; This is the key weight matrix; Value weight matrix; The input matrix; This is a scaling factor to suppress gradient vanishing caused by excessively large dot product values; Using the normalization function, the attention score matrix is ​​normalized along the row direction to obtain the probability distribution; This is the result of self-attention.

[0127] For the occlusion completion pre-trained model, a second decoder is constructed. This second decoder consists of multiple deconvolutional layers, which are the inverse process of convolutional layers. In this embodiment, the deconvolutional layers of the decoder gradually reduce the number of feature channels from 128 to 64, 32, 16, and finally to 1. Mean squared error loss is used as the loss function of the decoder, where the formula for mean squared error loss is:

[0128] ,

[0129] in, It is the sample size; It is a combination of a second encoder and a second decoder; It is the feature matrix that is being masked; It is the original complete feature matrix.

[0130] In a further embodiment, the step of fine-tuning the water supply network leakage detection model to a convergent state by using wavelet time-frequency maps in the training samples as training samples and the corresponding sample labels of the wavelet time-frequency maps as supervision labels includes:

[0131] Step S401: Train the contrastive learning pre-trained model and the occlusion completion pre-trained model using the sample dataset with the sample labels removed. Use backpropagation to transmit the loss values ​​of the first encoder and the second encoder back to the water supply network leakage detection model. Adjust the encoder parameters in the structure of the water supply network leakage detection model until the water supply network leakage detection model converges.

[0132] Step S402: Using the wavelet time-frequency graph in the training samples as training samples and the sample labels corresponding to the wavelet time-frequency graph as supervision labels, fine-tune the fine-tuning model until it converges.

[0133] Specifically, please refer to Figure 6 A pre-trained model is trained using a delabeled dataset. The loss values ​​of the two encoders are fed back into the model using backpropagation to adjust the parameters in the model structure until the model converges. The weights of the first encoder of the contrastive learning pre-trained model and the weights of the second encoder of the masking and completion pre-trained model are frozen. Two convolutional blocks, two fully connected layers, and a sigmoid activation function are connected to construct a fine-tuned model of the water supply network leakage detection model. The fine-tuned model is then fine-tuned using a labeled sample dataset. The cross-entropy function is used as the loss function, and the parameters of the model are optimized using backpropagation.

[0134] The formula for calculating the cross-entropy function is as follows:

[0135]

[0136] in, It is the actual label; It is the probability that the model predicts a sample belongs to the positive class (no omissions).

[0137] During training, five performance metrics can be used to evaluate the detection accuracy and performance of the model, including accuracy, precision, recall, F1 score, and Matthews correlation coefficient (MCC), to ensure that the model can accurately identify missed states.

[0138] The formula for performance evaluation metrics is as follows:

[0139] ,

[0140] ,

[0141] ,

[0142] ,

[0143] ,

[0144] in, It is a true positive result; It is a true negative; It is a false positive; It is a false negative.

[0145] In summary, the contrastive learning pre-trained model and the occlusion completion pre-trained model are trained using a sample dataset with the sample labels removed. Backpropagation is used to transmit the loss values ​​of the first and second encoders back to the water supply network leakage detection model, adjusting the encoder parameters until the model converges. Wavelet time-frequency maps from the training samples are used as training samples, and the corresponding sample labels are used as supervision labels to fine-tune the model until it converges. Once the water supply network leakage detection model has converged, it can be used to detect whether a water supply pipeline is in a leaky or non-leaky state.

[0146] Step S50: Input the vibration sound signal of the water supply pipeline to be detected into the water supply network leakage detection model that has been trained to convergence state, so as to determine whether the water supply pipeline to be detected is in a state of leakage or no leakage, so as to complete the leakage detection of the water supply network.

[0147] After the water supply network leakage detection model is fine-tuned to a convergent state, the vibration sound signal of the water supply pipeline to be detected is input into the water supply network leakage detection model that has been trained to a convergent state, so as to determine whether the water supply pipeline to be detected is in a state of leakage or no leakage, so as to complete the leakage detection of the water supply network.

[0148] Specifically, vibration and acoustic signals from the water supply pipeline to be detected are collected, and wavelet transforms are performed on these signals to generate corresponding wavelet time-frequency maps. These time-frequency maps show the distribution of the signals in time and frequency, providing crucial information for subsequent detection tasks. The generated wavelet time-frequency maps are used as input to a water supply network leakage detection model that has been trained and fine-tuned to convergence. This model has been trained on a large amount of labeled training data and can automatically determine whether the pipeline is in a leakage state based on the input time-frequency maps. After forward inference processing by the model, it outputs a binary classification result: leakage or no leakage. This judgment is based on the mapping relationship between the feature representations learned during training and the leakage state. Based on the model's output, the system can make corresponding decisions. For example, if the model judges "no leakage," the system will record it as a normal state and continue to monitor the pipeline's operation. If the model judges "leakage," the system will trigger an alarm, prompting relevant personnel to conduct further inspection and maintenance. Furthermore, the detection results can be combined with data from other monitoring devices (such as pressure sensors and flow meters) to further verify the accuracy and specific location of the leak. In practical applications, the system will continuously collect new data and periodically retrain or fine-tune the model to adapt to changes in pipeline conditions and new leakage patterns, ensuring the accuracy and reliability of the water supply network leakage detection model.

[0149] As can be seen from the above embodiments, compared with the prior art, the training of machine learning models in the prior art relies on a large amount of labeled data, while the frequency of missing events in real-world scenarios is low, resulting in scarce missing data and high labeling costs, which seriously restricts the generalization ability of the model. The present application has the following beneficial effects, including but not limited to:

[0150] Firstly, this application enhances signal capture capabilities through multi-resolution time-frequency analysis. Traditional leakage signal detection methods (such as acoustic methods) often face limitations due to fixed window resolution, failing to effectively capture high-frequency transient features and low-frequency sustained modes in leakage signals. However, this application utilizes continuous wavelet transform (CWT) to perform multi-resolution time-frequency analysis on the vibration and acoustic signals of the water supply network, enabling more accurate identification of leakage signals. This method not only captures high-frequency transient changes during leakage but also tracks sustained leakage features in the low-frequency band, thereby comprehensively improving the detection accuracy of leakage signals.

[0151] Secondly, this application significantly reduces the reliance on labeled data and improves generalization ability. Existing leak detection methods often rely on large amounts of labeled data for training. However, leak events occur infrequently in real-world environments, leading to data scarcity and high labeling costs. To address this issue, this application proposes using unlabeled data for self-supervised learning. Specifically, a self-supervised learning framework is constructed through two pre-training tasks: contrastive learning and occlusion completion. This allows the model to learn the time-frequency correlation characteristics of leak signals using unlabeled data, significantly reducing the reliance on manually labeled data. This approach not only reduces manual costs but also improves the adaptability and generalization ability of the leak detection system in different scenarios.

[0152] Third, this application innovatively combines the dual pre-training tasks of masking completion and contrastive learning, enabling the model to learn the features of missing signals during the pre-training stage. Through contrastive learning, the model can learn the similarities and differences between positive and negative samples, which helps improve the model's ability to distinguish missing signals; while through the masking completion task, the model can learn to fill in the missing parts of the signal, thereby improving its robustness and accuracy. This self-supervised learning framework enhances the model's adaptability, especially in the absence of a large amount of labeled data, enabling effective learning and recognition.

[0153] Fourth, due to the low frequency of leakage events in real-world scenarios and the difficulty in obtaining leakage data, the annotation workload is large and costly. However, by introducing self-supervised learning without labeled data, this application significantly reduces the need for manually labeled data, making leakage detection more efficient. Even when labeled data is scarce, the model can still be accurately trained and identified, improving the efficiency of water supply network leakage detection.

[0154] Fifth, this application significantly improves the model's generalization ability and adapts to diverse real-world scenarios. By combining different types of training data (including signal data under different pipe materials, pipe ages, and water pressure scenarios) and through the design of a self-supervised learning framework, this application enhances the model's generalization ability in diverse real-world environments. Whether it's a newly built pipe network, an aging pipe network, or a water supply network under different water pressure environments, the trained model can effectively identify leakage signals, ensuring that leakage detection technology can adapt to more complex real-world scenarios.

[0155] Sixth, this application ultimately establishes an intelligent sound signal recognition model for water supply network leakage, which can effectively improve the automation level of water supply network leakage detection. This intelligent detection system can not only monitor the leakage situation of the water supply network in real time, but also predict potential pipeline failures (such as water pipe bursts and other safety accidents) in advance, thereby greatly reducing the probability of safety accidents. This will effectively reduce maintenance costs, improve water supply safety, and avoid greater disasters and economic losses caused by undetected leakage.

[0156] In summary, this application can effectively improve the accuracy and efficiency of water supply network leakage detection, reduce reliance on manually labeled data, enhance the model's generalization ability in different network scenarios, and promote the development of water supply network leakage detection towards intelligence and automation.

[0157] Please see Figure 7A water supply network leakage detection device, provided to meet one of the purposes of this application, includes a training set acquisition module 1100, a contrastive learning training module 1200, a masking completion training module 1300, a fine-tuning model training module 1400, and a pipeline leakage detection module 1500. Among these, The training set acquisition module 1100 is configured to acquire a sample training set, wherein the sample training set includes multiple training samples, the training samples include wavelet time-frequency maps corresponding to the vibration sound signals of the water supply pipeline and their corresponding sample labels, the sample labels represent the corresponding leakage state of the water supply pipeline, and the wavelet time-frequency maps include core wavelet time-frequency maps and non-core wavelet time-frequency maps; the contrastive learning training module 1200 is configured to extract two segments of wavelet time-frequency maps from the same training sample as positive sample pairs, and extract wavelet time-frequency maps of the same size as the positive sample pairs from different training samples as negative sample pairs, and use the positive sample pairs and the negative sample pairs to perform contrastive learning self-supervised training on the pre-trained contrastive learning model of the pre-trained water supply network leakage detection model, so that the pre-trained contrastive learning pre-trained model is suitable for generating equivalent wavelet time-frequency maps of the negative sample pairs based on the positive sample pairs; the occlusion completion training module 1300 is configured to... To replace the corresponding non-core wavelet time-frequency maps in the training samples with the equivalent wavelet time-frequency maps and cover part of the core wavelet time-frequency maps in the wavelet time-frequency maps as training samples, and to use the core wavelet time-frequency maps as supervision labels, the occlusion completion pre-training model of the water supply network leakage detection model is pre-trained to a convergent state; the fine-tuning model training module 1400 is configured to use the wavelet time-frequency maps in the training samples as training samples and the sample labels corresponding to the wavelet time-frequency maps as supervision labels to fine-tune the fine-tuning model of the water supply network leakage detection model to a convergent state, so as to complete the training of the water supply network leakage detection model; the pipeline leakage detection module 1500 is configured to input the vibration sound signal of the water supply pipeline to be detected into the water supply network leakage detection model that has been trained to a convergent state, so as to determine whether the water supply pipeline to be detected is in a leaky state or not, so as to complete the leakage detection of the water supply network.

[0158] Based on any embodiment of this application, referring to FIG. 8, another embodiment of this application also provides an electronic device, which can be implemented by a computer device, such as... Figure 8The diagram shows the internal structure of a computer device. This computer device includes a processor, a computer-readable storage medium, a memory, and a network interface connected via a system bus. The computer-readable storage medium stores an operating system, a database, and computer-readable instructions. The database may store control information sequences. When the computer-readable instructions are executed by the processor, they enable the processor to implement a water supply network leakage detection method. The processor of this computer device provides computing and control capabilities, supporting the operation of the entire computer device. The memory of this computer device may store computer-readable instructions. When these computer-readable instructions are executed by the processor, they enable the processor to execute the water supply network leakage detection method of this application. The network interface of this computer device is used for communication with a terminal. Those skilled in the art will understand that… Figure 8 The structure shown is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the computer device to which the present application is applied. Specific computer devices may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.

[0159] In this embodiment, the processor is used to execute... Figure 7 The memory stores the specific functions of each module, and stores the program code and various data required to execute the above modules or sub-modules. The network interface is used for data transmission between the user terminal and the server. In this embodiment, the memory stores the program code and data required to execute all modules in the water supply network leakage detection device of this application, and the server can call the server's program code and data to execute the functions of all modules.

[0160] This application also provides a storage medium storing computer-readable instructions, which, when executed by one or more processors, cause the one or more processors to perform the steps of the water supply network leakage detection method described in any embodiment of this application.

[0161] This application also provides a computer program product, including a computer program / instructions that, when executed by one or more processors, implement the steps of the water supply network leakage detection method described in any embodiment of this application.

[0162] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments of this application can be implemented by a computer program instructing related hardware. This computer program can be stored in a computer-readable storage medium, and when executed, it can include the processes of the embodiments of the methods described above. The aforementioned storage medium can be a computer-readable storage medium such as a magnetic disk, optical disk, read-only memory (ROM), or random access memory (RAM).

[0163] The above description is only a partial embodiment of this application. It should be noted that for those skilled in the art, several improvements and modifications can be made without departing from the principle of this application, and these improvements and modifications should also be considered within the scope of protection of this application.

Claims

1. A method for detecting leakage in a water supply network, characterized in that, include: Obtain a sample training set, wherein the sample training set includes multiple training samples, the training samples include wavelet time-frequency maps corresponding to the vibration sound signals of water supply pipes and their corresponding sample labels, the sample labels characterize the corresponding leakage state of the water supply pipes, and the wavelet time-frequency maps include core wavelet time-frequency maps and non-core wavelet time-frequency maps. Two wavelet time-frequency images are extracted from the same training sample as positive sample pairs, and wavelet time-frequency images of the same size as the positive sample pairs are extracted from different training samples as negative sample pairs. The positive sample pairs and the negative sample pairs are used to perform contrastive learning self-supervised training on the pre-trained contrastive learning model of the water supply network leakage detection model, so that the pre-trained contrastive learning model is suitable for generating the equivalent wavelet time-frequency images of the negative sample pairs based on the positive sample pairs. The basic network architecture of the water supply network leakage detection model is a self-supervised learning model, which includes a contrastive learning pre-trained model, a masking and completion pre-trained model, and a fine-tuning model. The equivalent wavelet time-frequency map is used to replace the corresponding non-core wavelet time-frequency map in the wavelet time-frequency map of the training sample and to cover part of the core wavelet time-frequency map in the wavelet time-frequency map as the training sample. The core wavelet time-frequency map is used as the supervision label to pre-train the masking and completion pre-training model of the water supply network leakage detection model until the convergence state is implemented. Using the wavelet time-frequency map in the training samples as training samples and the sample labels corresponding to the wavelet time-frequency map as supervision labels, the fine-tuning training of the water supply network leakage detection model is carried out until convergence, so as to complete the training of the water supply network leakage detection model. The vibration sound signal of the water supply pipeline to be tested is input into the water supply network leakage detection model that has been trained to convergence state, so as to determine whether the water supply pipeline to be tested is in a state of leakage or no leakage, so as to complete the leakage detection of the water supply network.

2. The water supply network leakage detection method according to claim 1, characterized in that, The contrastive learning pre-trained model includes a first encoder and a first decoder. The first encoder includes a convolutional block constructed from a convolutional layer, a batch normalization layer, a ReLU activation function, and a max pooling layer, as well as three residual blocks. Each residual block is followed by a self-attention mechanism layer. The first decoder is constructed from a global average pooling layer and a fully connected layer.

3. The water supply network leakage detection method according to claim 1, characterized in that, The occlusion completion pre-trained model includes a second encoder and a second decoder. The second encoder includes a convolutional block constructed from a convolutional layer, a batch normalization layer, a ReLU activation function, and a max pooling layer, as well as three residual blocks. Each residual block is followed by a self-attention mechanism layer. The second decoder is constructed from multiple deconvolutional layers.

4. The method for detecting leakage in a water supply network according to any one of claims 2 to 3, characterized in that, freeze The weights of the first encoder of the contrastive learning pre-trained model and the weights of the second encoder of the occlusion completion pre-trained model are combined with two convolutional blocks, two fully connected layers, and a sigmoid activation function to construct a fine-tuned model of the water supply network leakage detection model.

5. The water supply network leakage detection method according to claim 4, characterized in that, The steps of fine-tuning the water supply network leakage detection model to convergence, using wavelet time-frequency maps from the training samples as training samples and corresponding sample labels as supervision labels, include: The contrastive learning pre-trained model and the occlusion completion pre-trained model are trained using a sample dataset with the sample labels removed. Error backpropagation is used to transmit the loss values ​​of the first encoder and the second encoder back to the water supply network leakage detection model. The parameters of the encoder in the structure of the water supply network leakage detection model are adjusted until the water supply network leakage detection model converges. Using the wavelet time-frequency graphs in the training samples as training samples and the corresponding sample labels of the wavelet time-frequency graphs as supervision labels, the fine-tuning model is fine-tuned and trained until convergence.

6. The method for detecting leakage in a water supply network according to claim 1, characterized in that, The steps to obtain a sample training set include: The vibration sound signals of each water supply pipe in the water supply network are collected, and the leakage status of the water supply pipe is marked to determine the sample label, wherein the sample label represents the state of leakage or the state of no leakage. The vibration sound signal of the water supply pipeline is normalized to determine the normalized vibration sound signal of the water supply pipeline. The Morlet wavelet basis function is used to perform continuous wavelet transform on the normalized vibration sound signal of the water supply pipeline to determine the wavelet time-frequency diagram corresponding to the vibration sound signal of the water supply pipeline. The sample training set is constructed based on the wavelet time-frequency diagram corresponding to the vibration sound signal of the water supply pipeline and its corresponding sample label.

7. A water supply network leakage detection device, characterized in that, include: The training set acquisition module is configured to acquire a sample training set, wherein the sample training set includes multiple training samples, the training samples include wavelet time-frequency maps corresponding to the vibration sound signals of the water supply pipe and their corresponding sample labels, the sample labels represent the corresponding leakage state of the water supply pipe, and the wavelet time-frequency maps include core wavelet time-frequency maps and non-core wavelet time-frequency maps. The contrastive learning training module is configured to extract two wavelet time-frequency images from the same training sample as positive sample pairs, and extract wavelet time-frequency images of the same size as the positive sample pairs from different training samples as negative sample pairs. Using the positive sample pairs and the negative sample pairs, the contrastive learning pre-training model of the pre-set water supply network leakage detection model is subjected to contrastive learning self-supervised training, so that the pre-trained contrastive learning pre-training model is suitable for generating equivalent wavelet time-frequency images of the negative sample pairs based on the positive sample pairs. The basic network architecture of the water supply network leakage detection model is a self-supervised learning model, which includes a contrastive learning pre-training model, a masking and completion pre-training model, and a fine-tuning model. The masking and completion training module is configured to replace the corresponding non-core wavelet time-frequency map in the wavelet time-frequency map of the training sample with the equivalent wavelet time-frequency map and mask part of the core wavelet time-frequency map in the wavelet time-frequency map as the training sample, and use the core wavelet time-frequency map as the supervision label to pre-train the masking and completion pre-training model of the water supply network leakage detection model to the convergence state. The fine-tuning model training module is configured to use the wavelet time-frequency map in the training samples as training samples and the sample labels corresponding to the wavelet time-frequency map as supervision labels to fine-tune the model of the water supply network leakage detection model until convergence, so as to complete the training of the water supply network leakage detection model. The pipeline leakage detection module is configured to input the vibration sound signal of the water supply pipeline to be detected into the water supply network leakage detection model that has been trained to a convergent state, so as to determine whether the water supply pipeline to be detected is in a state of leakage or no leakage, thereby completing the leakage detection of the water supply network.

8. An electronic device comprising a central processing unit and a memory, characterized in that, The central processing unit is used to invoke and run a computer program stored in the memory to perform the steps of the method as described in any one of claims 1 to 6.

9. A computer-readable storage medium, characterized in that, It stores, in the form of computer-readable instructions, a computer program implemented according to any one of claims 1 to 6, which, when invoked by a computer, executes the steps included in the corresponding method.