A fault diagnosis method and device for a water pump unit
By preprocessing and segmenting the acoustic and vibration signals of the water pump unit, and combining a deep diagnostic model and a multi-index early warning mechanism, the problems of signal non-stationarity and false alarms or missed alarms in early fault diagnosis of the water pump unit are solved, and timely identification and early warning of the operating status of the water pump unit are realized.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- ZHONGSHUI SANLI DATA TECH CO LTD
- Filing Date
- 2026-05-13
- Publication Date
- 2026-06-12
AI Technical Summary
In existing technologies for intelligent fault diagnosis of pump units based on acoustic signatures and vibrations, there are problems such as signal non-stationarity caused by operating conditions and noise interference, insufficient feature extraction and model generalization capabilities, and false alarms and missed alarms in early fault warnings.
By acquiring acoustic and/or vibration signals during the operation of the water pump unit, the signals are preprocessed and segmented into multiple time window segments, which are then input into a deep diagnostic model. The model is then standardized using Z-score by combining permutation entropy, effective value, and anomaly scores. The standardized early warning indicators within the sliding window are used to determine the threshold and output fault warning information.
It enables timely identification and early warning of the operating status of water pump units, reduces the problems of insufficient adaptability to complex operating conditions and delayed early warning, and improves the accuracy and reliability of fault diagnosis.
Smart Images

Figure CN122196843A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of industrial equipment fault diagnosis technology, and in particular to a fault diagnosis method and apparatus for a water pump unit. Background Technology
[0002] With the increasing demand for intelligent operation and maintenance of industrial equipment, water pump units, as key power equipment in water supply, chemical, metallurgical, and energy scenarios, directly affect the continuity and safety of the system. During long-term operation, water pumps are prone to faults such as bearing wear, rotor imbalance, coupling eccentricity, impeller damage, cavitation, and loosening. These faults often progress from minor abnormalities to serious failures. To achieve timely assessment and early warning of the water pump's health status, engineering practices typically involve deploying vibration sensors and acoustic sensors to collect vibration / acoustic signals during pump operation, and then using this data for condition identification and fault diagnosis.
[0003] In related technologies, pump fault diagnosis often employs feature extraction and classification methods based on vibration or acoustic signals. For example, it calculates statistical features such as root mean square, kurtosis, and peak factor in the time domain, or performs spectral analysis, envelope demodulation, wavelet decomposition, and empirical mode decomposition in the frequency / time-frequency domain. These are then combined with threshold criteria, expert rules, and traditional machine learning classifiers to achieve fault identification and alarms. However, the actual operating conditions of pumps are complex and variable, with factors such as load fluctuations, flow rate changes, installation differences, pipeline resonance, and environmental noise interference, causing vibration / acoustic signals to exhibit non-stationary and strong noise characteristics. On the one hand, fixed-parameter filtering and feature engineering are sensitive to changes in operating conditions, often resulting in unstable features and insufficient generalization ability across equipment / scenes. On the other hand, early fault signals often have weak amplitudes and are submerged by normal fluctuations, making it easy for single indicators or single threshold strategies to lead to missed detections. Furthermore, the high cost of on-site data annotation and the scarcity of fault samples limit model training and iterative updates, thus affecting diagnostic accuracy and early warning reliability.
[0004] Therefore, in the intelligent fault diagnosis of water pump units, the coexistence of non-stationary signals caused by operating conditions and noise interference, insufficient feature extraction and model generalization capabilities, and false alarms and missed alarms in early fault warnings have become urgent problems to be solved. Summary of the Invention
[0005] This application provides a fault diagnosis method and apparatus for water pump units, aiming to solve the problems in the existing technology of intelligent fault diagnosis of water pump units, such as signal non-stationarity caused by operating conditions and noise interference, insufficient feature extraction and model generalization ability, and the coexistence of false alarms and missed alarms in early fault warning.
[0006] A first aspect includes a method for diagnosing faults in a water pump unit, the method comprising: Acquire time-series signals collected during the operation of the water pump unit, wherein the time-series signals include acoustic fingerprint signals and / or vibration signals; The time series signal is preprocessed to obtain a preprocessed signal; The preprocessed signal is divided into multiple time window signal segments according to a preset time window. The time window signal segment is input into the deep diagnostic model to obtain the diagnostic results, which include component category and / or health status. Based on the preprocessed signal, the permutation entropy index and the effective value index are determined, wherein the effective value index is the root mean square of the preprocessed signal within the time window signal segment; Based on the preprocessed signal and / or the intermediate features of the deep diagnostic model, an anomaly score is determined, wherein the anomaly score is output through an isolated forest model; Based on health baseline data, the mean and standard deviation of the anomaly score, the permutation entropy index, and the effective value index are determined respectively. Z-score standardization is then performed on the real-time anomaly score, the permutation entropy index, and the effective value index to obtain standardized early warning indicators. Threshold determination is performed on the standardized early warning indicators within the sliding window range. When at least two of the standardized early warning indicators meet the corresponding threshold conditions, fault early warning information is obtained.
[0007] Optionally, in the above scheme, the preprocessing of the time series signal to obtain a preprocessed signal includes: The time series signal is cleaned to obtain a cleaned time series signal. Based on the power spectral density of the time series signal after data cleaning, the filtering parameters are determined; Based on the filtering parameters, the time series signal after data cleaning is subjected to adaptive high-frequency suppression filtering to obtain the preprocessed signal.
[0008] Optionally, in the above scheme, determining the filtering parameters based on the power spectral density of the cleaned time-series signal includes: Calculate the power spectral density of the time series signal after data cleaning; Based on the energy distribution of the power spectral density, the cutoff frequency parameter and / or filter threshold are determined as the filter parameters.
[0009] Optionally, in the above scheme, dividing the preprocessed signal into multiple time window signal segments according to a preset time window includes: The number of sampling points in a time window is determined based on the preset time window length and the preset sampling rate, wherein the number of sampling points in a time window is used to indicate the number of sampling points contained in each time window signal segment; The number of step sampling points is determined based on the preset step size and the preset sampling rate, wherein the number of step sampling points is used to indicate the offset of the starting position between two adjacent segments; Using the number of sampling points of the specified step size as the sliding step size, consecutive sampling points corresponding to the number of sampling points of the specified time window are sequentially extracted from the preprocessed signal to form multiple initial time window signal segments; wherein, there is an overlapping interval between two adjacent initial time window signal segments; The multiple initial time window signal segments are normalized to obtain multiple time window signal segments.
[0010] Optionally, in the above scheme, the deep diagnostic model includes a feature extraction subnetwork, a feature enhancement subnetwork, a temporal modeling subnetwork, and a classification subnetwork; The feature extraction subnetwork is used to perform one-dimensional temporal convolution on the time window signal segment to output temporal features; The feature enhancement subnetwork is used to perform multi-scale fusion of the temporal features and output enhanced features; The temporal modeling subnetwork is used to perform temporal modeling on the enhanced features and output a temporal representation; The classification subnetwork is used to output the diagnostic results based on the temporal representation.
[0011] Optionally, in the above scheme, the feature enhancement subnetwork performs multi-scale interactive fusion of the temporal features and outputs the enhanced features, including: The first convolution operation and the second convolution operation are performed in parallel on the temporal features to obtain the first scale features and the second scale features, wherein the kernel size of the first convolution operation is smaller than the kernel size of the second convolution operation. The first scale feature and the second scale feature are subjected to nonlinear activation processing to obtain the first activation feature and the second activation feature; Perform element-wise interaction operations on the first activation feature and the second activation feature to obtain the interaction feature; The enhanced feature is obtained by fusing the interactive feature with the first activation feature and / or the second activation feature and then performing a third convolution operation.
[0012] Optionally, in the above scheme, the temporal modeling sub-network includes a Long Short-Term Memory (LSTM) network; the temporal modeling sub-network performs temporal modeling on the enhanced features and outputs a temporal representation, including: The enhanced features obtained from the same water pump unit within a continuous time window signal segment are arranged in chronological order to form an enhanced feature sequence. The enhanced feature sequence is input into the Long Short-Term Memory (LSTM) network, which outputs the temporal representation.
[0013] Optionally, in the above scheme, the training of the deep diagnostic model includes time-domain and frequency-domain-based deentanglement representation learning and momentum contrast learning, and includes: Encoding networks are constructed in the time domain and frequency domain respectively to encode the time window signal segment and output trend feature representation and periodic feature representation respectively; At least two data augmentation views are generated from signal segments within the same time window. The query encoding network and the key encoding network are encoding networks with the same structure. One data augmentation view is input into the query encoding network to obtain a query representation, and the other data augmentation view is input into the key encoding network to obtain a key representation. The parameters of the key encoding network are updated using a momentum update method, and a contrastive learning loss is calculated based on the query representation and the key representation to update the parameters of the query encoding network. After completing the momentum contrast learning, the deep diagnostic model is initialized or updated using the parameters of the encoding network, and supervised training is performed on the deep diagnostic model based on labeled samples.
[0014] Secondly, a fault diagnosis device for a water pump unit, the device comprising: Acquisition module: used to acquire time-series signals collected during the operation of the water pump unit, the time-series signals including acoustic signature signals and / or vibration signals; Preprocessing module: used to preprocess the time series signal to obtain a preprocessed signal; Segmentation module: used to segment the preprocessed signal according to a preset time window to obtain multiple time window signal segments; Diagnostic module: used to input the time window signal segment into the deep diagnostic model to obtain diagnostic results, which include component category and / or health status; Output module: used to generate early warning indicators based on the preprocessed signal and / or the diagnostic results; The first determining module is used to determine the permutation entropy index and the effective value index based on the preprocessed signal, wherein the effective value index is the root mean square of the preprocessed signal within the time window signal segment; The second determining module is used to determine anomaly scores based on the preprocessed signal and / or intermediate features of the deep diagnostic model, wherein the anomaly scores are output through an isolated forest model; The third determination module is used to determine the mean and standard deviation of the abnormal score, the permutation entropy index and the effective value index based on the health baseline data, and to perform Z-score standardization on the real-time obtained abnormal score, the permutation entropy index and the effective value index to obtain a standardized early warning index. Early warning module: used to determine the threshold of the standardized early warning indicators within a sliding window range. When at least two of the standardized early warning indicators meet the corresponding threshold conditions, fault early warning information is obtained.
[0015] Compared with the prior art, this application has at least the following beneficial effects: This application, based on further analysis and research of existing technical problems, recognizes that existing technologies in intelligent fault diagnosis of water pump units suffer from issues such as signal non-stationarity caused by operating conditions and noise interference, insufficient feature extraction and model generalization capabilities, and the coexistence of false alarms and missed alarms in early fault warnings. This application addresses these problems by acquiring time-series signals containing acoustic and / or vibration signals collected during the operation of the water pump unit. Before entering the diagnostic stage, the time-series signals are preprocessed to form preprocessed signals more suitable for analysis. These preprocessed signals are then divided into multiple time-window signal segments according to a preset time window, thereby transforming the originally continuous and easily affected by operating condition fluctuations into analytical units with clear upper boundaries in the time domain, consistent length, and the ability to be processed window by window. Based on this, the time-window signal segments are input into a deep diagnostic model to obtain diagnostic results including component categories and / or health status. This allows the judgment of the water pump unit's operating status to no longer rely on human experience or single threshold features, but rather on the model learning and discriminating signal patterns within the time window to output structured diagnostic information.
[0016] This application generates early warning indicators based on the preprocessed signals and / or the diagnostic results, and performs threshold determination on the early warning indicators to obtain fault early warning information. This allows the early warning trigger to utilize both the abnormal operating symptoms reflected by the preprocessed signals and the component category or health status information represented by the diagnostic results for constraint and confirmation. In this way, it enables timely identification and early warning output of fault signs during the fault diagnosis and maintenance of water pump units, solving the problems of existing technologies where water pump unit fault diagnosis relies on human experience, has insufficient adaptability to complex operating conditions, and suffers from delayed early warnings that make it difficult to detect and deal with potential faults in a timely manner. Attached Figure Description
[0017] Figure 1 A schematic flowchart illustrating a fault diagnosis method for a water pump unit provided in one embodiment of this application; Figure 2 This is a schematic diagram of an adaptive threshold high-frequency filter provided in one embodiment of this application; Figure 3This is a schematic diagram of an interactive convolution module provided in one embodiment of this application; Figure 4 This is a schematic diagram of a deep learning-based water pump voiceprint recognition model provided in one embodiment of this application. Detailed Implementation
[0018] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.
[0019] In one embodiment, such as Figure 1 As shown, a fault diagnosis method for a water pump unit is provided, including the following steps: Acquire time-series signals collected during the operation of the water pump unit, wherein the time-series signals include acoustic fingerprint signals and / or vibration signals; The time series signal is preprocessed to obtain a preprocessed signal; The preprocessed signal is divided into multiple time window signal segments according to a preset time window. The time window signal segment is input into the deep diagnostic model to obtain the diagnostic results, which include component category and / or health status. Based on the preprocessed signal, the permutation entropy index and the effective value index are determined, wherein the effective value index is the root mean square of the preprocessed signal within the time window signal segment; Based on the preprocessed signal and / or the intermediate features of the deep diagnostic model, an anomaly score is determined, wherein the anomaly score is output through an isolated forest model; Based on health baseline data, the mean and standard deviation of the anomaly score, the permutation entropy index, and the effective value index are determined respectively. Z-score standardization is then performed on the real-time anomaly score, the permutation entropy index, and the effective value index to obtain standardized early warning indicators. Threshold determination is performed on the standardized early warning indicators within the sliding window range. When at least two of the standardized early warning indicators meet the corresponding threshold conditions, fault early warning information is obtained.
[0020] In one implementation, during the operation of the water pump unit, acoustic fingerprint signals are acquired via an acoustic acquisition device and / or vibration signals are acquired via a vibration acquisition device. The acoustic acquisition device can be a microphone or an acoustic sensor, and the vibration acquisition device can be an accelerometer or a vibration velocity sensor. The acquisition end samples the signals at a preset sampling rate to obtain a sequence of sampling points arranged in chronological order, forming a time-series signal. The time-series signal can be a single-channel signal or a multi-channel signal; when acoustic fingerprint signals and vibration signals are acquired simultaneously, their respective time-series signals can be generated and kept time-aligned.
[0021] In one implementation, the time-series signal is preprocessed to obtain a preprocessed signal. The preprocessing includes at least processing of outlier sampling points or signal segments, and suppression of high-frequency noise. The preprocessed signal is then divided into multiple time-window segments according to a preset time window. Each time-window segment contains sampling points of the preprocessed signal within a certain continuous time range. Multiple time-window segments cover the continuous time range of the preprocessed signal, and overlap between adjacent time windows is permitted.
[0022] In one implementation, a time window signal segment is input into a deep diagnostic model to obtain diagnostic results. The diagnostic results include component categories and / or health status, where component categories characterize the type of component associated with the fault, and health status characterizes the operating state category of the pump unit within that time frame. The diagnostic results can be category labels or outputs containing category confidence scores.
[0023] In one implementation, early warning indicators are generated based on preprocessed signals and / or diagnostic results. These indicators may include signal statistics or complexity metrics calculated from the preprocessed signals, or health scores or anomaly probability metrics obtained from the diagnostic results. Subsequently, a threshold determination is performed on the early warning indicators; if a preset threshold condition is met, fault early warning information is output. The fault early warning information may include an early warning identifier, trigger time or trigger time window range, early warning level, and diagnostic results associated with the early warning.
[0024] In one implementation, the permutation entropy index and the effective value index are determined based on the preprocessed signal. The permutation entropy index is obtained by statistically analyzing the permutation patterns of the preprocessed signal within a time window and calculating its complexity; it is used to characterize the change in the degree of order of the signal sequence. The effective value index is the root mean square of the preprocessed signal within the time window; it is used to characterize the signal energy level or vibration intensity level.
[0025] In one implementation, anomaly scores are output via an isolation forest model. The input for generating the anomaly scores can come from a preprocessed signal or from intermediate features of a deep diagnostic model. These intermediate features can be selected as temporal features output by the feature extraction subnetwork, enhanced features output by the feature enhancement subnetwork, or temporal representations output by the temporal modeling subnetwork. The selected input features are then fed into the isolation forest model to obtain the anomaly scores for the corresponding time windows.
[0026] In one implementation, health baseline data is used to establish a standardized reference. Health baseline data can be taken from historical periods confirming healthy equipment operation or from the initial healthy operation phase after commissioning. The mean and standard deviation of the anomaly scores, permutation entropy indicators, and effective value indicators corresponding to this baseline data are statistically calculated and used to perform Z-score standardization on real-time indicators, thereby obtaining standardized early warning indicators and making different indicators comparable.
[0027] In one implementation, threshold determination is performed on standardized early warning indicators within a sliding window. The sliding window can contain multiple consecutive time window signal segments, and the threshold conditions can be set separately for the anomaly score, permutation entropy index, and effective value index. The determination rule is: when at least two of the standardized early warning indicators meet their respective threshold conditions, fault early warning information is output. The fault early warning information may include the triggering time window range, triggering indicator items, early warning level, and optional diagnostic result reference information.
[0028] This embodiment involves two parts: "generating early warning indicators" and "obtaining fault early warning information by threshold determination". The permutation entropy indicator and the effective value indicator come from the preprocessed signal, and the anomaly score comes from the isolated forest model. The health baseline data is used to establish a standardized reference, the sliding window is used to impose continuous constraints on the determination in time, and "at least two conditions must be met" is used to form a multi-indicator joint triggering logic.
[0029] This embodiment constructs anomaly scores, permutation entropy indicators, and effective value indicators simultaneously and performs standardization processing. Then, within a sliding window, it uses the judgment rule of "at least two of them satisfying the threshold condition" to output fault warning information, which can form a warning mechanism jointly triggered by multiple sources of indicators, reducing the impact of occasional fluctuations of a single indicator on the warning output.
[0030] In this embodiment, "preprocessed signal" refers to the signal after cleaning and filtering the original time series signal; "time window signal segment" is a set of segments extracted from the preprocessed signal according to a preset time window; "diagnostic result" is the output of the deep diagnostic model after analyzing the time window signal segment; "early warning indicator" is one or more indicator variables used to trigger early warning judgment; and "threshold judgment" is the process of comparing the early warning indicator with a preset threshold condition to determine whether to output fault early warning information.
[0031] This embodiment establishes a closed-loop process of data acquisition, preprocessing, segmentation, in-depth diagnosis, and early warning determination, enabling the acoustic and / or vibration signals of the water pump unit to achieve diagnostic results and fault early warning information output under a unified link, facilitating the implementation of online monitoring and early warning.
[0032] In this embodiment, the preprocessing of the time series signal to obtain a preprocessed signal includes: The time series signal is cleaned to obtain a cleaned time series signal. Based on the power spectral density of the time series signal after data cleaning, the filtering parameters are determined; Based on the filtering parameters, the time series signal after data cleaning is subjected to adaptive high-frequency suppression filtering to obtain the preprocessed signal.
[0033] In one implementation, data cleaning of time series signals may include: identifying and removing abnormal signal segments during the acquisition process, such as all-zero segments caused by sensor disconnection, truncated segments caused by sampling saturation, and abrupt segments caused by acquisition jitter; median filtering or amplitude limiting can be used for short-term spike noise; and outliers that deviate significantly from the normal range can be replaced or interpolated to complete the data, thereby obtaining the cleaned time series signal.
[0034] In one implementation, filtering parameters are determined based on the power spectral density of the cleaned time-series signal: first, spectral analysis is performed on the cleaned time-series signal to obtain a power spectral density curve reflecting the energy distribution of different frequency components; then, based on the boundary characteristics between the main energy concentration region and the high-frequency noise-dominated region in the power spectral density curve, the parameters used for filtering are determined. The filtering parameters may include a cutoff frequency parameter and / or a filtering threshold. The cutoff frequency parameter is used to limit the frequency range through which the filter passes, and the filtering threshold is used to limit the strength of high-frequency suppression or the gating condition.
[0035] In one implementation, adaptive high-frequency suppression filtering is performed on the cleaned time-series signal based on filtering parameters: when the filtering parameters include a cutoff frequency parameter, a low-pass filter or a band-stop filter can be constructed, and the filter is configured with this cutoff frequency parameter; when the filtering parameters include a filtering threshold, gated suppression or attenuation processing can be performed on the high-frequency band, and the suppression conditions and suppression intensity are controlled by the filtering threshold. The preprocessed signal is then output after the above filtering process.
[0036] This embodiment divides the preprocessing process into two stages: "data cleaning" and "adaptive determination of filtering parameters based on power spectral density and filtering". The "adaptive" means that the filtering parameters are determined by the power spectral density characteristics of the current signal, rather than fixed parameters, so that the filtering process can automatically adjust for different operating conditions and different noise spectrum characteristics.
[0037] This embodiment introduces data cleaning and adaptive high-frequency suppression filtering based on power spectral density to reduce the impact of abnormal sampling and high-frequency noise on subsequent windowing and depth diagnostic model inputs, thereby improving the effectiveness and stability of the preprocessed signal.
[0038] In this embodiment, determining the filtering parameters based on the power spectral density of the cleaned time-series signal includes: Calculate the power spectral density of the time series signal after data cleaning; Based on the energy distribution of the power spectral density, the cutoff frequency parameter and / or filter threshold are determined as the filter parameters.
[0039] In one implementation, the power spectral density of the time series signal after data cleaning can be calculated by segmented analysis and averaging: the signal is divided into multiple short segments of a preset length, each segment is processed by a window function and then subjected to spectral analysis to obtain the frequency domain energy distribution of each segment, and then the energy distribution of each segment is averaged to obtain the overall power spectral density curve, so as to reduce the impact of random fluctuations on the spectrum estimation.
[0040] In one implementation, the cutoff frequency parameter can be determined based on the energy distribution of the power spectral density by: identifying the boundary point between the energy concentration region and the energy rapid decay region in the power spectral density curve, and using the frequency corresponding to the boundary point as the cutoff frequency parameter; or by selecting the corresponding frequency as the cutoff frequency parameter at the position where the energy accumulation reaches a preset proportion according to the energy accumulation ratio rule.
[0041] In one implementation, determining the filtering threshold based on the energy distribution of the power spectral density can be achieved by statistically analyzing typical levels of the power spectral density in the high-frequency range, for example, by determining a threshold based on the mean, standard deviation, or quantiles. This threshold is then used as the filtering threshold to determine which frequency components need to be suppressed and how much suppression is needed during subsequent high-frequency gating suppression. The cutoff frequency parameter and the filtering threshold can be used individually or in combination as filtering parameters.
[0042] In this embodiment, the implementation path of "determining filter parameters" is as follows: first, obtain the power spectral density curve, and then determine the cutoff frequency parameter and / or filter threshold by analyzing the energy distribution characteristics of the power spectral density, so that the acquisition of filter parameters has a reproducible rule basis.
[0043] This embodiment determines the cutoff frequency parameter and / or filter threshold based on the energy distribution of the power spectral density, which makes the source of the filter parameter clear and has adaptive adjustability, thereby enhancing the interpretability and consistency of the high-frequency suppression filter configuration.
[0044] In this embodiment, the step of dividing the preprocessed signal according to a preset time window to obtain multiple time window signal segments includes: The number of sampling points in a time window is determined based on the preset time window length and the preset sampling rate, wherein the number of sampling points in a time window is used to indicate the number of sampling points contained in each time window signal segment; The number of step sampling points is determined based on the preset step size and the preset sampling rate, wherein the number of step sampling points is used to indicate the offset of the starting position between two adjacent segments; Using the number of sampling points of the specified step size as the sliding step size, consecutive sampling points corresponding to the number of sampling points of the specified time window are sequentially extracted from the preprocessed signal to form multiple initial time window signal segments; wherein, there is an overlapping interval between two adjacent initial time window signal segments; The multiple initial time window signal segments are normalized to obtain multiple time window signal segments.
[0045] In one implementation, the preset sampling rate is configured by the acquisition end and is used to represent the number of sampling points acquired per unit time. The preset time window length is used to represent the duration covered by each time window. The number of sampling points in a time window is determined based on the preset time window length and the preset sampling rate. The number of sampling points in a time window is used to indicate the number of sampling points that each signal segment in a time window should contain, i.e., to limit the length of each segment.
[0046] In one implementation, a preset step size is used to represent the time interval between two adjacent segments. The number of sampling points for the step size is determined based on the preset step size and the preset sampling rate. The number of sampling points for the step size is used to indicate the offset of the starting position between two adjacent segments, that is, the number of sampling points that the starting point moves backward each time a segment is extracted from the preprocessed signal.
[0047] In one implementation, the sliding step size is the number of sampling points per step, and consecutive sampling points corresponding to the number of sampling points per time window are sequentially extracted from the preprocessed signal to form multiple initial time window signal segments. Since the length of the time window is usually greater than the step size, there is an overlap between adjacent initial time window signal segments. If the end of the preprocessed signal is insufficient to form a complete segment, the insufficient segment at the end can be discarded, zero-padding or mirror padding can be used to ensure that the length of each initial time window signal segment is consistent.
[0048] In one implementation, multiple initial time window signal segments are normalized separately to obtain multiple time window signal segments. The normalization process can be performed by removing the mean at the segment level and scaling according to the fluctuation amplitude, or by scaling according to the maximum amplitude value, so that different segments are more consistent in amplitude scale.
[0049] In this embodiment, the "number of sampling points in the time window" determines how many sampling points each segment contains, thus determining the segment length; the "number of sampling points in the step size" determines how many sampling points the segmentation starting point moves each time, thus determining the degree of overlap and update frequency between segments. The "initial time window signal segment" is the segmented, unnormalized segment, while the "time window signal segment" is the normalized set of segments, which is convenient as a unified format input for the deep diagnostic model.
[0050] This embodiment converts the time window length and step size into the number of sampling points based on the sampling rate and divides them in a sliding manner, which can stably obtain time window signal segments with overlapping intervals; combined with segment-level normalization processing, it can improve the consistency of segment input scale, which is convenient for subsequent deep diagnostic model processing.
[0051] In this embodiment, the deep diagnostic model includes a feature extraction subnetwork, a feature enhancement subnetwork, a temporal modeling subnetwork, and a classification subnetwork; The feature extraction subnetwork is used to perform one-dimensional temporal convolution on the time window signal segment to output temporal features; The feature enhancement subnetwork is used to perform multi-scale fusion of the temporal features and output enhanced features; The temporal modeling subnetwork is used to perform temporal modeling on the enhanced features and output a temporal representation; The classification subnetwork is used to output the diagnostic results based on the temporal representation.
[0052] In one implementation, the deep diagnostic model is functionally divided into a feature extraction subnetwork, a feature enhancement subnetwork, a temporal modeling subnetwork, and a classification subnetwork. A time-window signal segment is input to the feature extraction subnetwork, which extracts local temporal patterns from the segment using one-dimensional temporal convolution, outputting temporal features. These temporal features can be represented as a feature sequence or feature map arranged over time.
[0053] The feature enhancement subnetwork receives temporal features, performs multi-scale fusion on them, and outputs enhanced features. Multi-scale fusion can be achieved through parallel convolutional branches, feature aggregation with different convolutional kernel sizes or different receptive fields, so that the enhanced features contain information from different time scales.
[0054] The temporal modeling subnetwork receives enhanced features, performs temporal modeling on these features, and outputs a temporal representation. This temporal representation is used to aggregate and express dynamic changes within a time window or consecutive time windows, and can be a fixed-length vector or an aggregated representation result. The classification subnetwork receives the temporal representation and outputs diagnostic results, including component category and / or health status.
[0055] This embodiment clarifies the module division and data flow of the deep diagnostic model: time window signal segments are processed through feature extraction to obtain temporal features, then through feature enhancement to obtain enhanced features, and finally through temporal modeling to obtain temporal representations. The diagnostic results are then obtained through a classification sub-network. The output objects of each sub-network correspond hierarchically in name and function, which helps to describe the model structure, training methods, and inference processes separately in the specification.
[0056] This embodiment modularizes the deep diagnostic model into a structure of feature extraction, multi-scale enhancement, temporal modeling, and classification output. This makes the model processing chain clear, the implementation path well-defined, and facilitates configuration and expansion for representation learning and classification training at different stages.
[0057] In this embodiment, the feature enhancement subnetwork performs multi-scale interactive fusion of the temporal features and outputs the enhanced features, including: The first convolution operation and the second convolution operation are performed in parallel on the temporal features to obtain the first scale features and the second scale features, wherein the kernel size of the first convolution operation is smaller than the kernel size of the second convolution operation. The first scale feature and the second scale feature are subjected to nonlinear activation processing to obtain the first activation feature and the second activation feature; Perform element-wise interaction operations on the first activation feature and the second activation feature to obtain the interaction feature; The enhanced feature is obtained by fusing the interactive feature with the first activation feature and / or the second activation feature and then performing a third convolution operation.
[0058] In one implementation, the process of multi-scale interactive fusion of temporal features by the feature enhancement subnetwork includes: performing a first convolution operation and a second convolution operation in parallel on the temporal features to obtain first-scale features and second-scale features. The first convolution operation uses a smaller kernel size to extract local detail features within a shorter time range; the second convolution operation uses a larger kernel size to extract overall structural features within a longer time range.
[0059] Subsequently, nonlinear activation processing is applied to the first-scale features and the second-scale features respectively to obtain the first activated features and the second activated features. Then, element-wise interaction operations are performed on the first activated features and the second activated features to obtain the interaction features. Element-wise interaction operations can employ element-wise multiplication or element-wise gating to ensure that information from the two scales is coupled at the same location.
[0060] In one implementation, the interactive features are fused with the first activation features and / or the second activation features before a third convolution operation is performed. The fusion method can be channel concatenation followed by convolution, or weighted summation followed by convolution. The third convolution operation is used to re-encode and reorganize the fused features, outputting enhanced features, which serve as input to the temporal modeling sub-network.
[0061] In this embodiment, "element-by-element interactive operation" is used to establish the correspondence between features of different scales, so that multi-scale information is not only superimposed but also reflects interaction; "execute third convolution operation after fusion" is used to integrate the interactive information and the original scale information into a single enhanced feature representation, thereby maintaining the consistency of the input format of the subsequent temporal modeling sub-network.
[0062] This embodiment integrates multi-scale parallel convolution, nonlinear activation, element-wise interaction, and fused third convolution, enabling the enhanced features to simultaneously contain information from different time scales and their interactions, thereby improving the expressive power and structural integrity of the enhanced features.
[0063] In this embodiment, the temporal modeling sub-network includes a Long Short-Term Memory (LSTM) network; the temporal modeling sub-network performs temporal modeling on the enhanced features and outputs a temporal representation, including: The enhanced features obtained from the same water pump unit within a continuous time window signal segment are arranged in chronological order to form an enhanced feature sequence. The enhanced feature sequence is input into the Long Short-Term Memory (LSTM) network, which outputs the temporal representation.
[0064] In one implementation, the temporal modeling subnetwork includes a Long Short-Term Memory (LSTM) network. For the "enhanced features obtained within consecutive time window signal segments," the enhanced feature sequence can be formed as follows: multiple consecutive time window signal segments are selected in chronological order, processed by a feature extraction subnetwork and a feature enhancement subnetwork respectively to obtain corresponding enhanced features, and then these enhanced features are arranged in chronological order to form an enhanced feature sequence.
[0065] After the enhanced feature sequence is input into the LSTM, the LSTM recursively models the sequence information and outputs a temporal representation. The temporal representation can be taken from the output of the LSTM at the last time step, or it can be obtained by aggregating the outputs of multiple LSTM time steps. The temporal representation is then input into the classification subnetwork to output the diagnostic result.
[0066] In one implementation, the number of consecutive time windows can be a preset value. To improve temporal resolution, shorter time windows and smaller step sizes can be used to form denser sequence inputs; to improve stability, longer time windows or longer sequence lengths can be used for modeling.
[0067] In this embodiment, "continuous time window signal segments" emphasizes arranging the augmented features corresponding to multiple adjacent time windows into a sequence in chronological order, thereby enabling the LSTM input to contain dynamic evolution information across windows. "Temporal representation" is the output of the LSTM after modeling the augmented feature sequence, used to provide the classification subnetwork with representation information containing time dependencies.
[0068] This embodiment constructs enhanced features within a continuous time window into a sequence and inputs it into an LSTM for time series modeling. This can encode the changing trends and dependencies across time windows into the time series representation, providing richer time-related information for diagnostic results output.
[0069] In this embodiment, the training of the deep diagnostic model includes time-domain and frequency-domain-based deentanglement representation learning and momentum contrast learning, and further includes: Encoding networks are constructed in the time domain and frequency domain respectively to encode the time window signal segment and output trend feature representation and periodic feature representation respectively; At least two data augmentation views are generated from signal segments within the same time window. The query encoding network and the key encoding network are encoding networks with the same structure. One data augmentation view is input into the query encoding network to obtain a query representation, and the other data augmentation view is input into the key encoding network to obtain a key representation. The parameters of the key encoding network are updated using a momentum update method, and a contrastive learning loss is calculated based on the query representation and the key representation to update the parameters of the query encoding network. After completing the momentum contrast learning, the deep diagnostic model is initialized or updated using the parameters of the encoding network, and supervised training is performed on the deep diagnostic model based on labeled samples.
[0070] In one implementation, encoding networks are constructed in both the time and frequency domains. The time-domain encoding network encodes the time-domain sequence of the signal segment within the time window, outputting a trend feature representation. This trend feature representation characterizes the overall trend information of the signal's slow changes over time. The frequency-domain encoding network encodes the frequency-domain representation of the signal segment within the time window, outputting a periodic feature representation. The frequency-domain representation can be obtained from the time-window signal segment through spectral analysis, while the periodic feature representation characterizes the periodic structural information in the signal related to rotation or periodic excitation. By constructing time-domain and frequency-domain encoding networks separately, trend information and periodic information are learned and expressed in different representation channels.
[0071] In one implementation, at least two data augmentation views are generated from signal segments within the same time window. These data augmentation views can be obtained by applying different random augmentation operations to the same segment, such as adding noise of varying amplitudes, performing random cropping at different locations, applying amplitude scaling, or time perturbation. The query encoding network and the key encoding network are structurally identical encoding networks, used to process the different augmentation views and output representation vectors respectively.
[0072] One data augmentation view is input into the query encoding network to obtain the query representation, and the other data augmentation view is input into the key encoding network to obtain the key representation. A momentum update method is used to update the parameters of the key encoding network, meaning the key encoding network updates its parameters more slowly than the query encoding network, and the historical state of the query encoding network parameters is used as an update reference. A contrastive learning loss is calculated based on the query and key representations, and this loss is used to update the query encoding network parameters, making the representations of the same original segment more similar across different augmentation views and making the representations of different segments more distinguishable.
[0073] After momentum contrast learning is completed, the deep diagnostic model is initialized or updated using the parameters of the encoding network. Initialization or updating can be achieved by: loading the convolutional layer parameters from the encoding network into the feature extraction and / or feature enhancement subnetworks of the deep diagnostic model as initial parameters, or by incorporating the encoding network as part of the deep diagnostic model and sharing its parameters. Subsequently, the deep diagnostic model is trained under supervision based on labeled samples, enabling the classification subnetwork to output part categories and / or health status as diagnostic results.
[0074] In this embodiment, "unentangled representation learning" outputs trend feature representations and periodic feature representations through time-domain coding networks and frequency-domain coding networks, respectively, so that the two types of structural information are learned through different channels during the training phase; "momentum contrastive learning" forms a stable contrastive learning training process through the dual coding structure of query coding network and key coding network and momentum update strategy; "initialization or update and supervised training" is used to transfer the representation capabilities learned in the contrastive learning phase to the deep diagnostic model.
[0075] This embodiment employs time-domain and frequency-domain deentanglement representation learning combined with momentum contrastive learning training strategies to obtain a more complete representation of time window signal segments before supervised training. Then, by initializing or updating the deep diagnostic model using the coding network parameters and performing supervised training, the adaptability and training stability of the deep diagnostic model to trend and periodic structural features can be enhanced.
[0076] In one embodiment, a deep learning-based method for water pump acoustic signature recognition and fault diagnosis is provided, including: One-dimensional temporal convolution feature extraction module, such as Figure 4 As shown, a one-dimensional convolutional network is used to automatically extract representative local features from the original water pump acoustic / vibration time series signal, effectively capturing the time series pattern in the signal.
[0077] The time-frequency domain deentanglement and dual-domain momentum contrastive learning module, as the core mechanism for model training, aims to decouple trend features from periodic features in signals. This module constructs encoders in both the time and frequency domains and utilizes a momentum contrastive learning framework to train the model on unlabeled data. This enables the model to comprehensively analyze the dual-domain degradation information of signals and obtain more robust feature representations.
[0078] The interactive convolutional feature enhancement module captures local fine-grained features and global dependencies of the signal by using convolutional kernels of different sizes in parallel, and uses interactive operations between feature maps (such as element-wise multiplication) to fuse information, ultimately integrating them into a unified feature representation with stronger expressive power.
[0079] The Long Short-Term Memory Network (LSTM) temporal modeling module receives sequence data after feature extraction and enhancement. Through its internal gating mechanism (forget gate, input gate, and output gate), it models the long-term dependencies in the time series, accurately capturing the dynamic evolution pattern of the acoustic features of water pump components.
[0080] The three-indicator joint early warning strategy, as a post-processing and decision-making mechanism, comprehensively considers the state of three indicators: anomaly score (based on the isolated forest algorithm), permutation entropy (measuring sequence complexity), and effective value (measuring signal energy). A fault warning is triggered only when at least two indicators simultaneously exceed their adaptive thresholds set based on Z-scores within a continuous time window, thereby significantly reducing the false alarm rate.
[0081] As a further solution in this embodiment: In the data preprocessing stage, an adaptive threshold high-frequency filter is used to process the original signal, such as... Figure 2As shown, this filter calculates the signal power spectrum and dynamically learns the frequency threshold to filter out high-frequency noise while retaining important trends and low-frequency components, thus improving the quality of subsequent feature extraction. Furthermore, by utilizing transfer learning techniques, the model knowledge trained on specific pump components or operating conditions is transferred to new components or scenarios to enhance the model's generalization ability and reduce the computational cost of repeated training.
[0082] In another aspect of this embodiment, a method for water pump acoustic signature recognition and fault diagnosis based on deep learning includes the following steps: Step 1: Data Acquisition and Preprocessing. Raw vibration signals during operation are acquired using vibration sensors installed on the water pump unit. The signals are then preprocessed, including data cleaning to remove outliers and noise reduction using an adaptive threshold high-frequency filter. This filter dynamically adjusts its cutoff frequency based on the signal power spectrum to effectively preserve key components of the signal.
[0083] Step 2: Time window slicing and formatting. The preprocessed continuous vibration signal is sliced into time windows of fixed length to form a series of equal-length multidimensional time series segments. Each segment contains data from multiple sensors within a specific time period, serving as the basic input unit for the model.
[0084] Step 3: Dual-domain feature extraction and enhancement. The time window data is input into the feature extraction network. First, local features are initially extracted using a one-dimensional temporal convolutional network. Then, the features are fed into an interactive convolutional module, such as... Figure 3 As shown, this module extracts fine-grained local features and global context features through parallel small convolutional kernels and large convolutional kernels, respectively, and then performs interactive fusion to obtain an enhanced feature sequence.
[0085] Step 4: Temporal Feature Modeling and Component Classification. The enhanced feature sequence obtained in Step 3 is input into a Long Short-Term Memory (LSTM) network. The LSTM network utilizes its gating mechanism to learn long-term dependencies in the sequence, capturing the dynamic changes in voiceprint features. Finally, the final output of the LSTM network is fed into a fully connected layer for classification, identifying and outputting the pump component type or health status corresponding to the current time window.
[0086] Step 5: Joint Early Warning and Fault Diagnosis. The continuous output or intermediate features of the model are calculated to derive three indicators: anomaly score, permutation entropy, and effective value. The mean and variance of each indicator are calculated using data collected under bearing health conditions as a benchmark. During the monitoring phase, the Z-score of each indicator is calculated in real time, and thresholds are set. A sliding window is used to scan the time axis. When at least two indicators' Z-scores simultaneously exceed their thresholds within the same window, a joint early warning is triggered, indicating a potential component failure.
[0087] In this embodiment, the vibration acceleration sensor collects the acoustic signals of key components such as the water pump bearing. The signal path is: component vibration → sensor → signal conditioner → data acquisition card → host computer.
[0088] The acquired raw signal is preprocessed by the data processing program in the host computer, including the following steps: the signal is denoised by using an adaptive threshold high-frequency filter. First, the power spectrum of the signal is calculated to identify the dominant frequency components. Then, high-frequency noise with power below θ is filtered out by a learnable threshold θ, while retaining the trend and the main low-frequency components.
[0089] The filtered digital signal is sliced into time-window segments to form fixed-length multidimensional time series segments. The sliced data is then input into a voiceprint recognition model, which performs the following operations sequentially: Initial feature extraction is performed using a one-dimensional CNN; Feature enhancement is achieved through interactive convolutional modules: features are processed in parallel using Conv1 (small kernel) and Conv2 (large kernel), and interaction is performed after activation through element-wise multiplication (A1= (Conv1(S′))⊙ (Conv2(S′)), A2= (Conv2(S′))⊙ (Conv1(S′))), after merging, the enhanced features are output via Conv3.
[0090] The enhanced feature sequence is input into the LSTM network. The LSTM models the temporal dependency through its forget gate, input gate, and output gate, and finally outputs features for component state classification.
[0091] The system obtains preliminary classification or state identification results of components in the fully connected layer. Simultaneously, based on model output or features over a period of time, the system calculates the three indicators required for the joint early warning strategy. The outlier score is calculated using the Isolation Forest algorithm.
[0092] Calculate the permutation entropy of a time series.
[0093] Calculate the effective value of the signal within the window.
[0094] The system standardizes real-time calculated indicators using Z-scores derived from the mean and variance of various indicators learned from health status data. A threshold is set; when the Z-scores of at least two indicators within a sliding window exceed the threshold, a fault warning is generated.
[0095] To verify the effectiveness of the method, the diagnostic results of this method were compared with those of human experience-based diagnosis or high-precision laboratory analysis. The comparison results show that the method described in this embodiment can significantly reduce the false alarm rate while ensuring a high recall rate, and achieve timely early warning of early faults. Through transfer learning, the model trained by this method can quickly adapt to different models or operating conditions of water pump units, demonstrating strong generalization ability.
[0096] This embodiment constructs an end-to-end intelligent diagnostic framework that integrates "time-frequency decoupled feature learning", "multi-scale feature interaction enhancement" and "multi-index dynamic joint decision-making" to solve the three core problems of insufficient feature extraction, weak model generalization ability and low early warning reliability in existing water pump acoustic signature diagnostic methods.
[0097] This embodiment constructs a high-precision, highly robust, low-dependency, and practical intelligent water pump acoustic signature fault diagnosis system through a series of collaborative technological innovations. Specifically, the benefits and overall technical effects of each improvement are as follows: Time-frequency domain deentanglement and dual-domain momentum contrastive learning: The advantage of this improvement is that it enables a deeper and more essential feature representation of the voiceprint signal. It can automatically learn and separate trend and periodic features related to component degradation from unlabeled or minimally labeled data, significantly improving the feature representation capability and adaptability to complex working conditions, while reducing reliance on large amounts of manually labeled data and lowering data costs.
[0098] Interactive convolution module: The benefit of this improvement is that it enhances the model's ability to capture multi-scale features of the voiceprint signal. By fusing local fine-grained features and global contextual dependencies, the model can not only keenly perceive small local abrupt changes in the signal (such as early fault shocks), but also understand its evolution pattern throughout the entire time series, thereby improving the comprehensiveness and accuracy of feature extraction.
[0099] The three-indicator joint early warning strategy significantly improves the reliability of fault early warning. By integrating three indicators with different physical meanings—anomaly score, permutation entropy, and effective value—for joint decision-making, it effectively filters out false alarms caused by single indicators due to on-site noise, transient interference, etc., achieving a high fault detection rate with a low false alarm rate, making the early warning results more credible.
[0100] Adaptive threshold high-frequency filter: The advantage of this improvement is that it enables data-driven intelligent preprocessing. It can dynamically adjust the filtering threshold based on the specific signal's spectral characteristics, thereby removing high-frequency noise while better preserving low-frequency and trend components useful for diagnosis, thus improving the input signal quality of subsequent feature extraction modules.
[0101] Application of transfer learning technology: The benefit of this improvement is a significant increase in the model's versatility and deployment efficiency. It allows a diagnostic model trained on one component or operating condition to be quickly transferred to new, data-scarce components or scenarios, avoiding the huge computational and time costs of repeated training and accelerating the engineering implementation of the model.
[0102] In one embodiment, a fault diagnosis device for a water pump unit is provided, comprising the following modules, wherein: Acquisition module: used to acquire time-series signals collected during the operation of the water pump unit, the time-series signals including acoustic signature signals and / or vibration signals; Preprocessing module: used to preprocess the time series signal to obtain a preprocessed signal; Segmentation module: used to segment the preprocessed signal according to a preset time window to obtain multiple time window signal segments; Diagnostic module: used to input the time window signal segment into the deep diagnostic model to obtain diagnostic results, which include component category and / or health status; Output module: used to generate early warning indicators based on the preprocessed signal and / or the diagnostic results; The first determining module is used to determine the permutation entropy index and the effective value index based on the preprocessed signal, wherein the effective value index is the root mean square of the preprocessed signal within the time window signal segment; The second determining module is used to determine anomaly scores based on the preprocessed signal and / or intermediate features of the deep diagnostic model, wherein the anomaly scores are output through an isolated forest model; The third determination module is used to determine the mean and standard deviation of the abnormal score, the permutation entropy index and the effective value index based on the health baseline data, and to perform Z-score standardization on the real-time obtained abnormal score, the permutation entropy index and the effective value index to obtain a standardized early warning index. Early warning module: used to determine the threshold of the standardized early warning indicators within a sliding window range. When at least two of the standardized early warning indicators meet the corresponding threshold conditions, fault early warning information is obtained.
[0103] The specific implementation details of each module can be found in the above description of the fault diagnosis method for water pump units, and will not be repeated here.
[0104] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.
Claims
1. A method for diagnosing faults in a water pump unit, characterized in that, The method includes: Acquire time-series signals collected during the operation of the water pump unit, wherein the time-series signals include acoustic fingerprint signals and / or vibration signals; The time series signal is preprocessed to obtain a preprocessed signal; The preprocessed signal is divided into multiple time window signal segments according to a preset time window. The time window signal segment is input into the deep diagnostic model to obtain the diagnostic results, which include component category and / or health status. Based on the preprocessed signal, the permutation entropy index and the effective value index are determined, wherein the effective value index is the root mean square of the preprocessed signal within the time window signal segment; Based on the preprocessed signal and / or the intermediate features of the deep diagnostic model, an anomaly score is determined, wherein the anomaly score is output through an isolated forest model; Based on health baseline data, the mean and standard deviation of the anomaly score, the permutation entropy index, and the effective value index are determined respectively. Z-score standardization is then performed on the real-time anomaly score, the permutation entropy index, and the effective value index to obtain standardized early warning indicators. Threshold determination is performed on the standardized early warning indicators within the sliding window range. When at least two of the standardized early warning indicators meet the corresponding threshold conditions, fault early warning information is obtained.
2. The method according to claim 1, characterized in that, The preprocessing of the time series signal to obtain a preprocessed signal includes: The time series signal is cleaned to obtain a cleaned time series signal. Based on the power spectral density of the time series signal after data cleaning, the filtering parameters are determined; Based on the filtering parameters, the time series signal after data cleaning is subjected to adaptive high-frequency suppression filtering to obtain the preprocessed signal.
3. The method according to claim 2, characterized in that, The step of determining the filtering parameters based on the power spectral density of the cleaned time series signal includes: Calculate the power spectral density of the time series signal after data cleaning; Based on the energy distribution of the power spectral density, the cutoff frequency parameter and / or filter threshold are determined as the filter parameters.
4. The method according to claim 1, characterized in that, The step of dividing the preprocessed signal into multiple time window signal segments according to a preset time window includes: The number of sampling points in a time window is determined based on the preset time window length and the preset sampling rate, wherein the number of sampling points in a time window is used to indicate the number of sampling points contained in each time window signal segment; The number of step sampling points is determined based on the preset step size and the preset sampling rate, wherein the number of step sampling points is used to indicate the offset of the starting position between two adjacent segments; Using the number of sampling points of the specified step size as the sliding step size, consecutive sampling points corresponding to the number of sampling points of the specified time window are sequentially extracted from the preprocessed signal to form multiple initial time window signal segments; wherein, there is an overlapping interval between two adjacent initial time window signal segments; The multiple initial time window signal segments are normalized to obtain multiple time window signal segments.
5. The method according to claim 1, characterized in that, The deep diagnostic model includes a feature extraction subnetwork, a feature enhancement subnetwork, a temporal modeling subnetwork, and a classification subnetwork. The feature extraction subnetwork is used to perform one-dimensional temporal convolution on the time window signal segment to output temporal features; The feature enhancement subnetwork is used to perform multi-scale fusion of the temporal features and output enhanced features; The temporal modeling subnetwork is used to perform temporal modeling on the enhanced features and output a temporal representation; The classification subnetwork is used to output the diagnostic results based on the temporal representation.
6. The method according to claim 5, characterized in that, The feature enhancement subnetwork performs multi-scale interactive fusion of the temporal features and outputs the enhanced features, including: The first convolution operation and the second convolution operation are performed in parallel on the temporal features to obtain the first scale features and the second scale features, wherein the kernel size of the first convolution operation is smaller than the kernel size of the second convolution operation. The first scale feature and the second scale feature are subjected to nonlinear activation processing to obtain the first activation feature and the second activation feature; Perform element-wise interaction operations on the first activation feature and the second activation feature to obtain the interaction feature; The enhanced feature is obtained by fusing the interactive feature with the first activation feature and / or the second activation feature and then performing a third convolution operation.
7. The method according to claim 5, characterized in that, The temporal modeling subnetwork includes a Long Short-Term Memory (LSTM) network; the temporal modeling subnetwork performs temporal modeling on the enhanced features and outputs a temporal representation, including: The enhanced features obtained from the same water pump unit within a continuous time window signal segment are arranged in chronological order to form an enhanced feature sequence. The enhanced feature sequence is input into the Long Short-Term Memory (LSTM) network, which outputs the temporal representation.
8. The method according to claim 5, characterized in that, The training of the deep diagnostic model includes time-domain and frequency-domain-based deentanglement representation learning and momentum contrast learning, and includes: Encoding networks are constructed in the time domain and frequency domain respectively to encode the time window signal segment and output trend feature representation and periodic feature representation respectively; At least two data augmentation views are generated from signal segments within the same time window. The query encoding network and the key encoding network are encoding networks with the same structure. One data augmentation view is input into the query encoding network to obtain a query representation, and the other data augmentation view is input into the key encoding network to obtain a key representation. The parameters of the key encoding network are updated using a momentum update method, and a contrastive learning loss is calculated based on the query representation and the key representation to update the parameters of the query encoding network. After completing the momentum contrast learning, the deep diagnostic model is initialized or updated using the parameters of the encoding network, and supervised training is performed on the deep diagnostic model based on labeled samples.
9. A fault diagnosis device for a water pump unit, characterized in that, include: Acquisition module: used to acquire time-series signals collected during the operation of the water pump unit, the time-series signals including acoustic signature signals and / or vibration signals; Preprocessing module: used to preprocess the time series signal to obtain a preprocessed signal; Segmentation module: used to segment the preprocessed signal according to a preset time window to obtain multiple time window signal segments; Diagnostic module: used to input the time window signal segment into the deep diagnostic model to obtain diagnostic results, which include component category and / or health status; Output module: used to generate early warning indicators based on the preprocessed signal and / or the diagnostic results; The first determining module is used to determine the permutation entropy index and the effective value index based on the preprocessed signal, wherein the effective value index is the root mean square of the preprocessed signal within the time window signal segment; The second determining module is used to determine anomaly scores based on the preprocessed signal and / or intermediate features of the deep diagnostic model, wherein the anomaly scores are output through an isolated forest model; The third determination module is used to determine the mean and standard deviation of the abnormal score, the permutation entropy index and the effective value index based on the health baseline data, and to perform Z-score standardization on the real-time obtained abnormal score, the permutation entropy index and the effective value index to obtain a standardized early warning index. Early warning module: used to determine the threshold of the standardized early warning indicators within a sliding window range. When at least two of the standardized early warning indicators meet the corresponding threshold conditions, fault early warning information is obtained.