A fault detection method for engineering machinery based on Dirichlet distribution weight fitting
By using the Dirichlet distribution to fit the weight matrix in the fault detection of engineering machinery, overfitting of the autoencoder is avoided, and efficient fault detection of engineering machinery is achieved. This solves the problem of misjudgment in the existing technology, and the detection accuracy reaches 100%.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ZHEJIANG UNIV
- Filing Date
- 2023-09-20
- Publication Date
- 2026-06-23
AI Technical Summary
Existing technologies cannot effectively avoid overfitting in memory autoencoders during fault detection in engineering machinery, which can lead to misjudgments of abnormal samples.
During the testing phase, instead of using a decoder, the weight matrix is fitted using a Dirichlet distribution, and the anomaly score is obtained using the maximum log-likelihood. A threshold is then set to determine whether the rotating machinery has malfunctioned.
It effectively avoids overfitting of the self-encoder, and achieves accurate detection of faults in engineering machinery with a detection accuracy of 100%.
Smart Images

Figure CN117494016B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of intelligent fault diagnosis of construction machinery, specifically involving a fault detection method for construction machinery based on Dirichlet distribution weight fitting. This method fits the weight matrix in the memory autoencoder using the Dirichlet distribution. During the testing phase, it does not use the decoder and mean square error as outlier scores, thus theoretically avoiding overfitting of the autoencoder. Experiments have proven that this method can effectively detect whether construction machinery has experienced a fault. Background Technology
[0002] Mechanical systems play an indispensable role in industrialization, with rotating machinery accounting for the majority. A key component of rotating machinery is the gearbox. Due to the harsh industrial environment and its enclosed working conditions, gearbox maintenance is difficult, leading to frequent gearbox failures in rotating machinery, each potentially resulting in significant financial and productivity losses. Therefore, researching and developing data-driven methods and condition monitoring technologies to achieve rapid, reliable, and high-quality automatic diagnostics is essential.
[0003] The memory autoencoder learns the typical features of normal samples by using an added memory module. During the testing phase, the latent variables of both normal and abnormal samples can only be obtained through the typical features of normal samples. Therefore, when the decoder passes through the latent variables, it will output reconstructed data that is closer to normal samples. This ensures that abnormal data has a relatively large reconstruction error and avoids the disadvantage that the autoencoder network has too strong an expressive power and can reconstruct abnormal samples well.
[0004] However, to avoid overfitting, it is difficult to select a network that can effectively capture the latent variable features of the data without over-expressing them. Therefore, this invention proposes to skip the decoder stage and directly analyze the weight matrix obtained for each data point to obtain the anomaly score. Experiments have shown that this invention can effectively detect whether engineering machinery has malfunctioned.
[0005] This invention provides a fault detection method for engineering machinery based on Dirichlet distribution weight fitting. When rotating machinery malfunctions, vibration data that is completely different from that under normal operating conditions will be obtained. This method can be used to detect faults in rotating equipment of engineering machinery. Summary of the Invention
[0006] Based on the above-mentioned problem background, the present invention provides a fault detection method for engineering machinery based on Dirichlet distribution weight fitting. Specifically, in the testing phase, a decoder is not used. Instead, the weight matrix is fitted using the Dirichlet distribution. An anomaly score is obtained based on the maximum log-likelihood. By setting a threshold, it can be determined whether the rotating part has an anomaly.
[0007] The specific technical solution of this invention is as follows:
[0008] A fault detection method for engineering machinery based on Dirichlet distribution weight fitting, the method comprising the following steps:
[0009] S1. In the data preprocessing stage, vibration signals of engineering machinery components are collected by accelerometers. The obtained entire variable working condition data is segmented. Each segmented data is normalized and then subjected to continuous wavelet transform to obtain a coefficient matrix. The coefficient matrix is visualized to obtain a continuous wavelet transform time-frequency diagram, which is used as input data.
[0010] S2. In the network initialization phase, both the encoder and decoder adopt a two-dimensional convolutional neural network structure to initialize the network parameters and module size of the memory module. The memory module is a learnable parameterized matrix, i.e., the memory module matrix.
[0011] S3. In the network training phase, the obtained normal samples are input into the encoder to obtain latent variables. The similarity between the latent variables and the memory modules is calculated and then passed through Softmax to obtain the weight matrix. The weight matrix parameters and the memory module matrix are weighted and added together to obtain new latent variables. After passing through the decoder, the reconstructed signal is obtained. The information entropy of the weight matrix and the mean square error of the signals before and after reconstruction are used as the loss function for training.
[0012] S4. In the network testing phase, the weight matrix obtained from normal data is fitted with the Dirichlet distribution. The log-likelihood function is defined as the anomaly score. A threshold is set appropriately to detect faults in rotating machinery. Anomalies with a score greater than the threshold are considered fault signals, while those with a score less than the threshold are considered normal signals.
[0013] Furthermore, the data preprocessing process in S1 is as follows:
[0014] For each segmented fault signal x k The principle behind its max-min normalization is as follows:
[0015]
[0016] in, Input data x k The reconstructed data, mean(x) k ) represents the mean of the input signal, and max(x) represents the mean of the input signal. k ) and min(x k Let be the maximum and minimum values of the input signal, respectively. The principle of performing continuous wavelet transform on the normalized signal is as follows:
[0017]
[0018] Where s and τ represent the scale and translation parameters, respectively, t is time, ψ(*) is the wavelet basis function, and w k The obtained wavelet time-frequency two-dimensional matrix.
[0019] Furthermore, the network training process in S3 is as follows:
[0020] First, let C be the number of channels for the latent variable Z obtained by the encoder from the input signal, and let z be a denoted as z. i Let m be the latent variable vector of the i-th pixel, and let m be the size of the memory module matrix, which is N*C, where N is a given hyperparameter. The similarity between the latent variable vector of the i-th pixel and the memory module matrix is calculated as follows:
[0021] w i =[ <z i ,m1>,…, <z i ,m j >,…, <z i ,m N >] (3)
[0022] Among them, w i For the latent variable z i Similarity vectors calculated by the memory module <z i ,m j > is a latent variable z i and memory vector m j The inner product of w is the similarity. i The specific principle of performing softmax normalization is as follows:
[0023]
[0024] Among them, w ij Let be the similarity between the i-th latent variable and the j-th memory vector. The corresponding normalized weights are then weighted and accumulated with the memory modules to obtain the new latent variables. The specific steps are as follows:
[0025]
[0026] The principle behind introducing information entropy as a constraint is as follows:
[0027]
[0028] in for The information entropy, and the final loss function L are:
[0029]
[0030] Where K is the size of a batch of data, α is the loss balancing parameter, and W k and Input x k The weight matrix and reconstructed samples.
[0031] Furthermore, the specific steps in S4 for fitting the weight matrix using the Dirichlet distribution to obtain the outlier scores are as follows:
[0032] Assuming the latent variable Z obtained by the encoder is of size T*C, where T is the size of the latent variable and C is the number of channels, and the weight matrix W for each data point is of size T*N, then the anomaly score n is defined. s The maximum likelihood of the logarithm of the weight matrix:
[0033]
[0034] in, Let be the normalized weight values of the i-th latent variable vector and the memory module. Obtain the weight vector for a given sample x The conditional probability value, assuming P(W i |x)~Dir(α i ), x~P X (x), where P X (x) represents the distribution of normal data, and it is further assumed that each Dirichlet distribution in the weight matrix is independent;
[0035] The parameter α of a single Dirichlet distribution i The estimation method is as follows:
[0036] Suppose that the set of the i-th latent variable vector and the weight vector obtained by the memory module for each training sample is S. i S ij S is the latent variable vector of the j-th training sample and the weight vector obtained from the memory module. i The size is n. Iteration initial value as follows:
[0037]
[0038] Where Ψ(·) is the Digamma function, defined as follows:
[0039]
[0040] Parameter α i The iterative process is as follows, until convergence:
[0041]
[0042] The parameter estimation results are denoted as Substituting the probability values of the Dirichlet distribution into the calculation of the outlier score yields:
[0043]
[0044] By selecting an appropriate threshold, it is possible to detect whether a malfunction has occurred in rotating machinery.
[0045] Compared with the prior art, the beneficial effects of the present invention include:
[0046] 1) In the testing phase, this invention uses the Dirichlet distribution to fit the weight matrix and obtains the outlier score from the maximum log-likelihood, which effectively avoids the overfitting phenomenon that may occur in the autoencoder.
[0047] 2) This invention only requires normal vibration data for training to accurately distinguish abnormal vibration data of engineering machinery. Attached Figure Description
[0048] Figure 1 This is a flowchart of the present invention;
[0049] Figure 2 A time-frequency plot reconstructed from test data;
[0050] Figure 3 Time-frequency plot reconstructed from training data;
[0051] Figure 4 To reconstruct the loss reduction curve during training. Detailed Implementation
[0052] To more clearly and completely illustrate the technical solution of the present invention, the present invention will be further described in detail below with reference to the accompanying drawings and examples:
[0053] This example uses bearing data provided by the Case Western Reserve University (CRWU) laboratory, selecting vibration data from the drive end. Three fault sizes are defined as 7mm, and the sampling frequency is f. s The value was 12000, the motor speed was 1772 rpm, and the data length N was 4096, resulting in a total of 1000 positive samples and 200 samples for each type of fault. 800 normal samples were used as the training set, and the remaining normal and fault samples were combined to form the test set.
[0054] S1. In the data preprocessing stage, in a specific embodiment of the present invention, the principle of performing maximum-minimum normalization on the original vibration data is as follows:
[0055]
[0056] in, Input data x k The reconstructed data, mean(x) k ) represents the mean of the input signal, and max(x) represents the mean of the input signal. k ) and min(x k Let be the maximum and minimum values of the input signal, respectively. The principle of performing continuous wavelet transform on the normalized signal is as follows:
[0057]
[0058] Where s and τ represent the scale and translation parameters, respectively, t is time, Ψ(*) is the wavelet basis function, and w k The obtained wavelet time-frequency two-dimensional matrix.
[0059] First, all input data are cropped and standardized. Cropping ensures that all input data are of the same size, and standardization ensures that all input data approximately follow the same Gaussian distribution, which is beneficial for network training.
[0060] S2. In the network initialization phase, both the encoder and decoder adopt a two-dimensional convolutional neural network structure to initialize the network parameters and module size of the memory module. The memory module is a learnable parameterized matrix.
[0061] S3. In the network training phase, after setting the network parameters and the size of the memory modules, the network training process is as follows:
[0062] First, the input signal x k Let C be the number of channels of the latent variable Z obtained by the encoder, and let z be a denoted z. i Let be the latent variable vector of the i-th pixel, and let the size of the memory module matrix m be N*C, where N is a given hyperparameter. In this example, m is chosen to be 50. The similarity between the latent variable vector of the i-th pixel and the memory module matrix is calculated as follows:
[0063] w i =[ <z i ,m1>,…, <z i ,m j >,…, <z i ,m N >] (3)
[0064] Among them, w i For the latent variable z i Similarity vectors calculated by the memory module <z i ,m j > is a latent variable z i and memory vector m j The inner product of w is the similarity. i The specific principle of performing softmax normalization is as follows:
[0065]
[0066] Among them, w ij Let be the similarity between the i-th latent variable and the j-th memory vector. The corresponding normalized weights are then weighted and accumulated with the memory modules to obtain the new latent variables. The specific steps are as follows:
[0067]
[0068] To obtain sparser weight parameters, information entropy is introduced as a constraint. The specific principle is as follows:
[0069]
[0070] in for The information entropy, and the final loss function L are:
[0071]
[0072] Where K is the size of a batch of data, α is the loss balancing parameter, and in this example, α is fixed at 0.02, W k and Input x k The weight matrix and reconstructed samples.
[0073] S4. In the network testing phase, the steps for weight fitting using the Dirichlet distribution and obtaining outlier scores are as follows:
[0074] Assuming the latent variable Z obtained by the encoder is of size T*C, where T is the size of the latent variable and C is the number of channels, and the weight matrix W for each data point is of size T*N, then the anomaly score n is defined. s The maximum likelihood of the logarithm of the weight matrix:
[0075]
[0076] in, Let be the normalized weight values of the i-th latent variable vector and the memory module. Obtain the weight vector for a given sample x The conditional probability value, assuming P(W i |x)~Dir(α i ), x~P X (x), where P X (x) represents the distribution of normal data, and it is further assumed that each Dirichlet distribution in the weight matrix is independent;
[0077] The parameter α of a single Dirichlet distribution i The estimation method is as follows:
[0078] Suppose that the set of the i-th latent variable vector and the weight vector obtained by the memory module for each training sample is S. i S ij S is the latent variable vector of the j-th training sample and the weight vector obtained from the memory module. i The size is n. Iteration initial value as follows:
[0079]
[0080] Where Ψ(·) is the Digamma function, defined as follows:
[0081]
[0082] Parameter α i The iterative process is as follows, until convergence:
[0083]
[0084] The parameter estimation results are denoted as Substituting the probability values of the Dirichlet distribution into the calculation of the outlier score yields:
[0085]
[0086] Therefore, by selecting an appropriate threshold, it is possible to detect whether a rotating machine has malfunctioned. An abnormal score greater than the threshold is a fault signal, and vice versa.
[0087] To demonstrate the effectiveness of this method, AUC and f1-score were selected as evaluation metrics, and ordinary autoencoders and other anomaly detection methods were used for comparison. The experimental results are as follows:
[0088]
[0089] Experiments have shown that the fault detection accuracy of this invention is 100% on the Western Reserve University dataset, thus this invention can effectively detect whether engineering machinery has malfunctioned.
[0090] The above are preferred embodiments of the present invention. Any changes made to the technical solution of the present invention that do not exceed the scope of the technical solution of the present invention shall fall within the protection scope of the present invention.
Claims
1. A method for fault detection in engineering machinery based on Dirichlet distribution weight fitting, characterized in that, The method includes the following steps: S1. In the data preprocessing stage, vibration signals of engineering machinery components are collected by accelerometers. The obtained entire variable working condition data is segmented. Each segmented data is normalized and then subjected to continuous wavelet transform to obtain a coefficient matrix. The coefficient matrix is visualized to obtain a continuous wavelet transform time-frequency diagram, which is used as input data. S2. In the network initialization phase, both the encoder and decoder adopt a two-dimensional convolutional neural network structure to initialize the network parameters and module size of the memory module. The memory module is a learnable parameterized matrix, i.e., the memory module matrix. S3. In the network training phase, the obtained normal samples are input into the encoder to obtain latent variables. The similarity between the latent variables and the memory modules is calculated and then passed through Softmax to obtain the weight matrix. The weight matrix parameters and the memory module matrix are weighted and added together to obtain new latent variables. After passing through the decoder, the reconstructed signal is obtained. The information entropy of the weight matrix and the mean square error of the signals before and after reconstruction are used as the loss function for training. S4. In the network testing phase, the weight matrix obtained from the normal data is fitted with the Dirichlet distribution. The log-likelihood function is defined as the anomaly score. The threshold is set reasonably to detect faults in the rotating machinery. Anomaly scores greater than the threshold are considered fault signals, while those less than the threshold are considered normal signals. The specific steps for obtaining anomaly scores by fitting the weight matrix using the Dirichlet distribution are as follows: Assume the size of the latent variable Z obtained by the encoder is Where T is the size of the latent variable and C is the number of channels, the weight matrix W for each data point is of size [value missing]. Therefore, the abnormal score n is defined. s The maximum likelihood of the logarithm of the weight matrix: (8) in, Let be the normalized weight values of the i-th latent variable vector and the memory module. Obtain the weight vector for a given sample x The conditional probability value, assuming P(W i |x)~Dir(α i ), x~P X (x), where P X (x) represents the distribution of normal data, and it is further assumed that each Dirichlet distribution in the weight matrix is independent; The parameter α of a single Dirichlet distribution i The estimation method is as follows: Suppose that the set of the i-th latent variable vector and the weight vector obtained by the memory module for each training sample is S. i S ij S is the latent variable vector of the j-th training sample and the weight vector obtained from the memory module. i The size is n. , Initial value of iteration as follows: (9) in The Digamma function is defined as follows: (10) Parameter α i The iterative process is as follows, until convergence: (11) The parameter estimation results are denoted as Substituting the probability values of the Dirichlet distribution into the calculation of the outlier score yields: (12) By selecting an appropriate threshold, it is possible to detect whether a malfunction has occurred in rotating machinery.
2. The method for fault detection of engineering machinery based on Dirichlet distribution weight fitting according to claim 1, characterized in that, The data preprocessing process in S1 is as follows: For each segmented fault signal x k The principle behind its max-min normalization is as follows: (1) in, Input data x k The reconstructed data, mean(x) k ) represents the mean of the input signal, and max(x) represents the mean of the input signal. k ) and min(x k Let be the maximum and minimum values of the input signal, respectively. The principle of performing continuous wavelet transform on the normalized signal is as follows: (2) Where s and τ represent the scale and translation parameters, respectively, and t is time. Let w be the wavelet basis function. k The obtained wavelet time-frequency two-dimensional matrix.
3. The method for fault detection of engineering machinery based on Dirichlet distribution weight fitting according to claim 1, characterized in that, The network training process in S3 is as follows: First, let C be the number of channels for the latent variable Z obtained by the encoder from the input signal, and let z be a denoted as z. i Let m be the latent variable vector of the i-th pixel, and let m be the size of the memory module matrix. Where N is a given hyperparameter, the similarity between the latent variable and the memory module matrix of the i-th pixel is calculated as follows: (3) Among them, w i For the latent variable z i Similarity vectors calculated by the memory module <z i ,m j > is a latent variable z i and memory vector m j The inner product of w is the similarity. i The specific principle of performing softmax normalization is as follows: (4) Where w ij Let be the similarity between the i-th latent variable and the j-th memory vector. The corresponding normalized weights are then weighted and accumulated with the memory modules to obtain the new latent variables. The specific steps are as follows: (5) The principle behind introducing information entropy as a constraint is as follows: (6) in for The information entropy, and the final loss function L are: (7) Where K is the size of a batch of data, α is the loss balancing parameter, and W k and Input x k The weight matrix and reconstructed samples.