Equipment parts fault diagnosis method based on abnormal data processing and noise suppression

By using an adaptive weighted cross-entropy loss function and a collaborative attention module in a deep residual shrinkage network, noise and abnormal data from complex equipment are processed, improving the accuracy and automation of fault diagnosis and resolving the impact of noise and abnormal data on diagnostic results.

CN122241239APending Publication Date: 2026-06-19ZHEJIANG UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
ZHEJIANG UNIV
Filing Date
2026-05-19
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies suffer from severe noise and abnormal data interference in the data collected under the harsh working conditions of complex equipment, which leads to a decrease in the fault diagnosis accuracy of deep learning models. Furthermore, existing methods have low automation and rely on expert knowledge.

Method used

A deep residual shrinkage network with a collaborative attention module and globally parameterized ReLU is trained using an adaptive weighted cross-entropy loss function. By identifying and reconstructing the model from abnormal samples, abnormal data is filtered and processed, thereby improving the accuracy of fault diagnosis.

Benefits of technology

It significantly improves the accuracy and reliability of fault diagnosis for complex equipment components, effectively suppresses data noise interference, and enhances the automation level of fault diagnosis.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122241239A_ABST
    Figure CN122241239A_ABST
Patent Text Reader

Abstract

This application discloses a fault diagnosis method for equipment components based on anomaly data processing and noise suppression, belonging to the field of fault diagnosis technology. The method includes: acquiring a real-time sensor monitoring dataset; using a deep residual shrinking network trained with an adaptive weighted cross-entropy loss function, incorporating a collaborative attention module and globally parameterized ReLU as an anomaly sample identification model to filter real-time sensor monitoring anomaly samples; reconstructing the anomaly data using an anomaly point mask matrix combined with a self-attention-based anomaly data reconstruction model; and inputting non-anomaly samples and reconstructed samples into a fault diagnosis model trained with historical non-anomaly and reconstructed samples to obtain the corresponding diagnostic results for each sample. This method can efficiently identify and accurately reconstruct anomaly data, effectively suppress noise interference, improve the accuracy and reliability of fault diagnosis, and has a high degree of automation, requiring no reliance on expert knowledge, making it suitable for real-time fault diagnosis of complex equipment components.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of fault diagnosis technology, and in particular to a fault diagnosis method for equipment components based on abnormal data processing and noise suppression. Background Technology

[0002] With the rapid development of intelligent manufacturing technology, the stable operation of complex equipment plays a crucial role in the efficient production of various industries. However, many pieces of equipment operate under harsh conditions, making their components prone to failure. This not only leads to production interruptions and increased maintenance costs, but can also cause serious safety accidents. The rapid development of technologies such as sensing and monitoring makes it easy to collect large amounts of data, promoting the application of deep learning methods in fault diagnosis. However, the harsh working environment of complex equipment often results in a large amount of noise in the collected data. Noise interference obscures valuable fault features, reducing the feature extraction capability of deep learning models and thus affecting the results of fault diagnosis.

[0003] To address the issue of data noise, existing research includes signal processing-based methods, deep learning-based methods, and combinations of both, with deep residual shrinking networks and their improved versions being representative examples. However, due to extreme operating conditions and unexpected abnormal noise, the acquired data may contain aperiodic transient pulses. These abnormal data samples alter their distribution in the feature space, thus affecting fault diagnosis performance. Most existing methods do not consider outlier contamination in the data, making it difficult to achieve high fault diagnosis accuracy. A very small number of existing methods employ signal processing to address the abnormal data problem, but these methods have low automation levels and rely heavily on expert knowledge and human resources.

[0004] Therefore, solving the problem of noise and abnormal data affecting the accuracy of fault diagnosis in existing technologies, in order to improve the accuracy of fault diagnosis of complex equipment components, is an urgent problem to be solved in this field. Summary of the Invention

[0005] The purpose of this application is to provide a fault diagnosis method for equipment components based on abnormal data processing and noise suppression, which can effectively improve the accuracy of fault diagnosis for complex equipment components.

[0006] To achieve the above objectives, this application provides the following solution: A method for fault diagnosis of equipment components based on anomaly data processing and noise suppression includes the following steps: Obtain the real-time sensor monitoring dataset; the real-time sensor monitoring dataset includes several real-time sensor monitoring samples.

[0007] The trained anomaly identification model is used to determine whether there are any anomalies in each real-time sensor monitoring sample and to filter out the real-time sensor monitoring anomaly samples. The anomaly identification model is a deep residual shrinking network trained with a collaborative attention module and globally parameterized ReLU using an adaptive weighted cross-entropy loss function.

[0008] For any real-time sensor monitoring anomaly sample and its corresponding anomaly point mask matrix, the real-time sensor monitoring anomaly sample is reconstructed using a trained anomaly data reconstruction model to obtain a real-time sensor monitoring reconstructed sample. The anomaly data reconstruction model is a self-attention-based anomaly data reconstruction model trained using historical non-anomaly samples from sensor monitoring after random masking. The anomaly point mask matrix is ​​a mask matrix obtained by masking the identified anomaly data points.

[0009] Real-time sensor monitoring reconstructed samples and real-time sensor monitoring non-abnormal samples are input into the trained fault diagnosis model to obtain their respective fault diagnosis results. The fault diagnosis model is a deep residual shrinking network with a collaborative attention module and globally parameterized ReLU trained using historical sensor monitoring non-abnormal samples and historical sensor monitoring reconstructed samples.

[0010] According to the specific embodiments provided in this application, the following technical effects are disclosed: This application provides a fault diagnosis method for equipment components based on anomaly data processing and noise suppression. A trained anomaly sample identification model determines whether real-time sensor monitoring samples contain anomalies, filtering out abnormal samples. The adaptive weighted cross-entropy loss function used in training the anomaly sample identification model effectively addresses the class imbalance problem caused by the low proportion of anomaly samples. A collaborative attention module simultaneously captures key features in the temporal and channel dimensions of the data samples, while a globally parameterized ReLU activation function module enhances the network's nonlinear fitting ability. The combination of these three significantly improves the accuracy of anomaly sample identification. Subsequently, a trained anomaly data reconstruction model reconstructs the real-time sensor monitoring anomaly samples, obtaining reconstructed samples. The self-attention-based model structure accurately captures the temporal dependencies of the data, and the random mask training method allows the model to fully learn the distribution patterns of non-anomaly data, ensuring that the reconstructed samples conform to the temporal distribution of non-anomaly data. Finally, the reconstructed samples and non-anomaly samples are input into a trained fault diagnosis model to obtain the corresponding fault diagnosis results. The network structure of this model effectively suppresses noise interference in the data, significantly improving the accuracy and reliability of fault diagnosis for complex equipment components. Attached Figure Description

[0011] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0012] Figure 1 This is a flowchart illustrating a fault diagnosis method for equipment components based on abnormal data processing and noise suppression, provided as an embodiment of this application.

[0013] Figure 2 This is a flowchart illustrating the process of acquiring historical fault datasets and training various models in a fault diagnosis method for equipment components based on abnormal data processing and noise suppression, provided in an embodiment of this application.

[0014] Figure 3 This is a schematic diagram of the network structure of a deep residual shrinkage network with a collaborative attention module and globally parameterized ReLU in an equipment component fault diagnosis method based on abnormal data processing and noise suppression, provided in an embodiment of this application.

[0015] Figure 4 This is a schematic diagram of the network structure of residual blocks in a fault diagnosis method for equipment components based on abnormal data processing and noise suppression, provided as an embodiment of this application.

[0016] Figure 5 This is a schematic diagram of the network structure of the collaborative attention module in an equipment component fault diagnosis method based on abnormal data processing and noise suppression, provided as an embodiment of this application.

[0017] Figure 6 This is a schematic diagram of the network structure of the globally parameterized ReLU activation function module in an equipment component fault diagnosis method based on abnormal data processing and noise suppression, provided in an embodiment of this application.

[0018] Figure 7 This is a schematic diagram of the network structure of an abnormal data reconstruction model in an equipment component fault diagnosis method based on abnormal data processing and noise suppression, provided in an embodiment of this application. Detailed Implementation

[0019] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0020] With the rapid development of intelligent manufacturing technology, the stable operation of complex equipment plays a crucial role in the efficient production of various industries. However, many pieces of equipment operate under harsh conditions, making their components prone to failure. This not only leads to production interruptions and increased maintenance costs, but may also cause safety accidents in severe cases. Therefore, accurate fault diagnosis of complex equipment components can improve the operational safety of equipment, prevent major accidents, and ensure personal safety.

[0021] With the rapid development of technologies such as sensors, it is easier to collect large amounts of data, promoting the application of deep learning methods in the field of fault diagnosis, such as recurrent neural networks (RNNs), convolutional neural networks (CNNs), and autoencoders (AEs). However, the harsh working environment of complex equipment often results in a large amount of noise in the collected data. Noise interference can obscure valuable fault features, leading to a decrease in the feature extraction capability of deep learning models, thereby affecting the results of fault diagnosis.

[0022] To address the issue of data noise, existing research includes signal processing-based methods, deep learning-based methods, and combinations of both, with deep residual shrinking networks and their improved versions being representative examples. However, due to extreme operating conditions and unexpected anomalies, the acquired data may contain aperiodic transient pulses. These anomaly data samples alter their distribution in the feature space, thus affecting fault diagnosis performance. Most existing methods do not consider outlier contamination in the data, making it difficult to achieve high fault diagnosis accuracy. A very small number of existing methods employ signal processing to address the anomaly problem, but these methods have low automation and rely heavily on expert knowledge and human resources.

[0023] Therefore, in order to effectively address both noise interference and abnormal data contamination in the monitoring data of complex equipment components and improve the accuracy of fault diagnosis, this application proposes a fault diagnosis method for equipment components based on abnormal data processing and noise suppression.

[0024] To make the above-mentioned objectives, features and advantages of this application more apparent and understandable, the application will be further described in detail below with reference to the accompanying drawings and specific embodiments.

[0025] This application provides a method for diagnosing equipment component faults based on abnormal data processing and noise suppression. In one exemplary embodiment, such as... Figure 1 As shown, it includes the following steps: A1. Obtain the real-time sensor monitoring dataset; the real-time sensor monitoring dataset includes several real-time sensor monitoring samples.

[0026] A2. Using the trained anomaly identification model, determine whether there are anomalies in each real-time sensor monitoring sample, and filter out the real-time sensor monitoring anomaly samples; the anomaly identification model is a model obtained by training a deep residual shrinking network with a collaborative attention module and globally parameterized ReLU using an adaptive weighted cross-entropy loss function.

[0027] A3. For any real-time sensor monitoring abnormal sample and its corresponding abnormal point mask matrix, the real-time sensor monitoring abnormal sample is reconstructed using a trained abnormal data reconstruction model to obtain a real-time sensor monitoring reconstructed sample. The abnormal data reconstruction model is a self-attention-based abnormal data reconstruction model trained using historical sensor monitoring non-abnormal samples after random masking. The abnormal point mask matrix is ​​a mask matrix obtained by masking the identified abnormal data points.

[0028] A4. Input the real-time sensor monitoring reconstructed samples and the real-time sensor monitoring non-abnormal samples into the trained fault diagnosis model to obtain the corresponding fault diagnosis results. The fault diagnosis model is a model obtained by training a deep residual shrinking network with a collaborative attention module and globally parameterized ReLU using historical sensor monitoring non-abnormal samples and historical sensor monitoring reconstructed samples.

[0029] Specifically, before performing the application steps A1-A4 above, the method also includes the process of acquiring historical fault datasets and training each model. In a specific implementation, such as... Figure 2 As shown, the process includes the following steps: B1. Obtain the historical fault dataset, add noise to all historical sensor monitoring non-abnormal samples, and add aperiodic transient pulses to a preset percentage of historical sensor monitoring non-abnormal samples to obtain the perturbed historical fault dataset; the historical fault dataset includes several historical sensor monitoring non-abnormal samples and the anomaly judgment label and fault category label corresponding to each historical sensor monitoring non-abnormal sample; the perturbed historical fault dataset includes a preset percentage of historical sensor monitoring abnormal samples.

[0030] In this embodiment, the LW gearbox dataset was used to verify the proposed method. The test bench corresponding to the LW gearbox dataset includes components such as a single-stage reduction gearbox, a servo motor, and sensors. The test bench has three triaxial accelerometers (PCB-356A16), a sampling frequency of 5kHz, an input shaft speed of 1500 r / min, and loads of 0 Nm, 2 Nm, 4 Nm, 6 Nm, 8 Nm, and 10 Nm. Since the LW gearbox dataset test bench has three triaxial accelerometers, there are a total of nine sensing and monitoring features that can be used for fault diagnosis. Nine columns of data were used in each data file. The description of the LW gearbox dataset used is shown in Table 1.

[0031] Table 1 LW Gearbox Dataset

[0032] During data preprocessing, the sliding window length and sliding step size were set to 1024. A portion of the vibration signals was selected for sample division, and 4dB of Gaussian white noise was added to all samples to simulate noise signals.

[0033] B2. Based on the historical fault dataset after perturbation, a training set, validation set, and test set are obtained according to a preset ratio. For each operating condition, the number of samples in each category in the training set, validation set, and test set are 40, 20, and 20, respectively, resulting in a total of 1920 samples. The training set is used to train the model, the validation set is used to select the optimal model, and the test set is used to test the performance of the optimal model. In the training set, validation set, and test set, the proportion of samples containing aperiodic transient pulses is 5%, i.e., these samples are anomalous samples.

[0034] B3. Based on the training and validation sets, an adaptive weighted cross-entropy loss function is used to train a deep residual shrinking network with a collaborative attention module and globally parameterized ReLU to obtain a trained anomaly identification model. Subsequently, the performance of the trained anomaly identification model is tested and evaluated using samples from the test set to identify anomaly samples in the test set. Historical non-anomaly samples and historical anomalous samples from sensor monitoring are used as inputs to the anomaly identification model, and the output of the model is the judgment result of whether each input sample is anomaly. During model training, the loss of the identification model is calculated based on the judgment result output by the model and the anomaly label corresponding to the sample.

[0035] In one exemplary embodiment, the anomaly identification model and the fault diagnosis model have identical structural compositions; both are deep residual shrinking network structures with collaborative attention modules and globally parameterized ReLU, such as... Figure 3 As shown, each layer includes a first convolutional (Conv) layer, several residual blocks, a first batch normalization (BN) layer, a first globally parameterized ReLU (GPReLU) module, a first globally average pooling (GAP) layer, and a first fully connected (FC) layer, connected sequentially. In this embodiment, the number of residual blocks is... .

[0036] In a further expanded embodiment, such as Figure 4As shown, the residual block includes a second batch normalization layer, a second globally parameterized ReLU module, a second convolutional layer, a third batch normalization layer, a third globally parameterized ReLU module, a third convolutional layer, a second global average pooling layer, a second fully connected layer, a fourth batch normalization layer, a fourth globally parameterized ReLU module, a third fully connected layer, a first Sigmoid activation function layer, a first channel-wise multiplication module, a soft thresholding function calculation unit, a collaborative attention module, and a first element-wise addition module. The second batch normalization layer, the second globally parameterized ReLU module, the second convolutional layer, the third batch normalization layer, the third globally parameterized ReLU module, and the third convolutional layer are connected sequentially to process the input. The features are processed by taking the absolute value of all elements of the output features of the third convolutional layer and then inputting them into the second global average pooling layer. The output of the second global average pooling layer is then input into the second fully connected layer and the first channel-wise multiplication module. The output of the second fully connected layer passes through the fourth batch normalization layer, the fourth global parameterized ReLU module, the third fully connected layer, and the first Sigmoid activation function layer in sequence before being input into the first channel-wise multiplication module. The output of the first channel-wise multiplication module and the output of the third convolutional layer are used together as the input to the soft thresholding function calculation unit. The output of the collaborative attention module and the identity mapping of the input features of the residual block are input together into the first element-wise addition module for calculation.

[0037] The structure of the collaborative attention module is as follows: Figure 5 As shown, the collaborative attention module includes two parallel branches with identical structures. The first branch includes a third global average pooling layer and a first standard deviation pooling (SDP) layer connected in parallel, followed by a fourth convolutional layer, a second sigmoid activation function layer, and a second channel-wise multiplication module. The second branch includes a fourth global average pooling layer and a second standard deviation pooling layer connected in parallel, followed by a fifth convolutional layer, a third sigmoid activation function layer, and a third channel-wise multiplication module.

[0038] The first branch processes the input features of the collaborative attention module. The outputs of the third global average pooling layer and the first standard deviation pooling layer of the first branch are respectively... and ,calculate ,in, and These are learnable parameters greater than 0 and less than 1; the transposed result is input into the fourth convolutional layer, and the transposed output of the fourth convolutional layer is input into the second sigmoid activation function layer; the output of the second sigmoid activation function layer and the input features of the collaborative attention module are input into the first channel-wise multiplication module, and the output is... .

[0039] The second branch transposes the input features of the collaborative attention module before processing them. The outputs of the fourth global average pooling layer and the second standard deviation pooling layer of the second branch are respectively... and ,calculate ,in, and The learnable parameters are greater than 0 and less than 1. The calculated result is transposed and then input into the fifth convolutional layer. The output of the fifth convolutional layer is transposed and then input into the third sigmoid activation function layer. The output of the third sigmoid activation function layer and the transposed input features of the collaborative attention module are input into the third channel-wise multiplication module. Then, the output of the third channel-wise multiplication module is transposed, and the output is... The output features of the collaborative attention module are: .

[0040] The kernel size of the two convolutional layers in the collaborative attention module The definition is as follows: .

[0041] in, The kernel size of the two convolutional layers in the collaborative attention module. The length of the vector input to the convolutional layer is also equal to the number of channels in the input and output features of the collaborative attention module. It represents an odd number that is less than or equal to and closest to its value. and It's a hyperparameter.

[0042] In this embodiment, the structures of the various globally parameterized ReLU modules are identical, such as... Figure 6 As shown, one of the globally parameterized ReLU modules includes two parallel branches with identical structures. The first branch of the globally parameterized ReLU module includes a fifth global average pooling layer and a sixth global average pooling layer in parallel. The fifth global average pooling layer... As input, the sixth global average pooling layer uses For input, The input features of the globally parameterized ReLU module are concatenated (i.e., concat) the outputs of the fifth and sixth global average pooling layers and then input into the fourth fully connected layer. The fourth fully connected layer is followed by the fifth batch normalization layer, the first ReLU activation function layer, the fifth fully connected layer, the sixth batch normalization layer, and the fourth Sigmoid activation function layer in sequence. The second branch of the globally parameterized ReLU module includes two parallel global average pooling layers: the seventh and eighth. The seventh global average pooling layer... As input, the eighth global average pooling layer uses As input, the outputs of the seventh and eighth global average pooling layers are concatenated and then fed into the sixth fully connected layer. Following the sixth fully connected layer are the seventh batch normalization layer, the second ReLU activation function layer, the seventh fully connected layer, the eighth batch normalization layer, and the fifth Sigmoid activation function layer. The output features of the global parameterized ReLU module are calculated using the following formula: (The formula is not provided in the original text.) .

[0043] in, For the output characteristics of the globally parameterized ReLU module, For the input features of the globally parameterized ReLU module, and These are the output features of the fourth and fifth Sigmoid activation function layers, respectively.

[0044] The adaptive weighted cross-entropy loss function is shown in the following equation: .

[0045] in, For adaptive weighted cross-entropy loss, The batch size set during training. For the first Sample The result obtained after training the deep learning model and performing the Softmax operation. for The probability of the correct category in the given text. For the first The true label of each sample These are hyperparameters used to handle class imbalance problems. To make the model focus more on the hyperparameters of samples that are prone to misclassification, These are hyperparameters that vary with the number of iterations.

[0046] Hyperparameters , and The formulas are as follows: .

[0047] .

[0048] .

[0049] in, The total number of samples corresponding to the category with the most samples. For the first The total number of samples in each category corresponding to each sample. The first in the current batch The number of misclassified samples in the category corresponding to each sample. The first in the current batch The number of samples in each category corresponding to each sample. for The initial hyperparameters, This represents the current iteration number.

[0050] In an exemplary embodiment, the structural hyperparameters of a deep residual shrinking network with a collaborative attention module and globally parameterized ReLU are shown in Table 2. In Table 2, the first number in parentheses within the network layer names represents the number of convolutional kernels, the second number represents the kernel size, and " / 2" indicates a stride of 2. The number of neurons in the four fully connected layers of the globally parameterized ReLU activation function module is equal to the number of channels in the module's input features. The hyperparameters... , and They are equal to 2, 1.5 and 1 respectively.

[0051] Table 2. Network structure hyperparameters of the anomaly sample identification model

[0052] In this embodiment, the batch size is 32, the optimizer is Adam, the learning rate is 0.0025, the learning rate adjustment strategy is CosineAnnealingLR, and the number of iterations is 50. The model with the highest accuracy on the validation set is selected to identify anomalous samples in the test set to test the model's performance. The identified anomalous and non-anomalous samples are then used in subsequent steps. To reduce the influence of random factors, a total of 10 experiments were conducted, successfully identifying anomalous samples in the test set.

[0053] B4. After randomly masking each historical non-abnormal sample from the training and validation sets, the self-attention-based anomalous data reconstruction model is trained to obtain a trained anomalous data reconstruction model. Then, historical anomalous samples from the training, validation, and test sets are input into the trained anomalous data reconstruction model to achieve anomalous data reconstruction. The masked samples are used as input to the anomalous data reconstruction model, and the output of the model is the reconstructed sample.

[0054] Specifically in this embodiment, such as Figure 7As shown, the abnormal data reconstruction model includes an eighth fully connected layer, a ninth fully connected layer, a second element-wise addition module, a first dropout layer, several stacked self-attention modules, a tenth fully connected layer, an eleventh fully connected layer, a twelfth fully connected layer, a thirteenth fully connected layer, a third element-wise addition module, a fourteenth fully connected layer, a third ReLU activation function layer, a fifteenth fully connected layer, a sixteenth fully connected layer, a seventeenth fully connected layer, an eighteenth fully connected layer, and a sixth Sigmoid activation function layer. Among the stacked self-attention modules, except for the first self-attention module and the... In addition to the individual self-attention modules, each self-attention module uses the output of the previous self-attention module as its own input. The number of stacked self-attention modules, i.e., the total number of self-attention modules in the anomalous data reconstruction model. A self-attention module, in this embodiment .

[0055] The self-attention module comprises, in sequence, a first normalization (LN) layer, a diagonal masked multi-head attention (DiagMaskedMHA) module, a second dropout layer, a fourth element-wise addition module, a second normalization layer, a nineteenth fully connected layer, a fourth ReLU activation function layer, a twentieth fully connected layer, a third dropout layer, and a fifth element-wise addition module. The output of the second dropout layer and the input of the self-attention module are calculated in the fourth element-wise addition module. The output of the fourth element-wise addition module is input to both the second normalization layer and the fifth element-wise addition module. The output of the fifth element-wise addition module is the output of the self-attention module. The attention weights output by the self-attention module are the first... One of the outputs of the diagonal mask multi-head attention module in a self-attention module.

[0056] Input samples that need to be reconstructed With mask matrix After concatenation, the result serves as the input to the eighth fully connected layer. The output of the eighth fully connected layer is transposed and then input to the ninth fully connected layer. The output of the ninth fully connected layer and the output of the first positional encoding are then input into the second element-wise addition module for calculation. The output of the second element-wise addition module is processed by the first dropout layer and then input into the first self-attention module. The output of each self-attention module is used as the input to the tenth fully connected layer; the output of the tenth fully connected layer is transposed and then input to the eleventh fully connected layer, and the output of the eleventh fully connected layer is the feature. ; for features After replacing the data points, the samples were initially reconstructed. The specific calculation formula is as follows: ,in Hadamard product; samples after initial reconstruction and mask matrix After concatenation, the output is input into the twelfth fully connected layer. The output of the twelfth fully connected layer is transposed and then input into the thirteenth fully connected layer. The output of the thirteenth fully connected layer and the output of the second positional encoding are then input into the third element-wise addition module for calculation. The output of the third element-wise addition module serves as the... The input of the first self-attention module, the first The output of each self-attention module serves as the input to the fourteenth fully connected layer; the output of the fourteenth fully connected layer, after passing through the third ReLU activation function layer, serves as the input to the fifteenth fully connected layer; the output of the fifteenth fully connected layer, after being transposed, is input to the sixteenth fully connected layer, and the output of the sixteenth fully connected layer is the feature... ; for the first The attention weights output by each self-attention module are averaged and transposed according to the following formula, and then used as the input to the seventeenth fully connected layer: .

[0057] in, In this embodiment, the number of heads in the self-attention module is... , For the first Attention weight corresponding to size This represents the attention weights after averaging.

[0058] Output and mask matrix of the seventeenth fully connected layer After concatenation, the weights are transposed and then sequentially fed into the 18th fully connected layer and the 6th Sigmoid activation function layer. The output of the 6th Sigmoid activation function layer is transposed to obtain the combined weights. Based on combined weights ,feature and characteristics By performing weighted combinations, we obtain the features. The formula is ; for features After replacing the data points, the final reconstructed sample is obtained. The formula is .

[0059] Input Sample and mask matrix The definition is as follows: For the training and validation sets used to train the self-attention-based outlier data reconstruction model, all input samples are randomly masked.

[0060] For anomalous samples that need reconstruction, the Z-score method is used to initially identify anomalous data points in each feature channel of the sample. All data points between the first and last anomalous data points in each feature channel of the sample are considered anomalous data points (including the first and last anomalous data points), and they are all masked. The formula for the Z-score method is as follows: .

[0061] in, For the current feature channel in the sample Data points at each time step It is the average value of all data points in the current feature channel of the sample. The standard deviation of all data points in the current feature channel in the sample. The result of Z-score standardization; when At that time, data points It was considered an outlier data point.

[0062] Mask matrix The elements in the code are defined as follows: .

[0063] in, For input samples The In the feature channel, the first Data points at each time step These are the elements of the mask matrix corresponding to the sample.

[0064] In this embodiment, the loss function for abnormal data reconstruction when training the abnormal data reconstruction model is shown in the following equation: .

[0065] in, Loss due to reconstruction of anomalous data Let the mean absolute error loss function be . q As a hyperparameter, in this embodiment, .

[0066] The mean absolute error loss function is shown in the following equation: .

[0067] in, , and All Input, for The elements in This represents the number of feature channels in a sample. Indicates the length of the sample in the time dimension, subscript and These represent the element positions in the feature dimension and the time dimension, respectively.

[0068] In this embodiment, the batch size is 32, the optimizer is Adam, the learning rate is 0.005, the learning rate adjustment strategy is CosineAnnealingLR, and the number of iterations is 60. To reduce the influence of random factors, a total of 10 experiments were conducted. The trained outlier reconstruction model can reconstruct outliers from anomalous samples.

[0069] B5. Using the trained anomaly data reconstruction model, reconstruct the anomaly data for each historical sensor monitoring anomaly sample in the training and validation sets, obtaining their respective corresponding historical sensor monitoring reconstructed samples. Simultaneously, use the trained anomaly data reconstruction model to reconstruct the anomaly data for each historical sensor monitoring anomaly sample in the test set. Use the reconstructed samples for subsequent steps.

[0070] B6. Based on historical sensor monitoring non-anomaly samples and reconstructed historical sensor monitoring samples in the training and validation sets, train a deep residual shrinking network with a collaborative attention module and globally parameterized ReLU to obtain a trained fault diagnosis model. Input the samples from the test set into the trained fault diagnosis model to obtain fault diagnosis results. Use historical sensor monitoring non-anomaly samples and reconstructed historical sensor monitoring samples as inputs to the fault diagnosis model, and the output of the fault diagnosis model is the fault type corresponding to each input sample. During model training, calculate the loss of the diagnostic model based on the judgment results output by the diagnostic model and the fault category label corresponding to the sample.

[0071] Specifically, in this embodiment, the structural hyperparameters of the deep residual shrinkage network with collaborative attention module and global parameterized ReLU used in the fault diagnosis model are shown in Table 3. The number of neurons in the four fully connected layers of the global parameterized ReLU activation function module is equal to the number of channels of the module's input features.

[0072] Table 3 Network structure hyperparameters of the fault diagnosis model

[0073] In this embodiment, the batch size was 32, the optimizer was Adam, the learning rate was 0.002, the learning rate adjustment strategy was Cosine AnnealingLR, the loss function was cross-entropy loss, and the number of iterations was 100. The model with the highest accuracy on the validation set was selected for subsequent fault diagnosis performance testing and evaluation. To reduce the influence of random factors, a total of 10 experiments were conducted, and the obtained fault diagnosis accuracy was 94.67% ± 1.00%.

[0074] The above embodiments represent an optimization result of the present invention on the LW gearbox dataset, but the specific implementation of the present invention is not limited to the above embodiments. The descriptions of the above embodiments are only for the purpose of helping to understand the method and core ideas of the present invention; at the same time, for those skilled in the art, there will be changes in specific implementation methods and application scope based on the ideas of the present invention. In addition, those skilled in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium, and when the computer program is executed, it can include the processes of the embodiments of the above methods.

[0075] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

[0076] This document uses specific examples to illustrate the principles and implementation methods of this application. The descriptions of the above embodiments are only for the purpose of helping to understand the methods and core ideas of this application. Furthermore, those skilled in the art will recognize that, based on the ideas of this application, there will be changes in the specific implementation methods and application scope. Therefore, the content of this specification should not be construed as a limitation of this application.

Claims

1. A method for fault diagnosis of equipment components based on abnormal data processing and noise suppression, characterized in that, include: Obtain a real-time sensing monitoring dataset; the real-time sensing monitoring dataset includes several real-time sensing monitoring samples; The trained anomaly identification model is used to determine whether any of the real-time sensing and monitoring samples are abnormal, and to filter out the real-time sensing and monitoring abnormal samples. The anomaly identification model is a model obtained by training a deep residual shrinking network with a collaborative attention module and globally parameterized ReLU using an adaptive weighted cross-entropy loss function. For any real-time sensor monitoring anomaly sample and its corresponding anomaly point mask matrix, the real-time sensor monitoring anomaly sample is reconstructed using a trained anomaly data reconstruction model to obtain a real-time sensor monitoring reconstructed sample; the anomaly data reconstruction model is a self-attention-based anomaly data reconstruction model trained using historical non-anomaly sensor monitoring samples with random masking; the anomaly point mask matrix is ​​a mask matrix obtained by masking the identified anomaly data points. Real-time sensor monitoring reconstructed samples and real-time sensor monitoring non-abnormal samples are input into the trained fault diagnosis model to obtain their respective fault diagnosis results; the fault diagnosis model is a model obtained by training a deep residual shrinking network with a collaborative attention module and globally parameterized ReLU using historical sensor monitoring non-abnormal samples and historical sensor monitoring reconstructed samples.

2. The equipment component fault diagnosis method based on abnormal data processing and noise suppression according to claim 1, characterized in that, Also includes: A historical fault dataset is obtained, and the samples in the historical fault dataset are perturbed according to a preset rule to obtain a perturbed historical fault dataset. The historical fault dataset includes several historical non-abnormal samples of sensor monitoring and anomaly judgment label and fault category label corresponding to each historical non-abnormal sample of sensor monitoring. The perturbed historical fault dataset includes a preset proportion of historical abnormal samples of sensor monitoring. Based on the perturbed historical fault dataset, a training set, a validation set, and a test set are obtained according to a preset ratio; the training set is used to train the model, the validation set is used to select the optimal model, and the test set is used to test the performance of the optimal model. Based on the training and validation sets, an adaptive weighted cross-entropy loss function is used to train a deep residual shrinking network with a collaborative attention module and globally parameterized ReLU to obtain a trained anomaly identification model. Historical non-abnormal samples and historical abnormal samples from sensor monitoring are used as inputs to the anomaly identification model, and the output of the anomaly identification model is the judgment result of whether each input sample is abnormal. After randomly masking each historical non-abnormal sample from the training and validation sets, the self-attention-based anomalous data reconstruction model is trained to obtain the trained anomalous data reconstruction model. The masked sample is used as the input to the anomalous data reconstruction model, and the output of the anomalous data reconstruction model is the reconstructed sample. Using the trained abnormal data reconstruction model, abnormal data reconstruction is performed on each historical sensor monitoring abnormal sample in the training set and the validation set to obtain their respective historical sensor monitoring reconstruction samples. Based on historical sensor monitoring non-abnormal samples and historical sensor monitoring reconstructed samples in the training and validation sets, a deep residual shrinking network with a collaborative attention module and globally parameterized ReLU is trained to obtain a trained fault diagnosis model. The historical sensor monitoring non-abnormal samples and historical sensor monitoring reconstructed samples are used as inputs to the fault diagnosis model, and the output of the fault diagnosis model is the fault type corresponding to each input sample.

3. The equipment component fault diagnosis method based on abnormal data processing and noise suppression according to claim 1, characterized in that, The abnormal sample identification model and the fault diagnosis model have the same structure, both including a first convolutional layer, several residual blocks, a first batch of normalization layers, a first global parameterized ReLU module, a first global average pooling layer, and a first fully connected layer connected in sequence.

4. The equipment component fault diagnosis method based on abnormal data processing and noise suppression according to claim 3, characterized in that, The residual block includes a second batch normalization layer, a second global parameterized ReLU module, a second convolutional layer, a third batch normalization layer, a third global parameterized ReLU module, a third convolutional layer, a second global average pooling layer, a second fully connected layer, a fourth batch normalization layer, a fourth global parameterized ReLU module, a third fully connected layer, a first Sigmoid activation function layer, a first channel-wise multiplication module, a soft thresholding function calculation unit, a collaborative attention module, and a first element-wise addition module; the second batch normalization layer, the second global parameterized ReLU module, the second convolutional layer, the third batch normalization layer, the third global parameterized ReLU module ... The parameterized ReLU module and the third convolutional layer are connected sequentially to process the input features. The absolute values ​​of all elements of the output features of the third convolutional layer are then input into the second global average pooling layer. The output of the second global average pooling layer is input into the second fully connected layer and the first channel-wise multiplication module. The output of the second fully connected layer passes sequentially through the fourth batch normalization layer, the fourth global parameterized ReLU module, the third fully connected layer, and the first Sigmoid activation function layer before being input into the first channel-wise multiplication module. The output of the first channel-wise multiplication module and the output of the third convolutional layer serve as the input to the soft thresholding function calculation unit. The output of the collaborative attention module and the identity mapping of the input features of the residual block are jointly input into the first element-wise addition module for calculation.

5. The equipment component fault diagnosis method based on abnormal data processing and noise suppression according to claim 4, characterized in that, The collaborative attention module includes two parallel branches with identical structures. The first branch includes a third global average pooling layer and a first standard deviation pooling layer, followed by a fourth convolutional layer, a second sigmoid activation function layer, and a second channel-wise multiplication module connected in sequence. The second branch includes a fourth global average pooling layer and a second standard deviation pooling layer, followed by a fifth convolutional layer, a third sigmoid activation function layer, and a third channel-wise multiplication module connected in sequence. The first branch processes the input features of the collaborative attention module, while the second branch transposes the input features of the collaborative attention module before processing them. The output features of the collaborative attention module are calculated based on the transpose of the output of the second branch and the output of the first branch.

6. The equipment component fault diagnosis method based on abnormal data processing and noise suppression according to claim 4, characterized in that, The first, second, third, and fourth globally parameterized ReLU modules have identical structures. The first globally parameterized ReLU module includes two parallel branches with identical structures. The first branch of the first globally parameterized ReLU module includes a fifth and a sixth globally average pooling layer, with the fifth globally average pooling layer... As input, the sixth global average pooling layer uses For input, The input features of the first globally parameterized ReLU module are used as input features. The outputs of the fifth and sixth globally average pooling layers are concatenated and then input into the fourth fully connected layer. After the fourth fully connected layer, the fifth batch normalization layer, the first ReLU activation function layer, the fifth fully connected layer, the sixth batch normalization layer, and the fourth Sigmoid activation function layer are connected in sequence. The second branch of the first globally parameterized ReLU module includes a seventh and an eighth globally average pooling layer in parallel. The seventh globally average pooling layer is... As input, the eighth global average pooling layer uses As input, the outputs of the seventh and eighth global average pooling layers are concatenated and then input into the sixth fully connected layer. The sixth fully connected layer is then connected sequentially to the seventh batch normalization layer, the second ReLU activation function layer, the seventh fully connected layer, the eighth batch normalization layer, and the fifth Sigmoid activation function layer. Based on the outputs of the fourth and fifth Sigmoid activation function layers and the input features of the first global parameterized ReLU module, the output features of the first global parameterized ReLU module are calculated.

7. The equipment component fault diagnosis method based on abnormal data processing and noise suppression according to claim 1, characterized in that, The abnormal data reconstruction model includes an eighth fully connected layer, a ninth fully connected layer, a second element-wise addition module, a first dropout layer, several stacked self-attention modules, a tenth fully connected layer, an eleventh fully connected layer, a twelfth fully connected layer, a thirteenth fully connected layer, a third element-wise addition module, a fourteenth fully connected layer, a third ReLU activation function layer, a fifteenth fully connected layer, a sixteenth fully connected layer, a seventeenth fully connected layer, an eighteenth fully connected layer, and a sixth Sigmoid activation function layer; among the stacked self-attention modules, except for the first self-attention module and the... In addition to the individual self-attention modules, each self-attention module uses the output of the previous self-attention module as its own input. The number of stacked self-attention modules; the number of input samples to be reconstructed. With mask matrix After concatenation, the result serves as the input to the eighth fully connected layer. The output of the eighth fully connected layer is transposed and then input to the ninth fully connected layer. The output of the ninth fully connected layer and the output of the first positional encoding are then input into the second element-wise addition module for calculation. The output of the second element-wise addition module is processed by the first dropout layer and then input into the first self-attention module. The output of each self-attention module is used as the input to the tenth fully connected layer; the output of the tenth fully connected layer is transposed and then input to the eleventh fully connected layer, and the output of the eleventh fully connected layer is the feature. , for features After replacing the data points, the samples were initially reconstructed. The samples after preliminary reconstruction and mask matrix After concatenation, the output is input into the twelfth fully connected layer. The output of the twelfth fully connected layer is transposed and then input into the thirteenth fully connected layer. The output of the thirteenth fully connected layer and the output of the second positional encoding are then input into the third element-wise addition module for calculation. The output of the third element-wise addition module serves as the... The input of the first self-attention module, the first The output of each self-attention module serves as the input to the fourteenth fully connected layer; the output of the fourteenth fully connected layer, after passing through the third ReLU activation function layer, serves as the input to the fifteenth fully connected layer; the output of the fifteenth fully connected layer, after being transposed, is input to the sixteenth fully connected layer, and the output of the sixteenth fully connected layer is the feature... ; for the first The attention weights output by each self-attention module are averaged and transposed, then used as the input to the seventeenth fully connected layer; the output of the seventeenth fully connected layer and the mask matrix... After concatenation, the result is transposed and then sequentially fed into the 18th fully connected layer and the 6th Sigmoid activation function layer. The output of the 6th Sigmoid activation function layer is transposed to obtain the combined weights. Based on the combined weights and features... and characteristics By performing weighted combinations, we obtain the features. , for features After replacing the data points, the final reconstructed sample is obtained. .

8. The equipment component fault diagnosis method based on abnormal data processing and noise suppression according to claim 7, characterized in that, The loss function for outlier reconstruction when training an outlier reconstruction model is shown in the following equation: ; in, Loss due to reconstruction of anomalous data Let the mean absolute error loss function be . For hyperparameters; The mean absolute error loss function is shown in the following equation: ; in, , and All Input, for The elements in This represents the number of feature channels in a sample. Indicates the length of the sample in the time dimension, subscript and These represent the element positions in the feature dimension and the time dimension, respectively.

9. The equipment component fault diagnosis method based on abnormal data processing and noise suppression according to claim 7, characterized in that, The self-attention module comprises, in sequence, a first normalization layer, a diagonal mask multi-head attention module, a second dropout layer, a fourth element-wise addition module, a second normalization layer, a nineteenth fully connected layer, a fourth ReLU activation function layer, a twentieth fully connected layer, a third dropout layer, and a fifth element-wise addition module. The output of the second dropout layer and the input of the self-attention module are calculated in the fourth element-wise addition module. The output of the fourth element-wise addition module is input to both the second normalization layer and the fifth element-wise addition module. The output of the fifth element-wise addition module is the output of the self-attention module. The attention weights output by the self-attention module are the first... One of the outputs of the diagonal mask multi-head attention module in a self-attention module.

10. The equipment component fault diagnosis method based on abnormal data processing and noise suppression according to claim 1, characterized in that, The adaptive weighted cross-entropy loss function is shown in the following equation: ; in, For adaptive weighted cross-entropy loss, The batch size set during training. For the first Sample The result obtained after training the deep learning model and performing the Softmax operation. for The probability of the correct category in the given text. For the first The true label of each sample These are hyperparameters used to handle class imbalance problems. To make the model focus more on the hyperparameters of samples that are prone to misclassification, These are hyperparameters that vary with the number of iterations. , and The formula is as follows: ; ; ; in, The total number of samples corresponding to the category with the most samples. For the first The total number of samples in each category corresponding to each sample. The first in the current batch The number of misclassified samples in the category corresponding to each sample. The first in the current batch The number of samples in each category corresponding to each sample. for The initial hyperparameters, This represents the current iteration number.