Explainable, minimally simplified intelligent data augmentation methods, systems, and media applicable to fault diagnosis of high-end equipment
By constructing an interpretable encoder and decoder based on a two-layer residual stacking structure, and combining it with the Transformer module to generate high-quality simulated vibration signals, the reliability problem of fault diagnosis under small sample size and class imbalance in high-end equipment was solved, achieving efficient and interpretable data augmentation and fault diagnosis.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- NANJING TECH UNIV
- Filing Date
- 2026-04-13
- Publication Date
- 2026-06-30
Smart Images

Figure CN122020005B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of high-end equipment data augmentation and intelligent diagnostic technology, specifically to an interpretable, minimally simplified intelligent data augmentation method, system, and medium applicable to fault diagnosis of high-end equipment. Background Technology
[0002] Vibration signal-based intelligent fault diagnosis methods are widely used in high-end equipment. In actual industrial scenarios, high-end equipment is mostly in a healthy operating state, making it difficult to continuously acquire and systematically accumulate fault samples. At the same time, the equipment experiences complex operating conditions such as multiple loads, multiple speeds, and noise disturbances, making it difficult for minority class samples to cover the true distribution. Training datasets generally exhibit small sample sizes and class imbalance, which in turn leads to unstable discrimination boundaries for the minority class in diagnostic models, increased misclassification rates, and affects diagnostic reliability.
[0003] To address the problem of imbalanced data, existing technologies typically expand the minority class by generating simulated fault samples and constructing a balanced dataset. Patent CN112649198B proposes a fault sample generation approach based on generative adversarial networks (GANs), which uses conditional generation to target minority class samples, improving diagnostic training under imbalanced data conditions. Patent CN119848668A proposes a sample augmentation / diagnosis scheme based on a denoising diffusion probability model, enhancing samples through a diffusion generation mechanism to improve diagnostic performance. To improve the interpretability of the generation process, patent CN120296522A proposes a scheme that integrates mechanism modeling and interpretable generation constraints, attempting to improve the consistency and interpretable clues of generated samples by introducing mechanism and constraint terms.
[0004] However, the existing solutions mentioned above still have shortcomings under the complex operating conditions of high-end equipment: on the one hand, the solutions based on generative adversarial networks are usually sensitive to the training configuration under small sample and strong noise conditions, and the generation stability and diversity are easily affected; on the other hand, the solutions based on denoising diffusion models generally involve multi-step backsampling processes, which have large sampling overhead, and it is difficult to achieve class consistency and controllable generation under multi-class conditions; at the same time, existing "interpretable generation" related solutions mostly focus on providing interpretive clues through network constraints or loss constraints, and it is still difficult to form a low-dimensional, readable, parameterized and traceable generative representation for vibration signals.
[0005] Furthermore, from an engineering deployment perspective, intelligent diagnosis of high-end equipment often faces constraints such as limited computing power, short iteration cycles, and real-time requirements in the field. Data augmentation modules also need to be easy to train, reproduce, and deploy. Existing generation methods often introduce complex generative adversarial structures or multi-stage denoising and sampling processes, leading to unstable training processes, lengthy sampling chains, hyperparameter sensitivity, or high inference overhead, which hinders rapid deployment in practical diagnostic systems. Therefore, while ensuring generation quality and category consistency, constructing a simplified data generation architecture with a simpler structure, more stable training, and more efficient sampling is also of great significance. Summary of the Invention
[0006] The purpose of this invention is to provide an interpretable and extremely simplified intelligent data augmentation method, system, and medium suitable for fault diagnosis of high-end equipment, so as to solve the problem of poor reliability of fault diagnosis models caused by existing intelligent data augmentation methods due to uninterpretable generation process, unstable training, and low sampling efficiency under conditions of small sample size, class imbalance, and limited engineering computing power.
[0007] To achieve the above objectives, the present invention adopts the following technical solution:
[0008] An interpretable, minimally simplified intelligent data augmentation method applicable to fault diagnosis of high-end equipment includes the following steps:
[0009] S1: Obtain vibration signal data of different types of bearings used in high-end equipment under different loads or speeds and corresponding to different fault types; perform sample segmentation and standardization on the vibration signal data and divide it into training set and test set;
[0010] S2: Construct an interpretable encoder and interpretable decoder based on a two-layer residual stacking structure; train the interpretable encoder and interpretable decoder using a joint objective function based on reconstruction error and sparse regularization; after training, freeze the parameters of the interpretable encoder and interpretable decoder, and map the vibration signal data in the training set into a low-dimensional, readable and parameterized set of interpretable variables through the interpretable encoder.
[0011] S3: Construct a minimally simplistic data generation model and train it using a set of interpretable variables; use the trained minimally simplistic data generation model to sample and generate new interpretable variables for a small number of fault types;
[0012] S4: Sample new interpretable variables generated for a few fault types and input them into the frozen parameters of the interpretable decoder for back translation to obtain the simulated vibration signal in the time domain; mix the simulated vibration signal with the vibration signal data samples in the training set to construct a balanced training dataset;
[0013] S5: Train the classifier on the balanced training dataset, and use the trained classifier to diagnose faults on the test set, and output the diagnosis results.
[0014] To optimize the above technical solution, the specific limitations also include:
[0015] The interpretable encoder is a dual-head encoder structure that outputs the contribution coefficients of each physical component in the vibration signal and the corresponding physical parameter variables.
[0016] Further, in step S2, the double-layer residual stacking structure represents the vibration signal as a two-level fixed residual mechanism consisting of a steady-state background component and an impact transient component; the first level fits and reconstructs the background steady-state component in the vibration signal based on a first function package, and the second level fits and reconstructs the impact transient component in the residual after the first level reconstruction based on a second function package; the final residual statistic after the second level reconstruction characterizes the noise intensity in the signal.
[0017] Preferably, the explanatory variables are composed of the contribution coefficients and physical parameters of the background steady-state components, the contribution coefficients and physical parameters of the impact transient components, and the statistics of noise intensity.
[0018] Furthermore, in step S2, the joint objective function expression based on reconstruction error and sparse regularization is specifically as follows:
[0019] ;
[0020] in, Let x be the original vibration signal, and let x be the joint objective function. To reconstruct the signal, The contribution coefficient of the components. This is the regularization coefficient.
[0021] Furthermore, in step S3, the extremely simplified data generation model is a conditional generation backbone network constructed using the Transformer module.
[0022] Furthermore, the training of the simplified data generation model specifically includes adding noise to the interpretable variables to construct noisy inputs. The network takes the noisy variables, time steps, and category conditions as inputs, adopts a training method aimed at predicting clean variables, and constructs supervision signals and loss functions in the form of flow rates to optimize the parameters of the backbone network.
[0023] Preferably, the training set is constructed as an imbalanced dataset, including healthy state samples and faulty state samples, with the number of healthy state samples exceeding the number of faulty state samples; the balanced training dataset has a balanced number of healthy state samples and faulty state samples.
[0024] This invention also proposes a system for an interpretable, minimally simplified intelligent data augmentation method suitable for fault diagnosis of high-end equipment, comprising:
[0025] The data acquisition and preprocessing module is used to acquire vibration signal data of various types of bearings used in high-end equipment under different loads and speeds, corresponding to different fault types; the vibration signal data is sampled and standardized and divided into training and test sets.
[0026] An interpretable space building module is used to construct interpretable encoders and interpretable decoders based on a two-layer residual stacking structure; the interpretable encoders and interpretable decoders are trained by a joint objective function based on reconstruction error and sparse regularization; after training, the parameters of the interpretable encoders and interpretable decoders are frozen, and the vibration signal data in the training set are mapped into a set of low-dimensional, readable and parameterized interpretable variables through the interpretable encoder;
[0027] The simplified data generation model building module is used to build a simplified data generation model and train the simplified data generation model using a set of interpretable variables; the trained simplified data generation model is used to sample and generate new interpretable variables for a small number of fault types.
[0028] The balanced training dataset construction module is used to back-translate the new interpretable variables generated by sampling for a few fault types into the frozen parameters of the interpretable decoder to obtain the simulated vibration signal in the time domain; the simulated vibration signal is then mixed with the vibration signal data samples in the training set to construct the balanced training dataset.
[0029] The fault diagnosis module is used to train a classifier on the balanced training dataset, and to use the trained classifier to diagnose faults on the test set, and output the diagnosis results.
[0030] The present invention also proposes a computer-readable storage medium storing a computer program that enables a computer to execute the interpretable, minimally simplified intelligent data augmentation method for fault diagnosis of high-end equipment as described above.
[0031] Compared with the prior art, the beneficial effects of the present invention are:
[0032] The present invention provides an interpretable, minimally simplified intelligent data augmentation method for fault diagnosis of high-end equipment. By constructing an interpretable encoder and decoder based on physical mechanisms, the vibration signals corresponding to different fault types of high-end equipment are mapped into low-dimensional, readable, and parameterized interpretable variables. The interpretable encoder and decoder are trained using a joint objective function of reconstruction error and sparse regularization, so that the differences between different fault categories and signal characteristics are clearly characterized at the variable level. At the same time, the samples are conditionally generated in the interpretable variable space and can be translated back into time-domain signals by the decoder with frozen parameters. The corresponding physical components and specific parameters can be traced dimension by dimension, which enhances the interpretability, traceability, and controllability of the data augmentation process.
[0033] This invention also avoids the adversarial training of complex models by constructing a simplified data generation model and training it by directly predicting clean variables with noisy inputs, making the model training process more stable and with better convergence. At the same time, the sampling stage uses numerical updates on discrete-time grids to reduce sampling computation overhead, improve generation efficiency, and facilitate engineering deployment.
[0034] Furthermore, this invention corrects the problem of imbalanced sample numbers in the training data by mixing the generated, high-quality simulated samples with the original imbalanced samples to construct a balanced training dataset. This effectively alleviates the problems of blurred model discrimination boundaries and insufficient learning of minority class features due to the scarcity of minority class samples, thereby improving the stability and effectiveness of high-end equipment fault diagnosis models under complex working conditions. Attached Figure Description
[0035] Figure 1 : A flowchart illustrating the interpretable, minimally simplified intelligent data augmentation method applicable to fault diagnosis of high-end equipment according to the present invention.
[0036] Figure 2 The present invention provides a three-dimensional visualization of the process of an interpretable, extremely simplified intelligent data augmentation method applicable to fault diagnosis of high-end equipment.
[0037] Figure 3 : A modular structure diagram of the double-layer residual stacking structure of the present invention.
[0038] Figure 4 : A schematic diagram of the construction of the simplified data generation model of the present invention, wherein (a) is the simplified generation backbone network, (b) is the module structure of Transformer, and (c) is the scaling and bias parameters of the modulation normalization layer.
[0039] Figure 5 This invention generates time-domain and frequency-domain waveforms of simulated fault signals and original signals under three datasets.
[0040] Figure 6 The fault diagnosis results of the method of the present invention and two comparative models are presented under five imbalanced tasks in dataset 1, where M1, M2 and M3 represent comparative model 1, comparative model 2 and the method of the present invention, respectively.
[0041] Figure 7 The fault diagnosis results of the method of the present invention and two comparative models are presented in dataset 2 under five imbalanced tasks, where M1, M2 and M3 represent comparative model 1, comparative model 2 and the method of the present invention, respectively.
[0042] Figure 8 The fault diagnosis results of the method of the present invention and two comparative models are presented in dataset 3 under five imbalanced tasks, where M1, M2 and M3 represent comparative model 1, comparative model 2 and the method of the present invention, respectively.
[0043] Figure 9 Confusion matrix of fault diagnosis results of comparison model 1, comparison model 2 and the method of the present invention in dataset 1, table 4 and task 5, where (a), (b) and (c) represent comparison model 1, comparison model 2 and the method of the present invention, respectively.
[0044] Figure 10 Confusion matrix of fault diagnosis results of comparison model 1, comparison model 2 and the method of the present invention in dataset 2, table 4 and task 5, where (a), (b) and (c) represent comparison model 1, comparison model 2 and the method of the present invention, respectively.
[0045] Figure 11 Confusion matrix of fault diagnosis results of comparison model 1, comparison model 2 and the method of the present invention in dataset 3, table 4 and task 5, where (a), (b) and (c) represent comparison model 1, comparison model 2 and the method of the present invention, respectively. Detailed Implementation
[0046] The present invention will be further described in detail below through specific embodiments, but it should not be construed as limiting the scope of the subject matter of the present invention to the following embodiments. All technologies implemented based on the above content of the present invention fall within the scope of the present invention.
[0047] In one embodiment, this invention proposes an interpretable, minimally simplified intelligent data augmentation method suitable for fault diagnosis of high-end equipment, the flowchart of which is shown below. Figure 1 and Figure 2 As shown, the entire method includes the following steps:
[0048] S1: Obtain vibration signal data of different types of bearings used in high-end equipment under different loads or speeds and corresponding to different fault types; perform sample segmentation and standardization on the vibration signal data and divide it into training set and test set;
[0049] S2: Construct an interpretable encoder and interpretable decoder based on a two-layer residual stacking structure; train the interpretable encoder and interpretable decoder using a joint objective function based on reconstruction error and sparse regularization; after training, freeze the parameters of the interpretable encoder and interpretable decoder, and map the vibration signal data in the training set into a low-dimensional, readable and parameterized set of interpretable variables through the interpretable encoder.
[0050] S3: Construct a minimally simplistic data generation model and train it using a set of interpretable variables; use the trained minimally simplistic data generation model to sample and generate new interpretable variables for a small number of fault types;
[0051] S4: Sample new interpretable variables generated for a few fault types and input them into the frozen parameters of the interpretable decoder for back translation to obtain the simulated vibration signal in the time domain; mix the simulated vibration signal with the vibration signal data samples in the training set to construct a balanced training dataset;
[0052] S5: Train the classifier on the balanced training dataset, and use the trained classifier to diagnose faults on the test set, and output the diagnosis results.
[0053] In step S1, data acquisition specifically involves collecting vibration signal data of different types of bearings used in high-end equipment under different speeds or load conditions. The bearings include, but are not limited to, small bearings, medium bearings, and extra-large slewing bearings. The collected data covers different fault types and their corresponding operating conditions, and each segment of vibration signal is labeled with a corresponding fault category label to form an original set of fault vibration signals.
[0054] The continuous vibration signal is segmented into samples of fixed length, abnormal segments are removed, and the samples are standardized or normalized. The samples are then divided into training set and test set. The training set usually exhibits an imbalanced distribution, with the number of healthy state samples significantly exceeding the number of samples in each fault state. This training set is used for subsequent interpretable space construction and data generation modeling. The test set is used for final fault diagnosis verification.
[0055] In step S2, the processed vibration data is mapped into low-dimensional, readable, and parameterized interpretable variables to characterize class differences and noise characteristics.
[0056] First, an interpretable encoder and an interpretable decoder are constructed. The interpretable encoder is a dual-head encoder structure that outputs the contribution coefficients of each physical component in the vibration signal and the corresponding physical parameter variables.
[0057] Let the input original vibration segment be... The component contribution coefficients are output from the weight header and constrained by the hyperbolic tangent function to be continuously differentiable signed weights, expressed as:
[0058] (1);
[0059] in, The contribution coefficients of each physical component. For the neural network mapping function of the weight head, For vibration segment x to pass through a shared backbone network The high-level feature embedding representation obtained afterwards.
[0060] The normalized parameter is output from the parameter header, and its expression is:
[0061] (2);
[0062] in, These are the normalized physical parameters. The total dimension of the physical parameters, This is the neural network mapping function for the parameter header.
[0063] Projecting it onto a preset range of physical parameters, the expression is:
[0064] (3);
[0065] in, This refers to the variable obtained by linearly projecting the normalized parameter u onto a preset range of actual physical parameters. and These are the preset minimum and maximum values for each physical parameter.
[0066] Obtain the core variables of the interpretable encoder output. and .
[0067] The double-layer residual stacked structure represents the vibration signal as a two-level fixed residual mechanism consisting of steady-state background components and impact transient components, such as... Figure 3 As shown; where the first level fits and reconstructs the background steady-state component in the vibration signal based on the first function package, the expression is:
[0068] , (4);
[0069] in, This is the reconstructed signal of the background steady-state components; These are preset basis functions used to simulate the physical components of vibration signals. These are the physical parameters corresponding to the basis functions; This is the contribution coefficient of the nth basis function in the first level; This is the residual signal after the first-level reconstruction; t represents the original vibration segment; t is the time step.
[0070] The second level fits and reconstructs the impact transient components in the residuals after the first level reconstruction based on the second function package. Further reconstructing the shock-related components, the expression is:
[0071] (5);
[0072] in, For the reconstruction signal of the transient components of the impact, This represents the contribution coefficient of the nth basis function in the second level. This is the final residual signal after the second-level reconstruction.
[0073] The final expression for reconstructing the original vibration signal is:
[0074] (6);
[0075] The noise intensity is characterized by the statistics of the second-order residuals, expressed as:
[0076] (7);
[0077] in, Let be the statistical vector of noise intensity. To calculate the standard deviation of the final residuals, This is the final residual signal statistic.
[0078] Finally, we construct interpretable variables, with the following specific expression:
[0079] (8);
[0080] in, For the interpretable variables in the final output, This is the first-level contribution coefficient. This is the contribution coefficient for the second level. , and d represents the key physical parameter, and d is the dimension of the interpretable variable, d=17.
[0081] The interpretable encoder and interpretable decoder are trained using a joint objective function of reconstruction error and sparsity regularization. The expression for the joint objective function is as follows:
[0082] (9);
[0083] in, Let x be the original vibration signal, and let x be the joint objective function. The vibration signal reconstructed by the interpretable encoder based on the interpretable variable z The contribution coefficients for all physical components. The regularization coefficient is . As a sparse regularization term, it encourages a small number of key components to dominate expression, thereby improving... The readability and separability.
[0084] After training, the parameters of the interpretable encoder and interpretable decoder are frozen, and the training set samples are mapped to a set of interpretable variables. ,in The fault category is labeled; the freezing strategy ensures that any generated interpretable variable z can be translated back into a time-domain signal by the same decoder, and can be traced back to the corresponding physical components and parameters dimension by dimension.
[0085] In step S3, a simplified data generation backbone network is constructed that uses only the Transformer model to predict and generate clean data. This network is used to train data in the interpretable space, establish a conditional generation model in the interpretable variable space, learn the conditional distribution of each fault category, and perform sampling generation.
[0086] Interpretable variables The tokens are serialized and grouped into a sequence. After linear embedding to add position information, the sequence is input into a Transformer encoder and stacked. The time steps are then... The Transformer encoder is injected as a condition to achieve time-conditional modeling; the Transformer output is passed through a linear predictor to obtain direct predictors of clean, interpretable variables. This is used for subsequent training and sampling.
[0087] For each training pair Sampling time step With Gaussian noise Construct a noisy input, expressed as:
[0088] (10);
[0089] in, For noisy interpretable variables, For time steps, The noise level is standard Gaussian noise, and y represents the fault category.
[0090] Define the target velocity as follows:
[0091] (11);
[0092] Will The backbone network is generated using only Transformer data to obtain clean predictor variables, expressed as follows: (12);
[0093] in, For predictor variables, To simplify the generation of the backbone network;
[0094] The predicted velocity is obtained from equation (11). ,use It is used as a training loss function to update the network parameters θ.
[0095] Noise initialization Set an increasing time series ( ), for each time step The backbone network is generated using only Transformer data as input, yielding direct predictions from the network for clean variables. The calculation predicts the speed and uses Euler update to obtain the next sample. The expression is:
[0096] (13);
[0097] in, The variable values for the next time step are calculated according to Euler's update formula. This process iterates from t0 to t... K , These are the generated synthetic interpretable variables; The noisy variable estimate at the k-th time step; To achieve this under network conditions at time step t k Calculated predicted speed;
[0098] Sample of interpretable variables The input frozen parameters are interpreted back to obtain the simulated vibration signal in the time domain; the simulated vibration signal is then mixed with the unbalanced vibration signal data samples in the training set to construct a balanced training dataset.
[0099] To verify the improvement effect of the proposed method on fault diagnosis performance under different sample generation intensities, the classifier was trained on the balanced training dataset and fault diagnosis was performed on the test set to output the diagnosis results.
[0100] This invention also proposes a system for an interpretable, minimally simplified intelligent data augmentation method suitable for fault diagnosis of high-end equipment, comprising:
[0101] The data acquisition and preprocessing module is used to acquire vibration signal data of various types of bearings used in high-end equipment under different loads and speeds, corresponding to different fault types; the vibration signal data is sampled and standardized and divided into training and test sets.
[0102] An interpretable space building module is used to construct interpretable encoders and interpretable decoders based on a two-layer residual stacking structure; the interpretable encoders and interpretable decoders are trained by a joint objective function based on reconstruction error and sparse regularization; after training, the parameters of the interpretable encoders and interpretable decoders are frozen, and the vibration signal data in the training set are mapped into a set of low-dimensional, readable and parameterized interpretable variables through the interpretable encoder;
[0103] The simplified data generation model building module is used to build a simplified data generation model and train the simplified data generation model using a set of interpretable variables; the trained simplified data generation model is used to sample and generate new interpretable variables for a small number of fault types.
[0104] The balanced training dataset construction module is used to back-translate the new interpretable variables generated by sampling for a few fault types into the frozen parameters of the interpretable decoder to obtain the simulated vibration signal in the time domain; the simulated vibration signal is then mixed with the vibration signal data samples in the training set to construct the balanced training dataset.
[0105] The fault diagnosis module is used to train a classifier on the balanced training dataset, and to use the trained classifier to diagnose faults on the test set, and output the diagnosis results.
[0106] The present invention also proposes a computer-readable storage medium storing a computer program that enables a computer to execute the above-described interpretable, minimally simplified intelligent data augmentation method applicable to fault diagnosis of high-end equipment.
[0107] To further understand the technical solution of the present invention, a detailed description is provided in conjunction with specific embodiments:
[0108] This embodiment uses the bearing dataset from Huazhong University of Science and Technology, the bogie bearing dataset from Beijing Jiaotong University, and the extra-large slewing bearing dataset from Nanjing University of Technology (referred to as dataset 1, dataset 2, and dataset 3, respectively) to conduct verification experiments. The three types of data cover bearings of different sizes, different fault types, and different operating conditions such as speed and load. The three types of data are preprocessed and combined for verification according to the experimental settings of this invention to test the applicability of the data enhancement and fault diagnosis of this invention in different types of high-end equipment bearing scenarios.
[0109] The collected vibration signals are preprocessed, including segmenting continuous vibration signals into samples of fixed length, removing abnormal segments, and standardizing or normalizing the samples. The samples are then divided into training and test sets. The training set usually exhibits an imbalanced class distribution and is used for subsequent interpretable space construction and data generation modeling, while the test set is used for final fault diagnosis verification.
[0110] The three datasets are preprocessed as follows:
[0111] Dataset 1: The data comes from the bearing dataset released by Huazhong University of Science and Technology. The experiment used a triaxial accelerometer to collect axial (X) and radial (Y, Z) vibration signals at a sampling frequency of 25.6 kHz. Multi-condition data were constructed by adjusting the motor speed. Five states were selected for the experiment: healthy, moderate inner ring damage, moderate outer ring damage, severe outer ring damage, and severe inner ring damage. The data classification is shown in Table 1.
[0112] Table 1. Detailed information on the bearing dataset from Huazhong University of Science and Technology.
[0113]
[0114] Dataset 2: The data comes from the publicly available bogie drive system fault simulation dataset from Beijing Jiaotong University. This data was collected on a subway train bogie drive system test bench, covering various operating conditions and fault states of key components such as motors, gearboxes, and axle boxes. Seven states were selected for verification: healthy, and damage to the inner and outer races and rolling elements in the axle box and gearbox. Data partitioning and labeling are shown in Table 2.
[0115] Table 2. Detailed information on the Beijing Jiaotong University bogie dataset.
[0116]
[0117] Dataset 3: The data comes from the measured vibration data of the extra-large slewing bearing test bench of Nanjing University of Technology. It has engineering characteristics such as low speed heavy load and strong noise background. The sampling frequency is 2048Hz. Four states are selected for the experiment: healthy, single bolt fracture, multiple bolt fracture and raceway damage. The training set and test set are divided according to the proportion given in Table 3.
[0118] Table 3. Detailed information on the dataset of extra-large slewing bearings from Nanjing University of Technology.
[0119]
[0120] An interpretable encoder and interpretable decoder are constructed based on a two-layer residual stacking structure. The interpretable encoder and interpretable decoder are trained by a joint objective function based on reconstruction error and sparse regularization. After training, the parameters of the interpretable encoder and interpretable decoder are frozen, and the vibration signal data in the training set are mapped into a low-dimensional, readable and parameterized set of interpretable variables through the interpretable encoder.
[0121] Generate backbone network structure using only Transformer data: such as Figure 4 As shown in (a): First, the noisy interpretable variables are padded with dimensions and then divided into vector blocks of a fixed patch size. After obtaining the token sequence through linear embedding, it is input into a stacked Transformer encoder for global modeling. Finally, a clean variable estimate is directly output through a linear prediction head, and a v-loss function is constructed based on this to complete training supervision. Further, as... Figure 4 As shown in (b), the Transformer consists of Conditional Modulation Normalization (AdaLN), Multi-Head Self-Attention (MHSA), and a Feedforward Network (MLP). Both sub-layers employ residual connections to improve training stability and enhance the coupling representation ability between parameter dimensions. Meanwhile, as... Figure 4 As shown in (c), time step t and category y are mapped to conditional embeddings, and the scaling and bias parameters of the normalization layer are modulated in each block using AdaLN and FiLM methods, thereby achieving conditional generation without introducing additional complex components, enabling the model to learn each fault category and generate traceable synthetic variables. Table 4 shows the structural parameters.
[0122] Table 4 Simplified Generated Backbone Network Structure Parameters
[0123]
[0124] Five progressive tasks were set up according to Table 5: starting without adding generated samples, the samples generated by each generative model were gradually mixed into the original training set until 150 generated samples were added to each class to form a balanced dataset. Under each task, the classifier was trained with a two-layer convolutional neural network using a mixed training set of "original training data + generated data", and the experiment was repeated 5 times.
[0125] Table 5 Progressive Task Division
[0126]
[0127] like Figure 5 As shown, the simulated fault vibration signals generated by this invention on datasets 1, 2, and 3 maintain a high degree of consistency with the corresponding original signals in both time-domain waveform morphology and frequency-domain energy distribution. This effectively reproduces fault characteristics and reflects reasonable operating conditions / noise disturbances, demonstrating the feasibility and effectiveness of the method. Specifically, in the time domain, the generated signals effectively maintain the overall amplitude scale, waveform fluctuation trend, and typical impact / ringing characteristics of the original signals (such as the combination of periodic impacts and subsequent decaying ringing in dataset 1, the waveform characteristics of random impacts and noise background superposition in dataset 2, and the small-amplitude fluctuations and local abrupt changes under low-speed heavy load background in dataset 3). Their waveform contours and key transient morphologies are consistent with the original signals. In the frequency domain, the generated signals effectively maintain the main spectral peak positions, dominant frequency band range, and energy concentration area distribution of the original signals (such as the coexistence of low-frequency components and high-frequency energy clusters in dataset 1, the distribution characteristics of main spectral peaks and broadband background in dataset 2, and the distribution characteristics of low-frequency dominance and characteristic peaks in dataset 3). Furthermore, the generated spectrum matches the original spectrum in terms of peak structure and overall energy decay trend.
[0128] Therefore, by generating conditions within the interpretable variable space and back-translating them through an interpretable decoder, this invention can generate high-quality simulation samples consistent with the original fault characteristics on different types of bearings and different operating conditions. This avoids simply copying the original samples while maintaining the authenticity and usability of the fault information, providing a reliable data foundation for subsequently building a balanced dataset and improving the fault diagnosis effect.
[0129] To study the performance of this invention in data generation and fault diagnosis, two representative data generation models were collected through different methods for experimental comparison: a data generation model based on Generative Adversarial Network (GAN) and a data generation model based on Denoising Diffusion Probability Model (DDPM), referred to as Comparison Model 1 and Comparison Model 2, respectively.
[0130] like Figure 6 , Figure 7 and Figure 8 As shown, under five imbalanced task scenarios across datasets 1, 2, and 3, the diagnostic accuracy of all three methods increased overall with the addition of generated samples, indicating that "sample generation to fill in the minority class" effectively alleviates the underrepresentation problem caused by small sample imbalance. However, this invention consistently achieves higher and more stable diagnostic results across all three datasets, with its advantages gradually expanding as the task progresses. Specifically:
[0131] In Dataset 1, the overall accuracy in Task 1 was relatively low. When moving to Tasks 2 and 3, the accuracy of this invention improved from approximately 59.5% to approximately 75.0%, consistently higher than that of Comparative Model 2 (approximately 68.2% from 50.3%) and Comparative Model 1 (approximately 64.1% from 47.8%). When the number of generated samples was further increased to Tasks 4 and 5, the accuracy of this invention reached approximately 90.4% and approximately 98.3%, respectively, showing not only the largest improvement but also less fluctuation in repeated experiments. In contrast, Comparative Model 2 improved to approximately 87.9%–93.8%, and Comparative Model 1 improved to approximately 76.7%–86.0%.
[0132] In dataset 2, the accuracy of each method was low under the extremely small sample conditions of task 1; after entering tasks 2 and 3, the advantages of this invention began to expand significantly, with its accuracy increasing to about 59.0%~61.4% and 76.2%~77.4% respectively; it still maintained a stable lead in tasks 4 and 5, with about 89.2%~90.4% in task 4 and about 96.6%~97.8% in task 5.
[0133] In the noisy scenario of Dataset 3, the accuracy of each method in Task 1 is close to random levels, approximately 9.8%~12.2%, with the present invention achieving 10.0%~10.7%. As the task progresses, the advantages of the present invention gradually and steadily become apparent: 33.2%~35.4% in Task 2; further improved to 50.6%~53.5% in Task 3, compared to 33.9%~36.3% for Model 1 and 38.3%~40.3% for Model 2; 62.2%~64.9% in Task 4, compared to 39.8%~46.5% for Model 1 and 48.7%~54.0% for Model 2; and 71.1%~75.1% in Task 5, compared to 47.8%~52.9% for Model 1 and 57.6%~61.3% for Model 2.
[0134] The results show that the diagnostic benefits brought by high-quality generated samples are more significant during the process of gradually supplementing the samples. The novel interpretable and extremely simplified intelligent data augmentation method of the present invention, which is applicable to fault diagnosis of high-end equipment, can maintain higher upper limit performance and smaller fluctuations under different data scales and noise conditions.
[0135] like Figure 9 , Figure 10 and Figure 11As shown, under Task 5 conditions, the confusion matrix results for all three datasets indicate that the main diagonal of this invention is more concentrated and has fewer misclassified terms. Compared to Comparative Model 1 and Comparative Model 2, it can more effectively distinguish each fault category, demonstrating superior diagnostic consistency and reliability. This is because the novel interpretable, minimally simplified intelligent data augmentation method for fault diagnosis of high-end equipment provided by this invention performs structured modeling within a low-dimensional interpretable variable space, making the generation process more focused on components and parameters related to fault discrimination. Furthermore, it uses double-layer residual stacking to suppress background disturbances and noise. Simultaneously, it employs a minimally simplified generative backbone network using only Transformers, resulting in more stable training and avoiding the instability of adversarial training and the additional complexity brought by multi-step sampling. Therefore, it achieves superior augmentation and diagnostic effects under different task intensities across the three datasets.
[0136] The novel interpretable and minimally simplified intelligent data augmentation method for fault diagnosis of high-end equipment provided by this invention can achieve controllable sample generation and balanced dataset construction for different types of bearing data under the conditions of small sample size and class imbalance of vibration data of high-end equipment. It achieves higher and more stable fault diagnosis performance in three typical datasets and five imbalanced task settings. At the same time, by constructing a low-dimensional, readable, parameterized interpretable variable space, the generated samples have traceable components and parameter interpretations at the variable level. Combined with a minimally simplified generation backbone network using only Transformer, the training stability and engineering deployment efficiency of the generation process are improved, further enhancing the reliability and usability of intelligent fault diagnosis for high-end equipment.
[0137] In the embodiments disclosed in this application, a computer storage medium may be a tangible medium that may contain or store programs for use by or in conjunction with an instruction execution system, apparatus, or device. The computer storage medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of computer storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fibers, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
[0138] The above description is merely a preferred embodiment of the present invention and is not intended to limit the present invention in any way. Any simple modifications, equivalent substitutions, and improvements made by those skilled in the art to the above embodiments without departing from the scope of the technical solution of the present invention, based on the technical essence of the present invention, shall still fall within the protection scope of the technical solution of the present invention.
Claims
1. An interpretable, minimally simplified intelligent data augmentation method applicable to fault diagnosis of high-end equipment, characterized in that: Includes the following steps: S1: Obtain vibration signal data of different types of bearings used in high-end equipment under different loads or speeds and corresponding to different fault types; perform sample segmentation and standardization on the vibration signal data and divide it into training set and test set; S2: Construct an interpretable encoder and interpretable decoder based on a two-layer residual stacking structure; train the interpretable encoder and interpretable decoder using a joint objective function based on reconstruction error and sparse regularization; After training, the parameters of the interpretable encoder and interpretable decoder are frozen, and the vibration signal data in the training set is mapped into a low-dimensional, readable and parameterized set of interpretable variables through the interpretable encoder. S3: Construct a minimal data generation model and train the minimal data generation model using a set of interpretable variables; New interpretable variables are generated by sampling a small number of fault types using a highly simplified data generation model that has been trained. S4: Sample new interpretable variables generated for a few fault types and input them into the frozen parameters of the interpretable decoder for back translation to obtain the simulated vibration signal in the time domain; mix the simulated vibration signal with the vibration signal data samples in the training set to construct a balanced training dataset; S5: Train the classifier on the balanced training dataset, and use the trained classifier to diagnose faults on the test set, and output the diagnosis results.
2. The interpretable, minimally simplified intelligent data augmentation method for fault diagnosis of high-end equipment according to claim 1, characterized in that: The interpretable encoder is a dual-head encoder structure that outputs the contribution coefficients of each physical component in the vibration signal and the corresponding physical parameter variables.
3. The interpretable, minimally simplified intelligent data augmentation method for fault diagnosis of high-end equipment according to claim 2, characterized in that: In step S2, the double-layer residual stacking structure represents the vibration signal as a two-level fixed residual mechanism consisting of a steady-state background component and an impact transient component. The first level fits and reconstructs the background steady-state component in the vibration signal based on the first function package, and the second level fits and reconstructs the impact transient component in the residual after the first level reconstruction based on the second function package. The final residual statistic after the second level reconstruction characterizes the noise intensity in the signal.
4. The interpretable, minimally simplified intelligent data augmentation method for fault diagnosis of high-end equipment according to claim 3, characterized in that: The explanatory variables consist of the contribution coefficients and physical parameters of the background steady-state components, the contribution coefficients and physical parameters of the impact transient components, and the statistics of noise intensity.
5. The interpretable, minimally simplified intelligent data augmentation method for fault diagnosis of high-end equipment according to claim 1, characterized in that: In step S2, the joint objective function expression based on reconstruction error and sparse regularization is specifically as follows: ; in, Let x be the original vibration signal, and let x be the joint objective function. To reconstruct the signal, The contribution coefficient of the components. is the regularization coefficient.
6. The interpretable, minimally simplified intelligent data augmentation method for fault diagnosis of high-end equipment according to claim 1, characterized in that: In step S3, the simplified data generation model is a conditional generation backbone network constructed using the Transformer module.
7. The interpretable, minimally simplified intelligent data augmentation method for fault diagnosis of high-end equipment according to claim 6, characterized in that: The training of the simplified data generation model specifically includes adding noise to the interpretable variables to construct noisy inputs. The network takes the noisy variables, time steps, and category conditions as inputs, adopts a training method aimed at predicting clean variables, and constructs supervision signals and loss functions in the form of flow rates to optimize the conditions and generate parameters for the backbone network.
8. The interpretable, minimally simplified intelligent data augmentation method for fault diagnosis of high-end equipment according to claim 1, characterized in that: The training set is constructed as an imbalanced dataset, including healthy state samples and faulty state samples, with the number of healthy state samples exceeding the number of faulty state samples. The number of healthy state samples and fault state samples in the balanced training dataset is balanced.
9. A system for an interpretable, minimally simplified intelligent data augmentation method suitable for fault diagnosis of high-end equipment, characterized in that, include: The data acquisition and preprocessing module is used to acquire vibration signal data of various types of bearings used in high-end equipment under different loads and speeds, corresponding to different fault types; the vibration signal data is sampled and standardized and divided into training and test sets. An interpretable space building block is used to construct interpretable encoders and interpretable decoders based on a two-layer residual stacking structure; the interpretable encoders and interpretable decoders are trained by a joint objective function based on reconstruction error and sparse regularization; After training, the parameters of the interpretable encoder and interpretable decoder are frozen, and the vibration signal data in the training set are mapped into a set of low-dimensional, readable and parameterized interpretable variables through the interpretable encoder. The Minimalist Data Generation Model Building Module is used to build minimal data generation models and train them using a set of interpretable variables. New interpretable variables are generated by sampling a small number of fault types using a highly simplified data generation model that has been trained. The balanced training dataset construction module is used to back-translate the new interpretable variables generated by sampling for a few fault types into the frozen parameters of the interpretable decoder to obtain the simulated vibration signal in the time domain; the simulated vibration signal is then mixed with the vibration signal data samples in the training set to construct the balanced training dataset. The fault diagnosis module is used to train a classifier on the balanced training dataset, and to use the trained classifier to diagnose faults on the test set, and output the diagnosis results.
10. A computer-readable storage medium having a computer program stored thereon, characterized in that: The computer program causes the computer to execute the interpretable, minimally simplified intelligent data augmentation method for fault diagnosis of high-end equipment as described in any one of claims 1-8.