High-speed train bearing fault diagnosis method based on CNN-BiLSTM-attention hybrid model
The CNN-BiLSTM-Attention hybrid model for high-speed train bearing fault diagnosis solves the problems of high dependence on expert experience, low accuracy, and poor generalization ability in existing technologies. It achieves high-precision, robust, and interpretable fault diagnosis and is applicable to intelligent operation and maintenance of high-speed train bearings and other rotating machinery.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- NANTONG UNIV
- Filing Date
- 2026-03-13
- Publication Date
- 2026-06-19
Smart Images

Figure CN122241073A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of high-speed train running system operation and maintenance technology, specifically to a high-speed train bearing fault diagnosis method based on deep learning. Background Technology
[0002] Bearings are the core rotating components of the high-speed train running system, and their operating status directly affects the safety of high-speed train operation. High-speed trains operate under high speed and high load conditions for extended periods, which significantly increases the bearing failure rate. If bearing faults are not identified in time, they can easily lead to train operation failures or even safety accidents. Therefore, accurate and real-time fault diagnosis of high-speed train bearings is of great engineering significance.
[0003] Currently, high-speed train bearing fault diagnosis relies heavily on manual analysis based on expert experience or traditional signal processing techniques. These methods require manual extraction of vibration signal features, are highly dependent on expert experience, and are difficult to adapt to complex and ever-changing operating environments. In scenarios with background noise interference and dynamic changes in operating conditions, the accuracy of fault feature extraction is low, resulting in poor diagnostic performance.
[0004] In recent years, intelligent fault diagnosis methods based on deep learning have been gradually applied to bearing fault diagnosis, improving diagnostic efficiency by leveraging the automatic feature extraction capabilities of deep learning models. However, they still have many technical shortcomings: First, vibration signals are affected by background noise and changes in operating conditions, and a single deep learning model (such as Convolutional Neural Network (CNN) or Long Short-Term Memory (LSTM) network) cannot fully capture the spatial features and temporal dependencies of the signal, resulting in insufficient completeness of fault feature extraction. Second, fault samples of high-speed train bearings are scarce in actual engineering, and existing deep learning models have poor generalization ability and poor cross-condition diagnostic performance. Third, most existing deep learning models are "black box" structures, unable to effectively explain the diagnostic process, making it difficult to gain the trust of engineers and limiting their practical application in engineering. Fourth, some models require complex pre-feature engineering, making the diagnostic process cumbersome, with low automation, and unable to meet the needs of real-time monitoring.
[0005] To address the aforementioned technical issues, there is an urgent need for a high-speed train bearing fault diagnosis method that can achieve multi-level feature extraction, accurate modeling of time-series dependencies, focus on key fault information, and possess high generalization ability and high interpretability, in order to overcome the shortcomings of existing technologies and improve the accuracy and robustness of bearing fault diagnosis. Summary of the Invention
[0006] Purpose of the invention: To address the aforementioned existing technologies, this invention proposes a high-speed train bearing fault diagnosis method based on a CNN-BiLSTM-Attention hybrid model. By comprehensively extracting the spatial and temporal features of the bearing vibration signal, this method achieves high-precision, robust, and highly interpretable diagnosis of high-speed train bearing faults.
[0007] Technical solution: A high-speed train bearing fault diagnosis method based on a CNN-BiLSTM-Attention hybrid model, comprising four steps: data acquisition and preprocessing, CNN-BiLSTM-Attention hybrid model construction, model training and optimization, and fault diagnosis inference. The hybrid model is composed of a CNN module, a BiLSTM module, an Attention module, and a fully connected classification layer connected in sequence. It adopts an end-to-end learning approach and directly outputs the fault type from the original vibration signal input.
[0008] The hybrid model takes the preprocessed vibration signal tensor as input, extracts local spatial features through the CNN module, captures bidirectional temporal dependencies through the BiLSTM module, focuses key fault information through the Attention module, and outputs the fault category probability distribution through the fully connected classification layer to finally determine the bearing fault type.
[0009] Furthermore, the data acquisition and preprocessing steps include the following specific steps:
[0010] 1.1: Collect vibration signals of high-speed train bearings, including vibration signals of four types of states: normal state, inner ring fault, outer ring fault, and rolling element fault. Set the sampling rate to 32kHz and construct the original dataset.
[0011] 1.2: The original vibration signal is preprocessed, including denoising, normalization, and sequence segmentation. The processed vibration signal is converted into tensor data of dimension (batch_size, seq_length, 1) and used as model input. It is divided into training set, validation set and test set. The training set is used for model training, the validation set is used for model hyperparameter optimization, and the test set is used for model performance verification.
[0012] Furthermore, in the CNN-BiLSTM-Attention hybrid model:
[0013] The CNN module consists of a Conv1D convolutional layer, a ReLU activation function layer, and a MaxPooling1D pooling layer connected sequentially. Utilizing the local connectivity, weight sharing, and downsampling characteristics of CNNs, it performs convolution and pooling operations on the input vibration signal tensor, automatically extracting local spatial features and multi-scale frequency components from the vibration signal and outputting a feature map. Specifically, the Conv1D convolutional layer generates initial features through local sliding operations, the ReLU activation function layer enhances the complex pattern representation of the features, and the MaxPooling1D pooling layer reduces the dimensionality of the features, preserving key information while reducing computational cost.
[0014] The BiLSTM module consists of a forward LSTM unit, a backward LSTM unit, and a vector concatenation layer. It takes the feature map output by the CNN module as input. The forward LSTM unit captures the historical temporal information of the signal, and the backward LSTM unit captures the future temporal information of the signal. The vector concatenation layer concatenates the outputs of the forward and backward LSTM units, and the output is a temporal hidden state with dimensions of (batch, seq, hidden*2). The LSTM unit contains a forget gate, an input gate, a cell state gate, and an output gate. The gating system dynamically controls the input, forgetting, and output of information to avoid gradient anomaly problems and achieve accurate modeling of long temporal dependencies.
[0015] The Attention module takes the temporal hidden state output by the BiLSTM module as input and mimics the human attention allocation mechanism. Through four steps—projection generation, similarity calculation, weight normalization, and information aggregation—it dynamically assigns weights to features at different time steps, strengthens the weights of features strongly correlated with the fault, and suppresses noise and redundant feature weights. Finally, it outputs a context vector containing key contextual semantics of the fault. Specifically, projection generation maps features to different semantic spaces through linear transformation, similarity calculation measures feature correlation through dot product and scales to prevent gradient anomalies, weight normalization generates probabilistic attention weights through the Softmax function, and information aggregation generates the context vector through weighted summation.
[0016] Fully connected classification layer: Composed of a Dense fully connected layer and a Softmax activation function layer, it takes the context vector output by the Attention module as input, integrates features through the Dense layer, and uses the Softmax activation function to map the features to a fault category probability distribution. The category corresponding to the maximum probability is the fault type of the bearing, thus realizing fault classification. The fault types include four categories: normal, inner ring fault, outer ring fault, and rolling element fault.
[0017] Furthermore, the model training and optimization steps include the following specific steps:
[0018] 3.1: Using the Source Domain Public Bearing Dataset (CWRU) as the basic training set, combined with the constructed high-speed train bearing vibration signal dataset, the CNN-BiLSTM-Attention hybrid model was trained. The cross-entropy loss function was selected as the model's loss function, and the adaptive moment estimation (Adam) was selected as the optimizer. Appropriate learning rate, batch size, and number of training rounds were set. The model's weights and biases were continuously updated through forward propagation and backpropagation, so that the loss function value converged to the minimum value.
[0019] 3.2: Use the validation set to validate the model's performance during training. Employ early stopping to prevent overfitting. If the loss function value on the validation set does not decrease for several consecutive rounds, stop training and save the optimal model parameters.
[0020] 3.3: To address the problem of scarce fault samples in actual engineering, a transfer learning strategy is adopted to transfer the model parameters trained in the source domain to the target domain (the actual operation scenario of high-speed trains). The model is then fine-tuned through a domain adaptation strategy to improve the model's cross-condition diagnostic capabilities.
[0021] Furthermore, the fault diagnosis reasoning steps include the following specific steps:
[0022] The high-speed train bearing vibration signal, which is collected and preprocessed in real time, is input into the trained and optimized CNN-BiLSTM-Attention hybrid model. Through the sequential processing of the CNN module, BiLSTM module, Attention module and fully connected classification layer of the model, the bearing fault category probability distribution is output. Based on the probability distribution, the real-time operating status of the bearing is determined, realizing online and real-time diagnosis of high-speed train bearing faults.
[0023] Beneficial Effects: The high-speed train bearing fault diagnosis method based on the CNN-BiLSTM-Attention hybrid model of this invention has the following significant beneficial effects compared with the prior art:
[0024] 1. This invention extracts local spatial features of vibration signals using a CNN module, captures bidirectional long-term temporal dependencies using a BiLSTM module, and focuses on key fault information using an Attention module. These three modules work together to achieve deep learning of multi-dimensional fault patterns in bearing vibration signals, encompassing spatial, temporal, and key information dimensions, effectively improving the completeness and accuracy of fault feature extraction. Experimental results show that the model achieves a diagnostic accuracy of 0.9850, an F1 score of 0.9849, and an AUC score of 0.9995. Its accuracy in recognizing normal states reaches 1.0000, significantly higher than traditional machine learning models such as Random Forest, SVM, Gradient Boosting, and MLP, as well as single deep learning models. It can accurately distinguish between normal states and three types of fault states: inner race, outer race, and rolling element.
[0025] 2. The Attention module can dynamically suppress the interference of background noise and redundant information, enabling the model to maintain high diagnostic performance in scenarios with complex noise and dynamic changes in working conditions. At the same time, the model supports transfer learning strategies, which can transfer knowledge from the source domain to the target domain, effectively solving the problem of scarce fault samples in actual engineering. The cross-condition diagnostic capability is significantly improved. After cross-validation, the model's cross-validation accuracy reached 97.91%, and it showed stable diagnostic performance under different data segmentation.
[0026] 3. The Attention mechanism introduced in this invention can output the attention weights of features at each time step, which can clearly identify the key fault signal segments focused on during the model diagnosis process. This breaks the "black box" defect of traditional deep learning models, makes the diagnosis process interpretable, increases engineers' trust in the model, and is more conducive to engineering applications.
[0027] 4. The model of this invention adopts an end-to-end learning approach, from the input of the original vibration signal to the output of the fault type. It eliminates the need for complex manual feature engineering, simplifies the fault diagnosis process, reduces reliance on expert experience, improves the degree of automation in diagnosis, and meets the engineering requirements for real-time monitoring of bearing faults in high-speed trains.
[0028] 5. The method of this invention is specifically designed for fault diagnosis of high-speed train bearings, and is adapted to the high-speed and high-load operating conditions of high-speed trains, providing reliable technical support for intelligent fault diagnosis systems for high-speed train bearings; at the same time, this method can be extended to fault diagnosis scenarios of other rotating machinery such as fans, motors, and gearboxes, providing a new technical path for intelligent operation and maintenance of rotating machinery. Attached Figure Description
[0029] Figure 1 This is a diagram of the CNN-BiLSTM-Attention hybrid model architecture.
[0030] Figures 2-6 The following are the fault diagnosis confusion matrices for RandomForest, SVM, GradientBoosting, MLP, and CNN-BiLSTM-Attention models, respectively.
[0031] Figure 7 A performance comparison chart for each model;
[0032] Figures 8-12 The charts show the comparison of multi-class ROC curves for RandomForest, SVM, GradientBoosting, MLP, and CNN-BiLSTM-Attention models, respectively.
[0033] Figure 13 Normalized confusion matrix for cross-validation of CNN-BiLSTM-Attention model. Detailed Implementation
[0034] The technical solution of the present invention will be described in detail below with specific experimental data and implementation steps, so that those skilled in the art can better understand and implement the present invention. It should be noted that the following embodiments are only used to explain the present invention and are not intended to limit the scope of protection of the present invention.
[0035] Example 1: Model Training and Performance Validation Experiment
[0036] 1. Experimental Dataset Setup
[0037] The source domain dataset uses the publicly available bearing dataset (CWRU), which includes four types of samples: Normal, Inner Ring Fault (IR), Outer Ring Fault (OR), and Rolling Element Fault (B), covering common bearing fault types. The target domain dataset simulates actual operating data of high-speed train bearings, using a sampling rate of 32kHz to collect vibration signals of the four states. After denoising, normalizing, and sequence segmentation of the original signals, they are converted into tensor data of (batch_size, seq_length, 1), which are then divided into training, validation, and test sets in a 7:2:1 ratio.
[0038] 2. Model building parameters
[0039] Build as Figure 1 The CNN-BiLSTM-Attention hybrid model shown has the following configurations: the Conv1D convolutional layer in the CNN module has 64 kernels, a kernel size of 3, and a stride of 1; the MaxPooling1D pooling layer has a pooling kernel size of 2 and a stride of 2; the BiLSTM module has a hidden layer dimension of 128, one forward LSTM unit and one backward LSTM unit, with a dropout value of 0.2 to prevent overfitting; the Attention module uses an additive attention mechanism, with two Dense layers in the fully connected classification layer, with 64 and 4 neurons respectively, and finally outputs the probability distribution of four types of faults through the Softmax activation function.
[0040] 3. Model training parameters
[0041] The cross-entropy loss function was selected as the loss function, Adam was used as the optimizer, the learning rate was set to 0.001, the batch size was set to 32, and the number of training epochs was set to 100. Early stopping was adopted, and training was stopped if the validation set loss did not decrease for 10 consecutive epochs, and the optimal model parameters were saved. At the same time, a transfer learning strategy was adopted to transfer the model parameters trained on the source domain CWRU dataset to the target domain high-speed train bearing dataset, and fine-tuned for 50 epochs with a learning rate of 0.0001.
[0042] 4. Evaluation Indicators
[0043] Accuracy, precision, recall, F1 score, and AUC score were selected as the model performance evaluation metrics. Each metric was calculated based on true positive (TP), true negative (TN), false positive (FP), and false negative (FN). The AUC score is the area under the ROC curve and is used to measure the generalization ability of the model.
[0044] in:
[0045] Accuracy:
[0046] Accuracy:
[0047] Recall rate:
[0048] F1 score:
[0049] AUC score: Area under the ROC curve, measures the model's generalization ability.
[0050] TP (True Positive), TN (True Negative), FP (False Positive), and FN (False Negative) represent the number of the four classification results, respectively.
[0051] 5. Experimental Results and Analysis
[0052] To address the issue of separating the source domain training and test sets, five suitable diagnostic models were selected to implement the source domain diagnostic task. RandomForest, SVM, GradientBoosting, and MLP models were compared with the CNN-BiLSTM-Attention model to demonstrate that the selected CNN-BiLSTM-Attention model is the optimal diagnostic model. The diagnostic results are evaluated as follows:
[0053] Figures 2-6 This figure shows the confusion matrices of different models (CNN-BiLSTM-Attention, RandomForest, SVM, GradientBoosting, and MLP) in the rolling bearing fault classification task. The confusion matrix is an important tool for evaluating the performance of classification models; its rows represent the true labels, and the columns represent the predicted labels. The values in the matrix indicate the number of samples that were predicted as belonging to the corresponding true category. The figure shows that each model performs differently in classifying different fault types (IR, OR, B) and normal states. The RandomForest model exhibits some misclassifications in some categories, misclassifying some IR category samples as OR; the SVM model performs relatively well in identifying the OR category, but also has a few misclassifications; the GradientBoosting model also shows varying degrees of misclassification across categories; and the MLP model also exhibits misclassification when distinguishing between different fault types.
[0054] In contrast, the confusion matrix of the CNN-BiLSTM-Attention model shows a more concentrated diagonal distribution, indicating that the model has higher accuracy in distinguishing different fault types and normal states, and can more effectively capture feature information in the data, reducing misclassification. This is due to the local feature extraction capability of CNN, the bidirectional capture of temporal information by BiLSTM, and the focusing effect of the Attention mechanism on key information, enabling this hybrid model to exhibit superior performance in the rolling bearing fault classification task.
[0055] Figure 7 The performance comparison of five models shows that RandomForest, SVM, CNN-BiLSTM-Attention, and GradientBoosting perform exceptionally well in the source domain fault diagnosis task. All five evaluation metrics (accuracy, precision, recall, and F1 score) are close to a high level of 0.95, indicating that these models can effectively learn the fault feature patterns of the source domain data. The MLP (Multilayer Perceptron) model performs significantly worse, with a metric of approximately 0.45, presumably due to underfitting caused by unreasonable network structure design or insufficient parameter optimization. Among these five models, CNN-BiLSTM-Attention outperforms the others across all five evaluation metrics.
[0056] Figures 8-12 The paper presents multi-class ROC curves of various models, including CNN-BiLSTM-Attention, RandomForest, SVM, GradientBoosting, and MLP, in the rolling bearing fault classification task. The ROC curves are plotted with the false positive rate (FPR) on the horizontal axis and the true positive rate (TPR) on the vertical axis. By depicting the relationship between TPR and FPR at different thresholds, the paper intuitively demonstrates the classification performance of the model. The larger the area under the curve (AUC), the better the model performance.
[0057] The ROC curve for multi-class datasets remains constant despite changes in the distribution of positive and negative samples in the test set. In real-world datasets, class imbalance is common, meaning there are significantly more negative samples than positive samples (or vice versa), and the distribution of positive and negative samples in the test data can also change over time. The above figure clearly shows that both traditional and deep learning models are quite accurate in identifying different fault types (B, IR, OR) and the "normal" class (AUC generally higher than 0.95). Among them, CNN-BiLSTM-Attention performs better in identifying the "normal" class and has stronger "fine-grained" discrimination ability for fault classes.
[0058] As can be seen from the figure, the ROC curves of each model show different trends. Although the RandomForest model's curve has a certain degree of discrimination, its AUC value is not the highest, indicating that it has certain limitations in fault classification. The SVM model's curve shows a certain degree of stability, but there is still room for improvement in the classification performance of some categories. The GradientBoosting model's curve shows good classification ability, but it is still far from the ideal state. The MLP model's curve indicates that its performance in handling this classification task is average.
[0059] In summary, the experimental results show that the CNN-BiLSTM-Attention model has the best classification performance in this multi-class fault diagnosis task, and can more accurately distinguish between "normal" and various faults, demonstrating higher diagnostic accuracy and robustness. Although traditional machine learning models are effective, they are slightly inferior in terms of extreme accuracy and fine-class distinction.
[0060] Table 1 Performance Comparison of Source Domain Fault Diagnosis Models
[0061]
[0062] Table 1 further confirms the model rankings: CNN-BiLSTM-Attention ranks first with an overall performance of 0.9849 and an AUC of 0.9995, indicating that the model has excellent classification performance and generalization ability; RandomForest follows closely behind in second place, with all indicators exceeding 0.98; GradientBoosting and SVM rank third and fourth respectively, showing stable performance; MLP ranks last, with all indicators significantly lower than other models, especially the accuracy of only 0.7554, indicating that the model has obvious learning difficulties on source domain data.
[0063] In summary, the CNN-BiLSTM-Attention model outperforms the other four models in all model evaluations. Therefore, when designing a suitable diagnostic model to implement the source domain diagnostic task, the CNN-BiLSTM-Attention model is selected as the optimal diagnostic model.
[0064] Table 2 Performance of CNN-BiLSTM-Attention by Category
[0065]
[0066] Table 2 shows that the performance of CNN-BiLSTM-Attention varies across different fault categories: the Normal category performs best, achieving a perfect 1.0000 in precision, recall, and F1 score; the IR category performs second best, with a precision of 0.99 but a slightly lower recall of 0.97; the OR (outer ring fault) and B (rolling body fault) categories perform relatively weakly, with the B category's metrics around 0.98. This difference may be related to the feature complexity and sample distribution of different fault types. The Normal category is the easiest to identify, while some fault types may have overlapping features, increasing the difficulty of classification.
[0067] To evaluate the stability and generalization ability of the model, cross-validation was performed on the CNN-BiLSTM-Attention model, such as... Figure 13 As shown, a cross-validation accuracy of 97.91% was obtained, indicating that the model has good average performance under different data splits.
[0068] According to the experimental results, CNN-BiLSTM-Attention shows a significant advantage in bearing fault diagnosis tasks, which can be mainly attributed to the following factors:
[0069] 1. CNNs automatically learn local spatial patterns in vibration signals (temporal morphology of fault impact and energy distribution in specific frequency bands) through "convolution kernel sliding + multi-scale structure (Inception, multi-scale convolution kernel)", overcoming the shortcomings of traditional "manual feature extraction relying on experience and poor generalization".
[0070] 2. BiLSTM is composed of "forward LSTM + backward LSTM", which can simultaneously capture the "history-future" bidirectional temporal dependence of vibration signals (bearing fault vibration has obvious periodic temporal patterns), making up for the shortcomings of a single CNN in mining long-term temporal information.
[0071] 3. Attention mechanisms (channel attention, self-attention) can "dynamically weight" the features extracted by CNN and BiLSTM—strengthening features strongly related to faults (key frequency bands, typical impact segments), suppressing redundant / noisy features, and making the model more focused on core fault information.
[0072] 4. The model adopts a progressive fusion structure of "CNN extracting spatial features → BiLSTM mining temporal features → attention mechanism weighting key features", which can fully learn multi-dimensional fault modes of "spatial-temporal-key information" from the original signal. Therefore, it can still maintain high diagnostic performance under complex working conditions such as variable load and strong noise.
[0073] The fault diagnosis model based on CNN-BiLSTM-Attention demonstrates superior performance and robustness, providing reliable technical support for intelligent fault diagnosis systems for high-speed train bearings. By appropriately selecting models, optimizing features, and improving ensemble strategies, the diagnostic accuracy and practicality of the system can be further enhanced.
[0074] Example 2: Real-time diagnosis of bearing faults in high-speed trains
[0075] 1. Real-time data acquisition: Vibration sensors are installed at the bearings of the high-speed train running system to acquire bearing vibration signals in real time. The sampling rate is set to 32kHz and the acquisition frequency is 1 time / second to achieve continuous acquisition of vibration signals.
[0076] 2. Real-time data preprocessing: The collected vibration signals are preprocessed in real time. First, background noise is removed by wavelet denoising. Then, normalization is performed to map the signal amplitude to the [0,1] interval. Finally, sequence segmentation is performed to generate tensor data with dimension (1,seq_length,1), which is used as the real-time input of the model.
[0077] 3. Fault diagnosis reasoning: The preprocessed real-time vibration signal tensor is input into the CNN-BiLSTM-Attention hybrid model trained and optimized in Example 1. The model sequentially extracts local spatial features through the CNN module, captures bidirectional temporal dependencies through the BiLSTM module, and focuses on key fault information through the Attention module. Finally, the fully connected classification layer outputs the probability distribution of four states: normal, inner ring fault, outer ring fault, and rolling element fault.
[0078] 4. Diagnostic Result Output: The category corresponding to the maximum value in the probability distribution is selected as the real-time operating status of the bearing. The diagnostic results are transmitted to the high-speed train operation and maintenance monitoring platform. If a fault condition is detected, an early warning signal is immediately issued to remind the operation and maintenance personnel to carry out maintenance, thereby realizing real-time diagnosis and early warning of bearing faults in high-speed trains.
[0079] The above description is only a preferred embodiment of the present invention. It should be noted that for those skilled in the art, several improvements and modifications can be made without departing from the principle of the present invention, and these improvements and modifications should also be considered within the scope of protection of the present invention.
Claims
1. A high-speed train bearing fault diagnosis method based on a CNN-BiLSTM-Attention hybrid model, characterized in that, include: S1. Data Acquisition and Preprocessing: Acquire vibration signals of high-speed train bearings in four states, preprocess them, convert them into tensor data, and divide the dataset. S2. Hybrid Model Construction: Construct a CNN-BiLSTM-Attention hybrid model consisting of a CNN module, a BiLSTM module, an Attention module, and a fully connected classification layer connected in sequence. Each module sequentially performs local spatial feature extraction, bidirectional temporal dependency modeling, fault key information focusing, and fault category probability output. S3. Model Training and Optimization: Train the model using the source domain dataset, optimize the hyperparameters using the validation set, use transfer learning strategies to transfer knowledge from the source domain to the target domain, and save the optimal model parameters. S4. Fault Diagnosis Reasoning: Input the real-time collected and pre-processed vibration signal into the optimal model, output the fault category probability distribution, and determine the real-time operating status of the bearing.
2. The method according to claim 1, characterized in that, The CNN module consists of a Conv1D convolutional layer, a ReLU activation function layer, and a MaxPooling1D pooling layer connected in sequence. It extracts the local spatial features and multi-scale frequency components of the vibration signal through convolution and pooling operations, and outputs a feature map.
3. The method according to claim 1, characterized in that, The BiLSTM module consists of a forward LSTM unit, a backward LSTM unit, and a vector concatenation layer. The forward and backward LSTM units capture the historical and future timing information of the signal, respectively, and output the timing hidden state after concatenation.
4. The method according to claim 1, characterized in that, The Attention module takes the output of the BiLSTM module as input and performs four operations: projection generation, similarity calculation, weight normalization, and information aggregation. It dynamically assigns weights to features at different time steps and outputs a context vector containing key contextual semantics of the fault.
5. The method according to claim 1, characterized in that, The fully connected classification layer consists of a Dense fully connected layer and a Softmax activation function layer, which maps the context vector to a probability distribution of four categories: normal, inner ring fault, outer ring fault, and rolling element fault. The category corresponding to the maximum probability is the fault type.
6. The method according to claim 1, characterized in that, In step S3, the cross-entropy loss function is used as the model loss function, Adam is used as the optimizer, and early stopping is used to prevent the model from overfitting. To address the problem of scarce fault samples, transfer learning and domain adaptation strategies are used to improve the model's cross-condition diagnostic capabilities.
7. The method according to claim 1, characterized in that, In step S4, real-time diagnosis of bearing faults in high-speed trains is achieved. The vibration signal sampling rate is 32kHz, and the entire process of acquiring and outputting diagnostic results is automated without the need for manual feature engineering.
8. A high-speed train bearing fault diagnosis system, characterized in that, The high-speed train bearing fault diagnosis method based on the CNN-BiLSTM-Attention hybrid model as described in any one of claims 1-7 includes a vibration signal acquisition module, a data preprocessing module, a model training module, a fault diagnosis inference module, and a diagnosis result output module. These modules are connected in sequence to realize real-time acquisition, preprocessing, diagnosis, and early warning of bearing faults.