High-speed train secondary suspension system hybrid neural network fault diagnosis method and system

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By employing a hybrid neural network approach, combining time-frequency convolutional networks, sparse attention mechanisms, and inverted embedding Transformers, the problem of fault diagnosis in the secondary suspension system of high-speed trains under complex environments was solved, achieving high-precision fault identification and condition assessment.

CN122196659APending Publication Date: 2026-06-12SOUTHWEST JIAOTONG UNIV

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: SOUTHWEST JIAOTONG UNIV
Filing Date: 2026-02-10
Publication Date: 2026-06-12

Application Information

Patent Timeline

10 Feb 2026

Application

12 Jun 2026

Publication

CN122196659A

IPC: G06F18/241; G06F18/10; G06F18/213; G06F18/25; G06F18/2431; G06N3/0442; G06N3/0455; G06N3/0464; G06N3/048; G06N3/084; G06N3/082; G01M17/10; G01H17/00

AI Tagging

Application Domain

Subsonic/sonic/ultrasonic wave measurement Biological models

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing technologies are insufficient to effectively diagnose multi-mode, highly nonlinear faults in the secondary suspension system of high-speed trains under complex service environments, leading to a decrease in diagnostic accuracy and stability.

Method used

A hybrid neural network approach is adopted, combining time-frequency convolutional network (TFN), native sparse attention mechanism (NSA) and inverted embedding Transformer (iTransformer), to achieve fault diagnosis through time-frequency feature extraction, attention calculation and feature fusion.

Benefits of technology

It improves the accuracy and stability of fault diagnosis in the secondary suspension system of high-speed trains, and can effectively extract and fuse local and global features, and uncover complex dependencies between sensor channels.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122196659A_ABST

Patent Text Reader

Abstract

The application provides a high-speed train secondary suspension system hybrid neural network fault diagnosis method and system, including collecting sensor original vibration signals under six working conditions of the suspension system, pre-processing the signals, extracting features through a time-frequency convolution network, and outputting a feature sequence; performing attention calculation on the feature sequence, executing three attention mechanisms, finally learning three gating weights through a linear layer, normalizing the gating weights through Softmax, and performing weighted summation on the outputs of the three attention mechanisms to obtain a fused feature sequence; normalizing the feature sequence after attention fusion, modeling the feature dimension as a sequence by using reverse embedding, splicing a CLS token, and then mining the relationship between different feature channels through a Transformer; extracting a feature vector corresponding to the CLS token, inputting the feature vector into a fully connected classifier, and outputting a fault prediction category. The technical scheme based on the application improves the accuracy of high-speed train secondary suspension system fault diagnosis.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of fault diagnosis and deep learning technology, and particularly to a hybrid neural network fault diagnosis method and system for a high-speed train secondary suspension system. Background Technology

[0002] The secondary suspension system of high-speed trains is a key elastic component connecting the car body and bogies. Its main function is to attenuate residual vibrations not filtered by the primary suspension and suppress yaw, roll, and pitching movements of the car body during operation. The health of this system directly affects train operation safety and passenger comfort. Performance degradation or malfunctions can lead to train instability, derailment, and even affect other systems such as braking and power supply. Therefore, real-time and accurate monitoring of its status is crucial. Current monitoring and diagnostic technologies can be mainly divided into two categories: (1) Model-based approach: Fault diagnosis is achieved by establishing a vehicle system dynamics model and analyzing the residuals between the model output and actual measurement data. The core principle is "first build a theoretical model, then compare the actual data to find the deviation". First, a vehicle system dynamics model (such as the force / vibration model of a two-stage suspension) is built based on the principles of mechanics and vibration. The fault is judged by analyzing the difference (residual) between the theoretical value calculated by the model and the actual measured data of the sensor. It has the advantage of strong physical interpretability and was the focus of early research. (2) Data-driven methods: With the development of sensing and storage technologies, these methods do not require precise physical models. Instead, they directly extract statistical or nonlinear features from monitoring signals for state identification and assessment. Data-driven fault diagnosis methods do not require the construction of precise system mechanism models. Instead, they directly perform multi-domain feature mining on monitoring signals throughout the entire equipment lifecycle. Through time-domain statistical analysis, frequency-domain harmonic analysis, and nonlinear dynamic feature extraction algorithms, they extract statistical, frequency-domain, and nonlinear features that characterize the equipment's operating state from the signals, thereby achieving the identification, classification, and quantitative assessment of the equipment's health status. Among these, deep learning methods can automatically extract high-level features through neural networks and have good adaptability to multiple faults, compound faults, and complex operating conditions. They have become the rapidly developing mainstream direction in this field.

[0003] Due to factors such as large-scale road networks, complex geographical and climatic conditions, long-distance continuous high-speed operation, and wheel-rail excitation, the service conditions of high-speed train secondary suspension systems are extremely complex. Model-based methods have the advantage of strong interpretability, clearly revealing the physical correlation between faults and system dynamic responses; however, their application is limited by factors such as modeling accuracy, hardware cost, and computational efficiency. Faced with the multi-mode, strongly nonlinear fault characteristics exhibited by secondary suspension systems under complex service environments, existing single models or simple hybrid architectures struggle to fully exploit the deep correlations and time-frequency characteristics within the data, leading to decreased diagnostic accuracy and stability under noise interference and varying operating conditions. Summary of the Invention

[0004] Based on the above problems, we propose a hybrid neural network fault diagnosis method and system for the secondary suspension system of high-speed trains.

[0005] This invention relates to a hybrid neural network fault diagnosis method for a high-speed train secondary suspension system, which, in one embodiment, includes the following steps: Step S1: Collect the raw vibration signals from the sensors under six working conditions of the suspension system and preprocess the signals; Step S2: Feature extraction is performed on the preprocessed signal using a time-frequency convolutional network (TFN), including: Step S21: The core complex convolutional layer of the time-frequency convolutional network converts the original vibration signal of 1 channel into a time-frequency feature map of 32 channels; Step S22: Input the time-frequency feature map into the backbone CNN. The backbone CNN is composed of multiple standard convolutional blocks and pooling layers stacked alternately. The standard convolutional block includes at least a convolutional layer, a batch normalization layer and a GELU activation function. Step S23: The feature lengths processed by the backbone CNN are unified to a fixed value and output as a feature sequence through an adaptive average pooling layer; Step S3: Perform attention calculation on the feature sequence; generate three tensors, query Q, key K and value V, from the feature sequence through a linear projection layer, and then perform three attention mechanisms, namely compression, selection and sliding window, on the three tensors in parallel. Finally, learn three gating weights through a linear layer and normalize the gating weights with Softmax. Then, perform a weighted sum of the outputs of the three attention mechanisms to obtain the fused feature sequence. Step S4: Normalize the feature sequence after attention fusion, use inverted embedding to model the feature dimension as a sequence, concatenate CLS tokens at the beginning of the sequence, and then mine the relationship between different feature channels through 2 layers of Transformer. Step S5: Extract the feature vector corresponding to the CLS token, input it into the fully connected classifier to output the fault prediction category, and complete the fault diagnosis of the secondary suspension system of the high-speed train. The fault prediction results include normal state, air spring deflator, secondary lateral damper failure, and anti-roll torsion bar failure.

[0006] In one embodiment, the six operating conditions in step S1 include normal operating condition, air spring failure condition of position 1, air spring failure condition of the whole vehicle, removal of position 1 and position 3 secondary lateral shock absorbers, removal of position 1 secondary lateral shock absorber, and removal of position 1 anti-roll torsion bar. The signal preprocessing in step S1 includes, but is not limited to, adding Gaussian noise to the original vibration signal obtained by sampling; adjusting the data distribution to a mean of 0 and a standard deviation of 1 by subtracting the mean and dividing by the standard deviation through Z-score standardization; and obtaining the training set and test set by using sliding window sampling.

[0007] In one embodiment, in step S2, the time-frequency convolutional network uses the STFT kernel function; the filter of the core complex convolutional layer is a complex filter, and the learning process of the filter frequency parameters is implemented through the backpropagation algorithm, which is adaptively adjusted according to the characteristics of the fault samples in the training set.

[0008] In one embodiment, in step S3, the learning process of the gating weights maps the features of each time step to three weight values through a linear layer. The three weight values correspond to the output weights of three attention mechanisms: compression, selection, and sliding window, respectively. After Softmax normalization, the sum of the weights is 1. One approach is to directly calculate the original QK similarity matrix using attention, assess importance based on the similarity between complete tokens, or calculate the complete QK dot product similarity matrix, directly sort each query token by Top-K importance based on the original similarity values, and then perform weighted aggregation based on the key-value pairs corresponding to the sorting results.

[0009] In one embodiment, in step S4, the inverse embedding process first performs dimensional permutation on the feature tensor, converting the feature dimension into a sequence dimension, and then maps the time series of each feature to the model dimension through a linear embedding layer; each encoder layer of the two-layer Transformer encoder includes a multi-head self-attention mechanism, layer normalization, feedforward neural network and residual link connected in sequence.

[0010] The present invention provides a hybrid neural network fault diagnosis method and system for a high-speed train secondary suspension system, which, compared with the prior art, has at least the following advantages: By leveraging the complementary strengths and synergistic effects of convolutional neural networks, attention mechanisms, and Transformers, the accuracy of fault diagnosis in the secondary suspension system of high-speed trains is improved. This achieves progressive extraction and fusion from local to global and from time-frequency features to channel relationship features; extracts physically meaningful local features; the intermediate nonlinear self-attention (NSA) module filters and enhances features; and the back-end inverted embedding Transformer (iTransformer) focuses on mining the complex dependencies between multi-sensor channels, thus achieving layer-by-layer refinement and enhancement of feature representation.

[0011] This invention uses a Time-Frequency Network (TFN) as a time-frequency feature extractor, combines a natively trainable sparse attention mechanism (NSA) that integrates three attention mechanisms for attention calculation, and finally inputs the features into the iTransformer (invertedEmbedding Transformer) model to deeply explore the correlation between features, and then outputs the classification result by the classification layer. Attached Figure Description

[0012] Figure 1 This is a flowchart illustrating the overall technical process of the present invention. Figure 2 This is a diagram of the TFN-NSA-iTransformer architecture; Figure 3 Trend charts for the training and validation sets of each model in the ablation experiment; Figure 4 A comparison chart showing the accuracy of different models in the comparative experiment. Detailed Implementation

[0013] A hybrid neural network fault diagnosis system for a high-speed train secondary suspension system, in one embodiment, includes, The data acquisition and preprocessing module collects raw vibration signals from sensors under six operating conditions of the suspension system and preprocesses the signals. The time-frequency convolutional network module extracts features from the preprocessed signal using a time-frequency convolutional network (TFN). This includes first converting the 1-channel raw vibration signal into a 32-channel time-frequency feature map using the core complex convolutional layer of the TFN; then inputting the time-frequency feature map into the backbone CNN, which is composed of multiple standard convolutional blocks and pooling layers stacked alternately. Each standard convolutional block includes at least a convolutional layer, a batch normalization layer, and a GELU activation function; and finally, using an adaptive average pooling layer to unify the feature length of the backbone CNN to a fixed value and output it as a feature sequence. The native sparse attention mechanism (NSA) module performs attention calculations on the feature sequence; it generates three tensors, query Q, key K, and value V, from the feature sequence through a linear projection layer, and then performs three attention mechanisms, namely compression, selection, and sliding window, on the three tensors in parallel. Finally, it learns three gate weights through a linear layer, normalizes the gate weights through Softmax, and then performs a weighted sum of the outputs of the three attention mechanisms to obtain the fused feature sequence. The iTransformer module normalizes the feature sequence after attention fusion, uses inverted embedding to model the feature dimension as a sequence, concatenates CLS tokens at the beginning of the sequence, and then mines the relationship between different feature channels through two layers of Transformer. The classification output module extracts the feature vector corresponding to the CLS token, inputs it into a fully connected classifier, and outputs the fault prediction category to complete the fault diagnosis of the secondary suspension system of the high-speed train. The fault prediction results include normal state, air spring deflator, secondary lateral damper failure, and anti-roll torsion bar failure.

[0014] In one embodiment, the six operating conditions in the data acquisition and preprocessing module include normal operating condition, air spring failure condition, air spring failure condition for the entire vehicle, condition for removing the second-stage lateral dampers of the first and third stages, condition for removing the second-stage lateral damper of the first stage, and condition for removing the anti-roll torsion bar of the first stage. The data acquisition and preprocessing module includes, but is not limited to, adding Gaussian noise to the sampled raw vibration signal; adjusting the data distribution to a mean of 0 and a standard deviation of 1 by subtracting the mean and dividing by the standard deviation through Z-score standardization; and obtaining the training set and test set by using sliding window sampling.

[0015] In one embodiment, in the time-frequency convolutional network module, the time-frequency convolutional network adopts the STFT kernel function; the filter of the core complex convolutional layer is a complex filter, and the learning process of the filter frequency parameters is implemented through the backpropagation algorithm, which is adaptively adjusted according to the characteristics of fault samples in the training set.

[0016] In one embodiment, in the native sparse attention mechanism NSA module, the learning process of the gating weights maps the features of each time step to three weight values through a linear layer. The three weight values correspond to the output weights of the three attention mechanisms: compression, selection, and sliding window, respectively. After Softmax normalization, the sum of the weights is 1. The selected attention mechanism calculates the complete dot product similarity matrix between query Q and key K, sorts the similarity vectors corresponding to each query token, and selects the K key-value pairs with the highest similarity for weighted aggregation, thereby achieving precise importance filtering at the token level.

[0017] In one embodiment, in the iTransformer module, the inverse embedding process first performs dimensional permutation on the feature tensor, converting the feature dimension into a sequence dimension, and then maps the time series of each feature to the model dimension through a linear embedding layer; each encoder layer of the two-layer Transformer encoder includes a multi-head self-attention mechanism, layer normalization, feedforward neural network, and residual connection connected in sequence.

[0018] The above-mentioned technical features can be combined in various suitable ways or replaced by equivalent technical features, as long as the purpose of the present invention can be achieved.

[0019] like Figure 1 The specific steps are as follows: Data was collected through a simulated fault test of the secondary suspension system of a high-speed train on a rolling vibration test bench. Based on vehicle dynamics calculations, a set of sensors best characterized the fault features of the secondary suspension system of the high-speed train was selected. Data collected under normal operating conditions was denoted as Label 0; data collected under the condition of air spring deflating at position 1 was denoted as Label 1; data collected under the condition of air spring deflating throughout the train was denoted as Label 2; data collected under the condition of removing the lateral dampers at positions 1 and 3 was denoted as Label 3; data collected under the condition of removing the lateral damper at position 1 was denoted as Label 4; and data collected under the condition of removing the anti-roll torsion bar at position 1 was denoted as Label 5. The sensor signals under the six operating conditions were stored as raw data.

[0020] The sensor sampling frequency was 1000Hz, with a data acquisition time of 60s per operating condition, corresponding to 60,000 data points per label. To simulate the random interference that sensor data may contain during actual high-speed train operation and to improve the robustness of the model, 1% Gaussian noise was added to the sample data. To reduce the impact of outlier data, the sample data was standardized using Z-score by subtracting the mean and dividing by the standard deviation, adjusting the data distribution to a mean of 0 and a standard deviation of 1. To ensure consistent length, all sequences were uniformly truncated to the first 59,904 data points. A sliding window sampling method was used to obtain a total of 2,718 samples, each with a length of 2,048. The ratio of training set, validation set, and test set was 6:2:2. like Figure 2After the dataset is prepared, the training data is first processed using a Time-Frequency Convolutional Network (TFN) to process the original vibration signals. First, a core complex convolutional layer (TFconv) replaces the randomly initialized convolutional kernels in the CNN, using filters with explicit physical meaning and learnable frequency parameters to convert the 1-channel input into a 32-channel time-frequency feature map. Then, a backbone CNN consisting of alternating stacks of multiple standard convolutional blocks (Conv1d + BatchNorm1d + GELU activation) and pooling layers (MaxPool1d) gradually expands the receptive field and compresses the sequence length. Finally, an adaptive average pooling layer unifies the feature map length to a fixed value, with an output size of [Batch, Time_Steps=64, d_model=128]. Next, attention calculation is performed. First, the feature sequence is passed through a linear projection layer to generate three tensors: query (Q), key (K), and value (V), each with dimensions of [Batch, 64, 128]. Then, three attention mechanisms—compression, selection, and sliding window—are executed in parallel. Finally, a linear layer learns three gating weights for each time step's features. After Softmax normalization, the outputs of the three attention mechanisms are weighted and summed to obtain a fused feature sequence with dimensions [Batch, 64, 128]. After normalizing the attention outputs, inverted embedding is used to model the feature dimensions as a sequence. A learnable CLS token is appended to the beginning of the sequence, making the sequence length 65. Then, a two-layer Transformer is used to mine the complex dependencies between different feature channels. Finally, the feature vector corresponding to the [CLS] token is taken and a fully connected classifier is used to output the predicted class.

[0021] The NSA module includes three attention mechanisms: compressed attention, selection attention, and sliding window attention. Selection attention directly calculates the original QK similarity matrix, assessing importance based on complete token-to-token similarity, avoiding the block partitioning and index mapping overhead of compressed attention, and achieving accurate token-level importance assessment. Alternatively, it calculates the complete Query-Key dot product similarity matrix, directly ranking each query token by Top-K importance based on the original similarity values, and then collecting the corresponding key-value pairs for local weighted aggregation based on the ranking results, avoiding intermediate compression and index mapping calculations based on block partitioning.

[0022] like Figures 3-4To verify the effectiveness of each sub-module of the TFN-NSA-iTransformer hybrid neural network, ablation experiments were constructed. In these experiments, the proposed method was compared with Without TFconv, Without NSA, Without inverted Embedding, TFN, and iTransformer. The experiments demonstrated that each module effectively improved the model's accuracy. To evaluate the effectiveness of the proposed method, TFN-NSA-iTransformer was compared with RF, SVM, CNN-LSTM, CNN-Transformer, WCCN-BiLSTM, and TimeMIL. The experiments were run five times and the average value was taken. The results show that the proposed model has advantages in both accuracy and convergence speed.

[0023] Compared to traditional model-based methods and existing single-structure models, this technology achieves progressive extraction and fusion from local to global, and from time-frequency features to channel relationship features, through a hierarchical architecture design. The Time-Frequency Convolutional Network (TFN), as the front-end module, is specifically optimized for the time-frequency characteristics of vibration signals, extracting physically meaningful local features. The intermediate Nonlinear Self-Attention (NSA) module filters and enhances features through a multi-scale attention mechanism, highlighting fault-related information. The back-end Inverted Embedded Transformer (iTransformer) focuses on uncovering complex dependencies between multiple sensor channels. This hierarchical, specialized design allows each module to fully leverage its strengths, while the overall architecture achieves layer-by-layer refinement and enhancement of feature representations.

[0024] This invention improves the accuracy of fault diagnosis in the secondary suspension system of high-speed trains by leveraging the complementary advantages and synergistic effects of convolutional neural networks, attention mechanisms, and Transformers.

[0025] Since TFN is used as a feature extractor, the STFT kernel function, which has a simpler structure and a fixed time-frequency resolution, is selected. Because it is a classification task rather than a prediction and generation task, the positional encoding of NSA is removed, the block partitioning is changed to non-overlapping fixed blocks, the attention branch is simplified by adaptive average pooling, and token-level importance is calculated based on the original QK similarity to avoid redundant calculation of block index mapping.

[0026] While the invention has been described herein with reference to specific embodiments, it should be understood that these embodiments are merely examples of the principles and applications of the invention. Therefore, it should be understood that many modifications can be made to the exemplary embodiments, and other arrangements can be designed without departing from the spirit and scope of the invention as defined by the appended claims. It should be understood that different dependent claims and features described herein can be combined in ways different from those described in the original claims. It is also understood that features described in conjunction with individual embodiments can be used in other described embodiments.

Claims

1. A hybrid neural network fault diagnosis method for a high-speed train secondary suspension system, characterized in that, Includes the following steps: Step S1: Collect the raw vibration signals from the sensors under six working conditions of the suspension system and preprocess the signals; Step S2: Feature extraction is performed on the preprocessed signal using a time-frequency convolutional network (TFN), including: Step S21: The core complex convolutional layer of the time-frequency convolutional network converts the original vibration signal of 1 channel into a time-frequency feature map of 32 channels; Step S22: Input the time-frequency feature map into the backbone CNN. The backbone CNN is composed of multiple standard convolutional blocks and pooling layers stacked alternately. The standard convolutional block includes at least a convolutional layer, a batch normalization layer and a GELU activation function. Step S23: The feature lengths processed by the backbone CNN are unified to a fixed value and output as a feature sequence through an adaptive average pooling layer; Step S3: Perform attention calculation on the feature sequence; generate three tensors, query Q, key K and value V, from the feature sequence through a linear projection layer, and then perform three attention mechanisms, namely compression, selection and sliding window, on the three tensors in parallel. Finally, learn three gating weights through a linear layer and normalize the gating weights with Softmax. Then, perform a weighted sum of the outputs of the three attention mechanisms to obtain the fused feature sequence. Step S4: Normalize the feature sequence after attention fusion, use inverted embedding to model the feature dimension as a sequence, concatenate CLS tokens at the beginning of the sequence, and then mine the relationship between different feature channels through 2 layers of Transformer. Step S5: Extract the feature vector corresponding to the CLS token, input it into the fully connected classifier to output the fault prediction category, and complete the fault diagnosis of the secondary suspension system of the high-speed train. The fault prediction results include normal state, air spring deflator, secondary lateral damper failure, and anti-roll torsion bar failure.

2. The high-speed train secondary suspension system hybrid neural network fault diagnosis method according to claim 1, characterized in that, The six operating conditions in step S1 include normal operating condition, air spring failure in position 1, air spring failure in the whole vehicle, removal of position 1 and position 3 secondary lateral shock absorbers, removal of position 1 secondary lateral shock absorber, and removal of position 1 anti-roll torsion bar. The signal preprocessing in step S1 includes, but is not limited to, adding Gaussian noise to the original vibration signal obtained by sampling; adjusting the data distribution to a mean of 0 and a standard deviation of 1 by subtracting the mean and dividing by the standard deviation through Z-score standardization; and obtaining the training set and test set by using sliding window sampling.

3. The high-speed train secondary suspension system hybrid neural network fault diagnosis method according to claim 2, characterized in that, In step S2, the time-frequency convolutional network uses the STFT kernel function; the filter of the core complex convolutional layer is a complex filter, and the learning process of the filter frequency parameters is implemented through the backpropagation algorithm, which is adaptively adjusted according to the characteristics of the fault samples in the training set.

4. The high-speed train secondary suspension system hybrid neural network fault diagnosis method according to claim 1, characterized in that, In step S3, the learning process of the gate weights maps the features of each time step to three weight values through a linear layer. The three weight values correspond to the output weights of the three attention mechanisms: compression, selection, and sliding window. After Softmax normalization, the sum of the weights is 1. The selected attention mechanism calculates the complete dot product similarity matrix between query Q and key K, sorts the similarity vectors corresponding to each query token, and selects the K key-value pairs with the highest similarity for weighted aggregation, thereby achieving precise importance filtering at the token level.

5. The hybrid neural network fault diagnosis method for the secondary suspension system of high-speed trains according to claim 1, characterized in that, In step S4, the inverse embedding process first performs dimensional permutation on the feature tensor, converting the feature dimension into a sequence dimension, and then maps the time series of each feature to the model dimension through a linear embedding layer; each encoder layer of the two-layer Transformer encoder includes a multi-head self-attention mechanism, layer normalization, feedforward neural network and residual connection connected in sequence.

6. A hybrid neural network fault diagnosis system for a high-speed train secondary suspension system, characterized in that, It includes a data acquisition and preprocessing module, which collects raw vibration signals from sensors under six working conditions of the suspension system and preprocesses the signals; The time-frequency convolutional network module extracts features from the preprocessed signal using a time-frequency convolutional network (TFN). This includes first converting the 1-channel raw vibration signal into a 32-channel time-frequency feature map using the core complex convolutional layer of the TFN; then inputting the time-frequency feature map into the backbone CNN, which is composed of multiple standard convolutional blocks and pooling layers stacked alternately. Each standard convolutional block includes at least a convolutional layer, a batch normalization layer, and a GELU activation function; and finally, using an adaptive average pooling layer to unify the feature length of the backbone CNN to a fixed value and output it as a feature sequence. The native sparse attention mechanism (NSA) module performs attention calculations on the feature sequence; it generates three tensors, query Q, key K, and value V, from the feature sequence through a linear projection layer, and then performs three attention mechanisms, namely compression, selection, and sliding window, on the three tensors in parallel. Finally, it learns three gate weights through a linear layer, normalizes the gate weights through Softmax, and then performs a weighted sum of the outputs of the three attention mechanisms to obtain the fused feature sequence. The iTransformer module normalizes the feature sequence after attention fusion, uses inverted embedding to model the feature dimension as a sequence, concatenates CLS tokens at the beginning of the sequence, and then mines the relationship between different feature channels through two layers of Transformer. The classification output module extracts the feature vector corresponding to the CLS token, inputs it into a fully connected classifier, and outputs the fault prediction category to complete the fault diagnosis of the secondary suspension system of the high-speed train. The fault prediction results include normal state, air spring deflator, secondary lateral damper failure, and anti-roll torsion bar failure.

7. The high-speed train secondary suspension system hybrid neural network fault diagnosis system according to claim 6, characterized in that, The six operating conditions in the data acquisition and preprocessing module include normal operating condition, air spring failure condition, air spring failure condition for the whole vehicle, condition with removal of the first and third secondary lateral shock absorbers, condition with removal of the first secondary lateral shock absorber, and condition with removal of the first anti-roll torsion bar. The data acquisition and preprocessing module includes, but is not limited to, adding Gaussian noise to the sampled raw vibration signal; adjusting the data distribution to a mean of 0 and a standard deviation of 1 by subtracting the mean and dividing by the standard deviation through Z-score standardization; and obtaining the training set and test set by using sliding window sampling.

8. The high-speed train secondary suspension system hybrid neural network fault diagnosis system according to claim 6, characterized in that, In the time-frequency convolutional network module, the time-frequency convolutional network adopts the STFT kernel function; the filter of the core complex convolutional layer is a complex filter, and the learning process of the filter frequency parameters is realized through the backpropagation algorithm, which is adaptively adjusted according to the characteristics of the fault samples in the training set.

9. The high-speed train secondary suspension system hybrid neural network fault diagnosis system according to claim 6, characterized in that, In the native sparse attention mechanism NSA module, the learning process of the gate weights maps the features of each time step to three weight values through a linear layer. The three weight values correspond to the output weights of the three attention mechanisms: compression, selection, and sliding window, respectively. After Softmax normalization, the sum of the weights is 1. One approach is to directly calculate the original QK similarity matrix using attention, assess importance based on the similarity between complete tokens, or calculate the complete QK dot product similarity matrix, directly sort each query token by Top-K importance based on the original similarity values, and then perform weighted aggregation based on the key-value pairs corresponding to the sorting results.

10. The high-speed train secondary suspension system hybrid neural network fault diagnosis system according to claim 6, characterized in that, In the iTransformer module, the inverse embedding process first performs dimensional permutation on the feature tensor, converting the feature dimension into a sequence dimension, and then maps the time series of each feature to the model dimension through a linear embedding layer; each encoder layer of the two-layer Transformer encoder includes a multi-head self-attention mechanism, layer normalization, feedforward neural network and residual connection connected in sequence.