A method for cross-modal monitoring data-driven train running gear fault diagnosis
By designing a parallel hybrid attention mechanism and an adaptive confidence gating unit, the shortcomings of single-modal data and the defects of multi-modal fusion strategies in train running gear fault diagnosis are solved, achieving high-precision and robust fault diagnosis, which is suitable for fault identification of key components of train running gear.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- XINGHUI INTELLIGENT TECHNOLOGY (CHANGZHOU) CO LTD
- Filing Date
- 2026-03-03
- Publication Date
- 2026-06-19
AI Technical Summary
Existing train running gear fault diagnosis methods rely on single-modal data, which can easily lead to missed or misdiagnosed cases due to missing key features. Furthermore, multimodal fusion strategies lack adaptive adjustment capabilities, affecting the accuracy and robustness of diagnosis.
The design incorporates a parallel hybrid attention mechanism and an adaptive confidence gating unit. By forcibly preserving intramodal autocorrelation features and dynamically adjusting the fusion weights based on signal quality, cross-modal feature interaction enhancement is achieved.
It significantly improves the accuracy and robustness of fault diagnosis in train running gear, reduces the misdiagnosis rate, adapts to signal quality fluctuations under complex operating conditions, and is easy to deploy in train on-board terminals.
Smart Images

Figure FT_1 
Figure FT_2 
Figure FT_3
Abstract
Description
Technical Field
[0001] This invention relates to the field of rail transit operation and maintenance technology, and in particular to a method for fault diagnosis of train running gear driven by cross-modal monitoring data. Background Technology
[0002] As a core transmission component ensuring safe train operation, the health of the train's running gear directly impacts operational safety. With continuously increasing train speeds, critical components such as gears and bearings in the running gear are subjected to harsh conditions of high speed, heavy load, and intense wheel-rail impact, making them highly susceptible to early damage such as pitting and spalling. If such faults are not identified promptly and accurately, they can lead to serious derailments, causing significant loss of life and property. Therefore, achieving efficient and reliable fault diagnosis of the running gear is a crucial step in ensuring the safe and stable operation of the rail transit system.
[0003] Traditional fault diagnosis methods mainly rely on human experience to analyze monitoring signals, which is inefficient, highly subjective, and difficult to handle the massive and complex data generated under high-speed operation. In recent years, intelligent diagnostic technology based on deep learning, with its powerful automatic feature extraction and pattern recognition capabilities, has gradually replaced traditional methods and become a research hotspot and mainstream direction in the field of train operation and maintenance.
[0004] Existing intelligent diagnostic methods for train running gear are mainly divided into two categories: single-mode diagnostics and multi-mode fusion diagnostics. Single-mode methods typically rely on single-mode monitoring data such as vibration acceleration signals or acoustic emission signals. While vibration signals can reflect mechanical impacts, they are easily interfered with by strong background noise such as wheel-rail impacts and structural vibrations during high-speed train operation. Acoustic emission signals are more sensitive to early, weak faults, but the signal attenuates rapidly and the data volume is enormous, making it difficult to use as a reliable diagnostic basis alone. Single-mode data often only observes one aspect of the equipment's condition, resulting in perception blind spots. For example, vibration signals are not sensitive to high-frequency stress waves, while acoustic emission signals are difficult to capture the vibration modes and frequency characteristics of the overall structure. Therefore, relying solely on a single signal for diagnosis can easily lead to missed or misdiagnosed cases due to the lack of key features.
[0005] To overcome the limitations of single-modal information, multimodal fusion methods have emerged, aiming to enhance the integrity of state information by combining the advantages of different sensors. Current mainstream technologies mostly employ feature splicing or cross-modal attention mechanisms for information fusion, but these still face several challenges. On the one hand, cross-modal attention mechanisms often overemphasize common features between modes, potentially weakening the unique key information of each mode. For example, the frequency rotation periodicity characteristic unique to vibration signals is easily diluted during the fusion process. On the other hand, common splicing or weighted fusion methods usually preset fixed weights, failing to consider the differences in sensitivity of different fault types to each mode, and also failing to adapt to the fluctuations in sensor signals with operating conditions during actual train operation. This results in a lack of adaptive adjustment capability in the fusion strategy, limiting the robustness and accuracy of the diagnostic model. Therefore, researching a novel intelligent fusion diagnostic method that can adaptively balance modal contributions and fully retain and mine the unique and key fault information of each mode is of significant theoretical and practical importance for improving the accuracy, robustness, and engineering practicality of train running gear fault diagnosis. Summary of the Invention
[0006] To address the above issues, this invention designs a parallel hybrid attention mechanism that forcibly retains the autocorrelation features within a mode before cross-modal interaction, preventing unique key features from being ignored while enhancing cross-modal feature interaction. It also designs an adaptive confidence gating unit to replace traditional feature splicing, enabling the model to automatically adjust the fusion weights based on the real-time quality of the signal. This significantly improves the accuracy of fault diagnosis for key components of the train running gear without increasing complex physical constraints.
[0007] According to an embodiment of the present invention, a method for fault diagnosis of train running gear driven by cross-modal monitoring data is provided.
[0008] In a first aspect of the invention, a method for fault diagnosis of train running gear driven by cross-modal monitoring data is provided. The method includes: Step S01: Collect historical vibration acceleration signals and acoustic emission signals of key components of the train running gear, extract and standardize the signals, and construct a dual-modal sample library containing training and test sets; Step S02: Construct a fault diagnosis model, which includes a dual-stream feature extractor, a parallel hybrid attention module, an adaptive confidence gating unit, and a fault classifier connected in sequence; Step S03: Using the training set data, supervised training of the fault diagnosis model is performed by minimizing the cross-entropy loss function to obtain the trained fault diagnosis model. The test set data is then input into the trained fault diagnosis model to output the fault category. Step S04: Deploy the fault diagnosis model, collect vibration acceleration signals and acoustic emission signals of key components of the train running gear, and after intercepting and standardizing the signals, input them into the fault diagnosis model for fault category classification.
[0009] Furthermore, the vibration acceleration signal mentioned in step S01 is acquired by an acceleration sensor installed on a key component of the train running gear, and the acoustic emission signal is acquired by an acoustic emission sensor installed on a key component of the train running gear.
[0010] Furthermore, the signal truncation and standardization preprocessing described in step S01 includes: slicing the continuous time series signal into fixed lengths and using the Z-score standardization method to eliminate dimensional differences.
[0011] Further, in step S02, the fault diagnosis model inputs the preprocessed vibration acceleration signal and acoustic emission signal into a dual-stream feature extractor to extract vibration features and acoustic emission features. After obtaining vibration hybrid enhancement features and acoustic emission hybrid enhancement features through a parallel hybrid attention module, the model inputs them into an adaptive confidence gating unit for weighted fusion to obtain fused features. The fused features are then input into a fault classifier to obtain the fault probability distribution.
[0012] Furthermore, the dual-stream feature extractor includes a structurally symmetrical vibration feature extraction branch and an acoustic emission feature extraction branch, which are used to extract deep feature representations from the original vibration acceleration signal and acoustic emission signal, respectively. Both branches are constructed using a one-dimensional convolutional neural network, and each branch contains several convolutional blocks. Each convolutional block consists of a one-dimensional convolutional layer, a batch normalization layer, and a ReLU activation function.
[0013] Furthermore, the parallel hybrid attention module performs intra-modal self-attention calculation and cross-modal mutual attention calculation in parallel on the two branches, and performs residual fusion of the two calculation results to output vibration hybrid enhancement features and acoustic emission hybrid enhancement features. The specific steps are as follows: Step S021: Identify vibration signal characteristics Linear mapping to query vector Key vector Sum value vector ; Step S022: Transfer the query vector transpose of the key vector Perform matrix multiplication and divide by Scaling is applied; the ratings are transformed into a probability distribution, i.e., attention weights, using the SoftMax function; the attention weight matrix and value vector are then analyzed. Matrix multiplication is performed to obtain intramodal self-attention enhancement features. ; Step S023: Calculate cross-modal mutual attention enhancement features via cross-modal flow. ; Step S024: ... and After concatenation, the concatenation is performed using a 1×1 convolution transformation and then compared with the original input. By performing residual linking, vibration signal enhancement characteristics are obtained. ; Step S025: Perform the symmetric operations of steps S021-S024 on the acoustic emission hybrid enhancement feature to obtain the acoustic emission signal enhancement feature. .
[0014] Furthermore, the adaptive confidence gating unit generates dynamic gating weights through the evaluation network, and performs weighted fusion of vibration signal enhancement features and acoustic emission signal enhancement features based on these weights, outputting the final fused features.
[0015] In a second aspect of the invention, an apparatus for fault diagnosis of train running gear driven by cross-modal monitoring data is provided. The apparatus includes: Signal acquisition module: used to acquire historical vibration acceleration signals and acoustic emission signals of key components of the train running gear, perform signal interception and standardization preprocessing, and construct a dual-modal sample library containing training and test sets; Model building module: used to build fault diagnosis models, including a two-stream feature extractor, a parallel hybrid attention module, an adaptive confidence gating unit, and a fault classifier connected in sequence; Model training module: Used to perform supervised training of the fault diagnosis model by minimizing the cross-entropy loss function using training set data, to obtain the trained fault diagnosis model, input test set data into the trained fault diagnosis model, and output fault category; Model application module: Used to deploy fault diagnosis models, collect vibration acceleration signals and acoustic emission signals of key components of the train running gear, and after intercepting and standardizing the signals, input them into the fault diagnosis model for fault category classification.
[0016] In a third aspect of the invention, an electronic device is provided. The electronic device includes a memory and a processor, the memory storing a computer program, the processor executing the program to implement the method according to a first aspect of the invention.
[0017] In a fourth aspect of the invention, a computer-readable storage medium is provided having a computer program stored thereon, which, when executed by a processor, implements the method according to a first aspect of the invention.
[0018] This invention designs a parallel hybrid attention mechanism to forcibly retain the autocorrelation features within a mode before cross-modal interaction, and prevents unique key features from being ignored while enhancing cross-modal feature interaction. It also designs an adaptive confidence gating unit to replace the traditional feature splicing, enabling the model to automatically adjust the fusion weights according to the real-time quality of the signal, thereby significantly improving the fault diagnosis accuracy of key components of the train running gear without increasing complex physical constraints.
[0019] It should be understood that the description in the Summary of the Invention is not intended to limit the key or essential features of the embodiments of the present invention, nor is it intended to restrict the scope of the invention. Other features of the invention will become readily apparent from the following description.
[0020] The beneficial effects of this invention are: 1. It retains both intramodal self-attention and cross-modal mutual attention, which not only enables feature complementarity between modes, but also ensures that the model can still lock its periodic features when the vibration signal of the train running gear is subject to strong noise interference, thus enhancing the completeness of the features. 2. The introduced confidence gating mechanism can dynamically adjust the fusion weights in real time according to the input signal quality, and can adaptively cope with signal quality fluctuations. When the reliability of a certain mode signal decreases, the mechanism can automatically reduce its weight and enhance the dependence on reliable modes, thereby maintaining the robustness of diagnosis under complex working conditions and significantly reducing the misdiagnosis rate. 3. This method does not require complex cause-effect graph construction or manually designed signal processing rules. It can achieve high-precision diagnosis through end-to-end deep learning and is easy to deploy on train on-board terminals. Attached Figure Description
[0021] The above and other features, advantages, and aspects of the various embodiments of the present invention will become more apparent from the accompanying drawings and the following detailed description. Wherein: Figure 1 A flowchart of a method for fault diagnosis of train running gear driven by cross-modal monitoring data according to an embodiment of the present invention is shown; Figure 2 A schematic diagram of the fault diagnosis model network structure according to an embodiment of the present invention is shown; Figure 3 A schematic diagram of the vibration branch structure of a parallel hybrid attention module according to an embodiment of the present invention is shown; Figure 4 A schematic diagram of a fault diagnosis verification model structure for applying a fault diagnosis model to a test set according to an embodiment of the present invention is shown. Figure 5 A block diagram of a device for fault diagnosis of train running gear driven by cross-modal monitoring data is shown according to an embodiment of the present invention; Figure 6 A schematic diagram of a device for fault diagnosis of train running gear driven by cross-modal monitoring data is shown according to an embodiment of the present invention. Detailed Implementation
[0022] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0023] According to an embodiment of the present invention, a method for fault diagnosis of train running gear driven by cross-modal monitoring data is proposed. By designing a parallel hybrid attention mechanism, the autocorrelation features within the mode are forcibly retained before cross-modal interaction, and unique key features are prevented from being ignored while cross-modal feature interaction enhancement is performed. An adaptive confidence gating unit is designed to replace the traditional feature splicing, enabling the model to automatically adjust the fusion weights according to the real-time quality of the signal, thereby significantly improving the fault diagnosis accuracy of key components of the train running gear without increasing complex physical constraints.
[0024] The principles and spirit of the present invention will be explained in detail below with reference to several representative embodiments.
[0025] Figure 1 This is a schematic flowchart of a method for fault diagnosis of train running gear driven by cross-modal monitoring data, according to an embodiment of the present invention. The method includes: Step S01: Collect historical vibration acceleration signals and acoustic emission signals of key components of the train running gear, extract and standardize the signals, and construct a dual-modal sample library containing training and test sets; Step S02: Construct a fault diagnosis model, which includes a dual-stream feature extractor, a parallel hybrid attention module, an adaptive confidence gating unit, and a fault classifier connected in sequence; Step S03: Using the training set data, supervised training of the fault diagnosis model is performed by minimizing the cross-entropy loss function to obtain the trained fault diagnosis model. The test set data is then input into the trained fault diagnosis model to output the fault category. Step S04: Deploy the fault diagnosis model, collect vibration acceleration signals and acoustic emission signals of key components of the train running gear, and after intercepting and standardizing the signals, input them into the fault diagnosis model for fault category classification.
[0026] It should be noted that although the operation of the method of the present invention has been described in a specific order in the above embodiments and figures, this does not require or imply that the operations must be performed in that specific order, or that all the operations shown must be performed to achieve the desired result. Additionally or alternatively, certain steps may be omitted, multiple steps may be combined into one step, and / or one step may be broken down into multiple steps.
[0027] To provide a clearer explanation of the above-mentioned method for fault diagnosis of train running gear driven by cross-modal monitoring data, a specific embodiment will be used for illustration below. However, it is worth noting that this embodiment is only for better illustrating the present invention and does not constitute an improper limitation of the present invention.
[0028] The following specific example will further illustrate the method of train running gear fault diagnosis driven by cross-modal monitoring data: Step S01: Collect historical vibration acceleration signals and acoustic emission signals of key components of the train running gear, extract and standardize the signals, and construct a dual-modal sample library containing training and test sets.
[0029] Specifically, vibration acceleration signals and acoustic emission signals are synchronously acquired using acceleration sensors and acoustic emission sensors installed on key components of the train's running gear. Preprocessing includes slicing the continuous time-series signals into fixed lengths and using Z-score normalization to eliminate dimensional differences.
[0030] In this embodiment, vibration acceleration and acoustic emission signals of the axle box bearing, a key component of a train's running gear, were collected on a test bench at speeds of 400-700 rpm. The dataset included four health states: normal, inner race fault, outer race fault, and roller fault, labeled with numbers 0-3 respectively. Overlapping slices were applied to the collected continuous signals, with a sample length of 2048 points. Z-score normalization was performed on each sample. With 600 samples per class, the processed samples were randomly divided into a training set (70%) and a test set (30%).
[0031] Step S02: Construct a fault diagnosis model, which includes a dual-stream feature extractor, a parallel hybrid attention module, an adaptive confidence gating unit, and a fault classifier connected in sequence.
[0032] like Figure 2 As shown, the preprocessed vibration acceleration signal and acoustic emission signal are input into a dual-stream feature extractor to extract vibration features and acoustic emission features. After obtaining vibration hybrid enhancement features and acoustic emission hybrid enhancement features through a parallel hybrid attention module, they are input into an adaptive confidence gating unit for weighted fusion to obtain fused features. The fused features are then input into a fault classifier to obtain the fault probability distribution.
[0033] The dual-stream feature extractor comprises a structurally symmetrical vibration feature extraction branch and an acoustic emission feature extraction branch, which are used to extract deep feature representations from the original vibration acceleration signal and acoustic emission signal, respectively. Both branches are constructed using a one-dimensional convolutional neural network (1D-CNN), and each branch contains several convolutional blocks. Each convolutional block consists of a one-dimensional convolutional layer, a batch normalization layer, and a ReLU activation function.
[0034] In this embodiment, the vibration feature extraction branch consists of five one-dimensional convolutional layers. Each one-dimensional convolutional layer is followed by a batch normalization layer and a ReLU activation function, using vibration acceleration signal samples from the sample library. As input, output vibration signal characteristics The acoustic emission feature extraction branch consists of five one-dimensional convolutional layers. Each one-dimensional convolutional layer is followed by a batch normalization layer and a ReLU activation function, using acoustic emission signal samples from the sample library. As input, output acoustic emission signal characteristics .
[0035] The parallel hybrid attention module is used to enhance the extracted deep features. For any branch, the module performs intramodal self-attention calculation and cross-modal mutual attention calculation in parallel, and performs residual fusion of the two calculation results to output hybrid enhanced features.
[0036] Specifically, the characteristics of vibration signals Linear mapping to query vector Key vector Sum value vector ; Characteristics of acoustic emission signals Linear mapping to query vector Key vector Sum value vector .
[0037] In this embodiment, as Figure 3 As shown, the query vector transpose of the key vector Perform matrix multiplication and divide by Scaling is then applied. The scores are then transformed into a probability distribution, i.e., attention weights, using the SoftMax function. The attention weight matrix and value vector are then used. Matrix multiplication is performed to obtain intramodal self-attention enhancement features. , In the formula, For activation function, For feature dimensions.
[0038] Cross-modal mutual attention enhancement features are computed using cross-modal flow. Using acoustic emission signals to supplement weak high-frequency fault information .
[0039] Will and After concatenation, the concatenation is performed using a 1×1 convolution transformation and then compared with the original input. By performing residual linking, vibration signal enhancement characteristics are obtained. , In the formula, The initial value is set to 0.5 as a learnable equilibrium parameter; This is a convolution operation.
[0040] Similarly, performing a symmetric operation on the acoustic emission branch yields the acoustic emission signal enhancement features. .
[0041] An adaptive confidence gating unit is used to evaluate the confidence of each modal enhancement feature. The evaluation network generates dynamic gating weights, and the enhancement features of vibration and acoustic emission are weighted and fused based on these weights to output the final fused features.
[0042] Specifically, the evaluation network, composed of Global Average Pooling (GAP) and Multilayer Perceptron (MLP), is used to calculate the gating weights of the two branches respectively: Will Input the evaluation network to calculate the confidence weights of the vibration modes: , In the formula, The Sigmoid activation function has an output range of (0,1). Will Input the data into the evaluation network and calculate the confidence weights for the acoustic emission modes: ; The calculated weights are used to perform a weighted summation of the enhanced features to obtain the final fused features. : .
[0043] Gated fusion can automatically suppress the weights of interfered modes based on signal quality.
[0044] The fault classifier is used to map fused features to a fault category space and output a fault probability distribution.
[0045] The fault classifier consists of fully connected layers and a softmax activation function, with fused features as input. The output is a predicted probability vector for the fault category.
[0046] In this embodiment, the fault classifier employs a fully connected network architecture, designed as a two-layer structure. The dimensions of its hidden layers are set to 128 and 4 respectively. A ReLU activation function is applied after the first fully connected layer, while a Softmax activation function is applied after the last fully connected layer. The model input is the final fused features. The final output four-dimensional vector is used to represent the predicted probability of the input sample in each type of fault. Specifically, the vector corresponds to the predicted probability of the four types of faults: normal, inner race fault, outer race fault, and roller fault.
[0047] Step S03: Using the training set data, supervise the training of the fault diagnosis model by minimizing the cross-entropy loss function to obtain the trained fault diagnosis model. Input the test set data into the trained fault diagnosis model and output the fault category.
[0048] The model’s objective loss function includes optimized training of the fault classifier, adaptive confidence gating unit, parallel hybrid attention module, and two-stream feature extractor.
[0049] The optimization algorithm uses the root mean square propagation algorithm, with a learning rate of 0.001. Model training is stopped after 300 iterations.
[0050] The test set sample fault diagnosis verification model structure is as follows: Figure 4 As shown.
[0051] Table 1 shows the evaluation table of the diagnostic accuracy of the fault diagnosis model.
[0052] Table 1 Diagnostic Results As shown in the table, the diagnostic accuracy of the method of this invention reached 97.22%, while the diagnostic accuracy of the multimodal fusion diagnostic method DCMFC was only 93.19%. The diagnostic accuracy of the single-modal method and the simple feature splicing method were both below 90%, significantly lower than that of the method of this invention. The diagnostic results verify that the method of this invention, with its ability to lock onto key features through parallel hybrid attention and its ability to automatically suppress noise modes through adaptive gating, can be effectively applied to the fault diagnosis of key components of train running gear.
[0053] Step S04: Deploy the fault diagnosis model, collect vibration acceleration signals and acoustic emission signals of key components of the train running gear, and after intercepting and standardizing the signals, input them into the fault diagnosis model for fault category classification.
[0054] In summary, the method of this invention, through the construction of a dual-stream feature extractor, simultaneously acquires deep features of vibration and acoustic emission signals, effectively compensating for the inherent limitation of insufficient information in single-modal monitoring data when characterizing complex states of train running gear. The further constructed parallel hybrid attention module, by simultaneously executing intra-modal self-enhancement and cross-modal complementarity strategies, deeply mines and locks the inherent periodic features of each modality while achieving refined semantic associations between heterogeneous modalities, ensuring feature completeness. The designed adaptive confidence gating unit, through real-time confidence evaluation of each modal enhancement feature, realizes dynamic weight allocation and weighted fusion based on signal quality, effectively suppressing the negative interference of inferior modalities on diagnostic results in noisy environments. This invention significantly improves the fault identification accuracy and robustness of the model under complex background noise, providing an efficient cross-modal collaborative sensing solution for the reliable operation and maintenance of train running gear.
[0055] Based on the same inventive concept, this invention also proposes a device for fault diagnosis of train running gear driven by cross-modal monitoring data. The implementation of this device can be found in the implementation of the method described above, and repeated details will not be elaborated further. Figure 5 As shown, the device 100 includes: Signal acquisition module 101: used to acquire historical vibration acceleration signals and acoustic emission signals of key components of the train running gear, perform signal interception and standardization preprocessing, and construct a dual-modal sample library containing training and test sets; Model building module 102: used to build a fault diagnosis model, including a two-stream feature extractor, a parallel hybrid attention module, an adaptive confidence gating unit and a fault classifier connected in sequence; Model training module 103: Used to perform supervised training on the fault diagnosis model by minimizing the cross-entropy loss function using the training set data to obtain the trained fault diagnosis model, input the test set data into the trained fault diagnosis model, and output the fault category. Model application module 104: Used to deploy fault diagnosis models, collect vibration acceleration signals and acoustic emission signals of key components of the train running gear, and after intercepting and standardizing the signals, input them into the fault diagnosis model for fault category classification.
[0056] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working process of the described module can be referred to the corresponding process in the foregoing method embodiments, and will not be repeated here.
[0057] like Figure 6As shown, the device includes a central processing unit (CPU), which can perform various appropriate actions and processes based on computer program instructions stored in read-only memory (ROM) or loaded from storage units into random access memory (RAM). The RAM can also store various programs and data required for device operation. The CPU, ROM, and RAM are interconnected via a bus. Input / output (I / O) interfaces are also connected to the bus.
[0058] Multiple components in the device are connected to the I / O interface, including: input units such as keyboards and mice; output units such as various types of displays and speakers; storage units such as disks and optical discs; and communication units such as network interface cards (NICs), modems, and wireless transceivers. The communication unit allows the device to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks.
[0059] The processing unit executes the various methods and processes described above, such as method steps S01 to S04. For example, in some embodiments, method steps S01 to S04 may be implemented as a computer software program tangibly contained in a machine-readable medium, such as a storage unit. In some embodiments, part or all of the computer program may be loaded and / or installed on the device via ROM and / or a communication unit. When the computer program is loaded into RAM and executed by the CPU, one or more steps of method steps S01 to S04 described above may be performed. Alternatively, in other embodiments, the CPU may be configured to execute method steps S01 to S04 by any other suitable means (e.g., by means of firmware).
[0060] The functions described above in this document can be performed at least in part by one or more hardware logic components. For example, exemplary types of hardware logic components that can be used, without limitation, include: field programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), systems-on-a-chip (SoCs), payload programmable logic devices (CPLDs), and so on.
[0061] The program code used to implement the methods of the present invention can be written in any combination of one or more programming languages. This program code can be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing device, such that when executed by the processor or controller, the program code causes the functions / operations specified in the flowcharts and / or block diagrams to be implemented. The program code can be executed entirely on the machine, partially on the machine, as a standalone software package partially on the machine and partially on a remote machine, or entirely on a remote machine or server.
[0062] In the context of this invention, a machine-readable medium can be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium can be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media can include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fibers, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
[0063] Furthermore, although the operations are described in a specific order, this should be understood as requiring that such operations be performed in the specific order shown or in sequential order, or requiring that all illustrated operations be performed to achieve the desired result. In certain environments, multitasking and parallel processing may be advantageous. Similarly, although several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the invention. Certain features described in the context of individual embodiments may also be implemented in combination in a single implementation. Conversely, various features described in the context of a single implementation may also be implemented individually or in any suitable sub-combination in multiple implementations.
[0064] Although the subject matter has been described using language specific to structural features and / or methodological logic, it should be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or actions described above. Rather, the specific features and actions described above are merely illustrative examples of implementing the claims.
Claims
1. A method for fault diagnosis of train running gear driven by cross-modal monitoring data, characterized in that, The method includes: Step S01: Collect historical vibration acceleration signals and acoustic emission signals of key components of the train running gear, extract and standardize the signals, and construct a dual-modal sample library containing training and test sets; Step S02: Construct a fault diagnosis model, which includes a dual-stream feature extractor, a parallel hybrid attention module, an adaptive confidence gating unit, and a fault classifier connected in sequence; Step S03: Using the training set data, supervised training of the fault diagnosis model is performed by minimizing the cross-entropy loss function to obtain the trained fault diagnosis model. The test set data is then input into the trained fault diagnosis model to output the fault category. Step S04: Deploy the fault diagnosis model, collect vibration acceleration signals and acoustic emission signals of key components of the train running gear, and after intercepting and standardizing the signals, input them into the fault diagnosis model for fault category classification.
2. The method for fault diagnosis of train running gear driven by cross-modal monitoring data according to claim 1, characterized in that, The vibration acceleration signal mentioned in step S01 is acquired by an acceleration sensor installed on a key component of the train running gear, and the acoustic emission signal is acquired by an acoustic emission sensor installed on a key component of the train running gear.
3. The method for fault diagnosis of train running gear driven by cross-modal monitoring data according to claim 1, characterized in that, The preprocessing of signal truncation and standardization described in step S01 includes: slicing the continuous time series signal into fixed lengths and using the Z-score standardization method to eliminate dimensional differences.
4. The method for fault diagnosis of train running gear driven by cross-modal monitoring data according to claim 1, characterized in that, The fault diagnosis model described in step S02 inputs the preprocessed vibration acceleration signal and acoustic emission signal into a dual-stream feature extractor to extract vibration features and acoustic emission features. After obtaining vibration hybrid enhancement features and acoustic emission hybrid enhancement features through a parallel hybrid attention module, the model inputs them into an adaptive confidence gating unit for weighted fusion to obtain fused features. The fused features are then input into a fault classifier to obtain the fault probability distribution.
5. The method for fault diagnosis of train running gear driven by cross-modal monitoring data according to claim 1, characterized in that, The dual-stream feature extractor includes a structurally symmetrical vibration feature extraction branch and an acoustic emission feature extraction branch, which are used to extract deep feature representations from the original vibration acceleration signal and acoustic emission signal, respectively. Both branches are constructed using a one-dimensional convolutional neural network, and each branch contains several convolutional blocks. Each convolutional block consists of a one-dimensional convolutional layer, a batch normalization layer, and a ReLU activation function.
6. The method for fault diagnosis of train running gear driven by cross-modal monitoring data according to claim 5, characterized in that, The parallel hybrid attention module performs intramodal self-attention calculation and cross-modal mutual attention calculation in parallel on the two branches, and fuses the results of the two calculations using residuals to output vibration hybrid enhancement features and acoustic emission hybrid enhancement features. The specific steps are as follows: Step S021: Identify vibration signal characteristics Linear mapping to query vector Key vector Sum value vector ; Step S022: Transfer the query vector transpose of the key vector Perform matrix multiplication and divide by Scaling is applied; the ratings are converted into a probability distribution, i.e., attention weights, using the SoftMax function; Attention weight matrix and value vector Matrix multiplication is performed to obtain intramodal self-attention enhancement features. ; Step S023: Calculate cross-modal mutual attention enhancement features via cross-modal flow. ; Step S024: ... and After concatenation, the concatenation is performed using a 1×1 convolution transformation and then compared with the original input. By performing residual linking, vibration signal enhancement characteristics are obtained. ; Step S025: Perform the symmetric operations of steps S021-S024 on the acoustic emission hybrid enhancement feature to obtain the acoustic emission signal enhancement feature. .
7. The method for fault diagnosis of train running gear driven by cross-modal monitoring data according to claim 6, characterized in that, The adaptive confidence gating unit generates dynamic gating weights by evaluating the network, and performs weighted fusion of vibration signal enhancement features and acoustic emission signal enhancement features based on these weights, outputting the final fused features.
8. A device for fault diagnosis of train running gear driven by cross-modal monitoring data, characterized in that, The device implements the method as described in any one of claims 1 to 7, comprising: Signal acquisition module: used to acquire historical vibration acceleration signals and acoustic emission signals of key components of the train running gear, perform signal interception and standardization preprocessing, and construct a dual-modal sample library containing training and test sets; Model building module: used to build fault diagnosis models, including a two-stream feature extractor, a parallel hybrid attention module, an adaptive confidence gating unit, and a fault classifier connected in sequence; Model training module: Used to perform supervised training of the fault diagnosis model by minimizing the cross-entropy loss function using training set data, to obtain the trained fault diagnosis model, input test set data into the trained fault diagnosis model, and output fault category; Model application module: Used to deploy fault diagnosis models, collect vibration acceleration signals and acoustic emission signals of key components of the train running gear, and after intercepting and standardizing the signals, input them into the fault diagnosis model for fault category classification.
9. An electronic device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the program, it implements the method as described in any one of claims 1 to 7.
10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the program is executed by the processor, it implements the method as described in any one of claims 1 to 7.