Railcar safety state real-time monitoring and early warning system based on deep learning

By using multimodal data acquisition and deep learning methods, the railcar operation data is converted into three-channel color images. Combined with a dual-stream feature enhancement network, this solves the problems of insufficient real-time performance and feature extraction capabilities in traditional railcar safety monitoring systems. It enables high-precision early fault identification and automated early warning, thereby improving the safety and operational efficiency of rail transit.

CN120697816BActive Publication Date: 2026-06-26JIANGSU FLYING SHUTTLE INTELLIGENT CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
JIANGSU FLYING SHUTTLE INTELLIGENT CO LTD
Filing Date
2025-07-28
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Traditional railcar safety monitoring systems suffer from poor real-time performance and weak feature extraction capabilities, making it difficult to meet the requirements for early fault detection and response speed in high-speed and intelligent rail transit. In particular, they struggle to extract key features from high-dimensional, nonlinear, and time-dependent railcar operation data.

Method used

By employing multimodal data acquisition, cross-modal feature transformation, and deep learning methods, time-series data is encoded into three-channel color images through Markov transition fields, recursive graphs, and Gram angle fields. Combined with a dual-stream feature enhancement network, feature learning and few-sample classification are performed to achieve high-precision safety status identification and early warning.

Benefits of technology

It significantly improves the accuracy of fault identification and the timeliness of early warning response, provides intelligent data-driven rail transit safety assurance, and has strong generalization ability and robustness under small sample conditions.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN120697816B_ABST
    Figure CN120697816B_ABST
Patent Text Reader

Abstract

The application relates to the field of intelligent driving of rail transit, and discloses a real-time monitoring and early warning system for the safety state of a rail car based on deep learning, which comprises the following modules: a multi-modal data acquisition module, which is used for deploying sensors on key components of the rail car and collecting time series data including vibration, temperature, pressure and operation parameters; a cross-modal feature conversion module, which is used for encoding the collected time series data into a three-channel color image; a safety state evaluation module, which is used for deep modeling and feature learning on the three-channel color image, real-time prediction of a current state classification result of the rail car, and obtaining of a safety state evaluation result of the rail car; and an early warning and intelligent decision module, which is used for performing abnormality detection and alarm prompting on the rail car when potential risks exist according to the safety state evaluation result. The application realizes high-precision early fault identification and automatic early warning.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of intelligent driving technology for rail transit, and in particular to a real-time monitoring and early warning system for the safety status of rail vehicles based on deep learning. Background Technology

[0002] As rail transit systems develop towards higher speeds and greater intelligence, the operational safety of rail vehicles has become a critical guarantee for urban rail and trunk transportation systems. Traditional rail vehicle safety monitoring systems typically rely on fixed-point monitoring or periodic manual inspections, which suffer from problems such as insufficient monitoring coverage, poor real-time performance, and delayed early warnings, making it difficult to meet the requirements for early fault detection and response speed under long-term continuous operation of rail vehicles.

[0003] In recent years, the rapid development of sensor technology, multimodal data acquisition, and edge computing devices has enabled rail vehicles to collect various types of data, such as vibration, temperature, pressure, and speed, in real time during operation. However, effectively integrating these heterogeneous time-series data, mining potential operational anomalies, and utilizing deep learning models to achieve high-precision safety status identification and early warning remains a significant challenge in the field of intelligent monitoring. In particular, traditional data processing methods struggle to extract key features from rail vehicle operation data due to its high dimensionality, nonlinearity, and time-series dependence, resulting in low fault diagnosis accuracy. Furthermore, existing technologies generally employ only single-channel time-series data, such as Gram angle field conversion data, failing to capture state transitions and nonlinear features, and relying on fully supervised training, making them ill-suited for small-sample fault scenarios.

[0004] Therefore, there is an urgent need for a data modeling and intelligent recognition method with stronger feature expression and discrimination capabilities to improve the safety assurance capabilities of rail vehicles throughout their entire life cycle. Summary of the Invention

[0005] This invention aims to provide a real-time monitoring and early warning system for the safety status of rail vehicles based on deep learning, which solves the problems of poor real-time performance and weak feature extraction capabilities of traditional monitoring methods, and achieves high-precision early fault identification and automated early warning.

[0006] To achieve the above objectives, the following technical solution is adopted:

[0007] A real-time monitoring and early warning system for the safety status of rail vehicles based on deep learning, comprising:

[0008] The multimodal data acquisition module is used to deploy sensors on key components of the railcar and collect time-series data including vibration, temperature, pressure, and operating parameters.

[0009] The cross-modal feature conversion module is used to encode the collected time-series data into a three-channel color image, including a Markov transition field channel constructed based on state transition probability, a recursive graph channel constructed based on state similarity, and a Gram angle field channel constructed based on angle relationship.

[0010] The safety status assessment module is used to perform deep modeling and feature learning on the three-channel color images, predict the current status classification result of the railcar in real time, and obtain the safety status assessment result of the railcar.

[0011] The early warning and intelligent decision-making module is used to detect anomalies and issue alarms when there are potential risks to the railcar, based on the safety status assessment results.

[0012] Furthermore, the multimodal data acquisition module specifically includes:

[0013] Vibration sensors, temperature sensors, pressure sensors, and speed and acceleration sensors are deployed on wheels, axles, motors, braking systems, and vehicle body structures, and the data collected forms a multimodal raw data stream.

[0014] The multimodal raw data stream is preprocessed by the edge device to form a data stream with a unified format.

[0015] Furthermore, the cross-modal feature conversion module performs the following processing:

[0016] The preprocessed time series data is discretized into a state sequence, and a Markov transition field is constructed to capture state transition information as the first channel.

[0017] A recursive graph is constructed by setting a similarity threshold, which represents the nonlinear dynamic characteristics of the time series as a second channel;

[0018] Normalized data is mapped to polar coordinate angles, and a Gram angle field is constructed to represent local temporal correlation as a third channel;

[0019] The three channels are fused into a 224×224×3 three-channel color image.

[0020] Furthermore, the security status assessment module includes:

[0021] The feature extraction unit extracts the temporal image features of the three-channel color images through a shared feature extractor, performs dimensionality compression, and outputs a high-dimensional feature vector.

[0022] The dual-stream processing unit includes a main classification branch and an auxiliary contrastive branch. For the high-dimensional feature vector output by the shared feature extractor, the main classification branch receives feature vectors of similar samples in the support set and calculates the arithmetic mean of the high-dimensional feature vectors to generate prototype vectors for each category; the auxiliary contrastive learning branch optimizes the feature space by aggregating features of similar samples and separating features of dissimilar samples, generating an optimized feature space structure.

[0023] The meta-learning training unit is used to model the railcar safety assessment problem as an N-way K-shot few-shot classification problem. It constructs a few-shot classification task and performs end-to-end training on the model. It optimizes the network parameters by weighted fusion of classification loss and contrastive loss, and outputs the optimal parameters for the shared feature extractor training convergence.

[0024] Furthermore, the security status assessment module also includes:

[0025] A real-time classification decision unit, used for rapid prediction using only the primary classification branch in new tasks, specifically includes:

[0026] The trained and optimized shared feature extractor is used to map real-time sensor data into the feature space;

[0027] Real-time query of sample images, conversion of them into high-dimensional feature vectors, and comparison with the Euclidean distance of prototype vectors of each category;

[0028] The prototype category with the smallest distance is selected as the current state classification result, including the safety state label and the class probability distribution vector.

[0029] When an abnormal state is detected, an early warning message is automatically generated, which includes the type of abnormality, the location of the abnormality, and suggestions for handling the situation.

[0030] Furthermore, the shared feature extractor includes:

[0031] A four-layer cascaded convolutional module is used to map the input three-channel color image into a 512-dimensional temporal image feature vector; wherein, each convolutional module in the four-layer cascaded convolutional module contains a convolutional layer, a batch normalization layer and a ReLU activation function, used to extract time-frequency domain features at different scales.

[0032] Furthermore, the four-layer cascaded convolutional module specifically includes:

[0033] The first convolutional module performs a 3×3 convolution operation to generate a 64-channel feature map. After ReLU activation and 2×2 max pooling, it outputs a 112×112×64-dimensional feature map, which is used to extract the basic edge features of the track vehicle's state transition.

[0034] The second convolution module performs group convolution operations to generate 128-channel feature maps, which are then batch normalized to output 56×56×128-dimensional features to enhance the ability to capture nonlinear dynamic behavior features.

[0035] The third convolution module performs 3×3 dilated convolution with a dilation rate of 2 to generate a 256-channel feature map, which is then output as a 28×28×256-dimensional feature map through spatial pyramid pooling, and is used to fuse multi-scale temporal structure features.

[0036] The fourth convolutional module performs global average pooling for dimensionality reduction and maps it to a 512-dimensional temporal image feature vector through a fully connected layer.

[0037] Furthermore, the main classification branch adopts a few-sample meta-learning mechanism to model the safety status of the railcar as an N-class K-sample support set task;

[0038] The operations performed by the auxiliary comparison branch include:

[0039] Label the training batch samples with similar / dissimilar relationship tags;

[0040] Construct positive and negative sample pairs and apply spatial constraints: apply a feature distance reduction constraint to sample pairs of the same class; apply a feature distance increase constraint to sample pairs of different classes, and set a minimum interval threshold. When the distance between samples of different classes is less than the minimum interval threshold, a separation penalty is triggered.

[0041] Output the optimized feature space structure.

[0042] Furthermore, the operations performed by the meta-learning training unit include:

[0043] A few-sample task that samples N classes and K samples from historical data, each task containing a support set and a query set;

[0044] The training process jointly optimizes the cross-entropy loss of the main classification branch and the supervised contrast loss of the auxiliary branch, and forms the final loss function through weighted fusion;

[0045] The parameters of the shared feature extractor are iteratively updated to enable it to simultaneously possess classification and discrimination capabilities as well as feature space structure optimization capabilities.

[0046] Furthermore, the early warning and intelligent decision-making module includes:

[0047] The graded early warning unit is used to match the preset risk level according to the safety status assessment results, generate early warning information including the anomaly type, the location of occurrence, and the predicted consequences; and trigger graphical interface warnings and voice broadcasts through the human-computer interaction interface.

[0048] The response strategy generation unit is used to automatically generate handling suggestions based on the abnormal risk level and operational priority, including at least one of deceleration operation, station parking and maintenance, and emergency braking; and to send linkage control commands to the vehicle control system.

[0049] The closed-loop optimization unit is used to record early warning information and response execution results to the system log; and to update the expert rule base and model training dataset based on the log data.

[0050] Compared with the prior art, the present invention achieves the following beneficial effects:

[0051] 1. This invention proposes a time-series imaging method based on Markov Transfer Field (MTF), Recurrence Graph (RP), and Gram Corner Field (GAF) to encode raw time-series data from railcar sensors into three-channel color images. This method can effectively capture the dynamic amplitude changes, nonlinear evolution behavior, and local temporal structure during railcar operation, thereby enhancing feature representation capabilities and providing a unified high-dimensional input for deep neural networks.

[0052] 2. This invention constructs a dual-stream feature enhancement network structure, including a main classification branch and an auxiliary contrastive learning branch, and shares a multi-scale feature extractor. This structure can achieve efficient few-shot classification through a prototype network, and can also make full use of the auxiliary branch to structurally optimize the feature embedding space, thereby improving the model's discriminative performance for complex multi-class operating states.

[0053] 3. This invention proposes an assisted supervised contrastive learning branch mechanism, which enhances the model's ability to learn semantic boundaries between categories by constructing positive and negative sample pairs and introducing supervisory signals. This method can effectively shorten the feature distance between samples in the same operating state and widen the embedding distance between samples in different states, thereby improving the model's accuracy in recognizing minor anomalies and critical states.

[0054] 4. This invention designs a supervised contrastive loss function and a final weighted loss function structure. By jointly optimizing classification accuracy and embedding space structure during model training, it guides the feature extractor to learn more discriminative embedding representations. The weighted loss integrates cross-entropy loss and contrastive loss, enabling the model to maintain strong generalization ability and robustness even with limited sample conditions.

[0055] In summary, this invention, by combining time-series image processing technology, prototype learning structure, and supervised comparative learning mechanism, constructs an efficient and scalable railcar safety status monitoring system, which significantly improves the accuracy of fault identification and the timeliness of early warning response, providing intelligent and data-driven technical support for the safe operation of rail transit.

[0056] It should be understood that the description in the Summary of the Invention is not intended to limit the key or essential features of the embodiments of the present invention, nor is it intended to restrict the scope of the invention. Other features of the invention will become readily apparent from the following description. Attached Figure Description

[0057] The above and other features, advantages, and aspects of the various embodiments of the present invention will become more apparent from the accompanying drawings and the following detailed description. The drawings are provided for a better understanding of the invention and are not intended to limit the invention. In the drawings, the same or similar reference numerals denote the same or similar elements, wherein:

[0058] Figure 1 This is a schematic diagram of a module of a real-time monitoring and early warning system for the safety status of a railcar based on deep learning, according to an embodiment of the present invention.

[0059] Figure 2 This is a schematic diagram of the overall architecture of a deep learning-based real-time monitoring and early warning system for the safety status of rail vehicles according to an embodiment of the present invention.

[0060] Figure 3 This is a schematic diagram of a dual-stream network structure for a real-time monitoring and early warning system for the safety status of a railcar based on deep learning, according to an embodiment of the present invention. Detailed Implementation

[0061] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0062] Furthermore, the term "and / or" in this article is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, or B existing alone. Additionally, the character " / " in this article generally indicates that the preceding and following related objects have an "or" relationship.

[0063] Figure 1 This is a schematic diagram of a module of a real-time monitoring and early warning system for the safety status of a railcar based on deep learning, according to an embodiment of the present invention. Figure 2 This is a schematic diagram of the overall architecture of a deep learning-based real-time monitoring and early warning system for the safety status of rail vehicles, according to an embodiment of the present invention. Figure 1 and Figure 2 As shown, a real-time monitoring and early warning system 100 for the safety status of rail vehicles based on deep learning includes:

[0064] The multimodal data acquisition module 110 is used to deploy sensors on key components of the railcar and collect time-series data including vibration, temperature, pressure, and operating parameters.

[0065] Furthermore, the multimodal data acquisition module 110 specifically includes:

[0066] Multiple types of high-precision sensors are deployed in key components such as wheels, axles, motors, braking systems, and vehicle body structures to construct a comprehensive, multimodal sensing system. Specifically, this includes:

[0067] Vibration sensors: installed on the wheels, axles and chassis structure to monitor in real time whether there are abnormal vibrations, bearing wear or track unevenness in the structure;

[0068] Temperature sensors: deployed in motors, cables, brake pads, etc., to monitor heating trends and provide early warning of possible overload or short circuit faults;

[0069] Pressure sensors: used to monitor the working status of hydraulic braking and suspension systems and determine whether there is leakage or abnormal pressure;

[0070] Speed ​​and acceleration sensors: used to collect operating parameter data of the railcar, analyze the motion state of the railcar during operation, and identify dangerous behaviors such as sudden acceleration / deceleration and slippage;

[0071] The time-series signals collected by the aforementioned sensors form a multimodal raw data stream. This data will be preprocessed by edge devices (such as noise reduction, normalization, and time synchronization), standardized in format, and uploaded to the backend system to provide high-quality data input for subsequent deep learning models.

[0072] The cross-modal feature conversion module 120 is used to encode the acquired time-series data into a three-channel color image, including a Markov transition field channel constructed based on state transition probability, a recursive graph channel constructed based on state similarity, and a Gram angle field channel constructed based on angle relationship.

[0073] To achieve efficient identification and early warning of the safety status of rail vehicles, it is necessary to perform cross-modal feature transformation on the large amount of time-series data collected by sensors (such as wheel and axle vibration, motor temperature, braking pressure, and running speed), and encode it into image form to adapt to the input structure of deep learning networks. To this end, a time-series imageization method based on Markov transition fields (MTF), recurrent graphs (RP), and Gram angle fields (GAF) is proposed. The specific execution process of the cross-modal feature transformation module 120 is as follows:

[0074] (1) Constructing the Markov transition field (MTF)

[0075] The preprocessed time series data is discretized into state sequences. For example, the preprocessed time series data is discretized into 10 state sequences using the equal-width binning method, and a Markov transition field is constructed to capture state transition information as the first channel.

[0076] MTF is used to capture the dynamic changes in the operating status of key components of a railcar (such as vibration amplitude, temperature fluctuations, etc.), encoding the state transition probabilities of a one-dimensional time series into the R channel of the image.

[0077]

[0078] in, The element in the i-th row and j-th column of the Markov transition field matrix represents the state of the track vehicle transitioning from discrete states. Transition to The conditional probability; :from Transition to The statistical probability reflects the trend of state change; The state value is the result of normalizing and discretizing the original sensor data.

[0079] This method enhances the "state transition" information during the operation of the railcar, and is particularly effective in detecting abnormal states (such as sudden failures).

[0080] (2) Constructing a recursion graph (RP)

[0081] A recursive graph is constructed by setting a similarity threshold, which represents the nonlinear dynamic characteristics of the time series as a second channel;

[0082] RP (Repetitive Tracking) is used to reveal the nonlinear dynamic characteristics of track vehicle operation data and is suitable for constructing image G channels. This image can represent the repeatability of system states and trajectory similarity along the time axis.

[0083]

[0084] in, The value in the i-th row and j-th column of the recursive graph represents the time point. and Are their states similar enough? A step function that outputs 1 if the expression within the parentheses is positive, and 0 otherwise. Similarity threshold is used to adjust the sensitivity to changes in the condition of the railcar. It can be selected as 1.5 times the standard deviation of the time series (i.e., ϵ=1.5σ) to adapt to the data fluctuation range under different operating conditions of the railcar. Time point and The Euclidean distance between them. For multimodal sensor data, this can be expanded to:

[0085]

[0086] Where k represents different dimensions of the time series; This represents the reading at time point i in the k-th sensor channel (e.g., the k-th vibration shaft or temperature sensor). This represents the reading at time point j on the k-th sensor channel (e.g., the k-th vibration shaft or temperature sensor).

[0087] This diagram helps to uncover the differences in behavior patterns of the railcar system under normal and abnormal operating conditions.

[0088] (3) Constructing the Gram Field (GAF)

[0089] Normalized data is mapped to polar coordinate angles, and a Gram angle field is constructed to represent local temporal correlation as a third channel;

[0090] GAF is used to express the correlation structure of a time series signal of a railcar within a local time range, and can encode the angular information between time points into the B channel of the image:

[0091]

[0092] in, The value at position (i,j) in the Gram angle field matrix represents the time point. and The angular relationship between them. The perspective representation of time series data is defined as follows:

[0093]

[0094] in, : The normalized value of the track vehicle sensor data at the i-th time point. X: The complete normalized time series data (such as a velocity fluctuation sequence or temperature change sequence). min(X): The minimum value in time series X, used to normalize the data so that all data points are mapped to the same scale range. max(X): The maximum value in time series X, used to normalize the data so that all data points are mapped to the same scale range. arccos(·): The inverse cosine function, which maps the normalized data to the projection of the angle range [0,π] in polar coordinates.

[0095] This method can effectively capture periodic or trend changes in the operating state, which helps to improve the model's ability to perceive subtle fault characteristics.

[0096] (4) Constructing a three-channel fused image

[0097] Finally, the MTF, RP, and GAF ​​images from the three channels are fused into a single 224×224×3 dimensional three-channel color image, forming a unified data input format:

[0098]

[0099] The merged color images are processed separately using... Dynamic amplitude variations, nonlinear behavior patterns, and local temporal structures in the three-channel encoded time series. Markov transfer fields are used in the R channel to represent the dynamic amplitude changes of a time series. Recursive graphs are used for the G channel to represent the nonlinear characteristics of time series. Gram angle field is used for channel B to represent the local temporal relationship of a time series.

[0100] This invention achieves complementarity through the aforementioned three-channel design: MTF is highly sensitive to abrupt faults (such as bearing fracture), RP can identify nonlinear oscillations (such as abnormal wheel-rail friction), and GAF ​​can capture periodic degradation (such as motor wear). The fusion of these three channels covers all fault modes of the railcar. In contrast, existing technologies only use single Gram angle field transformation time series data, which cannot capture state transitions and nonlinear features, and rely on fully supervised training, making it difficult to adapt to small-sample fault scenarios.

[0101] Through the above visualization process, the original multi-dimensional time-series data of the track vehicle is uniformly transformed into image data, which not only preserves the key features of the signal, but also enhances the spatial structure expression capability of the data, providing strong input support for subsequent safety status identification and fault early warning.

[0102] The safety status assessment module 130 is used to perform deep modeling and feature learning on the three-channel color image, predict the current status classification result of the railcar in real time, and obtain the safety status assessment result of the railcar.

[0103] The core objective of the safety status assessment module 130 is to perform deep modeling and feature learning on multi-source data (time-series data after image processing (three-channel color images)) during the operation of the railcar, to assess the safety status of the railcar in real time, and to perform accurate anomaly detection and alarm prompts when potential risks are discovered.

[0104] To achieve accurate security status determination, this invention designs a dual-stream feature enhancement network, including a main classification branch and an auxiliary contrastive learning branch, both of which share a multi-scale feature extractor. . Figure 3 This is a diagram of a two-stream network structure according to an embodiment of the present invention.

[0105] Furthermore, the safety status assessment module 130 includes:

[0106] The feature extraction unit 131 extracts the temporal image features of the three-channel color image through a shared feature extractor, performs dimensionality compression, and outputs a high-dimensional feature vector.

[0107] A shared feature extractor 1311 is used to embed image-based temporal data input into an M-dimensional feature space, specifically mapping a 224×224×3 three-channel color image into a 512-dimensional temporal image feature vector. Its structure includes four cascaded convolutional modules, each containing a convolutional layer, a batch normalization layer, and a ReLU activation function to extract time-frequency domain features at different scales. The four cascaded convolutional modules specifically include:

[0108] The first convolutional module performs a 3×3 convolution operation to generate a 64-channel feature map. After ReLU activation and 2×2 max pooling, it outputs a 112×112×64-dimensional feature map, which is used to extract the basic edge features of the track vehicle's state transition.

[0109] The second convolution module performs group convolution operations to generate 128-channel feature maps, which are then batch normalized to output 56×56×128-dimensional features to enhance the ability to capture nonlinear dynamic behavior features.

[0110] The third convolution module performs 3×3 dilated convolution with a dilation rate of 2 (i.e., the kernel element spacing is 2) to generate a 256-channel feature map, which is then output as a 28×28×256-dimensional feature map through spatial pyramid pooling, and is used to fuse multi-scale temporal structure features.

[0111] The fourth convolutional module performs global average pooling for dimensionality reduction and maps it to a 512-dimensional temporal image feature vector through a fully connected layer.

[0112] Dual-stream processing unit 132, such as Figure 3 As shown, it includes a main classification branch and an auxiliary contrast learning branch. For the high-dimensional feature vector output by the shared feature extractor 1311, the main classification branch receives the feature vectors of similar samples in the support set and calculates the arithmetic mean of the high-dimensional feature vectors to generate prototype vectors for each category. The auxiliary contrast learning branch optimizes the feature space by aggregating features of similar samples and separating features of dissimilar samples, and generates an optimized feature space structure.

[0113] The main classification branch is implemented based on a prototype network and employs a few-shot learning mechanism. It models the safety status of the railcar as an N-class, K-sample support set task. For example, let the sample set of a certain class n in the support set be... Its prototype vector is defined as:

[0114]

[0115] in, :category The prototype embedding vector represents the central feature of the category. The support set for class n contains all training samples belonging to class n, mathematically represented as: ; This represents the number of samples of category n in the support set. For example, if category "normal" has 3 samples, then |S| represents the number of samples in the support set. 正常 |=3; If there are 2 samples in the category "mild anomaly", then |S 轻度异常 |=2. : Supports the u-th sample in the set. :sample Category tags. Shared Feature Extractor Feature mapping of input samples.

[0116] Assume the railcar has three safety states: normal (category 0), slightly abnormal (category 1), and severely abnormal (category 2). The support set sample distribution is as follows: for category normal (n=0), the sample size is |3|, and the sample feature vectors are [0.2, -0.1], [0.3, 0.0], [0.1, 0.1] |; for category slightly abnormal (n=1), the sample size is |2|, and the sample feature vectors are [1.8, 0.5], [2.0, 0.3] |; for category severely abnormal (n=2), the sample size is |2|, and the sample feature vectors are [3.2, -1.0], [3.5, -0.8] |. Prototype vector calculation process:

[0117] Normal class prototype (n=0):

[0118]

[0119] Mild anomalous prototype (n=1):

[0120]

[0121] Severe anomaly prototype (n=2):

[0122]

[0123] The auxiliary supervised contrastive learning branch is used to enhance the model's discriminative ability by constructing positive and negative sample pairs, bringing the feature representations of samples of the same class closer together and widening the distance between samples of different classes. The operations performed by the auxiliary contrastive branch include: labeling the training batch samples with class / dissimilar relationship labels; constructing positive and negative sample pairs and applying spatial constraints: applying a feature distance reduction constraint to class sample pairs; applying a feature distance increase constraint to dissimilar sample pairs, and setting a minimum margin threshold, triggering a separation penalty when the distance between dissimilar samples is less than this minimum margin threshold; and outputting the optimized feature space structure.

[0124] Specifically, the supervised contrastive loss of the auxiliary supervised contrastive learning branch is represented as follows:

[0125]

[0126] Supervised contrastive loss function, used to measure the model's ability to aggregate similar samples and separate dissimilar samples in the current training batch; : A training sample pair, where These are two samples taken from the training set; : The label indicator value of the sample pair, if and If they belong to the same category (i.e., the same operating state of the railcars), then ;otherwise ; Shared Feature Extractor Feature mapping of input samples; :sample The Euclidean distance in the feature space reflects the degree of similarity between their representations; The margin hyperparameter, also known as the minimum margin threshold (ranging from 0.5 to 1.0), is used to set the minimum distance that samples of different categories should maintain in the feature space to prevent feature overlap. The optional value is m=1.0. When the distance between samples of different classes is less than the margin Penalty terms are generated at certain times, prompting the model to increase the inter-class distance; Encourage similar samples to be as close as possible in the feature space; This penalizes cases where samples of different classes are too close to each other in the feature space.

[0127] The meta-learning training unit 133 is used to model the railcar safety assessment problem as an N-way K-shot few-shot classification problem, construct a few-shot classification task and perform end-to-end training on the model, optimize the network parameters by weighted fusion of classification loss and contrastive loss, and output the optimal parameters for the shared feature extractor training convergence.

[0128] During the meta-training phase:

[0129] The railcar safety assessment problem is modeled as an N-way K-shot small-sample classification problem. Historical fault data is split into thousands of N-class K-sample tasks, where N represents the number of railcar safety state categories (e.g., 5 categories: normal, mechanical fault, electrical fault, braking anomaly, track anomaly), and K represents the number of support samples for each category (3-5). By randomly combining different fault segments from historical data, thousands of training tasks are constructed to simulate real-world small-sample scenarios, enabling the model to quickly adapt with only a small number of new fault samples, overcoming the limitation of traditional models requiring massive amounts of labeled data. In the meta-training phase, support sets and query sets are sampled from different safety state categories (e.g., normal, minor anomaly, severe anomaly) to perform end-to-end training on the model.

[0130] The operations performed by the meta-learning training unit include: sampling N classes and K samples (e.g., 3 samples from 5 classes) from historical data for a few-sample task, with each task containing a support set and a query set; jointly optimizing the cross-entropy loss of the main classification branch and the supervised contrast loss of the auxiliary branch during the training process, and forming the final loss function through weighted fusion; and iteratively updating the parameters of the shared feature extractor so that it has both classification and feature space structure optimization capabilities.

[0131] In the N-class K-sample task.

[0132] Specifically, the overall training objective is to minimize the following weighted loss function:

[0133]

[0134] : Final loss function; : Weight coefficients, which balance the importance of the classification branch and the contrastive learning branch. They are optional. λ1 is 0.6 and λ2 is 0.4. The ratio is determined by cross-validation to balance classification accuracy and feature space separability. : Supervised contrastive loss for the auxiliary contrastive learning branch. Cross-entropy loss of the main classification branch, used to optimize the shared feature extractor. The parameter is expressed as:

[0135]

[0136] Query Sample Category The probability. Through continuous iterative training, the optimized feature extractor parameters are obtained. :

[0137]

[0138] The optimal parameters of the trained and optimized shared feature extractor are used to initialize the model during the meta-testing phase. : The current parameter variables of the shared feature extractor are continuously updated during training to optimize model performance; T: A small sample classification task constructed from the historical operation data of the railcar, including the support set and the query set, representing a first-dimensional training iteration; The sampling distribution of training task T is used to extract task samples from the task space in each round of meta-training to ensure that the model has good generalization ability. : Calculate the expected value over all possible training tasks T, representing the average loss under all task distributions; : On the current task T, by the parameter The loss function value calculated by the model takes into account both classification accuracy and feature embedding discriminative ability.

[0139] The real-time classification decision unit 134 is used for rapid prediction using only the main classification branch in new tasks. Specifically, it includes: mapping real-time collected sensor data to the feature space using a trained and optimized shared feature extractor; querying sample images in real time, converting them into high-dimensional feature vectors, and comparing them with the Euclidean distance of prototype vectors of each category; selecting the category corresponding to the prototype with the smallest distance as the current state classification result, including a safety status label and a class probability distribution vector; and automatically generating early warning information containing the abnormality type, the location of occurrence, and handling suggestions when an abnormal state is detected.

[0140] During the meta-testing phase:

[0141] The real-time classification decision unit 134 uses the optimal parameters obtained through meta-training. In the new task, only the main classification branch is used for fast prediction. For a given query sample... Calculate the distance between it and the prototype vectors of each class, and predict its class:

[0142]

[0143] The final prediction result, i.e., the query sample. The railcar is classified into different safety status categories (such as normal operation, minor abnormality, severe abnormality, etc.). The query samples to be classified come from real-time collected track vehicle operation data (such as imaged sensor signals). Optimal parameters of the shared feature extractor optimized using meta-training Perform feature embedding mapping on the input samples to convert the input data Convert to a high-dimensional feature vector; The prototype vector of category n represents the average feature representation of the railcar safety status level (normal, slightly abnormal, severely abnormal), obtained by extracting support samples of that category through an optimized shared feature extractor. Extracted; The similarity metric function uses Euclidean distance to measure the distance between the query sample and the prototypes of each category in the embedding space. : This indicates that among all known categories n, the prototype vector closest to the query sample is selected. This serves as the basis for the final prediction category.

[0144] The safety status labels include: Normal, Minor Anomaly, and Severe Anomaly. The class probability distribution vector is a three-dimensional probability array. An example of the class probability distribution vector format is [0.02, 0.15, 0.83], indicating an 83% probability of a severe anomaly. For a normal status level, the fault type is "None," and the recommended action is "Continue monitoring." For a minor anomaly status level, the fault type is, for example, wheel and axle imbalance, and the recommended action is "Reduce speed to 80 km / h and inspect at the next stop." For a severe anomaly status level, the fault type is, for example, brake pad overheating, and the recommended action is "Stop immediately and activate the cooling system."

[0145] This invention improves the accuracy of small sample scenarios by jointly optimizing classification and feature space through a dual-stream network, and the meta-learning framework enables rapid adaptation of fault types, which is significantly better than traditional supervised learning.

[0146] The early warning and intelligent decision-making module 140 is used to detect anomalies and issue alarms when there are potential risks to the railcar, based on the safety status assessment results.

[0147] After assessing the current operating status of the railcar and detecting anomalies, the system enters the early warning and intelligent decision support phase. When the anomaly risk reaches a set threshold, the system automatically generates an early warning message and notifies the driver or control center via a graphical interface or voice broadcast. The warning includes the anomaly type, location, predicted consequences, and response suggestions. Based on the railcar's operational tasks and priorities, the system can automatically generate scheduling suggestions, such as: recommending deceleration and entering low-power mode; recommending stopping for maintenance at the nearest station; and linking with the control system to trigger the emergency braking mechanism. All early warnings and response suggestions are recorded in the system log, serving as the basis for subsequent model optimization and expert rule updates, forming a data-driven closed-loop intelligent early warning system. Through this module, the railcar safety monitoring system can not only detect current anomalies and assist in formulating reasonable countermeasures, but also effectively improve the safety and operational efficiency of the rail transit system.

[0148] Specifically, the early warning and intelligent decision-making module 140 includes:

[0149] The graded early warning unit 141 is used to match the preset risk level according to the safety status assessment results and generate early warning information including the type of abnormality, the location of occurrence and the predicted consequences; and to trigger graphical interface warnings and voice broadcasts through the human-computer interaction interface.

[0150] The response strategy generation unit 142 is used to automatically generate handling suggestions based on the abnormal risk level and operational priority, including at least one of deceleration operation, station parking and maintenance, and emergency braking; and to send linkage control commands to the vehicle control system.

[0151] The closed-loop optimization unit 143 is used to record early warning information and response execution results to the system log; and to update the expert rule base and model training dataset based on the log data.

[0152] Optionally, the closed-loop optimization unit 143 also expands the support set samples based on false positive / false negative samples in the logs through a semi-supervised algorithm and updates the prototype vector database once a month; the expert rule base uses a decision tree model to dynamically adjust the risk level threshold.

[0153] It should be noted that the various embodiments in this specification are described in a progressive manner, with each embodiment focusing on the differences from other embodiments. Similar or identical parts between embodiments can be referred to interchangeably. For the apparatus disclosed in the embodiments, since it corresponds to the method disclosed in the embodiments, the description is relatively simple; relevant parts can be referred to in the method section.

[0154] It should also be noted that, in the embodiments of this application, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0155] The above description of the disclosed embodiments enables those skilled in the art to make or use this application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined in the embodiments of this application may be implemented in other embodiments without departing from the spirit or scope of this application. Therefore, this application is not to be limited to the embodiments shown in this application, but is to be accorded the widest scope consistent with the principles and novel features disclosed in the embodiments of this application.

Claims

1. A real-time monitoring and early warning system for the safety status of rail vehicles based on deep learning, characterized in that, include: The multimodal data acquisition module is used to deploy sensors on key components of the railcar and collect time-series data including vibration, temperature, pressure, and operating parameters. The cross-modal feature conversion module is used to encode the collected time-series data into a three-channel color image, including a Markov transition field channel constructed based on state transition probability, a recursive graph channel constructed based on state similarity, and a Gram angle field channel constructed based on angle relationship. The safety status assessment module is used to perform deep modeling and feature learning on the three-channel color images, predict the current status classification result of the railcar in real time, and obtain the safety status assessment result of the railcar. The early warning and intelligent decision-making module is used to detect anomalies and issue alarms when there are potential risks to the railcar, based on the safety status assessment results. The multimodal data acquisition module specifically includes: Vibration sensors, temperature sensors, pressure sensors, and speed and acceleration sensors are deployed on wheels, axles, motors, braking systems, and vehicle body structures, and the data collected forms a multimodal raw data stream. The multimodal raw data stream is preprocessed by an edge device to form a data stream in a unified format; The cross-modal feature conversion module performs the following processing: The preprocessed time series data is discretized into a state sequence, and a Markov transition field is constructed to capture state transition information as the first channel. A recursive graph is constructed by setting a similarity threshold, which represents the nonlinear dynamic characteristics of the time series as a second channel; Normalized data is mapped to polar coordinate angles, and a Gram angle field is constructed to represent local temporal correlation as a third channel; The three channels are fused into a 224×224×3 three-channel color image; The security status assessment module includes: The feature extraction unit extracts the temporal image features of the three-channel color images through a shared feature extractor, performs dimensionality compression, and outputs a high-dimensional feature vector. The dual-stream processing unit includes a main classification branch and an auxiliary contrastive branch. For the high-dimensional feature vector output by the shared feature extractor, the main classification branch receives feature vectors of similar samples in the support set and calculates the arithmetic mean of the high-dimensional feature vectors to generate prototype vectors for each category; the auxiliary contrastive learning branch optimizes the feature space by aggregating features of similar samples and separating features of dissimilar samples, generating an optimized feature space structure. The main classification branch is implemented based on a prototype network and employs a few-shot learning mechanism. It models the safety status of the railcar as an N-class, K-sample support set task, where the sample set of a certain class n in the support set is denoted as . Its prototype vector is defined as: ; in, :category The prototype embedding vector represents the central feature of the category; The support set for class n contains all training samples belonging to class n, mathematically represented as: ; Let |S| represent the number of samples in the support set for category n. If category "normal" has 3 samples, then |S| 正常 |=3; If the category "mild anomaly" has 2 samples, then |S 轻度异常 |=2; : Supports the u-th sample in the set; :sample Category tags; Shared Feature Extractor Feature mapping of input samples; The auxiliary contrast branch is used to narrow the feature representations of samples of the same class and widen the distance between samples of different classes by constructing positive and negative sample pairs. The meta-learning training unit is used to model the railcar safety assessment problem as an N-way K-shot few-shot classification problem. It constructs a few-shot classification task and performs end-to-end training on the model. It optimizes the network parameters by weighted fusion of classification loss and contrastive loss, and outputs the optimal parameters for the shared feature extractor training convergence.

2. The real-time monitoring and early warning system for the safety status of railcars based on deep learning according to claim 1, characterized in that, The security status assessment module also includes: A real-time classification decision unit, used for rapid prediction using only the primary classification branch in new tasks, specifically includes: The trained and optimized shared feature extractor is used to map real-time sensor data into the feature space; Real-time query of sample images, conversion of them into high-dimensional feature vectors, and comparison with the Euclidean distance of prototype vectors of each category; The prototype category with the smallest distance is selected as the current state classification result, including the safety state label and the class probability distribution vector. When an abnormal state is detected, an early warning message is automatically generated, which includes the type of abnormality, the location of the abnormality, and suggestions for handling the situation.

3. The real-time monitoring and early warning system for the safety status of railcars based on deep learning according to claim 2, characterized in that, The shared feature extractor includes: A four-layer cascaded convolutional module is used to map the input three-channel color image into a 512-dimensional temporal image feature vector; wherein, each convolutional module in the four-layer cascaded convolutional module contains a convolutional layer, a batch normalization layer and a ReLU activation function, used to extract time-frequency domain features at different scales.

4. The real-time monitoring and early warning system for the safety status of railcars based on deep learning according to claim 3, characterized in that, in, The four-layer cascaded convolutional module specifically includes: The first convolutional module performs a 3×3 convolution operation to generate a 64-channel feature map. After ReLU activation and 2×2 max pooling, it outputs a 112×112×64-dimensional feature map, which is used to extract the basic edge features of the track vehicle's state transition. The second convolution module performs group convolution operations to generate 128-channel feature maps, which are then batch normalized to output 56×56×128-dimensional features to enhance the ability to capture nonlinear dynamic behavior features. The third convolution module performs 3×3 dilated convolution with a dilation rate of 2 to generate a 256-channel feature map, which is then output as a 28×28×256-dimensional feature map through spatial pyramid pooling, and is used to fuse multi-scale temporal structure features. The fourth convolutional module performs global average pooling for dimensionality reduction and maps it to a 512-dimensional temporal image feature vector through a fully connected layer.

5. The real-time monitoring and early warning system for the safety status of railcars based on deep learning according to claim 4, characterized in that, in, The main classification branch adopts a few-sample meta-learning mechanism to model the safety status of the railcar as an N-class K-sample support set task; The operations performed by the auxiliary comparison branch include: Label the training batch samples with similar / dissimilar relationship tags; Construct positive and negative sample pairs and apply spatial constraints: apply a feature distance reduction constraint to sample pairs of the same class; apply a feature distance increase constraint to sample pairs of different classes, and set a minimum interval threshold. When the distance between samples of different classes is less than the minimum interval threshold, a separation penalty is triggered. Output the optimized feature space structure.

6. The real-time monitoring and early warning system for the safety status of railcars based on deep learning according to claim 5, characterized in that, The operations performed by the meta-learning training unit include: A few-sample task that samples N classes and K samples from historical data, each task containing a support set and a query set; The training process jointly optimizes the cross-entropy loss of the main classification branch and the supervised contrast loss of the auxiliary branch, and forms the final loss function through weighted fusion; The parameters of the shared feature extractor are iteratively updated to enable it to simultaneously possess classification and discrimination capabilities as well as feature space structure optimization capabilities.

7. The real-time monitoring and early warning system for the safety status of railcars based on deep learning according to claim 1, characterized in that, The early warning and intelligent decision-making module includes: The graded early warning unit is used to match the preset risk level according to the safety status assessment results, generate early warning information including the anomaly type, the location of occurrence, and the predicted consequences; and trigger graphical interface warnings and voice broadcasts through the human-computer interaction interface. The response strategy generation unit is used to automatically generate handling suggestions based on the abnormal risk level and operational priority, including at least one of deceleration operation, station parking and maintenance, and emergency braking; and to send linkage control commands to the vehicle control system. The closed-loop optimization unit is used to record early warning information and response execution results to the system log; and to update the expert rule base and model training dataset based on the log data.