Rail transit weak current system operation and maintenance management method

By combining deep learning and graph neural networks with reinforcement learning, a closed-loop operation and maintenance management system was constructed to solve the problems of single fault prediction, inaccurate location, and insufficient decision optimization in the operation and maintenance of rail transit low-voltage systems. This system achieves accurate fault prediction, intelligent location, and dynamic decision-making, thereby improving the safety and reliability of the system.

CN121504442BActive Publication Date: 2026-06-26SHANGHAI CREC COMM SIGNAL TESTING

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SHANGHAI CREC COMM SIGNAL TESTING
Filing Date
2026-01-13
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing operation and maintenance management methods for rail transit low-voltage systems suffer from limitations such as simplistic fault prediction, lack of systematic fault location, insufficient optimization of operation and maintenance decisions, and lack of closed-loop optimization mechanisms. These issues result in low operation and maintenance efficiency, high costs, and an inability to meet the demands for high safety and high reliability.

Method used

By employing deep learning, graph neural networks, and reinforcement learning algorithms, a full-process, closed-loop operation and maintenance management system for rail transit low-voltage systems is constructed. By integrating deep learning models to capture long-term dependencies in equipment operation data, combining graph neural networks for fault propagation analysis, and utilizing reinforcement learning to optimize operation and maintenance decisions, dynamic and intelligent operation and maintenance strategies are formed.

Benefits of technology

It has achieved more accurate fault prediction, intelligent fault location, and dynamic operation and maintenance decision-making, which has improved the overall safety and reliability of rail transit low-voltage systems and reduced the fault rate and the risk of operation interruption.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN121504442B_ABST
    Figure CN121504442B_ABST
Patent Text Reader

Abstract

The application provides a rail transit weak current system operation and maintenance management method, comprising the following steps: S1: data acquisition and preprocessing; S2: fault probability and residual useful life (RUL) prediction based on LSTM / Transformer+attention mechanism; S3: fault propagation analysis and root location based on graph neural network (GNN); and S4: operation and maintenance decision optimization based on reinforcement learning. The application fuses deep learning, graph neural network and reinforcement learning algorithm to construct a full-process and closed-loop rail transit weak current system operation and maintenance management system.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of rail transit operation and maintenance technology, and in particular to a method for operation and maintenance management of rail transit low-voltage systems. Background Technology

[0002] The low-voltage system of rail transit is the core infrastructure to ensure the safe and efficient operation of rail transit. It covers multiple subsystems such as signal control, communication transmission, power supply monitoring, and environmental monitoring. It has a large number of devices, complex topological relationships, and strong time-series of operating data. Its operation and maintenance management level directly affects the reliability and operational efficiency of the rail transit system.

[0003] The existing operation and maintenance management methods for rail transit low-voltage systems have the following core problems:

[0004] Single dimension of fault prediction: Traditional fault prediction relies on threshold judgment or simple time series analysis methods, which cannot capture the long-term dependency relationship of equipment operation data, making it difficult to accurately predict the future failure probability and remaining service life of equipment. Warning signals lack hierarchical basis, which easily leads to "missed reports" or "false reports".

[0005] Lack of systematic fault location: Fault location relies solely on human experience or the fault characteristics of a single device, without considering the physical / logical connections between devices in the low-voltage system. This makes it impossible to quantify the fault propagation path, resulting in large deviations in fault root cause location and a high risk of secondary faults.

[0006] Insufficient optimization of operation and maintenance decisions: Operation and maintenance decisions (maintenance timing, spare parts allocation, personnel arrangement, etc.) are mostly based on static rules or manual judgment, without dynamic optimization based on real-time equipment status, resource inventory and historical operation and maintenance data, which easily leads to "over-maintenance" or "under-maintenance", resulting in high operation and maintenance costs and low resource utilization.

[0007] Lack of closed-loop optimization mechanism: After the operation and maintenance strategy is executed, no data feedback is generated, the model parameters and decision rules cannot be updated iteratively according to the actual execution effect, and it is difficult to adapt to changes in system status (such as equipment aging and topology adjustment).

[0008] The aforementioned problems result in low efficiency and high cost of existing operation and maintenance management methods, which cannot meet the high safety and high reliability operation and maintenance requirements of rail transit low-voltage systems. Summary of the Invention

[0009] This invention provides a method for the operation and maintenance management of low-voltage systems in rail transit. By integrating deep learning, graph neural networks and reinforcement learning algorithms, a full-process, closed-loop operation and maintenance management system for low-voltage systems in rail transit is constructed.

[0010] To achieve the above objectives, the present invention adopts the following technical solution:

[0011] A method for operation and maintenance management of low-voltage electrical systems in rail transit includes:

[0012] S1: Collect the equipment runtime sequence data, equipment static attribute set, and historical operation and maintenance data of the rail transit low-voltage system; preprocess the equipment runtime sequence data to obtain standardized time sequence window data;

[0013] S2: Input standardized time-series window data into a long short-term memory network or a Transformer time-series model that incorporates attention mechanisms to capture long-term dependencies in device operation data and output device failure probability and remaining service life.

[0014] S3: Based on the equipment static attribute set, equipment failure probability and remaining service life, and combined with the rail transit weak current system topology database, construct a dynamic topology map of the equipment, input the dynamic topology map into the graph neural network to propagate the fault signal, calculate the abnormal score of each equipment, and output the fault root cause equipment by combining the frequency of fault root causes in historical operation and maintenance data.

[0015] S4: Based on the probability of equipment failure and remaining service life, the root cause of failure equipment and the abnormal scores of each equipment, a Markov decision process model is constructed by combining the spare parts inventory data of the spare parts management system and the maintenance personnel number data of the human resources management system. The optimal operation and maintenance decision is output by combining deep Q network and policy gradient algorithm.

[0016] In this specification, S5: Executes the optimal operation and maintenance decision, records the equipment operation data, actual operation and maintenance costs, failure probability and changes in remaining service life after execution, and feeds the above data back to S2, S3 and S4 to update the parameters of the Long Short-Term Memory Network, Transformer Temporal Model, Graph Neural Network, Markov Decision Process Model, Deep Q Network and Policy Gradient Algorithm respectively, and completes the iterative optimization of the model.

[0017] In this specification, the preprocessing of the device runtime sequence data in S1 includes the following steps executed sequentially: outlier handling, missing value filling, normalization, and time window partitioning. Outlier handling uses the 3σ criterion to determine outliers and fills them with the mean of the time neighborhood and the device neighborhood. Missing value filling uses linear interpolation to fill missing data caused by communication interruption. Normalization uses min-max normalization to eliminate the dimensional differences of different operating parameters.

[0018] In this specification, the fusion attention mechanism described in S2 is specifically an additive attention mechanism. By calculating the attention score at each time step, the attention score is normalized to obtain the attention weight. Then, the temporal features output by the Long Short-Term Memory Network or Transformer temporal model are weighted and fused with the attention weight to enhance the feature contribution of key time steps in the fault latency period.

[0019] In this specification, step S2 further includes a step of generating a graded early warning signal based on the output device failure probability. The graded early warning signal includes three levels: low risk, medium risk, and high risk. Low risk corresponds to a failure probability of 0 to 0.3, medium risk corresponds to a failure probability of 0.3 to 0.7, and high risk corresponds to a failure probability of 0.7 to 1. Step S3 further includes a step of correcting the graded early warning signal by combining the abnormal scores of each device. The early warning level of highly abnormal devices is increased by one level.

[0020] In this specification, the device dynamic topology graph described in S3 is a timestamped dynamic topology graph. Its adjacency matrix is ​​assigned different initial weights according to the type of power supply connection, communication connection, control command connection, and data interaction connection between devices, and the adjacency matrix is ​​updated every 10 minutes according to the real-time connection status of the devices.

[0021] In this specification, the step of outputting the root cause device in S3 also includes a candidate root cause set screening and verification process: First, select devices with anomaly scores ≥ 0.5 as the candidate root cause set, calculate the root cause probability of each device in the candidate set, and select the device corresponding to the maximum probability as the initial root cause device; if the historical root cause proportion of the initial root cause device is < 70%, then expand the candidate set to devices with anomaly scores ≥ 0.3, recalculate the root cause probability, and determine the root cause device.

[0022] In this specification, the state space of the Markov decision process model described in S4 includes the fault root cause device number, the anomaly score of each device, the remaining service life of the device, the spare parts inventory vector, the number of maintenance personnel, and the cumulative operation and maintenance cost of the device; the action space includes five types of operation and maintenance actions: immediate maintenance, delayed monitoring, device replacement, spare parts allocation + linkage maintenance, and linkage maintenance.

[0023] In this manual, the process of outputting the optimal operation and maintenance decision in S4, which combines the Markov decision process model, deep Q-network, and policy gradient algorithm, is as follows: First, a deep Q-network is built based on the state space and action space of the constructed Markov decision process model. The deep Q-network is trained by collecting operation and maintenance trajectory samples through an experience replay pool, and the Q-value of each state-action combination is output. Then, a policy function of the policy gradient algorithm is constructed based on the Markov decision process model. The Q-value output by the deep Q-network is used as a reward enhancement term to modify the cumulative reward of the policy gradient algorithm, and the policy gradient algorithm is trained to obtain the probability distribution of each operation and maintenance action. Finally, the real-time equipment failure probability, remaining service life, root cause equipment, abnormal scores of each equipment, spare parts inventory, and number of maintenance personnel are input into the trained policy function. The action corresponding to the maximum probability is selected as the core operation and maintenance action. The maintenance timing is determined by combining the spare parts transportation time, the nominal life of the equipment, and the rail transit shutdown maintenance window. Maintenance personnel are allocated according to the equipment maintenance qualification requirements, and a spare parts allocation plan is formulated according to the spare parts inventory. Finally, the optimal operation and maintenance decision, which includes the maintenance timing, spare parts allocation plan, and personnel arrangement plan, is formed.

[0024] In this manual, the iteration and optimization cycle of the model in S5 is dynamically adjusted according to the system's operating status. Under normal operating conditions, it is iterated once a month, and during periods of high failure rate such as high temperature and heavy rain, it is iterated once a week. During iteration, the model structure is fixed and only the weight parameters are updated. If the core indicators of fault prediction accuracy and root cause location accuracy decrease after iteration, the parameters are rolled back and the learning rate is adjusted to retrain the model.

[0025] In summary, the present invention has at least the following beneficial effects:

[0026] Accurate Fault Prediction: By combining the LSTM / Transformer time series model with the attention mechanism, the long-term dependencies of equipment operation data are effectively captured. The output fault probability and remaining service life have high accuracy. The graded early warning signal can accurately reflect the risk level of the equipment and provide a reliable basis for operation and maintenance intervention.

[0027] Intelligent fault location: Based on graph neural network, a device topology relationship graph is constructed, the fault propagation path is quantified and the anomaly score is calculated. Combined with historical fault data, the root cause device of the fault is accurately located, avoiding the positioning deviation caused by human experience and reducing the fault troubleshooting time.

[0028] Dynamic operation and maintenance decision-making: The operation and maintenance decision-making is modeled as a Markov decision process, and the deep Q network and policy gradient algorithm are integrated to achieve bidirectional interactive optimization. The output operation and maintenance strategy can be adapted to dynamic conditions such as real-time equipment status, spare parts inventory, and personnel configuration, balancing operation and maintenance security, cost and equipment life goals.

[0029] Closed-loop operation and maintenance system: Through feedback and iteration of operation and maintenance strategy execution data, the parameters of fault prediction, fault location and operation and maintenance decision model are continuously optimized, so that the operation and maintenance methods can adapt to changes in system state and maintain high operation and maintenance efficiency in the long term.

[0030] System reliability enhancement: Intelligent management of the entire process from fault prediction and location to decision optimization effectively reduces the fault incidence and the scope of fault impact, improves the overall safety and reliability of rail transit low-voltage systems, and reduces the risk of operational interruption due to equipment failure. Attached Figure Description

[0031] Figure 1 This is a schematic diagram of the operation and maintenance management method for the low-voltage electrical system of rail transit involved in this invention.

[0032] Figure 2 This is a schematic diagram of the equipment failure probability and remaining service life prediction process involved in this invention.

[0033] Figure 3 This is a schematic diagram of the fault root cause localization process involved in this invention.

[0034] Figure 4 This is a schematic diagram of the operation and maintenance decision optimization process involved in this invention. Detailed Implementation

[0035] The embodiments of the present invention will now be described in detail with reference to the accompanying drawings.

[0036] like Figure 1 As shown in the figure, this embodiment provides a method for operation and maintenance management of a rail transit low-voltage system, including:

[0037] S1: Data Acquisition and Preprocessing

[0038] 1.1 Data Acquisition:

[0039] 1.1.1 Device runtime sequence data : Where t: the timestamp of data collection (unit: minutes, collection frequency is 1 time / minute, adapted to the real-time status update requirements of S4). This represents the cumulative collection time. : Time steps within the time window (values ​​range from 1 to 60, corresponding to a 1-hour sliding window, used for time series feature extraction of S2). 1: Number of time steps in a single cycle; i: Equipment number (value from 1 to N, where N is the total number of equipment in the rail transit low-voltage system, such as signal controllers, communication modules, power supply units, etc.); Device i at time t The runtime parameter vector for each time step has a dimension of =8, specifically including: voltage ( ), current ( ), operating temperature ( ), response latency ( ), packet loss rate ( CPU utilization ), memory usage ( ), fan speed ( );

[0040] Data source: Distributed sensor network of rail transit low-voltage system (such as voltage sensors, temperature sensors, industrial gateways). All sensors are connected to the system SCADA platform to ensure the real-time performance (delay ≤ 1 second) and accuracy (error ≤ ±0.5%) of data acquisition.

[0041] Data acquisition rules: Core devices (such as signal control modules) acquire data once per minute, while non-core devices (such as ordinary communication nodes) acquire data once every 5 minutes.

[0042] 1.1.2 Equipment Static Attribute Set :

[0043] ,in, : The model of device i (e.g., signal controller model SJ-2023, communication module model TX-400); The installation location of device i (represented by three-dimensional coordinates and line number, such as Line1-Station5-Platform2-01, adapted to the S3 topology diagram construction). The nominal design life of device i (in hours, such as power supply unit). =87600 hours), which is in the S4 reward function. Provide basic data; : The manufacturer of device i; The installation date of device i, used to calculate real-time service duration. ( (Current time, unit: hours).

[0044] Data source: PLM (Product Lifecycle Management) database for rail transit low-voltage system equipment, updated monthly to ensure that the attribute set is consistent with the actual system.

[0045] 1.1.3 Historical Operation and Maintenance Dataset : ,in, : The kth fault record of device i ( , (Number of historical failures for device i), including the time of failure. Fault type (Examples: abnormal voltage, excessive temperature, response timeout, communication interruption, hardware damage) Troubleshooting measures Fault Root Cause Label (Indicate whether the fault was caused by device i itself, or by propagation from associated device j); The cumulative maintenance cost of device i up to time t (unit: yuan) is the value in the S4 state space. Provide data; The historical state transition records of devices i and j at time t represent the state transition probabilities of the S4MDP. Provide samples;

[0046] Data sources: Rail transit low-voltage system operation and maintenance work order database and cost management database. Each work order must include all fields such as "failure time, handling personnel, spare parts consumption, time consumed, and cost" to ensure that the data can support S4 reward function calculation and state transition probability modeling.

[0047] 1.2 Data Preprocessing:

[0048] 1.2.1 Outlier Handling: [This section likely refers to outlier handling procedures or procedures.] Outliers (such as voltage spikes or temperature exceeding the range due to sensor malfunction) are handled using the "3σ criterion + neighborhood interpolation":

[0049] 1. Calculate the mean of each operating parameter. and standard deviation :

[0050] ;

[0051] ;

[0052] 2. Identify outliers: If If it is, then it is marked as an outlier. ;

[0053] 3. Outlier imputation: Outliers are imputed using the mean of the time neighborhood (t ± 5 minutes) and the device neighborhood (i ± 3 devices of the same type).

[0054] ;

[0055] It must be within the valid range; otherwise, it will be rejected.

[0056] 1.2.2 Missing Value Imputation: Addressing data loss caused by communication interruptions ( After outlier handling, linear interpolation is performed to fill in the gaps in the time dimension and restore continuity.

[0057] ;

[0058] in This is valid data after outlier processing, ensuring that there are no missing values ​​within the time window, thus meeting the input requirements of the S2LSTM / Transformer model for continuous time series.

[0059] 1.2.3 Normalization: To eliminate the dimensional differences between different operating parameters (e.g., voltage is in V, temperature is in °C), min-max normalization is used, as shown in the following formula:

[0060] ;

[0061] in For the valid operational data of device i with full timestamps and full time-series windows, after normalization This ensures gradient stability during S2 model training.

[0062] 1.2.4 Time Series Window Partitioning: To adapt to the input of the S2 time series prediction model, the preprocessed... Divided into sliding windows of fixed length (window length) =60, step size=1), each window corresponds to an input feature of timestamp t, and the output is .

[0063] 1.3 Output and Transmission: Standardized Timing Window Data (Classified by device number i), transmitted to the model input layer of S2; device static attribute set (Including real-time service duration) The graph node feature construction stage, which transmits data to S3; historical operation and maintenance datasets. (Including fault records, cumulative costs, and state transition records), which are respectively transmitted to model training (fault labels) in S2, root cause localization (fault root cause labels) in S3, and MDP modeling (state transition probabilities and reward functions) in S4.

[0064] S2: Failure Probability and Remaining Lifetime (RUL) Prediction Based on LSTM / Transformer + Attention Mechanism

[0065] 2.1 Model Construction: The core objective of this step is to capture the long-term dependencies of equipment operation data based on the standardized time-series data of S1, and output the failure probability with timestamp t. and remaining service life This provides core quantitative indicators for fault propagation analysis in S3 and operational decision-making in S4. The model adopts an architecture of "LSTM / Transformer (either / or) + additive attention mechanism" to ensure accurate extraction of long-term time-series features. Specific construction details are as follows:

[0066] 2.1.1 LSTM Core Unit: The LSTM unit solves the gradient vanishing problem of traditional RNNs through input gates, forget gates, output gates, and memory units, and is adapted to the long-term temporal dependencies of rail transit low-voltage equipment operation data (such as the fault latency period caused by equipment aging can be several days). The core formulas and parameter definitions are as follows:

[0067] Input gate: Controls the input weights of the current time-series features. Formula:

[0068] ;

[0069] Forget gate: controls the retention weight of historical memories, formula:

[0070] ;

[0071] Memory unit: Long-term memory that stores temporal characteristics, formula:

[0072] ;

[0073] Output gate: Controls the output weight of the memory cell. Formula:

[0074] ;

[0075] Hidden state: Outputs the temporal features at the current time step, formula:

[0076] ;

[0077] : sigmoid activation function (output [0,1], adapted gating weights). : Hyperbolic tangent activation function (output [-1,1], adapted memory unit); Input gate input / hidden layer weight matrix (dimension 8×128 / 128×128, 8 is the dimension of running parameters, 128 is the dimension of LSTM hidden layer). : Input / hidden layer weight matrix of the forget gate (dimensions 8x128 / 128x128); : Input / hidden layer weight matrix of memory unit (dimensions 8x128 / 128x128); : Input / hidden layer weight matrix of the output gate (dimensions 8x128 / 128x128); : Each gate bias vector (dimension 128); Hadama product (element-level multiplication); Device i at time t The hidden state of the time step. : Corresponds to the state of the memory unit (initial value is 0).

[0078] 2.1.2 Transformer Alternative: If the device's operational data exhibits multi-dimensional interactive characteristics (such as the coupling of latency and packet loss rate in the communication module), it can be replaced with the Transformer model, whose core is the self-attention layer. The formula and parameter definitions are as follows:

[0079] ;

[0080] (Query matrix, dimension 60x64) (This is the weight matrix, 8x64 dimensions). (Key matrix, dimension 60 x 64, (This is a weight matrix, 8x64 dimensions). (Value matrix, dimension 60 x 64) (This is a weight matrix, 8x64 dimensions). =64: The dimension of the key matrix, used to scale the attention score and avoid excessively large values; Output: (Residual connections + layer normalization), 60x64 dimensions, representing global temporal features.

[0081] 2.1.3 Additive Attention Mechanism: To highlight the key time step features of the fault latency period (such as the inflection point of a slow temperature rise), an additive attention mechanism is introduced based on the global features output by the LSTM / Transformer. The formula and parameter definitions are as follows:

[0082] Attention score: quantifies the feature importance at each time step, formula:

[0083] ;

[0084] If using Transformer, then replace with ;

[0085] Weight normalization: Ensures that the sum of attention weights is 1. Formula: ;

[0086] Global feature fusion: Weighted summation yields the core temporal features of device i at time t, formula:

[0087] ;

[0088] in The dimension is 128, which is the core input for subsequent failure probability and RUL prediction.

[0089] 2.1.4 Prediction Head Construction: Global Features Based on Fusion Attention A dual-output prediction head is constructed to ensure that the output is a quantized metric with a timestamp t, directly adapting to the input requirements of S3 and S4:

[0090] Fault probability prediction head: Outputs the fault probability at time t+1 (the next acquisition cycle), formula:

[0091] ;

[0092] in This is the weight matrix (dimension 128 x 1). This is the bias term (dimension 1). Ensure output ;

[0093] RUL prediction head: Outputs the remaining lifetime of device i at time t (in hours), formula:

[0094] ;

[0095] in This is the weight matrix (dimension 128 x 1). This is the bias term (dimension 1). Ensure that RUL is non-negative (consistent with actual physical meaning).

[0096] 2.2 Model Training: The model is trained using supervised learning, based on historical operation and maintenance data from S1. The fault labels in the training process serve as supervisory signals to ensure that the prediction results accurately reflect actual operation and maintenance scenarios. The specific training process is as follows:

[0097] 2.2.1 Dataset Partitioning: The preprocessed time-series data is divided into a training set (70%), a validation set (15%), and a test set (15%) according to "time order" to avoid data leakage (e.g., training with future data). Partitioning rules: Training set: Validation set: Test set: .

[0098] 2.2.2 Loss Function and Optimizer: A combined loss function is used to simultaneously optimize the failure probability and the prediction accuracy of RUL. The formula is as follows:

[0099] ;

[0100] Cross-entropy loss based on failure probability (adaptive binary classification), formula:

[0101] ;

[0102] The label is used for monitoring (1 if device i fails at time t, 0 otherwise);

[0103] The mean squared error loss of RUL (fitting regression), formula:

[0104] ;

[0105] For the actual RUL (calculated from the equipment manual and service life): ); =0.6, =0.4: Weight allocation (failure probability is more critical for operational decisions); Optimizer: Adam optimizer, learning rate =0.001, weight decay Number of iterations =200 rounds, batch size per round =32.

[0106] 2.2.3 Model Evaluation and Tuning: After training, the following metrics are used to evaluate model performance. If the performance does not meet the criteria, the hyperparameters (such as the LSTM hidden layer dimension and the attention mechanism fusion coefficient) are adjusted: Fault probability prediction: AUC (area under the curve) ≥ 0.95, precision ≥ 0.9, recall ≥ 0.85; RUL prediction: MAE (mean absolute error) ≤ 24 hours, RMSE (root mean square error) ≤ 48 hours.

[0107] 2.3 Generation of Tiered Early Warning Signals: Based on Generate tiered early warning signals to meet the rapid identification needs of operations and maintenance personnel. Early warning levels and triggering rules: Low risk: The warning color is blue, indicating "Normal operation, routine monitoring"; Medium risk: The warning color is yellow, indicating "abnormal trend, strengthen monitoring (frequency increased to once every 5 minutes)"; High risk: The warning color is red, indicating "high probability of failure, maintenance strategy needs to be evaluated immediately".

[0108] 2.4 Output and Transmission: Device Failure Probability with Timestamp t Remaining service life Classified by equipment number, the data is transmitted to the graph node feature construction stage in S3 and the MDP state space in S4; graded early warning signals (classified by risk level) are transmitted to the fault propagation analysis stage in S3 (correcting anomaly score weights); model optimal parameters ( (This is a temporary storage method used for model iteration and optimization in S5.) The process for predicting equipment failure probability and remaining useful life is as follows: Figure 2 As shown.

[0109] S3: Fault Propagation Analysis and Root Cause Localization Based on Graph Neural Networks (GNNs)

[0110] 3.1 Graph Model Construction: The core objective of this step is to construct a device association graph of the low-voltage system based on the fault probability of S2 and the RUL prediction results, and to locate the root cause device of the fault by propagating the fault signal through GNN. Output the anomaly score with timestamp t. This provides core target objects and quantitative indicators of anomaly severity for S4's operational and maintenance decisions. The fault root cause localization process is as follows: Figure 3 As shown. The graph model construction needs to be combined with the physical / logical topology of the equipment to ensure that the fault propagation path conforms to the actual system. The specific construction details are as follows:

[0111] 3.1.1 Definition of Dynamic Topology Graph: Constructing a dynamic topology graph with timestamp t , where the node set : Each node corresponds one-to-one with the device number of S1. The dynamism is reflected in the updating of features over time t; edge sets : Characterizes the correlation strength between devices i and j at time t. =1 indicates the existence of a physical / logical connection (such as a power supply link or a signal transmission link). =0 indicates no connection; Connection types: power supply connection, communication connection, control command connection, and data interaction connection, each with a different initial weight (power supply connection weight = 0.8, communication connection weight = 0.6, control command connection weight = 0.9, data interaction connection weight = 0.5); Adjacency matrix : ( For connection type weights), add self-loops ( }=1) Ensure nodes can capture their own features; Dynamic update rules: Update the adjacency matrix every 10 minutes (to adapt to the data acquisition frequency of S1), if communication between devices i and j is interrupted ( ),but =0, ensuring the graphical model fits the real-time system state.

[0112] 3.1.2 Node Feature Construction: Node eigenvectors By integrating the static attributes of S1 and the prediction results of S2, the system can effectively characterize the degree of anomaly and fundamental attributes of the equipment. The formula is as follows:

[0113] ;

[0114] : Prediction results from S2 (core anomaly features); Normalized real-time service duration ( ); Unique thermal encoding of equipment model and installation location (dimensions are respectively) , This represents the total number of models. (Total number of locations); Feature dimension: This ensures dimensional consistency and adapts to GNN inputs.

[0115] 3.2 GCN Model Construction: Graph Convolutional Network (GCN) is adopted as the core model to aggregate node neighborhood features and quantify the propagation path of faults between devices. The formulas and parameter definitions are as follows:

[0116] 3.2.1 Convolutional Layer Calculation: GCN uses two convolutional layers (experiments have verified that two layers can balance feature extraction and computational efficiency), with the following formula for each layer:

[0117] ;

[0118] l: Convolutional layer number (l=1,2) (Initial node feature matrix, dimension) ); : Adding a self-loop adjacency matrix ( (using an identity matrix) to ensure that nodes can capture their own features; : The degree matrix, This is used to normalize the adjacency matrix and avoid gradient explosion; First layer weight matrix (dimensions) ), Second layer weight matrix (dimension 64 x 32); : Bias vectors for each layer (dimensions 64 and 32); ReLU activation function enhances the model's nonlinear fitting ability; Output: (Dimension N×32) represents the anomaly representation of each node after fusing neighborhood features at time t.

[0119] 3.3 Anomaly Score and Fault Root Cause Location: Based on the node features output by GCN, anomaly scores are calculated. and the probability of root causes of failure To locate the core faulty equipment, the specific formula and process are as follows:

[0120] 3.3.1 Anomaly Score Calculation: The anomaly score quantifies the degree of anomaly of device i at time t, fusing fault probability, RUL, and neighborhood anomaly characteristics. The formula is:

[0121] ;

[0122] : The first dimension (corresponding to the failure probability dimension); Failure probability weights ( ), enhance the anomaly score of high-probability equipment; A higher value indicates a more severe degree of equipment malfunction.

[0123] 3.3.2 Fault Root Cause Probability Calculation: Combining historical fault data and correcting anomaly scores, the probability of device i being the fault root cause is calculated using the formula: ;in The historical root cause frequency of device i up to time t (from S1) The higher the frequency, the greater the probability weight of the root cause.

[0124] 3.3.3 Fault Root Cause Location Process

[0125] 1. Screening for highly abnormal equipment: Selecting Devices as candidate root source set ;

[0126] 2. Calculate the root probability of the candidate set: for The device in the calculation ;

[0127] 3. Root cause identification equipment: Select The device corresponding to the maximum value is the root cause of the fault. (The subscript t indicates real-time updates);

[0128] 4. Verification: If If the historical fault root cause accounts for ≥70%, the localization result is confirmed; otherwise, the candidate set is expanded ( Recalculate.

[0129] 3.4 Model Training: Supervised Root Cause Localization Optimization

[0130] The model is trained using supervised learning, with S1 as the basis. Root cause label in The specific process for monitoring signals is as follows:

[0131] 3.4.1 Loss Function and Optimizer: The cross-entropy loss function is used to optimize the prediction accuracy of the root probability. The formula is: ;in Let be the number of historical fault samples at time t. The root label for sample m (1 if device i is the root, 0 otherwise); Optimizer: SGD optimizer, learning rate =0.01, momentum =0.9, number of iterations =100 rounds.

[0132] 3.4.2 Model Evaluation: Evaluation Indicator: Root cause localization accuracy ≥ 90% (i.e., localization accuracy...) If the matching rate with the actual root cause of the fault is not met, the number of GCN layers or the weight of the adjacency matrix will be adjusted.

[0133] 3.5 Output and Transmission: Root Cause Devices (with timestamp t), transmitted to S4's MDP state space; device anomaly score with timestamp t. (Classified by device number), transmitted to the S4 MDP state space; the corrected hierarchical early warning signal (integrated anomaly score, high-anomaly devices are upgraded one level in warning level), transmitted to the operation and maintenance monitoring platform; GCN optimal parameters ( ), temporarily stored for model iteration optimization in S5.

[0134] S4: Operation and maintenance decision optimization based on reinforcement learning

[0135] 4.1 Markov Decision Process (MDP) Modeling: This step first models the operation and maintenance decision-making problem of the rail transit low-voltage system as a Markov decision process. The core objective is to transform discrete decisions such as "maintenance timing, spare parts allocation, and personnel arrangement" into quantifiable and optimizable sequential decision problems, providing a decision framework for subsequent Deep Q-Network (DQN) and Policy Gradient (PG) algorithms. The MDP quadruple is defined as follows: The detailed definitions and construction logic of each element are as follows:

[0136] 4.1.1 State Space State space It is a comprehensive quantification of the current operation and maintenance scenario, including 6 core dimensions, specifically defined as: ;

[0137] The fault root cause device number (integer, value range [1, N], where N is the total number of devices in the system) is directly derived from the fault root cause location result output by step S3 and is the core target of operation and maintenance decision-making. The anomaly score of device i at time t (floating-point, range [0,1]) is derived from the GNN fault propagation analysis results in step S3. The subscript t indicates the time dimension, and the superscript (i) indicates the device number. It is used to characterize the degree of anomaly of other related devices besides the root cause device. : The remaining lifetime of device i at time t (floating-point, in hours), derived from the prediction result of the LSTM / Transformer + attention mechanism in step S2, with the subscript t distinguishing the prediction value at different times; Spare parts inventory vector at time t (dimension K, where K is the total number of spare parts types). ,in This represents the inventory quantity (integer) of the k-th type of spare parts at time t. The data source is the spare parts management database of the rail transit weak current system, and it is the core basis for determining whether "spare parts are available for adjustment". : The number of maintenance personnel available for scheduling at time t (integer, value range [0, M], where M is the total number of people in the operation and maintenance team). The data source is the human resources management system, used to constrain whether the maintenance action has the conditions for personnel to perform. The historical cumulative operation and maintenance cost of the device at time t (floating point, unit: yuan) is used to optimize decision-making in conjunction with cost dimensions. The subscript t distinguishes the cumulative value at different times.

[0138] 4.1.2 Action Space Action space For all feasible decision-making actions in the operation and maintenance scenario of rail transit low-voltage systems, there are 5 core actions, specifically defined as follows: ; Immediate maintenance (for root cause devices) – Execution rule: Schedule currently available devices at time t+1. Personnel, On-site troubleshooting and repair, without involving spare parts replacement; applicable scenarios: (The root cause of the high anomaly in the equipment) and (The equipment still has remaining lifespan); Delayed monitoring (for root cause devices and related abnormal devices) – Execution rule: Temporarily suspend maintenance actions, increase monitoring frequency from the usual once / hour to once / 10 minutes, and continuously collect data. Data (device runtime sequence data defined by S1), until or Hours; Applicable scenarios: (Anomaly in the root cause device) and Insufficient inventory of core spare parts; Replace equipment (for the root cause device) – Execute rule: From Replace with the same model spare parts. The old equipment is sent into the maintenance process after replacement, and the new equipment re-collects initial operating data; Applicable scenarios: (Root cause device is extremely abnormal) or (Equipment lifespan exhausted) and There are corresponding spare parts available. : Spare parts allocation + coordinated maintenance (for root cause equipment + highly associated equipment) – Execution rule: First, allocate the missing spare parts from the regional spare parts warehouse to the site (time required) , (This is a preset spare parts transportation time, ranging from 1 to 4 hours). Once the spare parts arrive, [the process will proceed simultaneously]. and Maintenance of associated equipment; Applicable scenarios: The core node device (such as the signal control module) and multiple related devices are malfunctioning; Joint maintenance (for high-risk equipment across the entire system) – Execution rule: Integrate all systems in the current setup. For the maintenance needs of (high failure probability equipment as defined by S2), unified scheduling of personnel and spare parts, and batch execution of maintenance actions; applicable scenarios: the number of high-risk equipment in the system is ≥3, and batch operations are allowed during maintenance windows (such as nighttime shutdown periods).

[0139] 4.1.3 State transition probability State transition probability Characterizes "being in state at time t" Execute actions Then, at time t+1, the state transitions to the next state. The probability of [the probability] is calculated using the following formula:

[0140] ;

[0141] Historical operation and maintenance data (Source: S1 definition) (Maintenance work order database), "Status" →Execute action →Transfer to status The number of samples; Smoothing coefficient (value) This is used to avoid situations where the probability is 0 due to missing historical samples; : implement The number of all possible states that may transition; core function: This probability quantifies the impact of different decision actions on the operation and maintenance scenario, and is the core basis for the subsequent DQN and PG algorithms to "predict the benefits of decision", ensuring that the decision optimization fits the evolution law of the actual operation and maintenance scenario.

[0142] 4.1.4 Reward Function The reward function is the core guiding principle of reinforcement learning algorithms, used to quantify the "action performed". The benefits of this technology are achieved by integrating three core operational goals: safety, cost, and equipment lifespan. The formula and parameter definitions are as follows:

[0143] ;

[0144] The failure probability of the root cause device at time t (floating-point, value [0,1]) is derived from the prediction result of step S2. The subscript t distinguishes the time, and the superscript (root) limits it to the root cause device. Security weight coefficient (value [0,1], default 0.5). Cost weighting coefficient (values ​​[0,1], default 0.3). Equipment lifespan weighting coefficient (value [0,1], default 0.2), satisfying... =1, can be dynamically adjusted according to operation and maintenance priority (e.g., increased during holidays). up to 0.6); : Execute action The cost of a single operation and maintenance (floating-point, unit: yuan) is based on data from a cost database. The cost calculation rules differ for different actions. ( For staff hourly rates, (The default maintenance time is 2 hours). ( To reduce the cost of high-frequency monitoring, (The monitoring duration is 12 hours by default). ( For spare parts costs, (The default time for equipment replacement is 4 hours). ( To cover spare parts transportation costs, (The default time for coordinated maintenance is 6 hours). ( The number of high-risk devices. To consume spare parts, The default time for batch maintenance of a single device is 1.5 hours. The root cause of the failure is the nominal design life (floating-point, unit: hours) of the equipment, data sourced from the equipment manual, superscript. Limited to root cause devices; core function: the reward function quantifies the three objectives of "reducing failure probability (improving safety), controlling operation and maintenance costs, and extending equipment life" into a single value, providing a positive reward ( >0 indicates that the benefit of the decision-making action outweighs the cost, resulting in a negative reward ( <0 indicates that the decision action is not worthwhile, providing a clear evaluation criterion for the "optimal policy search" of the subsequent DQN and PG algorithms.

[0145] 4.2 Construction of Deep Q-Network (DQN): The core function of the DQN algorithm is to quantize "in state..." Execute action The long-term benefit (i.e., Q-value) provides a value orientation for the policy gradient algorithm. Its model construction, training process, and core formula are as follows:

[0146] 4.2.1 DQN Network Structure Design: DQN uses a 3-layer fully connected perceptron (MLP) as its core network. The input is a combination of "state-action" features, and the output is the Q-value of this combination. The network structure and parameter definitions are as follows:

[0147] Input layer: dimension is ,in =6 (state space dimension), with an additional 1 dimension for action encoding (which will...). The input feature vector is defined as a one-hot vector encoded as [1,0,0,0,0] to [0,0,0,0,1]. ( (representing feature splicing);

[0148] Hidden layer 1: Number of neurons =128, activation function is ReLU, output is ,in The weight matrix from the input layer to hidden layer 1 (dimensions) ), The bias vector (dimension 128);

[0149] Hidden layer 2: Number of neurons =64, activation function is ReLU, output is ,in This is the weight matrix (dimension 128 x 64) from hidden layer 1 to hidden layer 2. The bias vector (64 dimensions);

[0150] Output layer: Number of neurons =1, no activation function, output is Q-value. ,in For all trainable parameters of DQN, This is the weight matrix from hidden layer 2 to the output layer (dimension 64 x 1). This is the bias term (dimension 1).

[0151] 4.2.2 Target Q-value Calculation: To avoid overfitting in Q-value estimation, DQN introduces a "target network" (with parameters of...). The target network has the same structure as the main network, but its parameters are synchronized every 100 training epochs (copied from the main network). The formula for the target Q-value is as follows:

[0152] ;

[0153] Discount factor (value 0.95) is used to balance "immediate reward" and "future reward".

[0154] The maximum Q-value of all possible actions at time t+1, representing the action to be performed. Afterwards, the optimal long-term benefit that can be obtained in the future; core function: the target Q value combines "immediate reward" and "optimal future benefit", avoiding the algorithm from only focusing on short-term benefits and ensuring that decision optimization meets the core goal of "lowest long-term operation and maintenance cost and highest security".

[0155] 4.2.3 DQN Training Process: DQN uses an "experience replay buffer" to store historical operation and maintenance trajectories to avoid training instability caused by sample correlation. The specific training process is as follows:

[0156] 1. Initialization: Experience Replay Pool (capacity ), main network parameters (Random initialization), target network parameters Learning rate =0.001 (Adam optimizer);

[0157] 2. Trajectory Acquisition: Within the MDP framework, through... - Greedy strategy ( Select the action and acquire the trajectory (linear decay from 0.9 to 0.1). ( (This refers to the length of a single trajectory, defaulting to 50 steps), and the length of each step... deposit ;

[0158] 3. Batch sampling: from Randomly sampled batches of size B=64 ;

[0159] 4. Loss Calculation: The error is estimated using the mean squared error loss quantification Q-value, as shown in the following formula:

[0160] ;

[0161] in Let the target Q value be for the b-th sample;

[0162] 5. Parameter Update: Minimize using the Adam optimizer Update main network parameters ;

[0163] 6. Target network synchronization: After every 100 rounds of training, execute... This ensures that the target network slowly tracks the main network;

[0164] 7. Convergence Criterion: When the loss function... Ten consecutive rounds of decline with fluctuations less than When the time is right, stop training and output the optimal main network parameters. .

[0165] 4.3 Policy Gradient (PG) Construction: The core function of the policy gradient algorithm is to directly optimize the "policy function" (i.e., the mapping rule between "state" and "action"), and output directly executable operational decisions. Its model construction, training process, and core formulas are as follows:

[0166] 4.3.1 Policy Function Design: Policy Function Representing "in state" Next, select an action. The probability of an action is calculated using a softmax output probability distribution, ensuring that the sum of the probabilities of all actions equals 1. The specific formula and network structure are as follows:

[0167] Input layer: dimension is =6, input features are (No action coding required, distinguishing it from DQN);

[0168] Hidden layer 1: Number of neurons =128, activation function is ReLU, output is ,in This is the weight matrix from the input layer to hidden layer 1 (dimension 6 x 128). The bias vector (dimension 128);

[0169] Hidden layer 2: Number of neurons =64, activation function is ReLU, output is ,in This is the weight matrix (dimension 128 x 64) from hidden layer 1 to hidden layer 2. The bias vector (64 dimensions);

[0170] Output layer: Number of neurons =5 (consistent with the action space dimension), activation function is softmax, output is:

[0171] ;

[0172] in For all trainable parameters of PG, This is the weight matrix from hidden layer 2 to the output layer (dimension 64 x 5). For example, the output is a bias vector (dimension 5); =0.7 indicates that in the state The probability of choosing "Maintain Now" is 70%.

[0173] 4.3.2 Cumulative Reward Calculation: The policy gradient algorithm uses the "cumulative reward of a single trajectory" as its optimization guide. The cumulative reward formula is as follows:

[0174] ;

[0175] Discount factor (value 0.9), slightly lower than DQN. They focus more on short-term gains; : Total number of steps for a single trajectory; Core function: Cumulative reward quantifies the total benefit of "executing the action sequence starting from time t", and is the core basis for the policy gradient algorithm to "increase the probability of high-quality actions and reduce the probability of low-quality actions".

[0176] 4.3.3 PG Training Process: PG adopts the "Monte Carlo policy gradient" method, which directly updates the policy parameters through cumulative rewards. The specific process is as follows:

[0177] 1. Initialization: Policy network parameters (Random initialization), learning rate =0.0005 (SGD optimizer);

[0178] 2. Trajectory Acquisition: Within the MDP framework, through the current strategy... Select the action and collect M=100 trajectories. Each trajectory contains (t=1 to );

[0179] 3. Cumulative Reward Calculation: For each step of each trajectory, calculate the cumulative reward. ;

[0180] 4. Loss Calculation: "Policy gradient loss" is used. Maximizing this loss increases the probability of high-quality actions. The formula is as follows (reducing it to a minimization problem by taking a negative sign):

[0181] ;

[0182] in The logarithmic probability chosen for the action, cumulative reward As weights, positive rewards increase the probability of the corresponding action, while negative rewards decrease the probability of the corresponding action;

[0183] 5. Parameter Update: Minimize using the SGD optimizer Update strategy parameters ;

[0184] 6. Convergence Judgment: When the cumulative reward of the optimal action (the action with the highest probability) output by the policy is... Training stops when the improvement is achieved for 20 consecutive rounds with a fluctuation of less than 0.01, and the optimal policy parameters are output. .

[0185] 4.4 Algorithm Fusion: To address the issues of DQN ("only estimating value without direct decision output") and PG ("large variance and unstable training"), this scheme designs a bidirectional interaction mechanism between DQN and PG to achieve collaborative optimization of "value-guided strategy and strategy feedback of value." The specific interaction process and formulas are as follows:

[0186] 4.4.1 Enhanced Value of PG's DQN: Reducing Policy Training Variance

[0187] The Q-value output by DQN is used as a "reward enhancement term" to correct the cumulative reward of PG, thereby reducing the variance of the training process. The corrected cumulative reward formula is as follows:

[0188] ;

[0189] : Fusion coefficient (value 0.1), used to balance "actual cumulative reward" and "DQN estimated value";

[0190] The optimal Q-value output after DQN training represents the long-term value of the action.

[0191] The revised PG loss formula:

[0192] ;

[0193] Core function: After introducing DQN value estimation, the training variance of PG is reduced by about 30% (as verified by experiments), avoiding policy oscillations caused by random rewards for a single trajectory and improving training stability.

[0194] 4.4.2 DQN's PG policy feedback: The policy probability output by the PG is used as the "weight of future action selection" to correct the target Q value of DQN, avoiding DQN from only focusing on the "maximum Q value action" and ignoring other possible actions. The formula for the corrected target Q value is as follows:

[0195] ;

[0196] The expected Q value based on the PG optimal strategy is used to replace the "maximum Q value" in the original objective Q value.

[0197] Core function: The revised DQN target Q value is more in line with the actual execution rules of the strategy, avoids overly optimistic Q value estimation, and improves the accuracy of value estimation.

[0198] 4.4.3 Integrated Training Process

[0199] 1. Pre-training: First, train DQN independently until convergence, then output... Then Substitution Train the PG until convergence, and output... ;

[0200] 2. Interactive iteration: Substitution Retrain DQN (50 iterations), update Then update Substitution Retrain the PG (50 iterations), update ;

[0201] 3. Termination Condition: When and The parameter changes are all less than When the fusion iteration stops, the final output is given. and .

[0202] 4.5 Optimal Strategy Output: Executable Operation and Maintenance Decisions

[0203] Based on the optimal policy parameters after fusion training The optimal operation and maintenance decision is output, and the specific process is as follows:

[0204] 1. Enter the current status (Real-time data collection of root cause devices, anomaly scores, RUL, inventory, personnel, etc.);

[0205] 2. Substitute the policy function to calculate the probability of all actions: ;

[0206] 3. Select the action with the highest probability as the optimal action. ;

[0207] 4. Based on Output the specific operation and maintenance execution details:

[0208] like Output: "Maintenance timing: time t+1, personnel allocation: " The two highest priority maintenance personnel do not require spare parts.

[0209] like Output: "Monitoring duration: 12 hours, monitoring frequency: 1 time / 10 minutes, maintenance trigger threshold: " or Hour";

[0210] like Output "Spare parts allocation: from" Retrieval Same model spare parts (part number) Replacement timing: During nighttime downtime (23:00-04:00), personnel allocation: 3 maintenance personnel;

[0211] like Output: "Spare parts allocation: Retrieve part number from regional spare parts warehouse" , Spare parts, transportation time: 2 hours, maintenance timing: after spare parts arrive (t+2 time), personnel allocation: 4 maintenance personnel, list of linked maintenance equipment: ”;

[0212] like Output: "Maintenance Window: 00:00-05:00 the next day, Personnel Allocation: All" (M people), spare parts allocation: Maintenance equipment list: ".

[0213] 4.6 Output and Transmission

[0214] Optimal operation and maintenance decision (in To maintain the timing, Here is a list of spare parts that need to be allocated. The number of personnel requiring scheduling is transmitted to step S5 as input for "Operation and Maintenance Strategy Execution and Model Iteration Optimization"; simultaneously, the merged data is temporarily stored. and This is used for model iteration updates in step S5. The operation and maintenance decision optimization process is as follows: Figure 4 As shown.

[0215] 4.7 Summary of Core Algorithm Contributions

[0216] Markov Decision Process (MDP): It transforms complex operation and maintenance decision problems into a structured sequential decision framework, realizes the quantitative mapping of "state-action-reward", and provides a unified decision context for subsequent reinforcement learning algorithms;

[0217] Deep Q-Networks (DQN): Accurately estimates the long-term value of each "state-action" combination, solving the "curse of dimensionality" problem of traditional Q-learning and providing a reliable value guide for policy optimization;

[0218] Policy Gradient (PG): Directly optimizes the decision policy and outputs the probability distribution of executable actions, making up for the shortcomings of DQN that "only estimates value and does not make direct decisions".

[0219] Two-way interactive fusion mechanism: By reducing the training variance of PG through DQN and correcting the value estimation bias of DQN through PG, the advantages of the two algorithms are complemented. The final output operation and maintenance decision has the characteristics of "optimal long-term benefits" and "high execution stability". Compared with a single algorithm, the operation and maintenance cost is reduced by about 15% and the fault response efficiency is improved by about 20% (verified by simulation).

[0220] S5: Operation and Maintenance Strategy Execution and Model Iteration Optimization

[0221] 5.1 Execution of Operation and Maintenance Strategy: The core objective of this step is to execute the optimal operation and maintenance decisions output by S4. It records all data during the execution process to provide feedback samples for model iteration and optimization. The specific execution process and data recording are as follows:

[0222] 5.1.1 Decision Analysis and Resource Scheduling: Based on the optimal action output by S4 Analyze the specific execution rules, schedule corresponding resources (spare parts, personnel), and enumerate the execution details of all actions:

[0223] like (Immediate Maintenance):

[0224] 1. Resource scheduling: From The two highest-priority maintenance personnel (who are qualified to repair this equipment) in the dispatching system do not require spare parts;

[0225] 2. Timing of execution: (Next data collection cycle, i.e., 1 minute later);

[0226] 3. Execution process: Arrive at the equipment installation location → Collect real-time operating data → Troubleshoot fault points (such as checking the power supply link if the voltage is abnormal) → Repair faults → Restart equipment → Verify operating status;

[0227] like (Delayed monitoring):

[0228] 1. Monitoring Configuration: [The following text appears to be incomplete and requires further context: "will..."] and The monitoring frequency of associated devices has been increased from once per minute to once every 10 minutes;

[0229] 2. Triggering condition: Set a threshold ( or (Hours), automatically triggered when the threshold is reached. action;

[0230] 3. Monitoring duration: 12 hours by default. Regular monitoring will resume if the threshold is not triggered.

[0231] like (Equipment replacement):

[0232] 1. Resource scheduling: From Retrieval Same model spare parts (part number) ), dispatch 3 maintenance personnel (including 1 equipment replacement specialist);

[0233] 2. Timing of execution: Choose the nighttime shutdown period (23:00-04:00) to avoid affecting operations;

[0234] 3. Execution process: Power off → Remove old equipment → Install new spare parts → Parameter debugging → Power-on test → Enter new equipment information into PLM database;

[0235] like (Parts allocation + coordinated maintenance):

[0236] 1. Resource scheduling: Retrieve from the regional spare parts warehouse Spare parts (number) Four maintenance personnel were dispatched, and related equipment was simultaneously notified. The person in charge of maintenance;

[0237] 2. Timing of execution: ( (This refers to the transportation time for spare parts, which is 1 to 4 hours).

[0238] 3. Execution process: Spare parts in place → Synchronous power off (root cause equipment + related equipment) → Maintenance / replacement → Batch power-on test → Verify whether the fault propagation path has been eliminated;

[0239] like (Joint maintenance):

[0240] 1. Resource Scheduling: Scheduling all (M people), retrieve Spare parts (covering all high-risk equipment);

[0241] 2. Timing of execution: Select the next day's service suspension window (00:00-05:00);

[0242] 3. Execution process: Divide into maintenance teams → Perform batch maintenance on high-risk equipment ( → Unified Testing → Generate Batch Maintenance Reports

[0243] 5.1.2 Execution Data Recording: Record all data after the decision is executed to form the core sample for closed-loop feedback. Specific recording content includes:

[0244] Post-execution device operation data: (The sampling frequency has been increased to once per minute, for 24 hours).

[0245] Actual operation and maintenance costs: (Personnel costs + spare parts costs + transportation costs) provide a basis for adjusting the S4 reward function;

[0246] Failure probability / RUL variation: , Quantify the effectiveness of decision implementation;

[0247] Fault cleared status: (1 indicates that the fault has been eliminated, and 0 indicates that it has not been eliminated), providing a supervision signal for model iteration.

[0248] 5.2 Model Iterative Optimization: Parameter Updates Based on Closed-Loop Feedback

[0249] Based on the execution data, the model parameters of S2, S3, and S4 are iteratively updated to ensure that the model continuously adapts to changes in system state. The specific iteration process is as follows:

[0250] 5.2.1 S2 Prediction Model Iteration

[0251] 1. Sample Supplementation: [The following appears to be a separate, unrelated sentence:] ... Supplement the training set for S2;

[0252] 2. Retraining: Keep the model structure fixed and only update the weight parameters. ), number of iterations =50 rounds (less than the initial training, improving efficiency);

[0253] 3. Evaluation and Adjustment: If the model AUC decreases by ≥0.02 after the update, roll back the parameters and adjust the learning rate. =0.0005) Retrain.

[0254] 5.2.2 S3GCN Model Iteration

[0255] 1. Sample Supplementation: [The following appears to be a separate, unrelated sentence:] ... (Fault clearance status) and new fault propagation path records are added to the training set of S3;

[0256] 2. Adjacency Matrix Update: Update the adjacency matrix based on the device connection status after execution (e.g., communication restored). Connection weights ;

[0257] 3. Retraining: Update GCN weight parameters ( ), number of iterations =30 rounds;

[0258] 4. Evaluation and adjustment: If the root cause localization accuracy drops by ≥0.05, increase the number of GCN layers (up to 3 layers) and retrain.

[0259] 5.2.3S4 Reinforcement Learning Model Iteration

[0260] 1. Sample Supplementation: [The following appears to be a separate, unrelated sentence:] ... State transition record Supplementing the MDP training set to S4;

[0261] 2. Reward function adjustment: based on actual cost Adjust the weighting coefficients of the reward function ( If costs exceed the budget, then increase [the budget / resources]. Weight;

[0262] 3. Fusion Model Update: With the DQN / PG network structure fixed, update the parameters. Number of iterations =20 rounds;

[0263] 4. Strategy Validation: Output 10 sets of decisions using the updated model to validate the cumulative reward. If the improvement is ≥0.1, adjust the fusion coefficient if the target is not met. .

[0264] 5.3 Closed-loop verification: After iterative optimization, S1 to S4 are re-executed to verify the core operation and maintenance indicators, ensuring the correct direction of model iteration. The evaluation indicators are enumerated as follows:

[0265] Fault prediction accuracy: ;

[0266] Root cause localization accuracy: ;

[0267] Operation and maintenance cost reduction rate: ;

[0268] Fault response time: Hours (high-risk fault) Hours (medium risk fault);

[0269] System reliability: .

[0270] If all indicators meet the requirements, the iteration is completed; if not, repeat steps S5.1 to S5.3 until the indicators meet the requirements. The iteration cycle is once a month by default, and shortened to once a week during high failure periods (such as high temperature or heavy rain).

[0271] 5.4 Output and Application: The optimal parameters of the iterated S2, S3, and S4 models are used to replace the original model parameters for the next round of operation and maintenance decisions; the operation and maintenance effect report (including fault elimination rate, cost changes, and equipment reliability) is submitted to the operation and maintenance management platform; the optimized operation and maintenance strategy library is supplemented with new decision cases to improve the generalization ability of the model.

Claims

1. A method for operation and maintenance management of a low-voltage electrical system for rail transit, characterized in that, include: S1: Collect the equipment runtime sequence data, equipment static attribute set, and historical operation and maintenance data of the rail transit low-voltage system; preprocess the equipment runtime sequence data to obtain standardized time sequence window data; S2: Input standardized time-series window data into a long short-term memory network or a Transformer time-series model that incorporates attention mechanisms to capture long-term dependencies in device operation data and output device failure probability and remaining service life. S3: Based on the equipment static attribute set, equipment failure probability and remaining service life, and combined with the rail transit weak current system topology database, construct a dynamic topology map of the equipment, input the dynamic topology map into the graph neural network to propagate the fault signal, calculate the abnormal score of each equipment, and output the fault root cause equipment by combining the frequency of fault root causes in historical operation and maintenance data. S4: Based on the equipment failure probability and remaining service life, the root cause equipment and the abnormal scores of each equipment, a Markov decision process model is constructed by combining the spare parts inventory data of the spare parts management system and the maintenance personnel number data of the human resources management system. The optimal operation and maintenance decision is output by combining deep Q network and policy gradient algorithm. The process of S4 combining Markov decision process model, deep Q network and policy gradient algorithm to output the optimal operation and maintenance decision is as follows: First, a deep Q network is built based on the state space and action space of the constructed Markov decision process model. The deep Q network is trained by collecting operation and maintenance trajectory samples through experience replay pool and outputting the Q value of each state-action combination. Then, based on the Markov decision process model, a policy function for the policy gradient algorithm is constructed. The Q value output by the deep Q network is used as a reward enhancement term to modify the cumulative reward of the policy gradient algorithm. The policy gradient algorithm is trained to obtain the probability distribution of each maintenance action. Finally, the real-time equipment failure probability, remaining service life, root cause equipment, abnormal scores of each equipment, spare parts inventory, and number of maintenance personnel are input into the trained policy function. The action corresponding to the maximum probability is selected as the core maintenance action. The maintenance timing is determined by combining the spare parts transportation time, the nominal life of the equipment, and the rail transit shutdown maintenance window. Maintenance personnel are allocated according to the equipment maintenance qualification requirements, and a spare parts allocation plan is formulated according to the spare parts inventory. Finally, the optimal maintenance decision including maintenance timing, spare parts allocation plan, and personnel arrangement plan is formed. The state space of the Markov decision process model includes the fault root cause device number, the anomaly score of each device, the remaining service life of the device, the spare parts inventory vector, the number of maintenance personnel, and the cumulative operation and maintenance cost of the device; the action space includes five types of operation and maintenance actions: immediate maintenance, delayed monitoring, device replacement, spare parts allocation + coordinated maintenance, and coordinated maintenance. Among them, the deep Q-network and the policy gradient algorithm form a two-way interaction mechanism: The Q-value output by the deep Q-network is used as a reward enhancement term to correct the cumulative reward of the policy gradient algorithm, thereby reducing the variance of the training process. The corrected cumulative reward formula is as follows: ; Cumulative rewards; : Fusion coefficient, used to balance the actual cumulative reward and the estimated value of the deep Q network; The optimal Q-value output after the deep Q-network is trained represents the long-term value of the optimal action output by the optimal operation and maintenance decision. Optimal main network parameters; :state; Action; Corrected policy gradient algorithm loss formula: ; in Log probability chosen for the action, adjusted cumulative reward Using the policy probabilities output by the policy gradient algorithm as weights, the target Q-value of the deep Q-network is corrected to avoid the deep Q-network focusing only on the action with the maximum Q-value while ignoring other possible actions. The formula for the corrected target Q-value is as follows: ; The expected Q-value of the optimal policy based on the policy gradient algorithm is used to replace the maximum Q-value in the original target Q-value. Fusion training process: (1). Pre-training: First, train the deep Q network independently until it converges, and outputs... Then Substitution Train the policy gradient algorithm until it converges, and output... ; (2). Interactive iteration: Substitution Retrain the deep Q-network and update Then update Substitution Retrain the policy gradient algorithm and update ; (3). Termination condition: when and The parameter changes are all less than When the fusion iteration stops, the final output is given. and .

2. The operation and maintenance management method for rail transit low-voltage systems according to claim 1, characterized in that, S5: Execute the optimal operation and maintenance decision, record the equipment operation data, actual operation and maintenance cost, failure probability and remaining service life changes after execution, and feed back the above equipment operation data, actual operation and maintenance cost, failure probability and remaining service life changes to S2, S3 and S4 to update the parameters of the Long Short-Term Memory Network, Transformer Temporal Model, Graph Neural Network, Markov Decision Process Model, Deep Q Network and Policy Gradient Algorithm respectively, and complete the model iterative optimization.

3. The operation and maintenance management method for rail transit low-voltage systems according to claim 1, characterized in that, The preprocessing of the device runtime sequence data in S1 includes the following steps executed sequentially: outlier handling, missing value filling, normalization, and time window partitioning. Outlier handling uses the 3σ criterion to determine outliers and fills them with the mean of the time neighborhood and the device neighborhood. Missing value filling uses linear interpolation to fill missing data caused by communication interruption. Normalization uses min-max normalization to eliminate the dimensional differences of different operating parameters.

4. The operation and maintenance management method for rail transit low-voltage systems according to claim 1, characterized in that, The fusion attention mechanism described in S2 is specifically an additive attention mechanism. It calculates the attention score at each time step, normalizes the attention score to obtain the attention weight, and then uses the attention weight to weight and fuse the temporal features output by the Long Short-Term Memory Network or the Transformer temporal model to enhance the feature contribution of key time steps in the fault latency period.

5. The operation and maintenance management method for rail transit low-voltage systems according to claim 1, characterized in that, S2 also includes a step of generating a graded early warning signal based on the output device failure probability. The graded early warning signal includes three levels: low risk, medium risk, and high risk. Low risk corresponds to a failure probability of 0 to 0.3, medium risk corresponds to a failure probability of 0.3 to 0.7, and high risk corresponds to a failure probability of 0.7 to 1. S3 also includes a step of correcting the graded early warning signal by combining the abnormal scores of each device. The early warning level of highly abnormal devices is increased by one level.

6. The operation and maintenance management method for rail transit low-voltage systems according to claim 1, characterized in that, The device dynamic topology graph described in S3 is a timestamped dynamic topology graph. Its adjacency matrix is ​​assigned different initial weights according to the type of power supply connection, communication connection, control command connection, and data interaction connection between devices, and the adjacency matrix is ​​updated every 10 minutes according to the real-time connection status of the devices.

7. The operation and maintenance management method for rail transit low-voltage systems according to claim 1, characterized in that, The steps in S3 to output the root cause device of the fault also include the candidate root cause set screening and verification process: first, select the devices with an anomaly score ≥ 0.5 as the candidate root cause set, calculate the fault root cause probability of each device in the candidate set, and select the device corresponding to the maximum probability as the initial fault root cause device; If the historical fault origin of the device is less than 70%, the candidate set is expanded to devices with anomaly scores ≥ 0.3, the fault origin probability is recalculated, and the fault origin device is determined.

8. The operation and maintenance management method for rail transit low-voltage systems according to claim 2, characterized in that, In S5, the model iteration and optimization cycle is dynamically adjusted according to the system's operating status. Under normal operating conditions, it is iterated once a month, while during periods of high failure rate due to high temperatures or heavy rain, it is iterated once a week. During iteration, the model structure is fixed and only the weight parameters are updated. If the core indicators of fault prediction accuracy and root cause location accuracy decrease after iteration, the parameters are rolled back and the learning rate is adjusted to retrain the model.