A Method and System for Identifying Abnormal Wear Conditions of Pantograph Carbon Sliding Plates Based on Random Forest

By using a random forest-based approach and constructing a model based on the operating data of the pantograph carbon sliding plate, abnormal wear can be identified and analyzed. This solves the problems of high identification cost and low accuracy in existing technologies, and improves the safety and reliability of railway transportation.

CN118898034BActive Publication Date: 2026-06-30XI AN JIAOTONG UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
XI AN JIAOTONG UNIV
Filing Date
2024-07-26
Publication Date
2026-06-30

Smart Images

  • Figure CN118898034B_ABST
    Figure CN118898034B_ABST
Patent Text Reader

Abstract

This invention discloses a method and system for identifying abnormal wear conditions of pantograph carbon sliding plates based on random forest, belonging to the field of fault diagnosis and intelligent operation and maintenance. The method collects characteristic data of pantograph carbon sliding plate wear; preprocesses the characteristic data, and establishes an input matrix and a target vector based on the preprocessed characteristic data; uses the input matrix and target vector as training data to train a random forest regression model, and fine-tunes the parameters of the random forest regression model through cross-validation to obtain a parametric model of abnormal wear of the pantograph carbon sliding plate; based on the feature importance index of the parametric model of abnormal wear of the pantograph carbon sliding plate, it identifies the key causes of abnormal wear of the pantograph carbon sliding plate. This method can identify the wear condition of the pantograph carbon sliding plate and analyze the key factors under abnormal wear conditions, playing a significant role in maintaining the safe operation of trains.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of fault diagnosis and intelligent operation and maintenance, specifically to a method and system for identifying abnormal wear conditions of pantograph carbon sliding plates based on random forest. Background Technology

[0002] In modern railway transportation systems, the health of the pantograph's carbon contactor is crucial for ensuring safe train operation. As a key component of the pantograph, the carbon contactor directly contacts the overhead contact line to transmit electrical energy. However, due to complex external environmental factors such as temperature variations, humidity, train speed, and current and voltage, the carbon contactor is prone to abnormal wear. This not only affects the normal operation of trains but can also lead to safety accidents, resulting in significant economic losses and social impact. Therefore, timely and accurate identification of abnormal wear conditions of the carbon contactor and analysis of its key causes are of great significance for improving the safety and reliability of railway transportation.

[0003] With the development of big data and machine learning technologies, fault diagnosis and intelligent operation and maintenance of pantograph carbon sliding plates are gradually being implemented. Compared with traditional periodic maintenance methods, condition-based intelligent maintenance can be performed according to the actual condition of the carbon sliding plate, avoiding unnecessary inspections and replacements, thereby reducing maintenance costs. However, existing research on the intelligence and automation of pantograph carbon sliding plates often suffers from problems such as focusing only on a single train, conducting only qualitative analysis of the importance of various influencing factors, and long training times. In practical applications, this often not only increases the cost of abnormal wear identification and analysis but may also affect the accuracy of decision-making and maintenance. Summary of the Invention

[0004] To address the issues of high complexity, large data volume, and limited analyzable samples in pantograph carbon contactor wear data, this invention provides a method and system for identifying abnormal wear conditions of pantograph carbon contactors based on random forest. By analyzing data collected during the operation of the pantograph carbon contactor, including parameters such as train speed, lifting force, contact temperature, current, voltage, and arc energy, the random forest model can identify the occurrence of abnormal wear and further analyze the key factors affecting wear, thus ensuring the safety and reliability of train operation.

[0005] This invention is achieved through the following technical solution:

[0006] A method for identifying abnormal wear conditions of pantograph carbon skateboards based on random forest includes the following steps:

[0007] Step 1: Collect characteristic data on the wear of the pantograph carbon sliding plate;

[0008] Step 2: Preprocess the feature data, and establish the input matrix and target vector based on the preprocessed feature data;

[0009] Step 3: Use the input matrix and target vector as training data to train the random forest regression model, and fine-tune the parameters of the random forest regression model through cross-validation to obtain the parameter model of abnormal wear of the pantograph carbon slide plate.

[0010] Step 4: Identify the key causes of abnormal wear of the pantograph carbon sliding plate based on the feature importance index of the parameter model.

[0011] Preferably, the feature data in step 1 includes vehicle speed, lifting force, contact temperature, current and voltage, arc energy, and pull-out value.

[0012] Preferably, the feature data refers to the feature data of normal and abnormal wear of the pantograph carbon sliding plate of the same train or / and trains of the same model.

[0013] Preferably, the preprocessing of the feature data in step 2 includes missing value processing and normalization processing.

[0014] Preferably, the method for handling missing values ​​is as follows:

[0015] Linear interpolation is used to fill in missing values ​​in the feature data; normalization is used to standardize the feature data.

[0016] Preferably, the training method for the parameter model of abnormal wear of the pantograph carbon slide plate in step 3 is as follows:

[0017] S3.1 Construct multiple decision trees. Each decision tree randomly selects a subset of features from all feature data, determines the split point based on Gini impurity, and continuously splits the feature set into subsets starting from the root node until the stopping condition is met.

[0018] S3.2 Construct a stochastic regression model based on the outputs of all decision trees, and use a vote to determine the prediction result for each sample.

[0019] S3.3. The number of decision trees, the maximum depth of the decision trees, and the number of samples with the fewest splits are used as key parameters. Cross-validation is used to fine-tune the key parameters of the stochastic sensible regression model, and the fine-tuned parameters are evaluated using the comprehensive performance metric of the stochastic sensible regression model.

[0020] S3.4 Repeat steps S3.1-S3.3 until the comprehensive performance measurement meets the requirements, and obtain the parameter model of abnormal wear of the pantograph carbon slide plate.

[0021] Preferably, the decision tree splitting method described in S3.1 is as follows:

[0022] Each possible value of a feature in the feature set is taken as a potential split point. The Gini impurity after splitting at the split point is calculated. A greedy algorithm is used to select the split point with the largest decrease in Gini impurity as the optimal split point. The optimal split point is taken as the optimal split threshold for the feature. After comparing the potential split points of all features, the feature with the largest decrease in Gini impurity and the corresponding optimal split threshold are selected. The feature subset is split according to the optimal split threshold.

[0023] Preferably, the Gini impurity G(t) is calculated as follows:

[0024]

[0025] Where G(t) represents the Gini impurity at node t, p i It represents the proportion of samples belonging to the i-th class in node t.

[0026] Preferably, in step 4, the reduction in Gini impurity for each feature at all node splits is accumulated, and the average value is taken over all decision trees in the random forest to calculate the average reduction in Gini impurity. The feature importance index is then determined based on the average reduction in impurity, and the calculation method is as follows:

[0027]

[0028] Where Imp(A) represents the importance of a given feature A, T A It is the set of all decision trees containing feature A, N t Δi(s) is the number of samples in node t, N is the total number of samples, and Δi(s) is the number of samples in node t. t ,t) represents the decrease in impurity before and after node t splits, s t It is the split point of splitting feature A at node t.

[0029] A system for identifying abnormal wear conditions of pantograph carbon trolley plates based on random forests includes:

[0030] The data acquisition module is used to collect characteristic data on the wear of the pantograph's carbon sliding plate.

[0031] The data processing module is used to preprocess the feature data and establish the input matrix and target vector based on the preprocessed feature data.

[0032] The training module is used to train the random forest regression model by taking the input matrix and the target vector as training data, and to fine-tune the parameters of the random forest regression model by cross-validation to obtain the parameter model of abnormal wear of the pantograph carbon slide plate.

[0033] The diagnostic module is used to identify the key causes of abnormal wear of the pantograph carbon sliding plate based on the feature importance index of the parameter model.

[0034] Compared with the prior art, the present invention has the following beneficial technical effects:

[0035] This invention discloses a method for identifying abnormal wear conditions of pantograph carbon contactors based on random forest. First, after research, factors related to abnormal wear of the pantograph carbon contactor, such as train speed, lifting force (contact force), contact temperature, current and voltage, arcing energy, and pull-out value, are selected as input features. After collecting sufficient feature data, data preprocessing is performed to construct an input matrix. Second, a random forest algorithm model is built, and the collected data is used to train and validate the model parameters, and the hyperparameters of the algorithm model are adjusted to identify abnormal wear conditions of the pantograph carbon contactor. Finally, the Gini impurity value in the random forest classification process is used to evaluate the importance of each feature, identifying and analyzing factors that significantly affect abnormal wear. This method identifies the wear condition of the pantograph carbon contactor and analyzes the key factors under abnormal wear conditions, playing a significant role in maintaining train operation safety. Attached Figure Description

[0036] Figure 1 This is a flowchart of the method for identifying abnormal wear of the pantograph carbon sliding plate according to the present invention;

[0037] Figure 2 This is a flowchart illustrating the training process for the parameter model of abnormal wear of the pantograph carbon sliding plate according to the present invention. Detailed Implementation

[0038] The present invention will now be described in further detail with reference to the accompanying drawings. These descriptions are intended to explain the invention and not to limit it.

[0039] A method for identifying abnormal wear conditions of pantograph carbon skateboards based on random forest includes the following steps:

[0040] S1. Collect characteristic data of normal and abnormal wear of the pantograph carbon sliding plate of the same train or the same model;

[0041] The characteristic data includes factors strongly correlated with abnormal wear of the pantograph carbon sliding plate, such as vehicle speed, lifting force (contact force), contact temperature, current and voltage, arc energy, and pull-out value.

[0042] Characteristic data of the train were collected from both the normal operation of the train and the abnormal wear of the pantograph carbon sliding plate. When collecting characteristic data, the sampling frequency should be kept as consistent as possible, and the number of data points during normal wear and abnormal wear of the pantograph carbon sliding plate should be kept as equal as possible to complete the construction of the dataset.

[0043] S2. Construct a dataset based on feature data, and after preprocessing and normalizing the dataset, construct the input matrix and target vector;

[0044] The methods for preprocessing and normalizing the training dataset are as follows:

[0045] Because the collected data, such as vehicle speed, lifting force (contact force), contact temperature, current and voltage, arcing energy, and pull-out value, generally exhibit continuous changes over time, linear interpolation is used to fill in missing values ​​after data acquisition. For each feature, if missing values ​​exist, linear interpolation is used to estimate the missing values ​​based on the two nearest non-missing values. The basic formula for linear interpolation is:

[0046]

[0047] Where y is the estimated value of the missing value, x is the position of the missing value, which is generally the time point where the missing value is located in this method, x1 and x2 are the positions of the two non-missing values ​​before and after the missing value, and y1 and y2 are the non-missing values ​​corresponding to these two positions.

[0048] In addition, to ensure the accuracy of the model, all numerical features are normalized to a range between 0 and 1. The normalization formula is:

[0049]

[0050] Where X norm This is the normalized value, where X is the original value. min and X max These are the minimum and maximum values ​​of all data for this feature, respectively.

[0051] The process of constructing the input matrix and target vector based on the normalized training dataset is as follows:

[0052] Each row of the input matrix represents a sample, and each column represents a feature. Therefore, each column of the input matrix represents the measured values ​​of the sample's vehicle speed, lifting force, contact temperature, current and voltage, arc energy, and pull-out value. The target vector is the label for each sample, representing the wear state, specifically 0 for normal wear and 1 for abnormal wear.

[0053] S3. Use the input matrix and target vector as training data to train the random forest regression model, and combine cross-validation to fine-tune the key parameters of the regression model to ensure that appropriate parameter settings are selected, and train the parameter model of abnormal wear of the pantograph carbon slide plate.

[0054] The training method for the random forest regression model is as follows:

[0055] Random forest regression, as an ensemble learning method, improves the accuracy and generalization ability of a model by combining multiple decision trees. A random forest regression model contains multiple decision trees, each predicting the target vector based on the input feature matrix. The final output of the random forest is the prediction result for each sample, obtained by voting on the outputs of all decision trees. Generally, when the input data is fixed, the decision-making process calculated by the decision trees is relatively fixed. Therefore, two additional random factors, bootstrap sampling and random feature selection, are introduced to increase the model's diversity.

[0056] The random forest regression model contains multiple decision trees. Each decision tree acts as a recursive process, starting from the root node and continuously splitting the data subset into smaller subsets until a stopping condition is met. In this method, the classification criterion for each node in the decision tree is Gini impurity. The calculation process for Gini impurity G(t) is as follows:

[0057]

[0058] Where G(t) represents the Gini impurity at node t, p i It represents the proportion of samples belonging to the i-th class in node t.

[0059] The key parameters of the random forest are optimized using cross-validation. This method employs k-fold cross-validation, and its specific process is as follows:

[0060] First, the dataset D is uniformly divided into k subsets of equal size. Then, k training and validation processes are performed. Finally, the average performance metric M of these k validation processes is calculated. i The average of these values ​​yields the following overall performance metric, M:

[0061]

[0062] Each time key parameters are adjusted, the above process is executed, and the parameters at the optimal result are taken during several parameter tuning processes to construct a complete random forest decision process.

[0063] S4. Evaluate the model performance using the test dataset constructed from the collected data, and evaluate the model's effectiveness in predicting normal and abnormal wear using metrics such as accuracy, recall, F1 score, and confusion matrix.

[0064] The performance of random forests can be evaluated using accuracy, precision, recall, and F1 score as evaluation metrics, calculated as follows:

[0065] Accuracy:

[0066]

[0067] In this context, TP stands for True Positives, TN stands for True Negatives, FP stands for False Positives, and FN stands for False Negatives.

[0068] Accuracy:

[0069]

[0070] Recall rate:

[0071]

[0072] F1 score:

[0073]

[0074] S5. Based on the parametric model of abnormal wear of the pantograph carbon slide plate after training, the average Gini impurity reduction is calculated by accumulating the reduction in Gini impurity for each feature at all node splits and averaging it across all decision trees in the random forest. The feature importance index is then determined based on this average Gini impurity reduction, and the contribution of each feature to the model's prediction results is analyzed according to this index. Finally, the key causes of abnormal wear of the pantograph carbon slide plate are identified based on the model's feature importance score.

[0075] The feature importance index is calculated based on the reduction in average impurity, and the calculation formula is as follows:

[0076]

[0077] Imp(A) represents the importance of a given feature A, T A It is the set of all trees containing feature A, N t Δi(s) is the number of samples in node t, N is the total number of samples, and Δi(s) is the number of samples in node t. t ,t) represents the decrease in impurity before and after node t splits, s t It is the split point of splitting feature A at node t.

[0078] Example 1

[0079] A method for identifying abnormal wear conditions of pantograph carbon sliding plates based on random forests was proposed. This method was implemented using data from the 2-1 sliding plate of Xi'an Metro Line 4 collected in 2021. The specific method is as follows:

[0080] S1. After preliminary research, six characteristic factors that are strongly correlated with abnormal wear of the pantograph carbon sliding plate were selected: vehicle speed, lifting force (contact force), contact temperature, current and voltage, and arc energy. A characteristic dataset was constructed.

[0081] S2.1 Because the collected data such as vehicle speed, lifting force (contact force), contact temperature, current and voltage, arcing energy, and pull-out value generally exhibit continuous changes over time, missing values ​​in the dataset are filled using linear interpolation.

[0082]

[0083] S2.2 Although random forests are not sensitive to the magnitude of features, in some cases, normalizing the data can help improve the stability and performance of the model. Normalize all data in the dataset as follows:

[0084] S2.3 After data preprocessing, the dataset is used to construct an input matrix based on the collected N samples. Each row of the input matrix represents a sample, and each column represents a feature. The target vector is the label of each sample. Therefore, the shape of the input matrix is ​​N×6, and the shape of the target vector is N×1. Wear-out states are divided into only two types: normal wear-out and abnormal wear-out. Therefore, 0 represents the normal wear-out state, and 1 represents the abnormal wear-out state. The sample size for the normal wear-out state is the same as that for the abnormal wear-out state, both being 0.

[0085] S3.1 Construct multiple decision trees. Each decision tree randomly selects a subset of features from all feature data and searches for the optimal splitting feature within this subset. Each decision tree operates through the following steps:

[0086] Step 1: First, sort all possible values ​​of each feature in the feature subset. Then, iterate through these feature values, treating each feature value as a potential split point, dividing the feature subset into two parts: one part containing samples less than or equal to the value, and the other part containing samples greater than the value. For each potential split point, calculate the Gini impurity G(t) after splitting. Then, use a greedy algorithm to select the split point that maximizes the decrease in Gini impurity as the optimal split point. This split point is the optimal split threshold for that feature. Finally, after comparing the potential split points of all features, select the feature with the largest decrease in Gini impurity and its corresponding split point.

[0087] Step 2: After selecting the best feature and its split point, the feature subset is divided into several subsets according to the feature, thus completing the process of adding a branch downwards in the decision tree.

[0088] Step 3: Repeat Step 1 and Step 2 for each split subset, further dividing each subset into smaller subsets. Each split corresponds to a node in the decision tree. This recursive process continues until the decision tree reaches its maximum depth or the data points in a node belong to the same category. At this point, the node becomes a leaf node, and its label is determined by the most frequent category in that node.

[0089] Step 4: To prevent overfitting, pruning is performed after the decision tree is built.

[0090] Step 5: The input matrix after data preprocessing is input starting from the root node of the decision tree. The corresponding branch is selected according to the decision tree's decision process until a leaf node is reached. The label of the leaf node at this point is the classification and recognition result of the input sample.

[0091] S3.2. Combine and integrate the outputs of all decision trees to construct a random forest algorithm. Based on the outputs of all decision trees, a vote is performed to obtain the prediction result for each sample, which is then used as the final output of the random forest algorithm.

[0092] S3.3 After constructing the structure, the Random Forest algorithm still requires tuning three parameters: the number of decision trees (n_estimators), the maximum tree depth (max_depth), and the minimum number of samples per split (min_samples_split). Cross-validation is used to fine-tune these key parameters. The implementation process is as follows:

[0093] First, the dataset D is uniformly divided into k subsets of equal size. Then, k training and validation processes are performed. Finally, the average performance metric M of these k validation processes is calculated. i The average is calculated to obtain the overall performance metric M of the model;

[0094] Repeat the above process until the model's performance evaluation metrics meet the requirements.

[0095] S3.4. The established random forest algorithm is comprehensively evaluated using four metrics: accuracy, precision, recall, and F1 score.

[0096] S4. After inputting the complete input matrix into the trained random forest model, the wear state of the pantograph carbon slide plate corresponding to all samples can be classified and identified. Furthermore, the trained random forest model can be based on the reduction in average impurity: The contribution of six features—vehicle speed, lifting force (contact force), contact temperature, current and voltage, arc energy, and pull-out value—to the classification and identification of the wear state of the pantograph carbon sliding plate was calculated.

[0097] Please see Figure 1 The present invention describes a method for identifying abnormal wear conditions of pantograph carbon sliding plates based on random forests. This method mainly includes a data preparation module, a pantograph carbon sliding plate wear condition identification module, and a key cause analysis module for abnormal wear. It can identify abnormal wear conditions of the pantograph carbon sliding plate while simultaneously analyzing the key causes of these abnormal conditions.

[0098] Please see Figure 2 The random forest algorithm in this invention contains multiple decision trees. Each decision tree is a recursive process that starts from the root node and continuously splits the dataset into subsets until the stopping condition is met.

[0099] This invention provides a method for identifying abnormal wear conditions of pantograph carbon sliding plate wear data based on random forest. The method incorporates randomness and integrates multiple models during computation. Besides accurately identifying the wear condition of a single train, it can also be extended to the analysis and predictive maintenance of wear conditions for other trains, demonstrating strong generalization ability. The random forest algorithm can automatically process various types of data, exhibiting good handling capabilities for high-dimensional data and complex interactions between features. Furthermore, random forest supports parallel computing, enabling efficient training when processing large-scale datasets. Therefore, the random forest algorithm effectively addresses the challenges of high complexity and large data volume in pantograph carbon sliding plate wear data.

[0100] Compared with traditional single models or simple statistical methods, the random forest algorithm reduces the risk of overfitting by integrating multiple decision trees and improves the model's prediction accuracy on unknown data. While identifying the wear status of the pantograph carbon slide plate, the random forest algorithm can also calculate the importance score of each feature, and then analyze which factors have a significant impact on the abnormal wear of the carbon slide plate, which helps to reduce the occurrence of abnormal wear in a targeted manner.

[0101] The above content is only for illustrating the technical concept of the present invention and should not be construed as limiting the scope of protection of the present invention. Any modifications made to the technical solution based on the technical concept proposed in this invention shall fall within the scope of protection of the claims of this invention.

Claims

1. A method for identifying abnormal wear conditions of pantograph carbon sliding plates based on random forest, characterized in that, Includes the following steps: Step 1: Collect characteristic data on the wear of the pantograph carbon sliding plate; The characteristic data includes vehicle speed, lifting force, contact temperature, current, voltage, arc energy, and pull-out value; Step 2: Preprocess the feature data, and establish the input matrix and target vector based on the preprocessed feature data; Step 3: Use the input matrix and target vector as training data to train the random forest regression model, and fine-tune the parameters of the random forest regression model through cross-validation to obtain the parameter model of abnormal wear of the pantograph carbon slide plate. The training method for the parameter model of abnormal wear of the pantograph carbon sliding plate is as follows: S3.1 Construct multiple decision trees. Each decision tree randomly selects a subset of features from all feature data, determines the split point based on Gini impurity, and continuously splits the feature set into subsets starting from the root node until the stopping condition is met. The decision tree splitting method is as follows: Each possible value of a feature in the feature set is taken as a potential split point. The Gini impurity after splitting at the split point is calculated. A greedy algorithm is used to select the split point with the largest decrease in Gini impurity as the best split point. The best split point is taken as the best split threshold for the feature. After comparing the potential split points of all features, the feature with the largest decrease in Gini impurity and the corresponding best split threshold are selected. The feature subset is split according to the best split threshold. S3.2 Construct a random forest regression model based on the output results of all decision trees, and vote on the output results of all decision trees to obtain the prediction result for each sample; S3.

3. The number of decision trees, the maximum depth of decision trees, and the number of samples with the fewest splits are used as key parameters. Cross-validation is used to fine-tune the key parameters of the random forest regression model, and the fine-tuned parameters are evaluated using the comprehensive performance metric of the random forest regression model. S3.4 Repeat steps S3.1-S3.3 until the comprehensive performance measurement meets the requirements and obtain the parameter model of abnormal wear of the pantograph carbon slide plate; Step 4: Identify the key causes of abnormal wear of the pantograph carbon sliding plate based on the feature importance index of the parameter model.

2. The method for identifying abnormal wear conditions of pantograph carbon sliding plates based on random forest according to claim 1, characterized in that, The characteristic data refers to the characteristic data of normal and abnormal wear of the pantograph carbon sliding plate of the same train or / and trains of the same model.

3. The method for identifying abnormal wear conditions of pantograph carbon sliding plates based on random forest according to claim 1, characterized in that, The preprocessing of the feature data in step 2 includes handling missing values ​​and normalization.

4. The method for identifying abnormal wear conditions of pantograph carbon sliding plates based on random forest according to claim 3, characterized in that, The method for handling missing values ​​is as follows: Linear interpolation is used to fill in missing values ​​in the feature data; normalization is used to standardize the feature data.

5. The method for identifying abnormal wear conditions of pantograph carbon sliding plates based on random forest according to claim 1, characterized in that, The Gini impurity The calculation method is as follows: in, Represents a node The impurity of the gin. It is a node The middle belongs to the first The proportion of samples in each class.

6. The method for identifying abnormal wear conditions of pantograph carbon sliding plates based on random forest according to claim 1, characterized in that, In step 4, the reduction in Gini impurity for each feature at all node splits is accumulated, and the average value is taken over all decision trees in the random forest to calculate the average reduction in Gini impurity. The feature importance index is determined based on the average reduction in impurity, and the calculation method is as follows: in, For a given feature The importance of It contains features The set of all decision trees, It is a node The number of samples in It is the total number of all samples. It is a node The reduction in impurity before and after cleavage. It is a characteristic of splitting At the node The split point.

7. A system for implementing the method for identifying abnormal wear conditions of pantograph carbon sliding plates based on random forest as described in any one of claims 1-6, characterized in that, include: The data acquisition module is used to collect characteristic data on the wear of the pantograph's carbon sliding plate. The data processing module is used to preprocess the feature data and establish the input matrix and target vector based on the preprocessed feature data. The training module is used to train the random forest regression model by taking the input matrix and the target vector as training data, and to fine-tune the parameters of the random forest regression model by cross-validation to obtain the parameter model of abnormal wear of the pantograph carbon slide plate. The diagnostic module is used to identify the key causes of abnormal wear of the pantograph carbon sliding plate based on the feature importance index of the parameter model.