A method and system for analyzing and evaluating the operation state of a power grid regulation system
By constructing a state prediction model based on feedforward neural networks and gradient boosting tree algorithm, the problem of the inability to obtain the operating status of the power grid control system in a timely and accurate manner is solved, realizing accurate prediction and efficient evaluation of the operating status of the power grid control system, and improving operation and maintenance efficiency and system stability.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHINA ELECTRIC POWER RESEARCH INSTITUTE CO LTD
- Filing Date
- 2019-07-29
- Publication Date
- 2026-06-19
AI Technical Summary
In existing technologies, the operating status of power grid control systems cannot be obtained in a timely and accurate manner, resulting in low work efficiency of operation and maintenance centers. Non-intelligent sensing methods reduce the efficiency of system operation and maintenance.
Deep learning technology is used to construct a state prediction model using feedforward neural networks and gradient boosting tree algorithms. By acquiring real-time operating data of the power grid control system, its operating state is analyzed and evaluated. This includes gradient boosting tree binary classification and multi-class classification algorithms, combined with feedforward neural networks for secondary cross-validation, and the category with the highest probability is selected as the abnormal state.
It enables accurate prediction of the operating status of the power grid control system, improves the working efficiency of the operation and maintenance center and the power grid control system, ensures the accuracy and timeliness of the results, and is suitable for diverse monitoring data types and scalability requirements.
Smart Images

Figure CN110571792B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of computer deep learning and power system dispatch automation, specifically to a method and system for analyzing and evaluating the operating status of a power grid control system. Background Technology
[0002] The smart grid control system is the nerve center of smart grid operation. The operation and maintenance center can remotely view the screens of systems in various regions and perform operation and maintenance by centrally monitoring the basic data of the smart grid control system at the provincial level and above. This enables the center to quickly resolve system anomalies and faults, promptly identify potential system hazards, and provide technical support for the stable operation of the smart grid control system.
[0003] Currently, the centralized operation and maintenance center monitors the power grid control systems in multiple regions daily, acquiring a large amount of monitoring data. This data specifically includes: application status, process status, node status, link status, I / O, CPU usage, disk usage, and memory usage. Previously, due to the lack of an intelligent analysis and evaluation system, problems with the control system were often only discovered retrospectively through manual methods, such as telephone notifications. The inability to obtain timely and accurate information about the control system's operational status caused difficulties for the operation and maintenance center. This non-intelligent sensing method reduced the efficiency of system operation and maintenance work, and also decreased the efficiency of the control system itself. Summary of the Invention
[0004] To address the shortcomings of existing technologies, such as the inability of maintenance personnel to predict the operational status of control systems and the inability to promptly detect problems, this invention provides a method for analyzing and evaluating the operational status of power grid control systems. This method is based on data mining and deep learning, and specifically implemented using techniques such as feedforward neural networks, gradient boosting tree algorithms, and multi-classifiers. The method for analyzing and evaluating the operational status of power grid control systems provided by this invention includes:
[0005] Obtain real-time operational data from the power grid control system;
[0006] Based on the real-time operating data and the pre-built state prediction model, the operating status of the power grid control system is analyzed and evaluated to obtain the operating status of the power grid control system.
[0007] The state prediction model is constructed based on historical operating state data of the power grid control system, using a feedforward neural network and gradient boosting tree algorithm.
[0008] Preferably, the construction of the state prediction model includes:
[0009] Based on the historical operating status data of the power grid control system, the gradient boosting tree binary classification algorithm is used to divide the status corresponding to the operating data of the power grid control system into two categories: system normal and system abnormal.
[0010] Based on the historical operating status data corresponding to the system anomaly, a gradient boosting tree multi-classification algorithm is used to further classify the specific situation of the system anomaly to obtain multi-classification results.
[0011] Based on the historical operating status data corresponding to the system anomaly, a feedforward neural network is used to perform secondary cross-validation on the multi-classification results, and the category with the highest probability is selected as the specific anomaly state in the system anomaly.
[0012] The historical operating status data includes: operating data and corresponding status; the status includes: system normal and system abnormal; the system abnormal includes: system crash, CPU spike, process deadlock, and network outage.
[0013] Preferably, based on the historical operating status data of the power grid control system, a gradient boosting tree binary classification algorithm is used to classify the status corresponding to the operating data of the power grid control system into two major categories: system normal and system abnormal, including:
[0014] The historical operating status data of the power grid control system is divided into training samples and validation samples;
[0015] Based on the first loss function set according to the difference between the predicted probability value and the actual probability value of the power grid control system state, the gradient boosting tree binary classification algorithm is used to train the training samples to obtain a binary classification tree.
[0016] Based on the set probability transformation formula, the probability that the state corresponding to each running data in the binary classification tree is system normal and system abnormal is calculated.
[0017] The state corresponding to each running data is determined based on the probability that the system is normal and the system is abnormal, respectively.
[0018] The trained binary classification tree is validated using validation samples. When the error meets the requirements, the training ends and the corresponding status of the power grid control system operation data is divided into two categories: system normal and system abnormal; otherwise, the training is repeated.
[0019] Preferably, the first loss function, set based on the difference between the predicted probability value and the true probability value of the power grid control system state, is used to train the training samples using a gradient boosting tree binary classification algorithm to obtain a binary classification tree, including:
[0020] Based on minimizing the first loss function, an optimal prediction probability function is constructed.
[0021] Calculate the predicted probability value of all training samples based on the current optimal prediction probability value function;
[0022] Based on the predicted probability values and true probability values of all training samples, the pseudo residual of the first loss function is obtained;
[0023] Use all the running data and corresponding pseudo residuals in the current training sample as the training data for the next tree, and fit them into a new classification tree;
[0024] The optimal prediction probability function is updated based on the current pseudo residual, and the new classification tree is iteratively calculated based on the updated optimal prediction probability function. When the value of the first loss function is minimized, the loop ends, and the optimal prediction function is updated based on the minimum value of the first loss function.
[0025] The binary classification tree is obtained by summing the optimal predictions from all iterations.
[0026] Preferably, the first loss function is as shown in the following equation:
[0027] L(y,f(x))=ln(1+exp(-2yf(x)))
[0028] In the formula: x: training sample in the training set; y: the real state corresponding to the training sample; where y = {0, 1}; f(x): the predicted value of the running state corresponding to the training sample; exp: an exponential function with the natural constant e as the base.
[0029] Preferably, the probability transformation formula includes:
[0030] When y = 1, the formula for the normal probability transformation of the system is as follows:
[0031]
[0032] In the formula: P(y=1|x): the probability that the system is normal corresponding to the training sample x; F(x): binary classification tree;
[0033] When y = 0, the system anomaly probability conversion formula is as follows:
[0034]
[0035] In the formula: P(y=0|x): the probability of a system anomaly corresponding to the training sample x.
[0036] Preferably, based on the historical operating status data corresponding to the system anomaly, a gradient boosting tree multi-classification algorithm is used to further classify the specific circumstances of the system anomaly to obtain a multi-classification result, including:
[0037] The historical operating status data corresponding to the system anomalies are divided into a training set and a validation set;
[0038] Based on the set second loss function, the gradient boosting tree multi-classification algorithm is used to train the training set to obtain a multi-classification tree;
[0039] Based on the multi-classification tree, the corresponding probabilities of the sample data being classified into each category are obtained;
[0040] The sample data is converted into the corresponding category based on the highest probability among each category to obtain multi-classification results;
[0041] The trained multi-class tree is validated using a validation set. When the error meets the requirements, the training ends, and the specific situations of the power grid control system anomalies are reclassified to obtain multi-class results; otherwise, the training is repeated.
[0042] Preferably, the step of training the training set using a gradient boosting tree multi-classification algorithm based on a set second loss function to obtain a multi-classification tree includes:
[0043] Based on minimizing the second loss function, a classification function is constructed;
[0044] Based on the classification function, predict the probability that all sample nodes in the training set belong to each category;
[0045] Calculate the pseudo residual value for all classifications based on the true class and predicted class probability of all sample nodes in the training set;
[0046] A new multi-class tree is fitted based on all sample nodes and pseudo-residual values in the training set;
[0047] The classification function is updated based on the current pseudo residual value, and the sum of the values of each leaf node region is calculated iteratively on the new multi-class tree based on the updated classification function. When the minimum value of the second loss function is obtained, the loop ends and the classification function is updated based on the minimum value of the second loss function.
[0048] Based on all the classification functions during the iteration process, a multi-classification tree is obtained.
[0049] Preferably, the step of using a feedforward neural network to perform secondary cross-validation on the multi-classification results based on the historical operating status data corresponding to the system anomaly includes:
[0050] A softmax function is added to the hidden layer of the feedforward neural network model to generate a classifier;
[0051] Based on the operation data in the historical operation status data and the set probability calculation formula, a multi-dimensional vector group is obtained;
[0052] Based on the multidimensional vector group and classifier, a probability distribution composed of the probabilities of each category corresponding to each group of running data is obtained;
[0053] The category with the highest probability in the probability distribution is selected as the classification result corresponding to the running data.
[0054] The multi-classification results are then subjected to secondary cross-validation based on the classification results corresponding to each group of operational data.
[0055] Preferably, the expression for each element in the multidimensional vector group is as shown in the following equation:
[0056]
[0057] In the formula: σ(z) j : Vector σ(z) j The description of the j-th element; z k : The z-th value of the k-dimensional vector; K: The number of categories.
[0058] Preferably, the probability of the running data corresponding to each category is calculated using the following formula:
[0059]
[0060] In the formula: P(s=v|x): the probability that the classification result s belongs to class v; N: the total number of classes; x′: the derivative with respect to the training data x; x T The weight value T is set for the training data x; w v The proportion of category v; w n The proportion of the nth category.
[0061] Based on the same inventive concept, the present invention also provides an analysis and evaluation system for the operating status of a power grid control system, comprising:
[0062] The acquisition module is used to acquire real-time operating data of the power grid control system.
[0063] The evaluation module is used to analyze and evaluate the operating status of the power grid control system based on the real-time operating data and the pre-built state prediction model, so as to obtain the operating status of the power grid control system.
[0064] The state prediction model is constructed based on historical operating state data of the power grid control system, using a feedforward neural network and gradient boosting tree algorithm.
[0065] Preferably, the system further includes: a construction module for constructing a state prediction model;
[0066] The building module includes:
[0067] The binary classification unit is used to classify the status corresponding to the power grid control system operation data into two categories: system normal and system abnormal, based on the historical operating status data of the power grid control system and using the gradient boosting tree binary classification algorithm.
[0068] The multi-classification unit is used to further classify the specific situation of the system anomaly based on the historical operating status data corresponding to the system anomaly, and to obtain the multi-classification result by using the gradient boosting tree multi-classification algorithm.
[0069] The verification unit is used to perform secondary cross-validation on the multi-classification results based on the historical operating status data corresponding to the system anomaly, using a feedforward neural network, and select the category with the highest probability as the specific anomaly state in the system anomaly.
[0070] The historical operating status data includes: operating data and corresponding status; the status includes: system normal and system abnormal; the system abnormal includes: system crash, CPU spike, process deadlock, and network outage.
[0071] Compared with the prior art, the beneficial effects of the present invention are as follows:
[0072] 1. The technical solution provided by this invention obtains real-time operating data of the power grid control system; based on the real-time operating data and a pre-built state prediction model, the operating state of the power grid control system is analyzed and evaluated to obtain the operating state of the power grid control system; the state prediction model is constructed based on the historical operating state data of the power grid control system using a feedforward neural network and a gradient boosting tree algorithm. This invention utilizes massive historical operating state data of the power grid control system to analyze and evaluate various operating state data in the control system, and makes accurate predictions of the real-time operating state of the control system, significantly improving the work efficiency of the operation and maintenance center and the power grid control system, while also preventing problems from occurring in the power grid control system.
[0073] 2. The technical solution provided by this invention selects a feedforward neural network and a gradient boosting tree algorithm, which can provide both online and offline computation. Offline computation is performed on massive historical state data of the power grid control system, and online computation is then performed based on the offline computation results and the real-time operating status data of the control system to analyze and evaluate the current operating status of the system, ensuring the accuracy of the final result.
[0074] 3. The technical solution provided by this invention performs secondary verification of the output results based on the softmax function added to the hidden layer in the feedforward neural network model. It has a strong fitting ability and avoids the problem of excessive difference between the calculated value and the true value to the greatest extent, thus further ensuring the accuracy of the results.
[0075] 4. The technical solution provided by this invention, based on the gradient boosting tree (GBDT) binary classification algorithm, can accurately classify the operating status of the power grid control system into two categories: normal and abnormal, according to historical operating status data; the multi-classification algorithm can further analyze and evaluate the abnormal system category in more detail, thereby improving the accuracy of the assessment of the operating status of the power grid control system.
[0076] 5. In the technical solution provided by this invention, the model based on feedforward neural network and perceptron can continuously iterate to generate new learners and continuously train and iterate on samples; the trained learner can be used multiple times and significantly improves the overall computational efficiency; it is suitable for situations where there are a large number of power grid control systems to be monitored and the types of monitoring data reflecting the system's operating status are diverse, greatly improving the timeliness and scalability of the analysis and evaluation system. Attached Figure Description
[0077] Figure 1 A flowchart of a method for analyzing and evaluating the operating status of a power grid control system provided by the present invention;
[0078] Figure 2 This is a flowchart illustrating the overall business process of a power grid control system operation status analysis and evaluation method according to the present invention.
[0079] Figure 3 This is a flowchart illustrating the calculation model of the power grid control system operation status analysis and evaluation method of the present invention. Detailed Implementation
[0080] To better understand this invention, the following description, in conjunction with the accompanying drawings and examples, will further illustrate the invention.
[0081] Example 1
[0082] like Figure 1 As shown, this invention provides a method for analyzing and evaluating the operational status of a control system based on deep learning technology, comprising:
[0083] Step S1: Obtain real-time operating data of the power grid control system;
[0084] Step S2: Analyze and evaluate the operating status of the power grid control system based on the real-time operating data and the pre-built state prediction model to obtain the operating status of the power grid control system;
[0085] The state prediction model is constructed based on historical operating state data of the power grid control system, using a feedforward neural network and gradient boosting tree algorithm.
[0086] like Figure 2 and Figure 3As shown, the specific implementation process is as follows: data acquisition and verification stage, state prediction model construction stage, data analysis stage, and calculation results and analysis evaluation. This method can effectively predict and analyze the state of the scheduling system, minimizing the occurrence of major problems in the control system and better achieving intelligent and automated system operation monitoring. Furthermore, this method utilizes neural networks as its technical foundation, allowing for real-time adjustments to neuron configurations based on business needs and actual business data, demonstrating strong adaptability to the external environment.
[0087] This method analyzes and models the operational data of the control system monitored by the system, uses historical monitoring data as the basis for calculation, establishes a corresponding regression learning task with continuous data, and finally obtains the classification-based predictive analysis results. This method can also be continuously corrected and improved during the implementation process to improve the accuracy of the analysis and evaluation results.
[0088] Step S1: Obtain real-time operating data of the power grid control system;
[0089] Obtain real-time operational data of the power grid control system using existing technological means.
[0090] Step S2: Based on the real-time operating data and the pre-built state prediction model, analyze and evaluate the operating state of the power grid control system to obtain the operating state of the power grid control system, specifically including:
[0091] 1. Obtain and verify historical operating status data of the control system. Obtain operating status data of the control system through the monitoring system.
[0092] 2. Establish a state prediction model for the operation status of the control system.
[0093] 3. Use the gradient boosting tree binary classification algorithm (GBDT binary classification algorithm) to perform binary classification on the state value calculation results, that is, divide the operating state of the control system into two categories: system normal and system abnormal.
[0094] 4. Based on the results of step 3, the samples in the system anomaly class are further refined, and the gradient boosting tree multi-classification algorithm is applied to further classify the specific situations of system anomalies.
[0095] 5. Use the Softmax function to calculate the probability of each anomaly classification result in the output, and select the result with the highest probability of hit as the evaluation result of the power grid control system operation status.
[0096] Step 1: Data Acquisition and Verification Phase, which specifically includes:
[0097] The operation parameters of the scheduling and control system are collected and aggregated from the historical database of the monitoring system in the centralized operation and maintenance center.
[0098] After obtaining the computational data, the format of the computational data is validated. If the computational data conforms to the set computational data format, it is added to the computational dataset; otherwise, the computational data is discarded.
[0099] The operating parameters include: application status, process status, node status, link status, I / O, CPU usage, disk usage, and memory usage.
[0100] Step 2: Establish a neural network model for analyzing and predicting the operating status of the power grid control system based on the calculation dataset, as follows:
[0101] This study selects feedforward neural networks and perceptrons as the technical solution. Typical neural networks consist of parallel layers: an input layer, hidden layers, and an output layer. Neurons within a single layer are not interconnected; however, adjacent layers are generally fully connected. This solution employs unsupervised learning, meaning it provides only input and allows the neural network to find patterns in the data. The dataset used for training the neural network in this solution consists of all historical data on the operational status of a local power grid control system collected by a monitoring system. The hidden layers use the softmax function as their activation function.
[0102] Step 3: Read historical data files of the operating status of all power grid control systems and use the Gradient Boosting Tree (GBDT) classification algorithm. Use the sklearn ensemble method from the machine learning library to call the GBDT algorithm. To enable cross-validation and model reuse on the training sample set, use the joblib function to save the model. The idea of GBDT can be explained with a simple example: Suppose a person is 30 years old. First, use a 20-year-old to fit the model, finding a loss of 10 years. Then, use a 6-year-old to fit the remaining loss, finding a gap of 4 years. In the third round, use a 3-year-old to fit the remaining gap, and the gap is only 1 year. If the iteration rounds are not yet complete, it can continue iterating. In each iteration, the error of the fitted age decreases. Finally, summing the fitted ages each time gives the model's output.
[0103] Generally speaking, GBDT, as a linear regression model, can be simplified as follows:
[0104] z = f(x) = w T x+b (1)
[0105] In the formula: z is the predicted value calculated by the model, and x is the calculated sample. TWith z and b as parameters, in a binary classification task, the predicted output value needs to be labeled as y = {0, 1}. Therefore, the real value z needs to be converted to {0, 1}. However, the function y = {0, 1} is discontinuous, so we need to find a "substitute function" that can approximate the function y to a certain extent, and it should be a monotonically differentiable function. Therefore, the log-odds function shown in the following formula is chosen as the substitute function:
[0106]
[0107] Formula (2) transforms the value of z into a y value close to 0 or 1, thereby achieving binary classification.
[0108] The historical data of the power grid control system's operating status are divided into a training set and a validation set. In step 3, 80% will be used as the training set and 20% as the validation set. However, since the sample outputs are not continuous but discrete values, it is impossible to directly fit the class output error from the output class. Therefore, a log-likelihood loss function similar to logistic regression is used to solve this problem. That is, the difference between the predicted probability value and the true probability value of the class is used to fit the loss, and binary classification is performed on the historical data of the power grid control system's operating status.
[0109] Input: A training dataset T consisting of historical data on the operating status of all power grid control systems, T = {(x1,y1),(x2,y2),......,(x i ,y i )}, where i represents the number of samples, i = 1, 2, ..., I. The log-likelihood is chosen as the loss function to represent the degree of difference between the prediction and the actual data. For binary classification, the loss function formula is:
[0110] L(y,f(x))=ln(1+exp(-2yf(x))) (3)
[0111] In formula (1), x and y mean that the calculated data x in the training set sample T has a predicted value of y, where y = {0,1}; exp means an exponential function with the natural constant e as the base.
[0112] Output: Binary classification tree F(x)
[0113] (1) Initialization: Construct a function f0(x) that minimizes the loss function to achieve the optimal predicted value:
[0114]
[0115] In formula (4), P(y=1|x) means the probability of the predicted value y=1 when the calculated data x is in the training set sample T; similarly, P(y=-1|x) can be obtained.
[0116] (2) Calculate the negative gradient of the loss function for all training samples in the current model, i.e., the residual value. For the logarithmic loss function, this means calculating its approximate residual value, called the pseudo residual, denoted as r. mi :
[0117]
[0118] In formula (5), m represents the number of iterations, i.e. the number of weak learners generated. Each iteration will generate a new classification tree; m = 1, 2, 3, ... M; i represents the sample, i = 1, 2, ... I.
[0119] (3) Data As training data for the next tree, a new classification tree is fitted, r mi The result of formula (5) is given, where I represents the number of samples. The calculation is performed on samples i = 1, 2, ..., I, and the sum of the values in each leaf node region is calculated to minimize the value of the loss function, thus obtaining the minimum value c. mj :
[0120]
[0121] In formula (6), R mj The meaning is the leaf node region of the m-th tree, where m represents the number of iterations and j represents the number of leaf nodes in each tree, j = 1, 2, ..., J.
[0122] Updating f(x) yields:
[0123]
[0124] Based on this result, the final classification tree F(x) can be obtained:
[0125]
[0126] Since the loss is fitted using the difference between the predicted probability value and the true probability value of the category, the probability must be converted into a category at the end, as shown in formulas (9) and (10). The final output compares the probability values of the categories, and the category with the higher probability is predicted as that category.
[0127] In this method, the output categories are divided into two main categories: system normal and system abnormal.
[0128] Let y = 1, in which case the system is in normal state:
[0129]
[0130] Suppose that when y = 0, the system is in an abnormal state:
[0131]
[0132] Step 4: Further analyze and predict the specific circumstances of system anomalies. Extract the training data from the binary classification algorithm in Step 3 that results in "system anomaly". To more accurately analyze and predict specific system anomalies, select 75% of this data as the training set and 25% as the validation set. Use the GBDT multi-class classification algorithm, as follows:
[0133] Input: Training data from step 3 where the binary classification algorithm outputs "system anomaly". Log-likelihood is then chosen as the loss function. For multi-class classification, the log-loss function formula is:
[0134]
[0135] In formula (11), k means the total number of categories, k = 0, 1, ..., K; y k Indicates whether it belongs to the k-th category, y k = {0, 1}, where 1 represents yes and 0 represents no; P k (x) represents the probability that sample x belongs to the k-th class.
[0136] Output: Multi-class classification tree F(x)
[0137] (1) Initialize f(x):
[0138]
[0139] In formula (12), k represents the classification function, k represents the category, and k = 0, 1, ..., K. k0 represents the function that has not been iterated.
[0140] (2) Calculate the probability P(x) that all sample nodes belong to each category for m = 1, 2, 3, ..., M:
[0141]
[0142] In formula (13), m represents the number of iterations, i.e., the number of weak learners generated. m = 1, 2, 3, ..., M.
[0143] (3) Calculate the pseudo residual values for all categories k = 1, 2, ..., K.
[0144]
[0145] In formula (14), i represents the sample, i = 1, 2, ..., N. k (x iP(i) represents the probability that the i-th sample belongs to class k, 0 ≤ P(i) k (x i )≤1. This indicates whether the i-th sample belongs to the k-th category.
[0146] (4) For probabilistic pseudo-residuals Fit a classification tree, calculate the sum of values for each leaf node region, and find the minimum value of the loss function.
[0147]
[0148] In formula (15), R represents the calculated pseudo-residual value of the category to which each sample belongs. mj The meaning is the leaf node region of the m-th tree, where m represents the number of iterations, j represents the number of leaf nodes in each tree, j = 1, 2, ..., J. K represents the total number of categories.
[0149] Updating f(x) yields:
[0150]
[0151] (5) Obtain the final multi-class tree F Mk (x):
[0152]
[0153] The final result can be used to obtain the corresponding probability P of classifying a child into the k-th class. Mk (x):
[0154]
[0155] Finally, the probabilities are converted into categories, as shown below:
[0156]
[0157] The final output category is c(k,k′), which is the joint cost when the true value is k′ and the prediction is the kth category. That is, the category with the highest probability is the category we predict.
[0158] Step 5: Use a neural network, along with the training and validation sets from Step 4, to perform secondary cross-validation on the multi-classification calculation results from Step 4. Use the softmax function mentioned in Step 2 as the new classifier. It can "compress" a K-dimensional vector z containing arbitrary real numbers into another K-dimensional real vector σ(z), such that each element is in the range (0,1), and the sum of all elements is 1. For example: z = {47, 58, 92}, where 47 represents CPU usage, 58 represents memory usage, and 92 represents disk usage. Calculate σ(z) = {0.4, 0.3, 0.3} for each element of vector z using the following formula:
[0159]
[0160] In formula (20), σ(z) j : Vector σ(z) j The description of the j-th element; z k : The z-th value of a k-dimensional vector; e represents the natural constant, k represents the dimension of the vector, k = 1, 2, ..., K; K: the number of categories;
[0161] The softmax classifier can be used to solve multi-class classification problems by mapping the outputs of multiple neurons to the (0,1) interval. This method can be understood as performing multi-class classification based on probability. Assume the last fully connected layer of the neural network model outputs a multi-dimensional vector set:
[0162]
[0163] Represents an element in a multidimensional vector. This means the Kth element in the I-th multidimensional vector x. First, softmax is used to transform the vector group logits into a probability distribution. The probability P(s=v|x) that the sample vector x belongs to the j-th class is:
[0164]
[0165] In the formula: P(s=v|x): the probability that the classification result s belongs to class v; N: the total number of classes; x′: the derivative with respect to the training data x; x T The weight value T is set for the training data x; w v The proportion of category v; w n The proportion of the nth category.
[0166] Then, the sample with the highest probability value is selected as the classification result. Thus, the result with the highest probability obtained from the output of the neural network is the final result. This result represents the analysis and prediction of the operating status of the power grid control system. The result obtained in step 4 has been calibrated and verified, demonstrating its ability to analyze and predict the operating status of the control system very well and accurately.
[0167] The present invention has been improved in the following aspects:
[0168] 1. In the technical solution provided by this invention, the existing historical data of the operation status of the control system are preprocessed based on neural networks and GBDT to establish a state prediction model, which can accurately and quickly classify and predict the data of the operation status. At the same time, the neural network can change its neuron configuration according to different needs, and GBDT can be continuously iterated according to needs. This ensures both the accuracy and speed of the analysis and evaluation of the operation status of the control system, as well as the application flexibility and scalability of the calculation model.
[0169] 2. In the technical solution provided by this invention, the GBDT binary classification algorithm and multi-classification algorithm can accurately classify the operating status of the power grid control system; the binary classification algorithm can effectively and accurately analyze massive historical data and distinguish between the two major categories of operating status: normal system and abnormal system. On this basis, the multi-classification algorithm can make a more detailed division of the specific operating status of the system. Through the analysis and calculation of massive historical data, the current operating status of the power grid control system can be accurately assessed.
[0170] 3. In the technical solution provided by this invention, the model based on feedforward neural network and perceptron can continuously generate new learners and continuously train and iterate on samples; the selected activation function can ensure the accuracy of the output results, and the learned learner will significantly improve the computational efficiency and greatly reduce the computation time in the next calculation. It is suitable for situations where there are a large number of power grid control systems to be monitored and the types of monitoring data reflecting the system's operating status are diverse, which greatly improves the timeliness and scalability of the analysis and evaluation system.
[0171] 4. In the technical solution provided by this invention, the softmax function is used to perform secondary verification on the output results of the GBDT algorithm, ensuring the accuracy of the final result and guaranteeing the accuracy of the analysis and evaluation results of the control system's operating status.
[0172] Example 2
[0173] Based on the same inventive concept, embodiments of the present invention also provide an analysis and evaluation system for the operating status of a power grid control system, comprising:
[0174] The acquisition module is used to acquire real-time operating data of the power grid control system.
[0175] The evaluation module is used to analyze and evaluate the operating status of the power grid control system based on the real-time operating data and the pre-built state prediction model, so as to obtain the operating status of the power grid control system.
[0176] The state prediction model is constructed based on historical operating state data of the power grid control system, using a feedforward neural network and gradient boosting tree algorithm.
[0177] In this embodiment, the system further includes: a construction module for constructing a state prediction model;
[0178] The building module includes:
[0179] The binary classification unit is used to classify the status corresponding to the power grid control system operation data into two categories: system normal and system abnormal, based on the historical operating status data of the power grid control system and using the gradient boosting tree binary classification algorithm.
[0180] The multi-classification unit is used to further classify the specific situation of the system anomaly based on the historical operating status data corresponding to the system anomaly, and to obtain the multi-classification result by using the gradient boosting tree multi-classification algorithm.
[0181] The verification unit is used to perform secondary cross-validation on the multi-classification results based on the historical operating status data corresponding to the system anomaly, using a feedforward neural network, and select the category with the highest probability as the specific anomaly state in the system anomaly.
[0182] The historical operating status data includes: operating data and corresponding status; the status includes: system normal and system abnormal; the system abnormal includes: system crash, CPU spike, process deadlock, network outage, etc.
[0183] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
[0184] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart... Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.
[0185] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.
[0186] The above are merely embodiments of the present invention and are not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention are included within the scope of the claims of the present invention pending approval.
Claims
1. A method for analyzing and evaluating the operating state of a power grid control system, characterized by include: Obtain real-time operational data from the power grid control system; Based on the real-time operating data and the pre-built state prediction model, the operating status of the power grid control system is analyzed and evaluated to obtain the operating status of the power grid control system. The state prediction model is constructed based on historical operating state data of the power grid control system, using a feedforward neural network and a gradient boosting tree algorithm. The construction of the state prediction model includes: Based on the historical operating status data of the power grid control system, the gradient boosting tree binary classification algorithm is used to divide the status corresponding to the operating data of the power grid control system into two categories: system normal and system abnormal. Based on the historical operating status data corresponding to the system anomaly, a gradient boosting tree multi-classification algorithm is used to further classify the specific situation of the system anomaly to obtain multi-classification results. Based on the historical operating status data corresponding to the system anomaly, a feedforward neural network is used to perform secondary cross-validation on the multi-classification results, and the category with the highest probability is selected as the specific anomaly state in the system anomaly. The historical operating status data includes: operating data and corresponding status; the status includes: system normal and system abnormal; the system abnormal includes: crash, high CPU, process deadlock, network failure; the operating data includes: application status, process status, node status, link status, I / O, CPU usage, disk usage, and memory usage. The process of using a feedforward neural network to perform secondary cross-validation on the multi-classification results based on the historical operational status data corresponding to the system anomalies includes: A softmax function is added to the hidden layer of the feedforward neural network model to generate a classifier; Based on the operation data in the historical operation status data and the set probability calculation formula, a multi-dimensional vector group is obtained; Based on the multidimensional vector group and classifier, a probability distribution composed of the probabilities of each category corresponding to each group of running data is obtained; The category with the highest probability in the probability distribution is selected as the classification result corresponding to the running data. The multi-classification results are subjected to secondary cross-validation based on the classification results corresponding to each group of operational data. Based on the historical operational status data corresponding to the system anomaly, a gradient boosting tree multi-classification algorithm is used to further classify the specific circumstances of the system anomaly, obtaining multi-classification results, including: The historical operating status data corresponding to the system anomalies are divided into a training set and a validation set; Based on the set second loss function, the gradient boosting tree multi-classification algorithm is used to train the training set to obtain a multi-classification tree; Based on the multi-classification tree, the corresponding probabilities of the sample data being classified into each category are obtained; The sample data is converted into the corresponding category based on the highest probability among each category to obtain multi-classification results; The trained multi-class tree is validated using a validation set. When the error meets the requirements, the training ends, and the specific situations of the power grid control system anomalies are reclassified to obtain the multi-class results; otherwise, the training is repeated. Calculate the probability of the running data corresponding to each category using the following formula: In the formula: Classification results Belongs to the The probability of class time; Total number of categories; Running data in the training set The derivative; Running data in the training set Set weight value ; : No. The proportion of each category; : No. The proportion of each category.
2. The method as described in claim 1, characterized in that, Based on the historical operating status data of the power grid control system, a gradient boosting tree binary classification algorithm is used to classify the operating status corresponding to the power grid control system data into two main categories: system normal and system abnormal. The historical operating status data of the power grid control system is divided into training samples and validation samples; Based on the first loss function set according to the difference between the predicted probability value and the actual probability value of the power grid control system state, the gradient boosting tree binary classification algorithm is used to train the training samples to obtain a binary classification tree. Based on the set probability transformation formula, the probability that the state corresponding to each running data in the binary classification tree is system normal and system abnormal is calculated. The state corresponding to the running data is determined based on the probability that the system is normal and the system is abnormal, respectively. The trained binary classification tree is validated using validation samples. When the error meets the requirements, the training ends and the corresponding status of the power grid control system operation data is divided into two categories: system normal and system abnormal; otherwise, the training is repeated.
3. The method of claim 2, wherein, The first loss function, set based on the difference between the predicted probability value and the actual probability value of the power grid control system state, is used to train the training samples using a gradient boosting tree binary classification algorithm to obtain a binary classification tree, including: Based on minimizing the first loss function, an optimal prediction probability function is constructed. Calculate the predicted probability value of all training samples based on the current optimal prediction probability value function; Based on the predicted probability values and true probability values of all training samples, the pseudo residual of the first loss function is obtained; Use all the running data and corresponding pseudo residuals in the current training sample as the training data for the next tree, and fit them into a new classification tree; The optimal prediction probability function is updated based on the current pseudo residual, and the new classification tree is iteratively calculated based on the updated optimal prediction probability function. When the value of the first loss function is minimized, the loop ends, and the optimal prediction probability function is updated based on the minimum value of the first loss function. The binary classification tree is obtained by summing the optimal prediction probability values from all iterations.
4. The method as described in claim 2, characterized in that, The first loss function is shown in the following equation: In the formula: Training samples in the training set; : The true state corresponding to the training sample; where ; : The predicted value of the running state corresponding to the training sample; : using natural constant An exponential function with base 0.
5. The method of claim 4, wherein, The probability transformation formula includes: Let When the system is normal, the probability conversion formula is as follows: In the formula: Training samples The probability that the system is functioning normally; Binary classification tree; Let When the system is abnormal, the system abnormality probability conversion formula is shown in the following equation: In the formula: : training sample corresponding to the probability of system abnormalities.
6. The method of claim 1, wherein, The step involves training the training set using a gradient boosting tree multi-classification algorithm based on a predefined second loss function to obtain a multi-classification tree, including: Based on minimizing the second loss function, a classification function is constructed; Based on the classification function, predict the probability that all sample nodes in the training set belong to each category; Calculate the pseudo residual value for all classifications based on the true class and predicted class probability of all sample nodes in the training set; A new multi-class tree is fitted based on all sample nodes and pseudo residuals in the training set; The classification function is updated based on the current pseudo residual value, and the sum of the values of each leaf node region is calculated iteratively on the new multi-class tree based on the updated classification function. When the minimum value of the second loss function is obtained, the loop ends and the classification function is updated based on the minimum value of the second loss function. Based on all the classification functions during the iteration process, a multi-classification tree is obtained.
7. The method of claim 1, wherein, The expression for each element in the multidimensional vector group is shown in the following equation: In the formula: :vector No. Description of each element; : The dimensional vector of the dimensional vector A number; : The number of categories.
8. An analysis and assessment system for the operating state of a power grid control system, characterized by include: The acquisition module is used to acquire real-time operating data of the power grid control system. The evaluation module is used to analyze and evaluate the operating status of the power grid control system based on the real-time operating data and the pre-built state prediction model, so as to obtain the operating status of the power grid control system. The state prediction model is constructed based on historical operating state data of the power grid control system, using a feedforward neural network and a gradient boosting tree algorithm. The system also includes: a construction module for constructing a state prediction model; The building module includes: The binary classification unit is used to classify the status corresponding to the power grid control system operation data into two categories: system normal and system abnormal, based on the historical operating status data of the power grid control system and using the gradient boosting tree binary classification algorithm. The multi-classification unit is used to further classify the specific situation of the system anomaly based on the historical operating status data corresponding to the system anomaly, and to obtain the multi-classification result by using the gradient boosting tree multi-classification algorithm. The verification unit is used to perform secondary cross-validation on the multi-classification results based on the historical operating status data corresponding to the system anomaly, using a feedforward neural network, and select the category with the highest probability as the specific anomaly state in the system anomaly. The historical operating status data includes: operating data and corresponding status; the status includes: system normal and system abnormal; the system abnormal includes: crash, high CPU, process deadlock, network failure; the operating data includes: application status, process status, node status, link status, I / O, CPU usage, disk usage, and memory usage. The process of using a feedforward neural network to perform secondary cross-validation on the multi-classification results based on the historical operational status data corresponding to the system anomalies includes: A softmax function is added to the hidden layer of the feedforward neural network model to generate a classifier; Based on the operation data in the historical operation status data and the set probability calculation formula, a multi-dimensional vector group is obtained; Based on the multidimensional vector group and classifier, a probability distribution composed of the probabilities of each category corresponding to each group of running data is obtained; The category with the highest probability in the probability distribution is selected as the classification result corresponding to the running data. The multi-classification results are subjected to secondary cross-validation based on the classification results corresponding to each group of operational data. Based on the historical operational status data corresponding to the system anomaly, a gradient boosting tree multi-classification algorithm is used to further classify the specific circumstances of the system anomaly, obtaining multi-classification results, including: The historical operating status data corresponding to the system anomalies are divided into a training set and a validation set; Based on the set second loss function, the gradient boosting tree multi-classification algorithm is used to train the training set to obtain a multi-classification tree; Based on the multi-classification tree, the corresponding probabilities of the sample data being classified into each category are obtained; The sample data is converted into the corresponding category based on the highest probability among each category to obtain multi-classification results; The trained multi-class tree is validated using a validation set. When the error meets the requirements, the training ends, and the specific situations of the power grid control system anomalies are reclassified to obtain the multi-class results; otherwise, the training is repeated. Calculate the probability of the running data corresponding to each category using the following formula: In the formula: Classification results Belongs to the The probability of class time; Total number of categories; Running data in the training set The derivative; Running data in the training set Set weight value ; : No. The proportion of each category; : No. The proportion of each category.