A method and system for early warning of faults in a synchronous condenser excitation system

By constructing a fault early warning method for synchronous condenser excitation systems that integrates multi-scale time-series networks and pre-set time attention, the problem of fault diagnosis under the condition of lack of fault samples is solved, accurate early warning of faults is achieved, and the stability and operational reliability of the power grid are improved.

CN120951186BActive Publication Date: 2026-06-30SOUTHEAST UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SOUTHEAST UNIV
Filing Date
2025-06-16
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing technologies struggle to accurately monitor the state of a synchronous condenser's excitation system in the absence of fault samples, leading to difficulties in fault diagnosis and impacting the stable operation of the power grid.

Method used

A fault early warning method for a synchronous condenser excitation system is constructed by integrating multi-scale temporal networks and pre-set temporal attention. Multi-scale temporal features are extracted by multi-branch temporal convolutional networks and long short-term memory networks, and pre-set temporal attention is introduced for feature fusion. The residual threshold is set by combining the exponential weighted moving average method to provide early fault warning.

Benefits of technology

It improves the accuracy of reactive power prediction in the excitation system, enables reliable early warning of faults, ensures stable operation of the synchronous condenser, and reduces maintenance costs.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN120951186B_ABST
    Figure CN120951186B_ABST
Patent Text Reader

Abstract

This invention proposes a method and system for early warning of faults in a synchronous condenser excitation system, belonging to the field of excitation system fault early warning technology. First, this invention constructs a multi-scale temporal series network model incorporating pre-set temporal attention. Multi-scale temporal features are extracted by parallel connecting a multi-branch temporal convolutional network and a long short-term memory network, and dynamic feature fusion is achieved using a time attention mechanism guided by prior knowledge. Then, the network model is trained using normal operating condition data to characterize the normal operating state of the excitation system. Finally, residual analysis is performed using an exponentially weighted moving average method to achieve early fault warning of the excitation system. This invention, combining a multi-scale temporal series network and pre-set temporal attention, significantly improves the prediction accuracy of the network model for the normal operating state of the excitation system, thereby achieving sensitive capture and early warning of early fault characteristics.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of excitation system fault early warning technology, specifically to a method and system for early warning of faults in a synchronous condenser excitation system that integrates multi-scale time-series networks and preset time attention. Background Technology

[0002] With the rapid development of ultra-high-voltage direct current (UHVDC) transmission, the risk of voltage instability caused by DC commutation failure and insufficient reactive power reserves in the DC receiving-end grid is becoming increasingly prominent. As the proportion of renewable energy generation continues to increase, the system's short-circuit capacity is decreasing, and DC faults in the sending-end grid can cause transient overvoltages, easily leading to large-scale disconnection of renewable energy units from the grid. Considering the grid instability issues brought about by UHVDC transmission and renewable energy grid integration, synchronous condensers with large-capacity dynamic reactive power output and the ability to improve system short-circuit capacity have once again attracted the attention of the power industry.

[0003] The excitation system, as the core control component of a synchronous condenser, plays a crucial role in enabling the condenser to rapidly provide reactive power support. When a fault occurs in the excitation system, failure to quickly locate and eliminate the fault poses a significant risk to the safe and stable operation of the power grid. In the past, fault diagnosis of the excitation system relied mainly on regular manual maintenance, which limited the information available from the monitoring points. With the continuous improvement of the performance of microprocessor-based excitation devices, these devices have not only added storage functions for switch and analog quantities but also integrated functions such as short-time waveform recording, action limitation, and fault alarms, facilitating further analysis of fault causes by maintenance personnel. However, due to the complex mapping relationship between fault symptoms and causes, maintenance personnel cannot quickly locate the fault source based solely on abnormal action information displayed by monitoring points and limited work experience, increasing the possibility of misdiagnosis or missed diagnosis. This significantly increases maintenance costs, extends maintenance cycles, and affects the stable operation of the synchronous condenser and the power grid.

[0004] The deterioration of excitation system equipment exhibits weak signs and slow changes, making traditional waveform recording mechanisms ineffective for monitoring. Furthermore, the limited availability of fault samples due to the redundancy switching mechanism of the excitation system and the short commissioning time of the SCADA system renders tag-based fault detection methods ineffective. Therefore, it is necessary to construct a time-series network model capable of representing normal operating conditions, predicting the reactive power of the excitation system, monitoring residual changes, and thus providing early warning of faults in the synchronous condenser excitation system. Summary of the Invention

[0005] The technical problem this invention aims to solve is to propose a fault early warning method and system for synchronous condenser excitation systems that integrates multi-scale temporal networks and pre-set time attention. Under conditions of insufficient fault samples, a temporal network model accurately representing the normal operating state is constructed. By introducing pre-set time attention, higher weights are assigned to input data at key time nodes, thereby improving the network model's prediction accuracy of reactive power in the excitation system. This provides a reliable basis for early warning of faults in the synchronous condenser excitation system, ensuring the stable and safe operation of the synchronous condenser.

[0006] To solve the above technical problems, the present invention adopts the following technical solution:

[0007] This invention first proposes a fault early warning method for a synchronous condenser excitation system that integrates multi-scale temporal networks and preset temporal attention, including the following steps:

[0008] S1. Collect SCADA data containing reactive power and related state variables under normal operating conditions of the synchronous condenser excitation system, perform data preprocessing on the raw data; determine the input time window length and output prediction step size, construct the dataset using the sliding window method, and divide it into training set, validation set, and test set.

[0009] S2. Construct a multi-scale temporal network model that integrates pre-set temporal attention. Extract multi-scale temporal features by connecting multi-branch temporal convolutional networks and long short-term memory networks in parallel, and introduce pre-set temporal attention to achieve dual-dimensional feature fusion.

[0010] S3. Train the network model using normal operating condition data, select the mean absolute loss function as the loss function, and update the network weights and biases using the gradient descent method.

[0011] S4. Calculate the residual between the predicted and measured reactive power output of the network, process the residual using the exponential weighted moving average method, and then determine the residual threshold; select real-time SCADA data as network input, and use residual analysis to realize early fault warning of the excitation system.

[0012] As a preferred embodiment of the method of this invention, a multi-branch temporal convolutional network extracts temporal features at different scales using an exponentially increasing dilation factor, and performs feature compression and concatenation on the outputs of each residual block using 1×1 convolution. For the output of the nth residual block... The features after 1×1 convolution compression are: Its s-th dimension feature has the value x′ at the t-th time step. s,t It can be represented as:

[0013]

[0014] in, The weight parameters of the convolution kernel achieve a linear mapping from input feature i to output feature s. For the bias term, s∈{1,2,...,D} n / 4} is the output feature dimension index, and t∈{1,2,...,T} is the time step index.

[0015] The features after each residual block is compressed by 1×1 convolution Concatenate along the feature dimension to obtain the concatenated feature matrix.

[0016]

[0017] Here, Concat(·) represents the feature concatenation operation. Building upon this, a global residual connection is introduced, its mathematical expression being as follows:

[0018]

[0019] in, D is the output feature matrix of a multi-branch temporal convolutional network. out This is the output feature dimension.

[0020] As a preferred embodiment of the method of the present invention, a time attention weight initialization method guided by prior knowledge is adopted, with an initial weight w. pre,t The expression is:

[0021]

[0022] Among them, b pre,1 and b pre,2 These are trainable coefficients, whose initial values ​​are set based on domain knowledge, ensuring that the initial weights decrease smoothly from the current time step to historical time steps. Dynamic temporal attention weights w are generated by combining sequence statistical features. stat,t :

[0023] w stat,t =σ(W stat x stat,t +b stat ).

[0024] in, Let be the statistical characteristic vector of reactive power at time t. This is the weight matrix. Let D be the bias vector. s The feature dimension of the statistical feature vector. The weights w determined by prior knowledge. pre Weights w adjusted with statistical characteristics stat The weights are then stacked and normalized using the Softmax function to obtain the final temporal attention weights λ. final :

[0025] w final =w pre +w stat ,

[0026] λ final =Softmax(w final ).

[0027] After completing the feature fusion along the temporal dimension, the fully connected layer is further used to calculate the feature attention weights, achieving dynamic fusion along the feature dimension. Its mathematical expression is:

[0028] w att =σ(W att2 ·ReLU(W att1 X fused +b att1 )+b att2 ).

[0029] Among them, W att1 and W att2 b is a trainable weight matrix att1 and b att2 Let ReLU(x) = max(x,0) be the corresponding bias vector, and let ReLU(x) = max(x,0) be the corrected linear unit function. D is the fused feature vector obtained by weighted summation along the time dimension. fused This represents the feature dimension output by the parallel network. The feature attention weights are normalized using the Softmax function, and the normalized feature attention weights λ are then... att With the corresponding fusion feature X fused Element-wise multiplication is used to enhance or suppress features, and the weighted feature vector X is output. con :

[0030] λ att =Softmax(w att ),

[0031] X con =λ att ⊙X fused .

[0032] Here, ⊙ represents element-wise multiplication.

[0033] As a preferred embodiment of the method of the present invention, a loss function L is designed. MAE for:

[0034]

[0035] Where, N S1 T represents the sample batch size. out To output the prediction step size, and y i,t These are the predicted and measured reactive power values ​​for the i-th sample at time step t, respectively.

[0036] As a preferred embodiment of the method of the present invention, statistical characteristic analysis is performed on the residual sequence under normal operating conditions, and an early warning threshold is set. The reactive power measurement value y is compared. t Compared with the predicted value Generate residual sequence And calculate the corresponding EWMA value Z. t :

[0037]

[0038]

[0039] Among them, Z t-1 and Z t The EWMA values ​​at times t-1 and t are λ. EW Z0 is the smoothing coefficient of EWMA, used to control the weight of historical data. The initial value of EWMA, Z0, is the mean of the residual sequence under normal operating conditions. The upper and lower limits of the EWMA control chart are:

[0040]

[0041]

[0042] Where UCL and LCL are the upper and lower limits of the EWMA control chart, respectively, and σ EW k is the standard deviation of the original residual sequence. EW This is the EWMA control limit coefficient.

[0043] After determining the residual warning threshold, the real-time acquired SCADA data undergoes the same preprocessing operations as in the offline phase. The processed data is then input into the trained time-series network model for reactive power prediction, and the residual between the predicted and measured values ​​is calculated in real time. When the residual exceeds the warning threshold multiple times consecutively, an abnormal alarm is triggered, thus providing early warning of excitation system faults.

[0044] Meanwhile, this invention proposes a fault early warning system for a synchronous condenser excitation system, comprising:

[0045] The data acquisition and processing unit collects SCADA data containing reactive power and related state variables under normal operating conditions of the synchronous condenser excitation system, performs data preprocessing on the raw data, clarifies the input time window length and output prediction step size, constructs the dataset using the sliding window method, and divides it into training set, validation set, and test set.

[0046] The neural network model building unit constructs a multi-scale temporal network model that integrates pre-set temporal attention. It extracts multi-scale temporal features by connecting multi-branch temporal convolutional networks and long short-term memory networks in parallel, and introduces pre-set temporal attention to achieve dual-dimensional feature fusion. The network model is trained using normal working condition data, and the mean absolute loss function is selected as the loss function. The gradient descent method is used to update the network weights and biases.

[0047] The residual threshold setting unit calculates the residual between the predicted and measured reactive power outputs of the network, processes the residual using the exponentially weighted moving average method, and then determines the residual threshold.

[0048] The early warning output unit selects real-time SCADA data as input to the trained time-series network model for reactive power prediction and calculates the residual between the predicted and measured values. When the residual exceeds the early warning threshold multiple times consecutively, an abnormal alarm is triggered, thus completing the early warning of excitation system faults.

[0049] Finally, the present invention also provides a computer-readable storage medium storing computer instructions for causing the computer to perform the steps of the method of the present invention.

[0050] The present invention adopts the above technical solution and has the following technical effects compared with the prior art:

[0051] (1) The multi-branch temporal convolutional network model proposed in this invention uses a multi-branch parallel structure to extract local temporal features of different receptive fields, and uses 1×1 convolution to compress and fuse the features, thereby enhancing the feature expression ability while controlling the computational complexity, and achieving a balance between the ability to model complex temporal data and computational efficiency.

[0052] (2) This invention performs dual-dimensional feature fusion of time dimension and feature dimension on the deep features output by the temporal network, thereby improving the model's ability to model complex temporal relationships.

[0053] (3) This invention combines prior knowledge-guided weight initialization with data-driven dynamic adjustment mechanism. It presets the initial decay trend of time attention weights through domain expert experience and dynamically corrects them in combination with real-time statistical features. This effectively balances the long-term dependence of time-series features with the capture of short-term key information, avoids the problem of overfitting training samples, and thus improves the stability and generalization ability of model training. Attached Figure Description

[0054] Figure 1 This is a flowchart illustrating the implementation process of the present invention.

[0055] Figure 2 This is the overall framework of the temporal network model in this embodiment of the invention.

[0056] Figure 3 This is a comparison chart of reactive power prediction curves for different network models in embodiments of the present invention.

[0057] Figure 4 It is the residual sequence output by the time-series network model under fault data in the embodiments of the present invention. Detailed Implementation

[0058] The technical solutions in the embodiments of the present invention will be further described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. The following embodiments are only used to more clearly illustrate the technical solutions of the present invention, and should not be used to limit the scope of protection of the present invention.

[0059] Example 1: This example proposes a fault early warning method for a synchronous condenser excitation system that integrates multi-scale temporal networks and preset time attention (PTA), such as... Figure 1 As shown, the specific steps are as follows:

[0060] Step 1: Collect SCADA data of the synchronous condenser excitation system under normal operating conditions, including reactive power and related state variables. Using Spearman correlation analysis, calculate the correlation coefficient between each state variable and reactive power, and select state variables with an absolute correlation coefficient greater than 0.5. Combine these with the reactive power data to form the original dataset. Perform data preprocessing on this dataset, including outlier removal, missing value imputation, moving average filtering, and standardization. Different units of measurement can cause significant differences in the distribution of input data, thus affecting the model training effect. Therefore, maximum-minimum normalization is used to standardize the filtered data. All data are scaled to the unit interval [0,1] through linear mapping, as shown in the following formula:

[0061]

[0062] Where, x norm The normalized value, x max and x min The filtered data x are respectively filter The maximum and minimum values.

[0063] Step 2: Select reactive power as the target variable to be predicted, calculate its mean, rate of change, and other time-series statistical characteristics, and combine them with the original reactive power and other key state parameters to form a multi-dimensional input feature. The historical statistical characteristics of reactive power include the mean feature μ. t and slope feature β t The specific construction method is as follows:

[0064]

[0065]

[0066] Where, x t Let μ be the reactive power value at time t. t and β t Let be the mean and slope of the reactive power during the time window from time t to time T, respectively. When t = T, β T =0. During sample construction, a sliding window method is used to generate samples from the preprocessed time-series data. The specific method is as follows: the sliding window width is set to T = 20 time steps, the window sliding step size is 5, and the future T... out =Reactive power over 5 time steps. The input feature matrix of the sample is defined as follows: The corresponding output sequence is Where the input feature dimension D x =11. 4020 normal operating state sample data were selected for sliding window sample construction, resulting in 800 sets of samples, which were divided into training, validation, and test sets in a 7:1:2 ratio. The training set was used for iterative optimization of network weight parameters, the validation set was used to adjust hyperparameters to prevent overfitting, and the test set was used to evaluate the model's predictive performance and generalization ability.

[0067] Step 3, construct as follows Figure 2 The multi-scale temporal network model shown integrates pre-set temporal attention. In the parallel network feature extraction module, a multi-branch temporal convolutional network is used to extract local temporal features, and an LSTM network uses gated units to model long-term time series dependencies. Together, they construct a multi-scale temporal feature space.

[0068] Multi-branch temporal convolutional networks extract temporal features at different scales using exponentially increasing dilation factors, and perform feature compression and concatenation on the outputs of each residual block using 1×1 convolutions. For the output of the nth residual block... The features after 1×1 convolution compression are: Its s-th dimension feature has the value x′ at the t-th time step. s,t It can be represented as:

[0069]

[0070] in, The weight parameters of the convolution kernel achieve a linear mapping from input feature i to output feature s. For the bias term, s∈{1,2,...,D} n / 4} is the output feature dimension index, and t∈{1,2,...,T} is the time step index.

[0071] The features after each residual block is compressed by 1×1 convolution Concatenate along the feature dimension to obtain the concatenated feature matrix.

[0072]

[0073] Here, Concat(·) represents the feature concatenation operation. Building upon this, a global residual connection is introduced, its mathematical expression being as follows:

[0074]

[0075] in, D is the output feature matrix of a multi-branch temporal convolutional network. out This is the output feature dimension.

[0076] Step 4: Perform two-dimensional feature fusion (temporal and feature dimensions) on the deep features output by the parallel network. A prior knowledge-guided temporal attention weight initialization method is used, with initial weights w. pre,t The expression is:

[0077]

[0078] Among them, b pre,1 and b pre,2 These are trainable coefficients, whose initial values ​​are set based on domain knowledge, ensuring that the initial weights decrease smoothly from the current time step to historical time steps. Dynamic temporal attention weights w are generated by combining sequence statistical features. stat,t :

[0079] w stat,t =σ(W stat x stat,t +b stat ).

[0080] in, Let be the statistical characteristic vector of reactive power at time t. This is the weight matrix. Let D be the bias vector. s The feature dimension of the statistical feature vector. The weights w determined by prior knowledge. pre Weights w adjusted with statistical characteristics stat The weights are then stacked and normalized using the Softmax function to obtain the final temporal attention weights λ. final :

[0081] w final =w pre +w stat ,

[0082] λ final =Softmax(w final ).

[0083] After completing the feature fusion along the temporal dimension, the fully connected layer is further used to calculate the feature attention weights, achieving dynamic fusion along the feature dimension. Its mathematical expression is:

[0084] w att =σ(W att2 ·ReLU(W att1 X fused +b att1 )+b att2 ).

[0085] Among them, W att1 and W att2 b is a trainable weight matrix att1 and b att2 Let ReLU(x) = max(x,0) be the corresponding bias vector, and let ReLU(x) = max(x,0) be the corrected linear unit function. D is the fused feature vector obtained by weighted summation along the time dimension. fused This represents the feature dimension output by the parallel network. The feature attention weights are normalized using the Softmax function, and the normalized feature attention weights λ are then... att With the corresponding fusion feature X fused Element-wise multiplication is used to enhance or suppress features, and the weighted feature vector X is output. con :

[0086] λ att =Softmax(w att ),

[0087] X con =λ att ⊙X fused .

[0088] Here, ⊙ represents element-wise multiplication.

[0089] Step 5: Design the loss function L MAE for:

[0090]

[0091] Where, N S1 T represents the sample batch size. out To output the prediction step size, and y i,t These are the predicted and measured reactive power values ​​for the i-th sample at time step t, respectively.

[0092] Step 6: Perform statistical characteristic analysis on the residual sequence under normal operating conditions and set early warning thresholds. Compare the reactive power measurement values ​​y. t Compared with the predicted value Generate residual sequence And calculate the corresponding EWMA value Z. t :

[0093]

[0094]

[0095] Among them, Z t-1 and Z t The EWMA values ​​at times t-1 and t are λ. EW Z0 is the smoothing coefficient of EWMA, used to control the weight of historical data. The initial value of EWMA, Z0, is the mean of the residual sequence under normal operating conditions. The upper and lower limits of the EWMA control chart are:

[0096]

[0097]

[0098] Where UCL and LCL are the upper and lower limits of the EWMA control chart, respectively, and σ EW k is the standard deviation of the original residual sequence. EW This is the EWMA control limit coefficient.

[0099] This embodiment uses Python for simulation testing, requiring the following Python libraries and modules to be imported: Pandas, NumPy, PyTorch, Matplotlib, torch.nn, and torch.optim. Regarding network model parameter configuration, considering both prediction accuracy and training time, the model training parameters are set as follows: batch size of 32, maximum number of iterations of 200, Adam algorithm selected as the optimizer, and learning rate set to 0.001. The main network structure consists of a single-layer LSTM network and a 3-branch multi-path temporal convolutional network. The LSTM has 64 hidden units, and the kernel size of each residual block in the multi-path temporal convolutional network is set to 3, with 32 kernels in total. A progressively increasing dilation factor (1, 2, 4) is used to expand the receptive field.

[0100] To further illustrate the advantages of the proposed MSTN-PTA model in reactive power prediction and its effectiveness in early fault warning, it is compared with various time-series network models, including LSTM and CNN-LSTM (concatenated structure). Specific model parameter settings are as follows: the LSTM has 64 hidden units, and the CNN part uses a 3-layer structure, with 32 convolutional kernels of size 3×1 in each layer. The comparative experiments used a uniform training configuration to ensure fairness; the number of iterations, training batch size, gradient optimization algorithm, and loss function were all consistent with the proposed model. The comprehensive comparison results of the prediction performance of each model are shown in Table 1, and the reactive power prediction curves of different models are compared as follows. Figure 3 As shown.

[0101] Table 1. Overall Comparison of Predictive Performance of Different Models

[0102]

[0103]

[0104] The model incorporating a CNN feature extraction module showed a reduction in prediction error compared to the LSTM model. This indicates that convolutional neural networks can effectively extract deep features from time-series data, thereby improving prediction accuracy. Figure 3 It can be seen that the MSTN-PTA model has the best fitting effect and can accurately capture the temporal features, while the prediction results of the other models all have large deviations from the reactive power measurement values ​​in some time periods. In summary, by synergistically utilizing a multi-scale parallel network structure and introducing a feature fusion strategy with pre-set time attention, the proposed model performs better in prediction accuracy and provides a feasible solution for reactive power prediction of excitation systems.

[0105] To verify the early warning capability of the proposed method for early failures, Figure 4 The residual distribution results including fault data are shown. The sample index for the fault occurrence period in the figure is from 400 to 700. The MSTN-PTA model is used to predict the reactive power of the excitation system, and combined with residual analysis, it can provide early warning of potential faults when the traditional regulator recording mechanism is not triggered, while maintaining a low false alarm rate under normal operating conditions, which significantly improves the reliability of condition monitoring.

[0106] Example 2: This example proposes a fault early warning system for a synchronous condenser excitation system, comprising the following units:

[0107] The data acquisition and processing unit collects SCADA data containing reactive power and related state variables under normal operating conditions of the synchronous condenser excitation system, performs data preprocessing on the raw data, clarifies the input time window length and output prediction step size, constructs the dataset using the sliding window method, and divides it into training set, validation set, and test set.

[0108] The neural network model building unit constructs a multi-scale temporal network model that integrates pre-set temporal attention. It extracts multi-scale temporal features by connecting multi-branch temporal convolutional networks and long short-term memory networks in parallel, and introduces pre-set temporal attention to achieve dual-dimensional feature fusion. The network model is trained using normal working condition data, and the mean absolute loss function is selected as the loss function. The gradient descent method is used to update the network weights and biases.

[0109] The residual threshold setting unit calculates the residual between the predicted and measured reactive power outputs of the network, processes the residual using the exponentially weighted moving average method, and then determines the residual threshold.

[0110] The early warning output unit selects real-time SCADA data as input to the trained time-series network model for reactive power prediction and calculates the residual between the predicted and measured values. When the residual exceeds the early warning threshold multiple times consecutively, an abnormal alarm is triggered, thus completing the early warning of excitation system faults.

[0111] Example 3: This example proposes a computer-readable storage medium storing a computer program thereon. When the computer program is executed by a processor, it implements the steps of the method described in this invention, which will not be repeated here.

[0112] It should be noted that the processing flows of Embodiments 2 and 3 correspond to the specific steps of the method provided in the embodiments of the present invention, and have the corresponding functional modules and beneficial effects of the method. Technical details not described in detail in this embodiment can be found in the method provided in the embodiments of the present invention.

[0113] The program code used to implement the methods of this application may be written in any combination of one or more programming languages. This program code may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing device, such that when executed by the processor or controller, the functions / operations specified in the flowcharts and / or block diagrams are implemented. The program code may be executed entirely on a machine, partially on a machine, as a standalone software package partially on a machine and partially on a remote machine, or entirely on a remote machine or server.

[0114] In the context of this application, a machine-readable medium can be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium can be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media can be, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fibers, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

[0115] In the description of this specification, references to terms such as "an embodiment," "example," "specific example," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of the invention. In this specification, illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples.

[0116] The foregoing has shown and described the basic principles, main features, and advantages of the present invention. Those skilled in the art should understand that the present invention is not limited to the above embodiments. The embodiments and descriptions in the specification are merely illustrative of the principles of the invention. Various changes and modifications can be made to the invention without departing from its spirit and scope, and all such changes and modifications fall within the scope of the claimed invention.

Claims

1. A fault early warning method for a synchronous condenser excitation system integrating multi-scale temporal networks and preset temporal attention, characterized in that, Includes the following steps: S1. Collect SCADA data containing reactive power and related state variables of the synchronous condenser excitation system under normal operating conditions, and preprocess the raw data; determine the input time window length and output prediction step size, construct the dataset using the sliding window method, and divide it into training set, validation set and test set; S2. Construct a multi-scale temporal network model that integrates pre-set temporal attention. Extract multi-scale temporal features by connecting multi-branch temporal convolutional networks and long short-term memory networks in parallel, and introduce pre-set temporal attention to achieve dual-dimensional feature fusion. S3. Train the network model using normal operating condition data, select the mean absolute loss function as the loss function, and update the network weights and biases using the gradient descent method. S4. Calculate the residual between the predicted and measured reactive power output of the network, process the residual using the exponentially weighted moving average method, and then determine the residual threshold. Real-time SCADA data is selected as network input, and residual analysis is used to achieve early fault warning of the excitation system. In step S2, the multi-branch temporal convolutional network uses an exponentially increasing dilation factor to extract temporal features at different scales, and uses 1×1 convolution to compress and stitch together the output of each residual block. The features after each residual block is compressed by 1×1 convolution Concatenate along the feature dimensions to obtain the concatenated feature matrix: ,in, Indicates feature concatenation operation; Based on this, a global residual join is introduced, the mathematical expression of which is as follows: , in, D is the output feature matrix of a multi-branch temporal convolutional network. out The output feature dimension is T, and the time step of the input sequence is T. Step S2 introduces a pre-defined temporal attention mechanism to achieve dual-dimensional feature fusion. Specifically, this involves using a prior knowledge-guided temporal attention weight initialization method, with initial weights... The expression is: , Among them, b pre,1 and b pre,2 These are trainable coefficients, with initial values ​​set based on domain knowledge. t represents time step, and T is the time step size of the input sequence. This ensures that the initial weights decrease smoothly from the current time step to historical time steps. Dynamic temporal attention weights are generated by combining sequence statistical features. , in, This represents the sigmoid function. , Let be the statistical characteristic vector of reactive power at time t. This is the weight matrix. Let D be the bias vector. s The feature dimension of the statistical feature vector; The weights w determined by prior knowledge pre Weights w adjusted with statistical characteristics stat The weights are then stacked and normalized using the Softmax function to obtain the final temporal attention weights. : , 。 2. The method according to claim 1, characterized in that, In step S2, after completing the feature fusion along the time dimension, the fully connected layer is further used to calculate the feature attention weights, achieving dynamic fusion along the feature dimension. Its mathematical expression is: , Among them, W att1 and W att2 b is a trainable weight matrix att1 and b att2 Let ReLU(x) = max(x, 0) be the corresponding bias vector, and let ReLU(x) = max(x, 0) be the corrected linear unit function. D is the fused feature vector obtained by weighted summation along the time dimension. fused The feature dimensions output by the parallel network are normalized using the Softmax function to normalize the feature attention weights. Simultaneously, the normalized feature attention weights With corresponding fusion features Element-wise multiplication is used to enhance or suppress features, and the weighted feature vector is output. : , , in, This indicates the element-wise multiplication operation.

3. The method according to claim 1, characterized in that, In step S3, the loss function is designed. for: , in, T represents the sample batch size. out To output the prediction step size, and These are the predicted and measured reactive power values ​​for the i-th sample at time step t, respectively.

4. The method according to claim 1, characterized in that, In step S4, statistical characteristic analysis is performed on the residual sequence under normal operating conditions, a residual early warning threshold is set, and the reactive power measurement values ​​are compared. Compared with the predicted value Generate residual sequence And calculate the corresponding EWMA value. : , , in, and They are respectively and the EWMA value at time t, The smoothing coefficient of EWMA is used to control the weight of historical data; the initial value of EWMA. The upper and lower limits of the EWMA control chart are: (The value is the mean of the residual sequence under normal operating conditions.) , , Wherein, UCL and LCL are the upper and lower limits of the EWMA control chart, respectively. The standard deviation of the original residual sequence. This is the EWMA control limit coefficient.

5. The method according to claim 1, characterized in that, In step S4, after determining the residual warning threshold, the real-time acquired SCADA data undergoes the same preprocessing operations as in the offline stage. The processed data is then input into the trained time-series network model for reactive power prediction. The residual between the predicted and measured values ​​is calculated in real time. When the residual continuously exceeds the warning threshold, a warning is issued. At this time, an abnormal alarm is triggered, providing an early warning of excitation system failure.

6. The method according to claim 1, characterized in that, For the nth residual block output The features after 1×1 convolution compression are The value of its s-th dimension feature at the t-th time step Represented as: , in, The weight parameters of the convolution kernel achieve a linear mapping from input feature i to output feature s. Indicates the first 3D input features in the 1st dimension The value at each time step For bias terms, To output the feature dimension index, Here, T is the time step index, T is the time step size of the input sequence, and D is the time step index. n This represents the feature dimension output by the corresponding residual block.

7. A fault early warning system for a synchronous condenser excitation system, characterized in that, include: The data acquisition and processing unit collects SCADA data containing reactive power and related state variables under normal operating conditions of the synchronous condenser excitation system, performs data preprocessing on the raw data, clarifies the input time window length and output prediction step size, constructs the dataset using the sliding window method, and divides it into training set, validation set, and test set. The neural network model building unit constructs a multi-scale temporal network model that integrates pre-set temporal attention. It extracts multi-scale temporal features by connecting multi-branch temporal convolutional networks and long short-term memory networks in parallel, and introduces pre-set temporal attention to achieve dual-dimensional feature fusion. The network model is trained using normal working condition data, and the mean absolute loss function is selected as the loss function. The gradient descent method is used to update the network weights and biases. The residual threshold setting unit calculates the residual between the predicted and measured reactive power outputs of the network, processes the residual using the exponentially weighted moving average method, and then determines the residual threshold. The early warning result output unit selects real-time running SCADA data as input to the trained time series network model to predict reactive power and calculates the residual between the predicted value and the measured value. When the residual exceeds the early warning threshold multiple times in a row, an abnormal alarm is triggered to complete the early warning of excitation system faults. In the neural network model building unit, the multi-branch temporal convolutional network uses an exponentially increasing dilation factor to extract temporal features at different scales, and uses 1×1 convolution to compress and splice the features of each residual block output. The features after each residual block is compressed by 1×1 convolution Concatenate along the feature dimensions to obtain the concatenated feature matrix: ,in, Indicates feature concatenation operation; Based on this, a global residual join is introduced, the mathematical expression of which is as follows: , in, D is the output feature matrix of a multi-branch temporal convolutional network. out The output feature dimension is T, and the time step of the input sequence is T. The neural network model building unit introduces pre-set temporal attention to achieve two-dimensional feature fusion. Specifically, it uses a prior knowledge-guided temporal attention weight initialization method, with initial weights... The expression is: , Among them, b pre,1 and b pre,2 These are trainable coefficients, with initial values ​​set based on domain knowledge. t represents time step, and T is the time step size of the input sequence. This ensures that the initial weights decrease smoothly from the current time step to historical time steps. Dynamic temporal attention weights are generated by combining sequence statistical features. , in, This represents the sigmoid function. , Let be the statistical characteristic vector of reactive power at time t. This is the weight matrix. Let D be the bias vector. s The feature dimension of the statistical feature vector; The weights w determined by prior knowledge pre Weights w adjusted with statistical characteristics stat The weights are then stacked and normalized using the Softmax function to obtain the final temporal attention weights. : , 。 8. A computer-readable storage medium storing computer instructions, characterized in that, The computer instructions are used to cause the computer to perform the method according to any one of claims 1-6.