A medium-voltage vacuum circuit breaker mechanical fault diagnosis method based on a Transformer framework algorithm

By using a dynamic block and dual-path encoder method based on the Transformer framework algorithm, the problem of insufficient time sequence correlation analysis between multiple current signals in the mechanical fault diagnosis of medium-voltage circuit breakers is solved, realizing in-depth and intelligent diagnosis of the health status of circuit breakers and improving the comprehensiveness and accuracy of fault identification.

CN122241476APending Publication Date: 2026-06-19STATE GRID SHANGHAI MUNICIPAL ELECTRIC POWER CO +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
STATE GRID SHANGHAI MUNICIPAL ELECTRIC POWER CO
Filing Date
2026-03-20
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing mechanical fault diagnosis methods for medium-voltage circuit breakers lack dynamic temporal correlation analysis between multiple current signals, making it difficult to adapt to complex and ever-changing fault modes. Furthermore, traditional methods struggle to effectively handle long-term sequence data and complex dependencies between multiple variables, resulting in insufficient diagnostic accuracy.

Method used

A method based on the Transformer framework algorithm is adopted, which divides the time sequence of multi-channel current into adaptive blocks through a dynamic block model, and uses a dual-path encoder for feature extraction. Combining physical constraints, temporal order, and causal indication, cross-channel feature extraction and aggregation are realized to obtain high-order features for fault category prediction.

Benefits of technology

It significantly improves the comprehensiveness and accuracy of fault identification, enhances the ability to conduct in-depth and intelligent diagnosis of the health status of circuit breakers, and strengthens the sensitivity to local transient fault characteristics and early warning capabilities.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122241476A_ABST
    Figure CN122241476A_ABST
Patent Text Reader

Abstract

This invention relates to a mechanical fault diagnosis method for medium-voltage vacuum circuit breakers based on the Transformer framework algorithm, belonging to the field of power grid automation technology. It solves the problems of incomplete feature extraction and insufficient diagnostic accuracy in existing circuit breaker mechanical fault technologies. Specific steps include: acquiring the time sequence of multiple currents during the opening and closing process of the medium-voltage vacuum circuit breaker; using a dynamic block model to divide the time sequence of multiple currents into multiple event sequence stages according to current characteristics, obtaining several adaptive blocks corresponding to each current; inputting the block embedding features extracted from each adaptive block of each current through a linear projection layer into a Transformer-based dual-path encoder for mechanical fault feature extraction, obtaining high-order features corresponding to each current; aggregating the high-order features of the multiple currents and performing fault category prediction to obtain the diagnostic result of the circuit breaker mechanical fault, reducing false alarms and missed faults caused by missing information.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of power grid automation technology, and in particular to a method for diagnosing mechanical faults in medium-voltage vacuum circuit breakers based on the Transformer framework algorithm. Background Technology

[0002] With rapid socio-economic development and continuous improvement in industrial levels, the demand for electricity is increasing daily, leading to unprecedented investment in electrical equipment. Therefore, the safe and stable operation of electrical equipment is of paramount importance. Medium-voltage circuit breakers (12kV~40.5kV) are among the most widely used and largest-scale electrical equipment in the power supply and distribution sector, and are extensively applied in power, industry, transportation, and military industries.

[0003] Circuit breakers have complex mechanical structures, are prone to various types of mechanical failures, and are susceptible to causing extremely serious concurrent accidents. Currently, the commonly used detection methods both domestically and internationally mainly include offline periodic maintenance and online real-time monitoring. Offline periodic maintenance is primarily preventative and indiscriminate maintenance, which can play a certain role in preventing equipment failures.

[0004] Traditional methods typically involve real-time acquisition of trip coil and closing coil currents in online monitoring, followed by fault diagnosis based on extracted time-domain current features. However, this feature extraction method is limited in scope and expressive power, neglecting factors such as the circuit breaker's energy storage motor current, blocking coil current, and total circuit current. Furthermore, existing methods lack systematic analysis of the dynamic temporal correlations between multiple current signals, making them ill-suited for complex and varied fault modes, particularly performing poorly with multivariate and non-stationary current signals. Moreover, current diagnostic models are often based on shallow machine learning methods or simple neural networks, which struggle to effectively handle long-term sequential data and complex dependencies between multiple variables, resulting in weak model generalization capabilities. Summary of the Invention

[0005] Based on the above analysis, the present invention aims to provide a mechanical fault diagnosis method for medium-voltage vacuum circuit breakers based on the Transformer framework algorithm, in order to solve the problems of incomplete extraction of mechanical fault features and insufficient diagnostic accuracy in existing technologies.

[0006] The objective of this invention is mainly achieved through the following technical solutions: This invention provides a method for diagnosing mechanical faults in medium-voltage vacuum circuit breakers based on the Transformer framework algorithm, comprising the following steps: Obtain the timing sequence of multiple currents during the opening and closing process of a medium-voltage vacuum circuit breaker; The time sequence of multiple currents is divided into multiple event sequence stages according to the current characteristics using a dynamic block model, resulting in several adaptive blocks corresponding to each current. The block embedding features extracted from each adaptive block of each current path through the linear projection layer are input into the Transformer-based dual-path encoder to extract mechanical fault features, thereby obtaining the high-order features corresponding to each current path. After aggregating the higher-order characteristics of the multiple currents, fault category prediction is performed to obtain the diagnostic results of the circuit breaker's mechanical faults.

[0007] Furthermore, the dual-path encoder includes one main encoding module and one relationship processing module; wherein, The relationship processing module receives the block embedding features of each current path, obtains the cross-channel relationship between the block embedding features through a shared multilayer perceptron network, and obtains the relationship control parameters. The main encoding module is a multi-channel parallel Transformer encoder. In conjunction with the relational control parameters, the Transformer encoder of each channel extracts the block embedding features of its corresponding current to obtain the higher-order features of the corresponding current. The higher-order features fuse the information of all block embedding features in this channel and fuse the information of block embedding features of other channels through the relational control parameters.

[0008] Furthermore, the cross-channel relationships include physical constraints, temporal sequences, and causal indicative relationships; wherein, The physical constraint relationship is supervised learning using event sequence labels; the temporal sequence relationship is supervised learning using the time interval between events during normal operation of the circuit breaker; the causal indication relationship is supervised learning using the propagation direction of a known fault; the physical constraint relationship, temporal sequence relationship, and causal indication relationship are fused and encoded into an association weight matrix and a channel importance bias vector to obtain the relationship control parameters.

[0009] Furthermore, the higher-order features are obtained by any channel Transformer encoder based on the following process: The input embedded features are linearly projected to obtain the query vector, key vector, and value vector; The conditional attention score is obtained by element-wise multiplying the dot product matrix of the query vector and the key vector with the association weight matrix and combining it with the channel importance bias vector. The higher-order features are obtained by performing multi-head attention aggregation based on the conditional attention score and value vector.

[0010] Furthermore, the diagnostic results for the mechanical fault of the circuit breaker are obtained, including: The higher-order features of each current path are globally aggregated using attention-weighted global pooling to obtain globally fused features; The classifier is used to predict the mechanical fault category of the circuit breaker based on the global fusion features, and the diagnostic results are obtained.

[0011] Furthermore, the aforementioned adaptive blocks are obtained based on the following process: The current sequence is sliced ​​using a fixed window length and step size to obtain several initial block sequence positions; Current characteristic analysis is performed on each of the initial block sequences to detect local extreme points within the blocks; Using each initial block sequence and its corresponding local extreme point as input, the dynamic block model is used to predict the event sequence stage of each initial block sequence during the opening or closing process, thereby obtaining the center offset and width adjustment of the corresponding block sequence position. The position of the corresponding initial fast sequence is adjusted based on the center offset and width adjustment to obtain the adaptive block.

[0012] Furthermore, the dynamic block-based model constructs a training sample set based on the following process: The historical current time series of each path is evenly divided into blocks to obtain the positions of several training blocks; By global analysis, the key phase points of each event sequence stage in the historical current time series are obtained, and the interval between two adjacent key phase points is defined as an ideal block to obtain the position of each ideal block. Calculate the overlap between each training block and all ideal blocks. Based on the position of the ideal block with the highest overlap and the position of the corresponding training block, obtain the adjustment amount of the training block, including the center offset and the width adjustment. The adjustment amount is used as the label of the corresponding training block to construct a training sample set.

[0013] Furthermore, the multiple currents include the trip coil current, the closing coil current, the energy storage motor current, the lockout coil current, and the main circuit total current.

[0014] Furthermore, the event sequence stages corresponding to the trip coil current include coil energization, core movement, and coil de-energization; the event sequence stages corresponding to the closing coil current include coil energization, core movement, current holding, and coil de-energization; the event sequence stages corresponding to the energy storage motor current include motor start-up, energy storage operation, and motor stop; the event sequence stages corresponding to the blocking coil current include blocking signal establishment, signal holding, and signal return; and the time sequence stages corresponding to the main circuit total current include current zero value, current establishment, steady-state load, and current cutoff.

[0015] Further, the following preprocessing is performed on the currents collected synchronously to obtain the time series of each current: Denoise each current using a Butterworth filter to obtain the denoised current; Perform amplitude normalization and time alignment on the denoised current to obtain the time series.

[0016] Compared with the prior art, the present invention can achieve at least one of the following beneficial effects: 1. By synchronously collecting multiple key current signals during the opening and closing processes of a medium-voltage vacuum circuit breaker, a multi-dimensional perception system covering the complete operation chain of "energy preparation - command execution - mechanical locking - main circuit on / off" is constructed. A dynamic block strategy is introduced to convert the synchronously extracted current field time series into an adaptive block form, and a dual-path encoder based on Transformer is combined for feature extraction, realizing in-depth and intelligent diagnosis of the health status of the circuit breaker. This method significantly improves the comprehensiveness, accuracy, and early warning ability of fault identification, and has important engineering application value.

[0017] 2. The constructed dual-path encoder is based on the Transformer architecture of "channel-independent encoding + relationship-aware injection", breaking through the limitations of traditional methods in modeling time series relationships, and achieving accurate characterization of cross-channel and long-distance causal dependencies, laying a foundation for subsequent fault diagnosis.

[0018] 3. The introduced dynamic block strategy, on the one hand, the dynamic adjustment mechanism ensures that transient key events such as "closing coil current spike" and "starting current inrush of the energy storage motor" can be completely included in a single dynamic block, avoiding feature fragmentation caused by fixed blocking; on the other hand, it improves the computational efficiency of subsequent feature extraction, converting the original sequence of length T into a dynamic block sequence with a number much less than T, reducing the self-attention computational complexity of Transformer from O(T²) to O(N²) (N is the number of blocks, N << T), and greatly improving the feasibility of processing long sequences while ensuring accuracy.

[0019] In the present invention, the above technical solutions can also be combined with each other to achieve more preferred combination schemes. Other features and advantages of the present invention will be described in the subsequent specification, and some advantages can be made obvious from the specification, or understood by implementing the present invention. The objectives and other advantages of the present invention can be realized and obtained from the content specifically pointed out in the specification and the drawings. Description of the Drawings

[0020] The drawings are only for the purpose of showing specific embodiments and are not considered as a limitation to the present invention. Throughout the drawings, the same reference signs represent the same components.

[0021] Figure 1 This is a flowchart of the mechanical fault diagnosis method for vacuum circuit breakers in an embodiment of the present invention; Figure 2 This is a general block diagram of the mechanical fault diagnosis method for vacuum circuit breakers in an embodiment of the present invention; Figure 3 This is a structural block diagram of the dynamic block division strategy according to an embodiment of the present invention; Figure 4 This is a schematic diagram of the structure of a dual-path encoder according to an embodiment of the present invention. Detailed Implementation

[0022] Preferred embodiments of the present invention will now be described in detail with reference to the accompanying drawings, which form part of this application and are used together with the embodiments of the present invention to illustrate the principles of the present invention, but are not intended to limit the scope of the present invention.

[0023] Example 1 A specific embodiment of the present invention discloses a method for diagnosing mechanical faults in medium-voltage vacuum circuit breakers based on the Transformer framework algorithm, such as... Figure 1 As shown, it includes the following steps: Step S1: Obtain the timing sequence of multiple currents during the opening and closing process of the medium-voltage vacuum circuit breaker; Step S2: Using a dynamic block model, the time sequence of multiple currents is divided into multiple event sequence stages according to the current characteristics, resulting in several adaptive blocks corresponding to each current. Step S3: Input the block embedding features extracted from each adaptive block of each current through the linear projection layer into the Transformer-based dual-path encoder to extract mechanical fault features and obtain the high-order features corresponding to each current. Step S4: After aggregating the higher-order characteristics of the multiple currents, perform fault category prediction to obtain the diagnostic results of the circuit breaker mechanical faults.

[0024] The above method adaptively divides the synchronously acquired multi-current sequences into blocks using a dynamic block model. Combined with the use of a Transformer-based dual-path encoder to extract features from the adaptive blocks of each current, the complexity is reduced while the sensitivity to local transient fault features is significantly enhanced. After fusing the extracted features, fault feature classification is performed, which improves the accuracy and completeness of circuit breaker fault diagnosis.

[0025] Specifically, in step S1, in order to obtain comprehensive and high-quality input data, the present invention abandons the traditional approach of monitoring only a single or dual signal and constructs a multi-channel synchronous acquisition system to extract the current timing sequence within the opening or closing cycle after preprocessing the multi-channel current signals.

[0026] Taking five-channel synchronous acquisition as an example, during circuit breaker operation, the current of the opening coil, the current of the closing coil, the operating current of the energy storage motor, the current of the interlocking coil, and the total current of the main circuit are acquired in parallel. These five currents together depict the complete operation chain from energy preparation (energy storage motor), command execution (opening / closing coils), mechanical interlocking to the final on / off state of the main circuit, laying the foundation for analyzing faults in any link.

[0027] To ensure the fidelity and synchronization of the multi-channel current signals, all five current channels utilize synchronous data acquisition cards with a sampling rate of no less than 10kHz to capture transient processes at the millisecond or even microsecond level. Preprocessing of the acquired five current signals includes both analog and digital conditioning. Analog conditioning includes signal isolation and preliminary filtering to ensure the safety of subsequent circuits and suppress high-frequency noise. Digital conditioning, performed in an embedded processor or host computer, mainly involves three steps: First, an adjustable cutoff frequency Butterworth filter is applied to effectively eliminate power frequency interference and irrelevant high-frequency noise, retaining the main frequency band reflecting mechanical motion characteristics (typically DC to several kilohertz). Then, amplitude normalization is performed, independently calculating the baseline value and typical fluctuation range of each of the five current channels within a calibration period. A minimum-maximum scaling normalization method is used to map all data to a unified numerical range, eliminating scale differences caused by different sensor ratios or physical dimensions, allowing the model to focus on learning waveform morphology and relative changes. Finally, precise time alignment is performed, using the pulse of the opening or closing command issued in the control system as the absolute time reference, covering a complete "command-response-steady-state" operation cycle. The end of this cycle is not simply set to a fixed duration, but is adaptively determined based on the characteristics of the current signal itself: for example, when the current in all channels decays and stabilizes at the standby noise level, and the duration exceeds a preset threshold, the system determines the end of the operation cycle. For different types of circuit breakers (such as vacuum circuit breakers and SF6 circuit breakers), their typical mechanical action times differ. A reasonable sample time length can be automatically matched according to the equipment model or determined through learning (e.g., 80-150 milliseconds for spring-operated mechanisms), thereby ensuring that each timing sequence completely encapsulates an independent opening or closing event in the time dimension. Furthermore, a small time delay compensation is applied to the data of each channel to ensure that the timing relationship of cross-channel events (such as motor starting and coil current rise) is accurately maintained at the data level.

[0028] It should be noted that, as Figure 2As shown, this invention employs a diagnostic model based on the Transformer framework algorithm for mechanical fault diagnosis of medium-voltage vacuum circuit breakers. However, directly inputting long-sequence raw time-series data into the Transformer presents the problem of computational complexity increasing quadratically with the sequence length, and uniform attention distribution may obscure key local features. Therefore, a dynamic adaptive block-segmentation strategy is introduced in the embedding layer of the diagnostic model. The core idea of ​​this strategy is to intelligently segment a one-dimensional continuous current sequence into a series of "semantic fragments" (blocks) of varying lengths. Each block should contain as completely as possible a physical action stage of the current sequence during the opening and closing cycles, providing an optimal input structure for subsequent deep encoding and classification.

[0029] Specifically, in step S2, a number of adaptive blocks corresponding to each current path are obtained using a dynamic block model based on the following process. The specific steps are as follows: S21. Slide the current sequence with a fixed window length and step size to obtain several initial block sequence positions; S22. Perform current characteristic analysis on each of the initial block sequences to detect local extreme points within the block; S23. Using each initial block sequence and its corresponding local extreme point as input, the dynamic block model is used to predict the event sequence stage of each initial block sequence during the opening or closing process, and the position offset and width adjustment of the corresponding block sequence center are obtained. S24. Adjust the position of the corresponding initial fast sequence based on the position offset and width adjustment to obtain the corresponding adaptive block.

[0030] For example, for a single-channel sequence of length T, first use a fixed window length P s and overlap step size S (usually S = P) s / 2) A sliding slice is performed to obtain a series of initial blocks. This overlap aims to avoid interrupting continuous features at fixed boundaries. Feature analysis is then performed on each initial block to detect local extrema. The sequences of initial blocks are concatenated with their corresponding local extrema and input into the trained dynamic block segmentation model. The outputs are the predicted center position offset ΔC and the block width ΔW, thereby calculating the center C of the adjusted adaptive block. new = C init + ΔC and the width of the new block W new = W init + ΔW.

[0031] It should be noted that the Adjuster Net learns how to adjust the coarse initial blocks into semantic blocks that are precisely aligned with the physical operation phase of the circuit breaker, such as... Figure 3As shown, the process of training the Adjuster Net to obtain the dynamic block model is as follows: S231. Divide the historical current time series of each channel into even blocks to obtain the positions of several training blocks, including the initial center position C. init and width W init ; S232. Obtain the key phase points of each event sequence stage in the historical current time sequence through global analysis, define the interval between two adjacent key phase points as an ideal block, and obtain the position of each ideal block. Specifically, waveform analysis is performed on the complete operational sample sequence of each current path to avoid limitations in the field of view within a block. High-precision algorithms are used to detect extreme points and inflection points in the entire sequence, and these are interpreted as key phase points with clear temporal significance, such as "starting point," "peak point," and "platform start point," based on the circuit breaker physical model. The interval between two adjacent key phase points is defined as an ideal block, which completely corresponds to an event sequence stage (i.e., the physical action stage) in the opening and closing process.

[0032] For example, the event sequence stages corresponding to the trip coil current include coil energization, core movement, and coil de-energization; the event sequence stages corresponding to the closing coil current include coil energization, core movement, current holding, and coil de-energization; the event sequence stages corresponding to the energy storage motor current include motor start-up, energy storage operation, and motor stop; the event sequence stages corresponding to the blocking coil current include blocking signal establishment, signal holding, and signal return / reset; and the time sequence stages corresponding to the main circuit total current include current zero value, current establishment, steady-state load, and current cutoff.

[0033] S233. Label the training blocks based on the boundary set of all ideal blocks to construct a training sample set; wherein the labels include center offset and width adjustment.

[0034] Specifically, for each training block, its overlap with all ideal blocks is calculated, and the center C of the ideal block with the highest overlap is selected. ideal and width W ideal As the true value, the positional difference with the training block is calculated to obtain the adjustment amount of the training block, i.e., the center offset ΔC = C. ideal - C init Width adjustment amount ΔW = W ideal - W init .

[0035] S234. Perform feature analysis on each training block to detect local extrema (these points may coincide with the global phase point). Use the local extrema as reference information, and take the training samples and the corresponding reference information as input to perform supervised training on the Adjuster Net.

[0036] Furthermore, the reference information also includes statistical features, anomaly markers, etc. A rapid initial screening based on a rule base is used to mark "suspected anomalies." The generated preliminary anomaly markers enrich the understanding of phase points within the block, providing the Adjuster Net with additional fault context information, enabling it to assign different weights to suspicious regions during adjustment. Simultaneously, these preliminary anomaly markers can also serve as auxiliary verification, with these features and markers together constituting the input reference information for the Adjuster Net, helping the diagnostic network understand the local context of the current adaptive block.

[0037] The above method encapsulates each stage reflecting current characteristics during the opening and closing process of each current path within an independent block structure. After dynamic adjustment, the sequence is transformed into a series of segments with optimized positions and lengths. Compared to sequence processing with fixed windows or fixed blocks, this method can dynamically adjust the analysis granularity based on the internal characteristics of the signal, avoiding the impact on early fault identification and location due to the loss of key local information (such as transient fault characteristics).

[0038] Specifically, in step S3, before using the Transformer model for feature extraction, a learnable linear projection layer is first used to map the data of all time points of each adaptive block of each current path into a high-dimensional vector to obtain the block embedding feature, or block embedding for short.

[0039] Furthermore, to preserve the temporal order between segments, a learnable positional encoding is added to each block embedding. Thus, the original long sequence is transformed into a shorter, but more information-dense, sequence of block embeddings, which greatly reduces the computational burden on subsequent Transformer models.

[0040] Typically, the truth behind circuit breaker failures lies hidden within the timing logic and physical constraints across different channels. For example, the mechanical sequence of the operating mechanism dictates that the current decay of the energy storage motor (indicating the end of energy pre-reservation) must precede the rise of the closing coil current (indicating the start of electromagnetic force drive). If the model only learns the waveform of a single current, it may only detect "insufficient closing current amplitude," but cannot trace the root cause to whether it is "insufficient energy storage" or "coil aging."

[0041] To model this cross-channel dependency that embodies physical common sense and fault causality, this invention abandons the approach of simple splicing or fully connected fusion, and designs a Transformer-based dual-path encoder structure, such as... Figure 4As shown, the dual-path encoder includes a main encoding module based on the Transformer framework and a parallel relation processing module. The main encoding module is a multi-channel Transformer encoder used for parallel processing of block embeddings of multiple currents. The relation processing module obtains the potential correlations between channels through a shared multilayer perceptron (MLP) network. This dual-path encoder simultaneously captures the unique timing patterns of each current signal and the complex interactions between them.

[0042] On the one hand, the block embedding features of each current are input into the relationship processing module, and the cross-channel relationship between each block embedding feature is obtained through the shared MLP, and the relationship control parameters are output.

[0043] During training, the shared MLP learns cross-channel relationships through multi-task learning, including physical constraints, temporal sequences, and causal indicative relationships. The physical constraints are learned using event sequence labels as supervision; the temporal sequences are learned using the time intervals between events during normal circuit breaker operation; and the causal indicative relationships are learned using the propagation direction of known faults.

[0044] For example, firstly, all block embeddings of the five current paths are labeled to construct a training sample set. The labels include event sequence labels (such as "coil energized" and "core movement" for the tripping coil), the true value of the actual time interval between events (such as a 10–30 millisecond window from the end of energy storage to the start of closing), and known fault causal chain labels (such as "insufficient energy storage current leads to abnormal closing current"). Then, the labeled samples are concatenated across channels to form a comprehensive input, which is fed into a shared MLP for multi-task branch learning. The branches include an event sequence prediction branch, a time regression branch, and a causal differentiation branch.

[0045] The event sequence prediction branch uses event label sequences as supervision to learn the fixed action sequence that circuit breaker operation must follow, thus completing the learning of physical constraints. This task uses the cross-entropy loss function to drive the model to master the hard rule that "actions must occur in physical order" by judging whether the input event sequence is legal (normal operation) or illegal (abnormal operation). For example, after learning the "closing command issued" event, if the "energy storage motor stopped" label appears first and then the "closing coil current increased" label appears, this is a normal sequence; otherwise, it is abnormal. The time regression branch uses the time interval between events in normal circuit breaker operation as supervision. Under the premise of satisfying the physical sequence, it learns the temporal relationship between actions, that is, the reasonable delay dependency between each action. This task uses the mean squared error loss function to learn the typical reasonable time window between each action (for example, the time from the end of energy storage to the start of closing should be within 10-30 milliseconds), thereby enabling quantitative assessment of whether the observed time relationship deviates from the normal state. The causal differentiation branch uses the known fault propagation direction as supervision to learn the causal indication relationship, that is, to learn the fault propagation direction and asymmetric dependency. This is achieved by designing asymmetric attention modulation. The task employs contrastive loss or triplet loss, enabling the model to learn that specific anomalous patterns in certain channels will "cause" characteristic responses in subsequent channels. For example, the model will discover that the feature of "insufficient energy storage current amplitude" (occurring at time t in channel i) has a strong and unidirectional predictive ability for the feature of "too short closing current duration" (occurring at time t+Δt in channel j), but the reverse is not true.

[0046] Finally, a shared hidden layer is used to fuse and encode the physical constraints, temporal sequences, and causal indications learned from the three task branches into a unified high-dimensional representation. An independent relation weight matrix α is used to generate a branch that maps this representation to a three-dimensional tensor (attention head × channel i × channel j). Physical constraints ensure that the fixed order of events is encoded as the basic pattern of inter-channel attention; temporal sequences are transformed into assigning higher weights to cross-channel interactions that conform to a reasonable time window; and causal indications are embedded in the matrix in an asymmetric form, thus accurately characterizing the direction and intensity of fault propagation. Furthermore, another dedicated branch generates a channel-level global importance parameter, namely the channel importance bias β, based on the fusion and evaluation of multi-source input information (physical, temporal, and causal). The channel association weight matrix α and the channel importance bias β corresponding to each Transformer attention head constitute the relation control parameters. These parameters are injected in real-time into the attention calculation of each Transformer layer in the main encoding module.

[0047] On the other hand, the block embeddings of each current path are input into the corresponding channel's Transformer encoder in the main encoding module for feature extraction. Each encoder stack consists of L standard encoding layers, each containing a multi-head self-attention mechanism and a feedforward neural network. In this path, by combining relational adjustment parameters, the higher-order features obtained by each branch not only focus on information within its own channel but also incorporate information from the block embedding features of other channels. The process of obtaining the higher-order features corresponding to each current path for any channel's Transformer encoder is as follows: S31. The input embedded features are linearly projected to obtain the query vector, key vector and value vector; S32. When calculating attention in each Transformer layer, the standard self-attention score is recalibrated by combining relational control parameters and using Hadamard product and addition operations. This transforms the learned physical constraints, temporal and causal knowledge into control instructions that guide the model to focus on the most relevant cross-channel evidence.

[0048] For example, first, the standard dot product matrix of the query vector and key vector within the channel is calculated to capture the channel's own temporal dependencies. Then, the dot product matrix is ​​compared with the relation weight matrix α. i Element-wise multiplication (i.e., the Hadamard product) is performed. This multiplication operation is crucial; it acts as an "association filter," recalibrating the attention score element-wise based on the learned cross-channel dependency strength. Finally, the channel importance bias vector β is added. i The conditional attention score is obtained, represented as: , in, Let Q be the attention score of the i-th layer Transformer; Q and K be the query vector and key vector, respectively. The dimension is vector. Through conditional attention scoring, when the model analyzes the "block of closing current at time t," it not only focuses on which blocks in its own channel history are similar (self-attention), but also considers α... i and β i The encoded knowledge selectively enhances its understanding of "energy storage motors in t". Δ The attention given to "momentary current blocks".

[0049] S33. Perform multi-head attention aggregation based on conditional attention scores and value vectors to obtain the higher-order features.

[0050] Through this dual-path encoding design, the model achieves conditional, flexible, and interpretable cross-channel information fusion while maintaining the independence of feature extraction for each channel. This enables the model to capture complex, nonlinear conditional correlations, such as "when the falling edge slope of the energy storage current is below the threshold η, its indicative power for the rising edge delay of the subsequent closing current is significantly enhanced." These diagnostic rules are automatically learned and applied through a data-driven approach. Therefore, this design not only improves the model's accuracy in classifying known faults but also enhances its generalization ability to unseen abnormal combinations that conform to the same physical logic.

[0051] Specifically, in step S4, after aggregating the multiple high-order features containing deep temporal patterns obtained from dual-channel encoding, and performing fault classification prediction, the diagnostic result of the circuit breaker's mechanical fault is obtained. The specific process includes: S41. The higher-order features of each current are globally aggregated using attention-weighted global pooling to obtain global fused features; For example, to aggregate five high-order features into a fixed-length global fusion feature, an attention-weighted global pooling module was designed. This module calculates an attention score for each high-order feature at every time step throughout the entire operation cycle of opening and closing. For instance, when identifying a "closing bounce" fault, the model assigns extremely high attention weights to the time steps immediately following contact closure and the subsequent few milliseconds. Finally, the high-order features from all time steps are weighted and summed according to their attention scores to obtain the final global fusion feature.

[0052] S42. Use a classifier to predict the mechanical fault category of the circuit breaker based on the global fusion features to obtain the diagnostic result. To improve interpretability, the output diagnostic result can include the extracted global fusion features as key features in addition to the fault type. For example, if the output is a mechanical jamming fault, the key feature is the closing current rise time.

[0053] For example, the obtained global fusion features are fed into the final multilayer perceptron classifier. This classifier consists of multiple fully connected layers, using activation functions such as ReLU to introduce non-linearity and employing Dropout layers to prevent overfitting. Finally, a Softmax layer outputs a probability distribution for a specific fault category. The category with the highest probability is used as the diagnostic result, and this probability value can also serve as the confidence level of the diagnosis, providing decision-making reference for operations and maintenance personnel.

[0054] It should be noted that the entire model is trained end-to-end on a large labeled dataset. The training data should, as far as possible, cover the operational records of the target circuit breaker model under various health and fault states. Figure 2As shown, when training the fault diagnosis model, each sample is represented as a high-dimensional matrix of dimension T×C, where T represents the number of time steps (i.e., the number of sampling points) within the sample, and C represents the number of channels (e.g., 5). This matrix is ​​the direct input object for subsequent deep model pattern learning. Each row (corresponding to a time point) represents a "multi-current snapshot" consisting of the current values ​​from 5 different channels (opening coil, closing coil, energy storage motor, locking coil, and main circuit) at that specific moment. Each column (corresponding to a channel) represents the complete sequence of current changes over time for a single path throughout the entire operating cycle (from t0 to the end of the operation), i.e., the "current evolution trajectory" of that channel. To construct a labeled dataset that can be used for supervised learning, each sample matrix needs to be associated with an accurate state label. The tags originate from multiple sources: first, based on the substation's historical ledgers and maintenance records, current data recorded before and after a fault are correlated with the final fault handling report; second, artificial simulation tests are conducted during equipment maintenance, artificially creating typical faults (such as adjusting spring pre-compression to simulate insufficient energy storage, and loosening screws to simulate mechanical jamming) and collecting data; third, based on multiphysics simulation software, a high-fidelity digital model of the circuit breaker is established to simulate the current response of each branch under various normal and fault conditions, generating massive amounts of accurate simulation data to expand sample diversity. Ultimately, a tagging system is constructed that includes a set of "normal" states and several refined fault categories, such as "Normal_Open," "Normal_Close," "Fault_Energy Storage Spring Fatigue_Close," "Fault_A Phase Contact Wear_Open," and "Fault_Control Circuit Poor Contact_Failure to Operate," etc. Based on this, the dataset will be divided into training set, validation set and test set in chronological order or by stratified sampling in order to rigorously evaluate the generalization performance of the model, avoid overfitting to a specific time period, and ensure that the diagnostic system has reliable judgment ability for unknown failures that will occur in the future in actual deployment.

[0055] To quantify the diagnostic effectiveness of this invention, on the same test set, the baseline model using only the dual-channel opening / closing signal achieved a detection rate of 72.3% for "energy storage spring fatigue" type faults. However, with the five-channel signal approach of this solution, the detection rate increased to 96.8%, with the improved completeness of fault feature coverage resulting in a performance gain of nearly 25 percentage points, significantly reducing false negatives and false negatives caused by missing information. Furthermore, when verifying the effectiveness of the dynamic segmentation strategy, using a fault containing "inter-turn short circuit in the coil" (manifested as an abnormal oscillation in the current waveform lasting approximately 2ms), a fixed 32ms segmentation method resulted in a 75% probability that the abnormal oscillation was divided into two blocks, leading to a model detection sensitivity of only 68%. However, with the dynamic segmentation strategy, the probability that the abnormal oscillation was completely contained within a single "focused" block increased to 95%, corresponding to a detection sensitivity increase to 93%. This demonstrates the "focusing" and "preservation" capabilities of the dynamic segmentation strategy for localized micro-faults.

[0056] Compared with existing technologies, this embodiment provides a mechanical fault diagnosis method for medium-voltage vacuum circuit breakers based on the Transformer framework algorithm. By utilizing a dynamic block embedding layer and a deep Transformer encoder to jointly construct a powerful automatic feature extractor, it extracts features from multiple currents, improving the ability to capture the interaction and coordinated change patterns between current signals before and after a fault. Through a dynamic block strategy, it adaptively adjusts the input information and combines dual-channel encoding to perceive multi-level dependencies, ultimately forming a highly abstract and discriminative representation in the global feature vector. This significantly improves the completeness of fault features and enables a comprehensive and accurate diagnosis of the circuit breaker's operating status.

[0057] Example 2 A specific embodiment of the present invention discloses a mechanical fault diagnosis system for medium-voltage vacuum circuit breakers based on the Transformer framework algorithm, comprising: The preprocessing module is used to obtain the timing sequence of multiple currents during the opening and closing process of the medium-voltage vacuum circuit breaker; The dynamic block module is used to divide the time sequence of multiple currents into multiple event sequence stages according to the current characteristics, and obtain several adaptive blocks corresponding to each current. The feature extraction module is used to extract the block embedding features of each adaptive block of each current through the linear projection layer, and input them into the Transformer-based dual-path encoder to extract mechanical fault features and obtain the high-order features corresponding to each current. The mechanical fault diagnosis module aggregates the higher-order characteristics of the multiple currents, predicts the fault category, and obtains the diagnostic results of the circuit breaker's mechanical faults.

[0058] The system can perform mechanical fault diagnosis of medium-voltage vacuum circuit breakers according to any of the methods described in Embodiment 1. Related aspects can be referenced from each other, and are not repeated in this embodiment.

[0059] In practical deployment, this system can be integrated as a software module into existing circuit breaker online monitoring devices or substation edge computing platforms. The deployed system continuously monitors the synchronous five-channel current signal flow from the data acquisition unit. For each circuit breaker operation event, the system automatically triggers the diagnostic process: real-time data undergoes a preprocessing process (filtering, normalization, alignment) identical to that of the training phase, and is then input into the loaded diagnostic model for forward inference. The system outputs diagnostic results and confidence levels within milliseconds. For example, the position and length of each segment after dynamic segmentation can be visualized to observe which time regions the model focuses on; the attention weights generated by the relationship-aware path can also be analyzed to understand which channel interactions the model prioritizes during decision-making. This information is of significant reference value for maintenance personnel to verify diagnostic results and locate the root cause of faults. Diagnostic results, raw waveforms, and intermediate analysis data can be stored locally or uploaded to a cloud platform to build a fault case library, providing data support for continuous iterative optimization of the model and full lifecycle management of circuit breakers.

[0060] Compared with existing technologies, this embodiment provides a medium-voltage vacuum circuit breaker mechanical fault diagnosis system based on the Transformer framework algorithm. It constructs a closed-loop intelligent system with "cloud-edge" collaboration, achieving continuous evolution of diagnostic capabilities and feasibility for engineering applications. In addition to outputting diagnostic conclusions, the system also provides a certain degree of interpretability. Lightweight deployment at the edge ensures real-time diagnosis and low latency (single inference time <50ms), meeting the power system's requirements for rapid fault response. The centralized data pool, training platform, and knowledge base in the cloud support continuous model iteration and the accumulation of expert experience. This breaks the limitation of traditional diagnostic systems that are "deployed once and never updated," forming a positive feedback loop of "data collection -> model optimization -> knowledge accumulation -> performance improvement." When the system encounters low-confidence cases or entirely new fault modes at the edge, it can trigger manual annotation and incremental model training in the cloud. The updated model can be redeployed, enabling the entire system's diagnostic capabilities to continuously evolve with equipment operation, providing a scalable technical foundation for predictive maintenance and full lifecycle health management of circuit breakers.

[0061] Those skilled in the art will understand that all or part of the processes of the methods described in the above embodiments can be implemented by a computer program instructing related hardware, and the program can be stored in a computer-readable storage medium. The computer-readable storage medium may be a disk, optical disk, read-only memory, or random access memory, etc.

[0062] The above description is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any changes or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in the present invention should be included within the scope of protection of the present invention.

Claims

1. A method for diagnosing mechanical faults in medium-voltage vacuum circuit breakers based on the Transformer framework algorithm, characterized in that, Includes the following steps: Obtain the timing sequence of multiple currents during the opening and closing process of a medium-voltage vacuum circuit breaker; The time sequence of multiple currents is divided into multiple event sequence stages according to the current characteristics using a dynamic block model, resulting in several adaptive blocks corresponding to each current. The block embedding features extracted from each adaptive block of each current path through the linear projection layer are input into the Transformer-based dual-path encoder to extract mechanical fault features, thereby obtaining the high-order features corresponding to each current path. After aggregating the higher-order characteristics of the multiple currents, fault category prediction is performed to obtain the diagnostic results of the circuit breaker's mechanical faults.

2. The method according to claim 1, characterized in that, The dual-path encoder includes one main encoding module and one relationship processing module; wherein... The relationship processing module receives the block embedding features of each current path, obtains the cross-channel relationship between the block embedding features through a shared multilayer perceptron network, and obtains the relationship control parameters. The main encoding module is a multi-channel parallel Transformer encoder. In conjunction with the relational control parameters, the Transformer encoder of each channel extracts the block embedding features of its corresponding current to obtain the higher-order features of the corresponding current. The higher-order features fuse the information of all block embedding features in this channel and fuse the information of block embedding features of other channels through the relational control parameters.

3. The method according to claim 2, characterized in that, The cross-channel relationships include physical constraints, temporal sequences, and causal indicative relationships; among which... The physical constraint relationship is supervised learning using event sequence labels; the temporal sequence relationship is supervised learning using the time interval between events during normal operation of the circuit breaker; the causal indication relationship is supervised learning using the propagation direction of a known fault; the physical constraint relationship, temporal sequence relationship, and causal indication relationship are fused and encoded into an association weight matrix and a channel importance bias vector to obtain the relationship control parameters.

4. The method according to claim 3, characterized in that, The higher-order features are obtained by any channel Transformer encoder based on the following process: The input embedded features are linearly projected to obtain the query vector, key vector, and value vector; The conditional attention score is obtained by element-wise multiplying the dot product matrix of the query vector and the key vector with the association weight matrix and combining it with the channel importance bias vector. The higher-order features are obtained by performing multi-head attention aggregation based on the conditional attention score and value vector.

5. The method according to claim 1, characterized in that, The diagnostic results of the circuit breaker mechanical faults are obtained, including: The higher-order features of each current path are globally aggregated using attention-weighted global pooling to obtain globally fused features; The classifier is used to predict the mechanical fault category of the circuit breaker based on the global fusion features, and the diagnostic results are obtained.

6. The method according to any one of claims 1-5, characterized in that, The aforementioned adaptive blocks are obtained based on the following process: The current sequence is sliced ​​using a fixed window length and step size to obtain several initial block sequence positions. Current characteristic analysis is performed on each of the initial block sequences to detect local extreme points within the blocks; Using each initial block sequence and its corresponding local extreme point as input, the dynamic block model is used to predict the event sequence stage of each initial block sequence during the opening or closing process, thereby obtaining the center offset and width adjustment of the corresponding block sequence position. The position of the corresponding initial fast sequence is adjusted based on the center offset and width adjustment to obtain the adaptive block.

7. The method according to claim 6, characterized in that, The dynamic block-based model constructs a training sample set based on the following process: The historical current time series of each path is evenly divided into blocks to obtain the positions of several training blocks; By global analysis, the key phase points of each event sequence stage in the historical current time series are obtained, and the interval between two adjacent key phase points is defined as an ideal block to obtain the position of each ideal block. Calculate the overlap between each training block and all ideal blocks. Based on the position of the ideal block with the highest overlap and the position of the corresponding training block, obtain the adjustment amount of the training block, including the center offset and the width adjustment. The adjustment amount is used as the label of the corresponding training block to construct a training sample set.

8. The method according to any one of claims 1-5 and 7, characterized in that, The multiple currents include the trip coil current, the closing coil current, the energy storage motor current, the lockout coil current, and the total current of the main circuit.

9. The method according to claim 8, characterized in that, The event sequence stages corresponding to the trip coil current include coil energization, core movement, and coil de-energization; the event sequence stages corresponding to the closing coil current include coil energization, core movement, current holding, and coil de-energization; the event sequence stages corresponding to the energy storage motor current include motor start-up, energy storage operation, and motor stop; the event sequence stages corresponding to the blocking coil current include blocking signal establishment, signal holding, and signal return; and the time sequence stages corresponding to the main circuit total current include current zero value, current establishment, steady-state load, and current cutoff.

10. The method according to claim 9, characterized in that, The synchronously acquired currents are preprocessed as follows to obtain the timing sequence of each current stream: The Butterworth filter is used to denoise each current stream to obtain the denoised current. The amplitude of the noise-reduced current is normalized and the time is aligned to obtain the time sequence.