Transformer fault diagnosis method based on dynamic correlation modeling and attention fusion
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- ELECTRIC POWER RES INST OF GUANGXI POWER GRID CO LTD
- Filing Date
- 2026-02-03
- Publication Date
- 2026-06-19
AI Technical Summary
Existing transformer fault diagnosis methods struggle to balance changes in gas coupling relationships and single-channel characteristics, failing to capture both long-term slow trends and short-term abrupt changes simultaneously. Furthermore, deep learning methods suffer from gradient vanishing or exploding problems.
A method based on dynamic association modeling and attention fusion is adopted. The time series features are extracted by the TimesNet model, a query matrix is constructed and the structural coupling relationship is analyzed by the DyGNN model, and the feature fusion is combined with the cross-attention mechanism to achieve accurate diagnosis of transformer faults.
It improves the accuracy of fault diagnosis, can adaptively focus on the most diagnostically valuable feature combinations under different time backgrounds, adapts to changes in operating conditions, and enhances the ability to express complex time-series evolution processes and characterize feature synergy.
Smart Images

Figure CN122241338A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the technical field of transformer fault diagnosis, and in particular to a transformer fault diagnosis method based on dynamic correlation modeling and attention fusion. Background Technology
[0002] Currently, dissolved gas analysis (DGA) has become the most widely used transformer condition monitoring and fault diagnosis technology due to its non-invasiveness, high sensitivity, and good fault indication capabilities. By detecting the concentration changes and ratios of various gases in insulating oil, it can help determine potential abnormal phenomena such as thermal faults, arc discharges, and local overheating inside the transformer.
[0003] Existing fault diagnosis methods based on DGA can be broadly categorized into three types: 1. Empirical rule-based methods, such as the Duval triangulation method and the IEC ratio method; 2. Traditional machine learning methods, such as support vector machines and random forests, which can achieve a certain degree of automatic discrimination through model learning; and 3. Deep learning methods, such as convolutional neural networks and long short-term memory networks, which have been used in DGA data modeling in recent years to uncover the potential relationships between time-series features and gases. Empirical rule-based methods heavily rely on expert knowledge and fixed rules, making them difficult to adapt to changing operating conditions and gas evolution patterns, thus limiting diagnostic accuracy. Traditional machine learning methods can achieve a certain degree of automatic discrimination through model learning and have shown some success in static feature extraction, but they typically struggle to simultaneously model the temporal dynamics of gas concentration and the complex nonlinear coupling relationships between gases, and the models are highly sensitive to changes in operating conditions and data noise. Deep learning-based methods are not only limited by the characteristics of network structure, but also suffer from the following drawbacks: First, local recursive structures struggle to fully capture global evolution trends and cannot identify signal patterns in long-period, slowly changing faults. Second, deep network structures may lead to vanishing or exploding gradients, affecting model stability and generalization ability. Third, existing methods generally use static features or single gas ratios as model inputs, ignoring the time-varying nature of structural relationships between multiple gases, thus limiting the model's ability to learn complex coupled features. Regarding the aforementioned related technologies, the inventors believe that they suffer from the drawback of being unable to simultaneously address long-term slow trends and short-term abrupt changes in gas coupling relationships and single-channel features during transformer DGA fault diagnosis. Summary of the Invention
[0004] To address the problem that existing technologies for transformer DGA fault diagnosis struggle to balance long-term slow trends with short-term abrupt changes due to variations in gas coupling relationships and single-channel features, this application provides a transformer fault diagnosis method based on dynamic correlation modeling and attention fusion. Through improvements in temporal representation capabilities, structural coupling expression, and feature fusion mechanisms, this method can adaptively focus on the most diagnostically valuable feature combinations under different time contexts, thereby improving the accuracy of fault identification.
[0005] The above-mentioned inventive objective of this application is achieved through the following technical solutions: A transformer fault diagnosis method based on dynamic correlation modeling and attention fusion, the method comprising: Real-time acquisition of transformer DGA data and encoding and feature classification are performed, and a multi-level feature system of the transformer DGA data is constructed based on the encoding and feature classification results. Based on the multi-level feature system, the time-series features of the transformer DGA data are extracted using a pre-trained TimesNet model, and a query matrix of the transformer DGA data is constructed. The structural coupling relationship between the transformer DGA data is analyzed by a pre-trained DyGNN model, and the key matrix and value matrix of the transformer DGA data are generated according to specific feature dimensions based on the structural coupling relationship. The query matrix, the key matrix, and the value matrix are aggregated. The fault category distribution of the transformer DGA data is analyzed based on the aggregation results, and the final fault type prediction result is output based on the analysis results.
[0006] In a preferred embodiment, this application can be further configured as follows: Extracting the time-series features of the transformer DGA data using a pre-trained TimesNet model according to the multi-level feature system, and constructing a query matrix for the transformer DGA data, specifically includes: According to the encoding order of the multi-level feature system, the transformer DGA data is converted into local block tensors corresponding to multiple time-scale sliding windows through the multi-scale sliding window in the TimesNet model. The periodicity and local variation features of the local block tensor within each time scale sliding window are extracted by the periodic-aware convolutional structure in the TimesNet model to obtain the periodic encoding features corresponding to each time scale. The periodic coding features across all time scales are concatenated and fused to generate the final time-series coding of the transformer DGA data.
[0007] In a preferred embodiment, this application can be further configured as follows: the step of converting the transformer DGA data into local block tensors corresponding to multiple time-scale sliding windows through a multi-scale sliding window in the TimesNet model according to the encoding order of the multi-level feature system includes: The transformer DGA data is converted into an input sequence of a set length according to the encoding order of the multi-level feature system. The expression of the input sequence is as follows: (1) in, This represents an input sequence of a specified length. - This represents the feature vector in the input sequence, composed of transformer DGA data. Indicates the set length. Indicates the number of feature types; The transformer DGA data is converted into local block tensors corresponding to multiple time-scale sliding windows using the multi-scale sliding window in the TimesNet model. The expression of the local block tensor is as follows: (2) in, Represents a local block tensor. This indicates the operation of constructing a sliding window block. The duration of the sliding window representing the time scale. This indicates the number of sliding steps for the sliding window block.
[0008] In a preferred embodiment, this application can be further configured as follows: the extraction of periodicity and local variation features of the local block tensor within each time-scale sliding window through the periodicity-aware convolutional structure in the TimesNet model to obtain the periodic encoding features corresponding to each time scale specifically includes: The convolutional operation is performed along the time dimension and the feature dimension using the periodic-aware convolutional structure in the TimesNet model. The convolutional expression of the periodic-aware convolutional structure is as follows: (3) in, This represents the periodic encoding obtained by concatenating the time and feature dimensions after convolution, used to characterize the periodicity and local variation features of local block tensors. This represents the convolution of a local block tensor in the time dimension. This represents the convolution of a local block tensor along the feature dimension.
[0009] In a preferred embodiment, this application can be further configured such that: the step of extracting the periodicity and local variation features of the local block tensor within each time-scale sliding window through the periodicity-aware convolutional structure in the TimesNet model to obtain the periodic encoding features corresponding to each time scale further includes: Explicit position encoding is added to the periodic-aware convolutional structure. Enhanced periodicity perception by combining with a periodic weighting function and incorporating explicit position encoding. The convolution expression for a period-aware convolution structure with a period weighting function is shown below: (4) in, Indicates the addition of explicit positional coding Periodic encoding with periodic weighting function, It is the ReLU activation function. This represents element-wise multiplication. For the first k Periodic frequency parameters at various scales For phase shift, This indicates batch normalization.
[0010] In a preferred embodiment, this application can be further configured such that: the concatenation and fusion of periodic coding features across all time scales to generate the final time-series coding of the transformer DGA data specifically includes: Average pooling is performed on the periodic encoded features at each time scale, and all the average pooled periodic encoded features are concatenated and fused to generate the final time-series encoding of the transformer DGA data. The expression of the final time-series encoding is as follows: (5) in, This represents the final temporal code output by the TimesNet model. This represents the average pooling mechanism of the TimesNet model. Indicates a time scale.
[0011] In a preferred embodiment, this application can be further configured as follows: the analysis of the structural coupling relationship between the transformer DGA data using a pre-trained DyGNN model, and the generation of the structural coupling relationship analysis in the key matrix and value matrix of the transformer DGA data according to specific feature dimensions based on the structural coupling relationship, specifically includes: The connectivity relationships between gas features in the transformer DGA data are analyzed. A graph structure of the transformer DGA data is constructed based on these connectivity relationships. Node features for inputting the DyGNN model are then generated based on this graph structure. The expression for the graph structure is as follows: (6) Among them, the node set Each node corresponds to a gas feature, and the edge set , indicating the connection relationship between gas characteristics; The perceptron of the DyGNN model is used to activate the node features, generating dynamic edge weights between node features that have feature connections. The expression for the dynamic edge weights is as follows: (7) in, Representing node features Node features The node pairs formed The adjacency matrix is used to represent dynamic edge weights. Perceptron Node features Node features Activation process, For feature splicing, , For learnable parameters, For activation functions; Based on the dynamic edge weights and corresponding node features, the graph attention network in the DyGNN model performs feature interaction aggregation processing, outputting aggregated structural features to characterize structural coupling relationships. The expression of the structural features is as follows: (8) in, This represents the structural features used to characterize the structural coupling relationships between the graph structures corresponding to the transformer DGA data. - They represent the first l Layer nodes 1- Feature representation of feature interaction aggregation, Indicates the number of feature types. Indicates the feature dimension.
[0012] In a preferred embodiment, this application can be further configured as follows: the step of analyzing the structural coupling relationship between the transformer DGA data using a pre-trained DyGNN model, and generating the key matrix and value matrix of the transformer DGA data according to specific feature dimensions based on the structural coupling relationship, further includes: The expression for the single-layer network update process during feature interaction aggregation in the graph attention network is shown below: (9) in, Indicates the first Layer nodes i Feature interaction aggregation representation, For the first l Layer nodes j Feature interaction aggregation representation, It is a linear mapping matrix. These represent different feature dimensions. Represents node pairs The normalized attention weights depend on the edge weights. , Indicates the first l The number of nodes in the layer; The expression for the normalized attention weights is as follows: (10) in, For activation function, Indicates the first l Layer Each node.
[0013] In a preferred example, this application can be further configured as follows: The aggregation of the query matrix, the key matrix, and the value matrix, the analysis of the fault category distribution of the transformer DGA data based on the aggregation results, and the output of the final fault type prediction result based on the analysis results, specifically including: Calculate the cross-attention weights among the query matrix, the key matrix, and the value matrix. Concatenate and fuse these cross-attention weights with the temporal features output by the TimesNet model to obtain the final fused feature. The expression for the final fused feature is as follows: (11) in, This represents the final fused feature, concatenated from temporal features and cross-attention weights, used to characterize the aggregation result. This represents the temporal features output by the TimesNet model. Representing the query matrix Key matrix Sum matrix Cross-attention weights between them; Based on the aggregation results, the fault category distribution of the transformer DGA data is analyzed, the fault type of the transformer DGA data is predicted based on the fault category distribution, and the final fault type prediction result is output. The fault category distribution expression is as follows: (12) in, This represents the predicted distribution of fault categories. C The number of fault types. , These are the parameters of the fully connected layer. Indicates will A global average pooling operation mapped to a single vector.
[0014] In a preferred example, this application can be further configured as follows: aggregating the query matrix, the key matrix, and the value matrix; analyzing the fault category distribution of the transformer DGA data based on the aggregation results; and outputting the final fault type prediction result based on the analysis results, further comprising: The calculation expression for the cross-attention weights is as follows: (13) in, , , , These are the corresponding feature dimensions. A similarity matrix representing cross-structural associations between gas features. This is a scaling factor used to prevent gradient instability caused by excessively large inner products.
[0015] In summary, this application includes at least one of the following beneficial technical effects: 1. This application introduces a TimesNet encoder based on periodic awareness and sliding window to characterize feature change trajectories at multiple scales. Compared with existing deep learning methods that mostly rely on single-scale convolution or recursive structures and are difficult to capture long-term trends and local mutations at the same time, the solution of this application effectively improves the ability to express complex temporal evolution processes and is suitable for continuous identification of the early stage of fault evolution, the stable stage and the critical transition moment. 2. This application constructs a dynamic graph structure based on the state of input samples, uses a lightweight perceptron to generate edge weights between nodes on demand, and leverages graph neural networks to aggregate the local dependencies and global effects between features, thereby enhancing the model's ability to characterize the synergistic effects of multiple types of fault features. It can effectively solve the problems in traditional methods that generally ignore the structural coupling between multiple gas features or rely solely on fixed ratio rules for feature combination, making it difficult to adapt to the impact of changes in operating conditions on feature relationships. 3. This application uses a cross-attention mechanism with temporal features as queries and structural features as key-value pairs to achieve complementary enhancement between features. This enables the model to adaptively focus on the feature combinations with the most diagnostic value in different time contexts. Compared with most existing methods that simply weight features from different sources and lack cross-channel information filtering mechanisms, the solution in this application improves the accuracy of fault identification. Attached Figure Description
[0016] To more clearly illustrate the specific embodiments of the present invention or the technical solutions in the prior art, the accompanying drawings used in the description of the specific embodiments or the prior art will be briefly introduced below. In all the drawings, similar elements or parts are generally identified by similar reference numerals. In the drawings, the elements or parts are not necessarily drawn to scale.
[0017] Figure 1 This is a flowchart of the data processing for the fault diagnosis model in this embodiment.
[0018] Figure 2 This is a flowchart illustrating the implementation of the transformer fault diagnosis method based on dynamic correlation modeling and attention fusion in this embodiment.
[0019] Figure 3 This is a flowchart illustrating the implementation of step S20 of the transformer fault diagnosis method in this embodiment.
[0020] Figure 4 This is a flowchart illustrating the implementation of step S201 of the transformer fault diagnosis method in this embodiment.
[0021] Figure 5 This is a flowchart illustrating the implementation of step S30 of the transformer fault diagnosis method in this embodiment.
[0022] Figure 6 This is a flowchart illustrating the implementation of step S40 of the transformer fault diagnosis method in this embodiment. Detailed Implementation
[0023] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0024] It should be understood that, when used in this specification and the appended claims, the terms "comprising" and "including" indicate the presence of the described features, integrals, steps, operations, elements and / or components, but do not exclude the presence or addition of one or more other features, integrals, steps, operations, elements, components and / or collections thereof.
[0025] It should also be understood that the terminology used in this specification is for the purpose of describing particular embodiments only and is not intended to limit the invention. As used in this specification and the appended claims, the singular forms “a,” “an,” and “the” are intended to include the plural forms unless the context clearly indicates otherwise.
[0026] It should also be further understood that the term "and / or" as used in this specification and the appended claims refers to any combination of one or more of the associated listed items and all possible combinations, and includes such combinations.
[0027] In one embodiment, the TimesNet model and the DyGNN model are combined to construct a fault diagnosis model that integrates dynamic graph neural networks and TimesNet. The transformer fault diagnosis method of this embodiment is then executed through the fault diagnosis model to perform transformer fault diagnosis. The data processing flowchart of the fault diagnosis model is shown below. Figure 1 As shown, this application discloses a transformer fault diagnosis method based on dynamic correlation modeling and attention fusion, which improves upon the method in three aspects: temporal representation capability, structural coupling expression, and feature fusion mechanism. The implementation flowchart of this method is shown below. Figure 2 As shown, the specific steps include the following: S10: Real-time acquisition of transformer DGA data and encoding and feature classification, and construction of a multi-level feature system for transformer DGA data based on the encoding and feature classification results.
[0028] Currently, transformer online monitoring systems are widely deployed in substations, integrating various sensors and signal acquisition devices to acquire key operating parameters and status information in real time. Online DGA monitoring devices are typically installed on the oil conservator side or in the oil circulation loop of the power transformer, connected to the transformer body. Core components include an infrared spectroscopy detection unit, a trace gas extraction device, a gas separation and quantification module, and a data acquisition and communication module. Auxiliary acquisition units include temperature and humidity sensors installed on the transformer body or the outer shell of the insulating oil container to monitor environmental and oil temperatures; current and voltage transformers are used to collect transformer operating current, voltage, and other condition information.
[0029] In this embodiment, the transformer DGA data collected can be divided into three categories according to their characteristics: gas concentration data, derived feature data, and operating condition data. Among them, gas concentration data refers to the key gas components dissolved in the transformer insulating oil; derived feature data, including the ratio and difference of specific gases, is the input of the empirical rule judgment model; and derived feature data, including the ratio and difference of specific gases, is the input of the empirical rule judgment model.
[0030] The diagnostic basis of DGA is based on the regular differences in the composition ratio and generation rate of dissolved gases under different fault types. To comprehensively characterize gas evolution characteristics and enhance the model's fault discrimination capability, a multi-level feature system is constructed from three perspectives: initial concentration, ratio combination, and dynamic change rate. The first layer is the initial gas concentration feature (F1–F8), selecting the concentrations of seven common single gases in DGA (H2, CH4, C2H6, C2H4, C2H2, CO, CO2) and total hydrocarbon concentration (THG) to reflect the gas release intensity of thermal and electrical faults inside the transformer; the second layer is the typical gas ratio feature (F9–F8). 12 Based on IEC standards and the Duval triangular diagram, four sets of sensitivity ratio features are constructed to identify fault types; the third layer is the time-difference dynamic feature (F... 13 –F 15 The rate of change of key gas concentration (DGA) is used to capture gas generation trends and abrupt change signals, which helps identify the evolution stage of a fault. In addition, operating condition characteristics such as temperature, current, and voltage are also considered. The multi-level feature system includes codes, feature names representing feature types, and descriptions. The multi-level feature system in this embodiment is shown in the following transformer DGA and operating status feature coding table: Table 1. Transformer DGA and Operating Status Characteristic Coding Table S20: Based on the multi-level feature system, extract the time-series features of transformer DGA data through a pre-trained TimesNet model, and construct a query matrix for transformer DGA data.
[0031] Specifically, such as Figure 3 As shown, step S20 includes: S201: Following the encoding order of the multi-level feature system, the transformer DGA data is converted into local block tensors corresponding to multiple time-scale sliding windows through the multi-scale sliding window in the TimesNet model.
[0032] Specifically, such as Figure 4 As shown, step S201 includes: S2011: Convert the transformer DGA data into an input sequence of a set length according to the encoding order of the multi-level feature system. The expression of the input sequence is as follows: (1) in, This represents an input sequence of a specified length. - This represents the feature vector in the input sequence, composed of transformer DGA data. Indicates the set length. Indicates the number of feature types.
[0033] S2012: Transformer DGA data is converted into local block tensors corresponding to multiple time-scale sliding windows using a multi-scale sliding window in the TimesNet model. The expression for the local block tensor is as follows: (2) in, Represents a local block tensor. This indicates the operation of constructing a sliding window block. This indicates the length of the sliding window on different time scales, used to extract evolutionary patterns. This indicates the number of sliding steps for the sliding window block.
[0034] In this embodiment, DGA data is used as an example. , When the gas change trend is relatively gentle, , When gases exhibit periodic impact characteristics during partial discharge abrupt changes, these two types of behaviors are modeled separately using a multi-scale sliding window in the TimesNet model, generating multiple local block tensors.
[0035] S202: The periodicity and local variation features of local block tensors within the sliding window of each time scale are extracted by the periodicity-aware convolutional structure in the TimesNet model to obtain the periodic encoding features corresponding to each time scale.
[0036] Specifically, the convolution operation is performed along the time dimension and the feature dimension using the periodicity-aware convolutional structure in the TimesNet model. The convolution expression of the periodicity-aware convolutional structure is shown below: (3) in, This represents the periodic encoding obtained by concatenating the time and feature dimensions after convolution, used to characterize the periodicity and local variation features of local block tensors. This represents the convolution of a local block tensor in the time dimension. This represents the convolution of local block tensors along the feature dimension. Each time scale corresponds to a periodic convolutional block. In this embodiment, the periodic-aware convolutional structure is a two-dimensional depthwise separable convolutional network.
[0037] In this embodiment, to enhance periodicity awareness, explicit positional encoding is further added to the periodicity-aware convolutional structure. Add explicit positional encoding to the periodic weighting function. The convolution expression for the period-aware convolution structure with the period weighting function is shown below: (4) in, Indicates the addition of explicit positional coding Periodic encoding with periodic weighting function, It is the ReLU activation function. This represents element-wise multiplication. For the first k Periodic frequency parameters at various scales For phase shift, This indicates batch normalization.
[0038] S203: Concatenate and fuse the periodic coding features across all time scales to generate the final time-series coding of the transformer DGA data.
[0039] Specifically, average pooling is performed on the periodic encoded features at each time scale, and all the average pooled periodic encoded features are concatenated and fused to generate the final time-series encoding of the transformer DGA data. The expression for the final time-series encoding is as follows: (5) in, This represents the final temporal code output by the TimesNet model. This represents the average pooling mechanism of the TimesNet model. Indicates a time scale.
[0040] In this embodiment, to accurately identify the evolution characteristics of dissolved gas concentration sequences, TimesNet is introduced as the main time-series encoder to construct a TimesNet model. Through a structural design that integrates multi-scale sliding windows and periodic convolutional sensing, it can effectively capture the long-term trends and periodic oscillations of different transformer features over time. The training process of the TimesNet model includes: First, the input to the TimesNet model is set to a string of length [length missing]. L The sequence is shown in formula (1). The input structure and multi-scale sliding window are constructed, and different window lengths are selected. Multiple local block tensors are constructed from the sequence as shown in formula (2).
[0041] For each selected time window, a periodicity-aware convolutional structure is introduced to extract periodic and local variation features within the sliding window. The periodicity-aware convolutional structure is a two-dimensional depthwise separable convolutional network that performs convolution operations along the time dimension and the feature dimension, as shown in Equation (3). To enhance the periodicity-awareness capability, explicit positional encoding is further added. The periodic weighting function is shown in Equation (4); finally, the periodic coding features under all time scales are spliced and fused to output the final time-series code as shown in Equation (5).
[0042] S30: Analyze the structural coupling relationship between transformer DGA data using a pre-trained DyGNN model, and generate the key matrix and value matrix of transformer DGA data according to specific feature dimensions based on the structural coupling relationship.
[0043] Specifically, considering that the interrelationships between features are influenced by multiple factors such as operating conditions, fault types, and aging stages, their coupling status may change over different time periods. Therefore, a dynamic graph structure representation mechanism is introduced into the structural branches, treating each type of gas feature as a node in a graph structure. Edge weights are dynamically generated adaptively, thereby expressing the coupling structure of each feature in real time. For example, Figure 5 As shown, the structural coupling relationship analysis process in step S30 includes: S301: Analyze the connectivity relationships between gas features in the transformer DGA data, construct a graph structure for the transformer DGA data based on these connectivity relationships, and generate node features for the input DyGNN model based on the graph structure. The expression for the graph structure is shown below: (6) Among them, the node set Each node corresponds to a gas feature, and the edge set , indicating the connection relationship between gas characteristics.
[0044] S302: The perceptron of the DyGNN model is used to activate node features, generating dynamic edge weights between node features that have feature connections. The expression for the dynamic edge weights is as follows: (7) in, Representing node features Node features The node pairs formed The adjacency matrix is used to represent dynamic edge weights. Perceptron Node features Node features Activation process, For feature splicing, , For learnable parameters, This is the activation function.
[0045] The adjacency matrix generated in this embodiment It is symmetric, and its numerical stability can be ensured through softmax regularization.
[0046] S303: Based on the dynamic edge weights and corresponding node features, feature interaction aggregation is performed through the graph attention network in the DyGNN model. The aggregated structural features are output to characterize the structural coupling relationship. The structural feature expression is as follows: (8) in, This represents the structural features used to characterize the structural coupling relationships between the graph structures corresponding to transformer DGA data. - They represent the first l Layer nodes 1- Feature representation of feature interaction aggregation, Indicates the number of feature types. Indicates the feature dimension.
[0047] In this embodiment, after obtaining the adjacency matrix A With node features x Then, a graph attention network is used for feature interaction aggregation. The expression for the single-layer network update process of the graph attention network during feature interaction aggregation is as follows: (9) in, Indicates the first Layer nodes i Feature interaction aggregation representation, For the first l Layer nodes j Feature interaction aggregation representation, It is a linear mapping matrix. These represent different feature dimensions. Represents node pairs The normalized attention weights depend on the edge weights. , Indicates the first l The number of nodes in a layer.
[0048] The expression for the normalized attention weights is as follows: (10) in, For activation function, Learnable parameters The transpose of is used to perform a dot product with the concatenated node feature vectors to generate a scalar attention score. Indicates the first l Layer Each node.
[0049] The DyGNN model training process in this embodiment includes: First, using gas features as nodes and the connections between features as edges, the feature vectors are... The graph structure is constructed as shown in equation (6); then a lightweight perceptron is used. Generate node pairs in the graph structure Edge weights between them, using the adjacency matrix A To ensure computational efficiency, the perceptron contains only a linear mapping and activation function, and the edge weight expression is shown in formula (7). Finally, feature interaction aggregation is performed through a graph attention network. The single-layer update process in the graph attention network is shown in formula (8), and the normalized attention weights are set according to the edge weights. The normalized attention weights are shown in formula (9). Finally, the aggregated structural features are output, and the output result is shown in formula (10).
[0050] S40: Aggregate the query matrix, key matrix, and value matrix, analyze the fault category distribution of transformer DGA data based on the aggregation results, and output the final fault type prediction result based on the analysis results.
[0051] Since transformer monitoring data contains both time-series evolution information and structural coupling characteristics, a single channel cannot fully reflect the diagnostic signals during the fault process. Therefore, this embodiment introduces a cross-attention fusion mechanism to achieve complementary advantages between time-series branches and structural branches, thereby further improving the accuracy of judgment.
[0052] In the cross-attention mechanism, a temporal representation of feature changes is selected. As a query term, the graph structure branches are output. Used as key-value pairs, specifically, the evolution trajectory of features over time directly reflects the fault characteristics during transformer operation, and their trend changes and periodic oscillations are often highly correlated with specific abnormal states. Therefore, the feature representation extracted by the time-series branch has a clear direction and can be regarded as an active focus on key features in diagnostic tasks. Setting it as a query term means that the model, based on the current time-series behavior, initiates a "focus request" to the structural branch in order to filter out the most relevant associated features from the structural information. In contrast, the structural branch learns the dependencies between features through a graph neural network, which has global expressive power but lacks the ability to distinguish the focus of features at different time stages. Therefore, setting the output of the structural branch as a key-value pair, making it a source of potential associated information, and adaptively responding according to the current time-series state helps to avoid indiscriminate interference from structural information and improves the accuracy of the fusion effect.
[0053] Specifically, such as Figure 6 As shown, step S40 includes: S401: Calculate the cross-attention weights among the query matrix, key matrix, and value matrix. Concatenate and fuse these cross-attention weights with the temporal features output by the TimesNet model to obtain the final fused feature. The expression for the final fused feature is shown below: (11) in, This represents the final fused feature, concatenated from temporal features and cross-attention weights, used to characterize the aggregation result. This represents the temporal features output by the TimesNet model. Representing the query matrix Key matrix Sum matrix Cross-attention weights between them.
[0054] In this embodiment, in order to preserve the original temporal features and improve training stability, the final fused features are spliced together with the original temporal vector and the cross-attention result using a residual connection method.
[0055] S402: Analyze the fault category distribution of the transformer DGA data based on the aggregation results, predict the fault type of the transformer DGA data based on the fault category distribution, and output the final fault type prediction result.
[0056] The fault category distribution expression is as follows: (12) in, This represents the predicted distribution of fault categories. C The number of fault types. , These are the parameters of the fully connected layer. Indicates will A global average pooling operation mapped to a single vector.
[0057] Specifically, in this embodiment, the final fault type prediction result is output through a fully connected layer and a classifier.
[0058] In this embodiment, a cross-attention mechanism is introduced to calculate the attention weight of the query towards structural information. The weight calculation expression for the cross-attention mechanism is as follows: (13) in, , , , These are the corresponding feature dimensions. A similarity matrix representing cross-structural associations between gas features. This is a scaling factor used to prevent gradient instability caused by excessively large inner products. The output of the cross-attention mechanism is the aggregated feature representation. .
[0059] It should be understood that the sequence number of each step in the above embodiments does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this application.
[0060] Those skilled in the art will recognize that the units of the various examples described in connection with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both. To clearly illustrate the interchangeability of hardware and software, the components of the various examples have been generally described in terms of functionality in the foregoing description. Whether these functions are implemented in hardware or software depends on the specific application of the technical solution and the constraints involved. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementations should not be considered beyond the scope of the invention.
[0061] In the embodiments provided by the present invention, it should be understood that the division of units is only a logical functional division. In actual implementation, there may be other division methods, such as multiple units can be combined into one unit, one unit can be split into multiple units, or some features can be ignored.
[0062] Furthermore, the functional units in the various embodiments of the present invention can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.
[0063] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, read-only memory (ROM), random access memory (RAM), portable hard drives, magnetic disks, or optical disks.
[0064] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some or all of the technical features therein. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present invention, and they should all be covered within the scope of the claims and specification of the present invention.
Claims
1. A transformer fault diagnosis method based on dynamic correlation modeling and attention fusion, characterized in that, The method includes: Real-time acquisition of transformer DGA data and encoding and feature classification are performed, and a multi-level feature system of the transformer DGA data is constructed based on the encoding and feature classification results. Based on the multi-level feature system, the time-series features of the transformer DGA data are extracted using a pre-trained TimesNet model, and a query matrix of the transformer DGA data is constructed. The structural coupling relationship between the transformer DGA data is analyzed by a pre-trained DyGNN model, and the key matrix and value matrix of the transformer DGA data are generated according to specific feature dimensions based on the structural coupling relationship. The query matrix, the key matrix, and the value matrix are aggregated. The fault category distribution of the transformer DGA data is analyzed based on the aggregation results, and the final fault type prediction result is output based on the analysis results.
2. The transformer fault diagnosis method based on dynamic association modeling and attention fusion according to claim 1, characterized in that, The step of extracting the time-series features of the transformer DGA data using a pre-trained TimesNet model according to the multi-level feature system and constructing a query matrix for the transformer DGA data specifically includes: According to the encoding order of the multi-level feature system, the transformer DGA data is converted into local block tensors corresponding to multiple time-scale sliding windows through the multi-scale sliding window in the TimesNet model. The periodicity and local variation features of the local block tensor within each time scale sliding window are extracted by the periodic-aware convolutional structure in the TimesNet model to obtain the periodic encoding features corresponding to each time scale. The periodic coding features across all time scales are concatenated and fused to generate the final time-series coding of the transformer DGA data.
3. The transformer fault diagnosis method based on dynamic correlation modeling and attention fusion according to claim 2, characterized in that, The step of converting the transformer DGA data into local block tensors corresponding to multiple time-scale sliding windows through a multi-scale sliding window in the TimesNet model, according to the encoding order of the multi-level feature system, includes: The transformer DGA data is converted into an input sequence of a set length according to the encoding order of the multi-level feature system. The expression of the input sequence is as follows: (1) in, This represents an input sequence of a specified length. - This represents the feature vector in the input sequence, composed of transformer DGA data. Indicates the set length. Indicates the number of feature types; The transformer DGA data is converted into local block tensors corresponding to multiple time-scale sliding windows using the multi-scale sliding window in the TimesNet model. The expression of the local block tensor is as follows: (2) in, Represents a local block tensor. This indicates the operation of constructing a sliding window block. The duration of the sliding window representing the time scale. This indicates the number of sliding steps for the sliding window block.
4. The transformer fault diagnosis method based on dynamic association modeling and attention fusion according to claim 2, characterized in that, The process of extracting the periodicity and local variation features of the local block tensor within each time-scale sliding window using the periodicity-aware convolutional structure in the TimesNet model to obtain the periodic encoding features corresponding to each time scale specifically includes: The convolutional operation is performed along the time dimension and the feature dimension using the periodic-aware convolutional structure in the TimesNet model. The convolutional expression of the periodic-aware convolutional structure is as follows: (3) in, This represents the periodic encoding obtained by concatenating the time and feature dimensions after convolution, used to characterize the periodicity and local variation features of local block tensors. This represents the convolution of a local block tensor in the time dimension. This represents the convolution of a local block tensor along the feature dimension.
5. The transformer fault diagnosis method based on dynamic association modeling and attention fusion according to claim 4, characterized in that, The step of extracting the periodicity and local variation features of the local block tensor within each time-scale sliding window using the periodicity-aware convolutional structure in the TimesNet model to obtain the periodic encoding features corresponding to each time scale also includes: Explicit position encoding is added to the periodic-aware convolutional structure. Enhanced periodicity perception by combining with a periodic weighting function and incorporating explicit position encoding. The convolution expression for a period-aware convolution structure with a period weighting function is shown below: (4) in, Indicates the addition of explicit positional coding. Periodic encoding with periodic weighting function, It is the ReLU activation function. This represents element-wise multiplication. For the first k Periodic frequency parameters at various scales For phase shift, This indicates batch normalization.
6. The transformer fault diagnosis method based on dynamic association modeling and attention fusion according to claim 5, characterized in that, The step of concatenating and fusing the periodic coding features across all time scales to generate the final time-series coding of the transformer DGA data specifically includes: Average pooling is performed on the periodic encoded features at each time scale, and all the average pooled periodic encoded features are concatenated and fused to generate the final time-series encoding of the transformer DGA data. The expression of the final time-series encoding is as follows: (5) in, This represents the final temporal code output by the TimesNet model. This represents the average pooling mechanism of the TimesNet model. Indicates a time scale.
7. The transformer fault diagnosis method based on dynamic association modeling and attention fusion according to claim 1, characterized in that, The analysis of structural coupling relationships between transformer DGA data using a pre-trained DyGNN model, and the generation of key and value matrices of the transformer DGA data based on specific feature dimensions according to these structural coupling relationships, specifically includes: The connectivity relationships between gas features in the transformer DGA data are analyzed. A graph structure of the transformer DGA data is constructed based on these connectivity relationships. Node features for inputting the DyGNN model are then generated based on this graph structure. The expression for the graph structure is as follows: (6) Among them, the node set Each node corresponds to a gas feature, and the edge set , indicating the connection relationship between gas characteristics; The perceptron of the DyGNN model is used to activate the node features, generating dynamic edge weights between node features that have feature connections. The expression for the dynamic edge weights is as follows: (7) in, Representing node features Node features The node pairs formed The adjacency matrix is used to represent dynamic edge weights. Perceptron Node features Node features Activation process, For feature splicing, , For learnable parameters, For activation functions; Based on the dynamic edge weights and corresponding node features, the graph attention network in the DyGNN model performs feature interaction aggregation processing, outputting aggregated structural features to characterize structural coupling relationships. The expression of the structural features is as follows: (8) in, This represents the structural features used to characterize the structural coupling relationships between the graph structures corresponding to the transformer DGA data. - They represent the first l Layer nodes 1- Feature representation of feature interaction aggregation, Indicates the number of feature types. Indicates the feature dimension.
8. The transformer fault diagnosis method based on dynamic correlation modeling and attention fusion according to claim 7, characterized in that, The step of analyzing the structural coupling relationship between the transformer DGA data using a pre-trained DyGNN model, and generating the key matrix and value matrix of the transformer DGA data according to specific feature dimensions based on the structural coupling relationship, further includes: The expression for the single-layer network update process during feature interaction aggregation in the graph attention network is shown below: (9) in, Indicates the first Layer nodes i Feature interaction aggregation representation, For the first l Layer nodes j Feature interaction aggregation representation, It is a linear mapping matrix. These represent different feature dimensions. Represents node pairs The normalized attention weights depend on the edge weights. , Indicates the first l The number of nodes in the layer; The expression for the normalized attention weights is as follows: (10) in, For activation function, Indicates the first l Layer Each node.
9. The transformer fault diagnosis method based on dynamic association modeling and attention fusion according to claim 1, characterized in that, The process involves aggregating the query matrix, the key matrix, and the value matrix; analyzing the fault category distribution of the transformer DGA data based on the aggregation results; and outputting the final fault type prediction result based on the analysis results. Specifically, this includes: Calculate the cross-attention weights among the query matrix, the key matrix, and the value matrix. Concatenate and fuse these cross-attention weights with the temporal features output by the TimesNet model to obtain the final fused feature. The expression for the final fused feature is as follows: (11) in, This represents the final fused feature, concatenated from temporal features and cross-attention weights, used to characterize the aggregation result. This represents the temporal features output by the TimesNet model. Representing the query matrix Key matrix Sum matrix Cross-attention weights between them; Based on the aggregation results, the fault category distribution of the transformer DGA data is analyzed, the fault type of the transformer DGA data is predicted based on the fault category distribution, and the final fault type prediction result is output. The fault category distribution expression is as follows: (12) in, This represents the predicted distribution of fault categories. C The number of fault types. , These are the parameters of the fully connected layer. Indicates will Global average pooling operation that maps to a single vector. Indicates the number of feature types. Indicates the feature dimension.
10. The transformer fault diagnosis method based on dynamic association modeling and attention fusion according to claim 9, characterized in that, The process of aggregating the query matrix, the key matrix, and the value matrix, analyzing the fault category distribution of the transformer DGA data based on the aggregation results, and outputting the final fault type prediction result based on the analysis results also includes: The calculation expression for the cross-attention weights is as follows: (13) in, , , , These are the corresponding feature dimensions. A similarity matrix representing cross-structural associations between gas features. A scaling factor used to prevent gradient instability caused by excessively large inner products. Indicates the number of feature types.