A method for building a multi-modal fusion power transformer fault diagnosis model
By using a multimodal fusion power transformer fault diagnosis model, the problems of insufficient robustness of single physical quantity detection and difficulty in training data-driven models are solved. This enables efficient diagnosis of complex faults, reduces energy consumption and false alarm rate, extends transformer life, and supports root cause analysis of faults.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- MAANSHAN POWER SUPPLY COMPANY STATE GRID ANHUI ELECTRIC POWER
- Filing Date
- 2026-03-24
- Publication Date
- 2026-06-26
AI Technical Summary
Existing power transformer fault diagnosis technologies rely on the detection of a single physical quantity, which is susceptible to interference and lacks diagnostic robustness. They cannot meet the refined requirements of complex faults. Traditional methods have high false alarm and false negative rates, and data-driven models are difficult to train and generalize effectively.
A multimodal fusion fault diagnosis model is adopted, which combines vibration signal and oscillatory wave response signal. Feature fusion is performed through graph convolutional network and cross-modal Transformer network. Unsupervised diagnosis is performed using variational autoencoder and knowledge graph. Model training is combined with Bayesian optimization and transfer learning. A transformer fault knowledge graph is constructed for fault analysis.
It improves the diagnostic accuracy of complex faults, reduces false alarm and missed detection rates, lowers energy consumption, extends transformer lifespan, supports root cause analysis of faults, and reduces energy consumption and equipment wear and tear in data acquisition experiments.
Smart Images

Figure CN122286484A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of fault diagnosis model technology, specifically to a method for building a multimodal fusion power transformer fault diagnosis model. Background Technology
[0002] As a core piece of equipment in the power grid, the health status of power transformers directly affects the safe and stable operation of the grid. Existing fault diagnosis technologies mainly rely on the detection of single physical quantities, such as monitoring transformer tank vibration signals to reflect mechanical defects like winding deformation and core loosening. However, vibration signals are easily affected by load current fluctuations, cooling system operation, and environmental noise, leading to insufficient stability in feature extraction and inadequate diagnostic robustness. Furthermore, vibration signals have low sensitivity to early insulation defects (such as partial discharge), failing to meet the needs of refined diagnosis of complex faults. Timely repair of early faults can prevent their escalation (e.g., partial discharge developing into insulation breakdown), preventing premature equipment failure due to severe faults and extending the transformer's design life.
[0003] Meanwhile, while the oscillating wave detection method based on electrical characteristics can effectively detect hidden dangers such as insulation aging and partial discharge by injecting high-frequency excitation signals and analyzing the winding resonance response, its technical bottlenecks are also prominent: conventional methods judge faults based on a single parameter, the resonant frequency offset, ignoring the inherent correlation between multi-dimensional characteristics such as amplitude decay rate and quality factor changes during the damping oscillation process. Furthermore, the inherent resonance characteristics of different transformer models vary significantly, making it difficult for traditional fixed threshold judgment modes to dynamically adapt to individual equipment differences and parameter drift caused by long-term aging, resulting in high false alarm and false negative rates. False alarms trigger unnecessary shutdowns for maintenance, increasing energy consumption due to equipment downtime and the activation of standby equipment. False negatives can cause partial power outages and load fluctuations in the grid. To cope with these fluctuations, peak-shaving units need to be activated; however, the operating efficiency of these units is far lower than that of conventional thermal power units or new energy units, increasing energy consumption and carbon emissions. Although recent research has attempted to integrate vibration and electrical characteristics, existing solutions mostly employ simple feature splicing or linear weighting methods, failing to establish a deep physical coupling model between mechanical vibration modes and electrical resonant responses. This results in insufficient diagnostic accuracy for complex scenarios such as winding-insulation co-deterioration and multi-point latent faults. For example, existing technologies have disclosed applications of multimodal data in the fault diagnosis of wind turbines or electromechanical actuators, involving general feature weighting fusion algorithms, such as adaptive weighting methods based on Bayesian optimization, and brain-like computing paradigms based on biomimetic pulse coding and spiking neural networks (SNNs). While these technical solutions provide useful references, their general algorithmic ideas, if directly applied to the field of power transformers without targeted original improvements, are easily seen as "obvious combinations" of existing technologies, thus facing licensing obstacles. Furthermore, transformer faults, especially early and severe complex faults, often have extremely scarce data samples. Simulating transformer faults not only damages equipment but also requires continuous energy consumption to maintain experimental conditions, making it difficult for traditional data-driven models to be effectively trained and generalized. Existing technologies do not provide universal, energy-efficient, and environmentally friendly solutions for this. Summary of the Invention
[0004] (a) Technical problems to be solved To address the shortcomings of existing technologies, this invention provides a method for building a multimodal fusion power transformer fault diagnosis model, which solves the technical problems mentioned in the background section.
[0005] (II) Technical Solution To achieve the above objectives, the present invention provides the following technical solution: a method for building a multimodal fusion power transformer fault diagnosis model, comprising the following steps: S1: Multimode signal acquisition and preprocessing of the transformer; The multimodal signal includes vibration signal and oscillating wave response signal; The preprocessed vibration signal is bandpass filtered, and the intrinsic mode function is separated using the empirical mode decomposition method to remove high-frequency noise and low-frequency interference. The preprocessing process includes feature extraction of vibration signals and oscillating wave response signals; S2: Abstract the internal structure of the transformer into a graphical model, with nodes including windings, core, and insulation structure; Edges represent physical connections, while node attributes include vibration characteristics and oscillation wave characteristics. S3: A Graph Convolutional Network (GCN) is used as the base model to fuse features from graph data and learn the physical coupling relationships between the components inside the transformer. The propagation rules of the graph convolutional network layers can be expressed as: ; in, Let be the feature matrix of the l-th layer node. Given an adjacency matrix with self-loops, for The degree matrix, For activation function, The weight matrix is a learnable matrix; S4: Positional encoding of multimodal signal feature sequences; S5: Utilizing the multi-head self-attention mechanism of a cross-modal Transformer network, this model dynamically captures deep correlations between features of different modes (vibration, oscillation waves); the self-attention calculation formula is as follows:
[0006] Where Q, K, and V are the query, key, and value matrices, respectively, which are obtained by linear transformation of the input feature vectors; is the dimension of the key vector; this mechanism enables the model to automatically assign dynamic weights to each feature, highlighting the most critical feature information for fault diagnosis under specific time series and operating conditions; S6: The model employs multi-head self-attention: To capture feature associations from different dimensions, the model uses a multi-head self-attention (MHA) mechanism; MHA divides the input features into multiple subspaces and performs attention calculations in parallel in each subspace, finally concatenating the results: ; ; Wherein, is the number of attention heads. , , A weight matrix independent for each head. This is the final output weight matrix; S7: Build a proxy model to predict performance under different combinations. The proxy model learning includes the first and second stages. Phase 1: Using a dataset containing fault data, pre-train the model to learn general mechanical and electrical anomaly patterns; Phase Two: Fine-tuning the pre-trained model using a small amount of high-value power transformer-specific fault data; at this stage, a loss function incorporating a regularization term is adopted, with the following form: ; in, For cross-entropy loss, Let Ω be the migration regularization coefficient, and Ω be the set of frozen layers. and These are the weights after fine-tuning and those after pre-training, respectively; An optimization objective driven by both physics and data is introduced, the implementation of which relies on a Bayesian optimization algorithm; during the fine-tuning phase, the model's loss function is defined as: ; in, Loss function optimized for Bayesian approach. For feature space vectors, For the weight vector, This is the actual fault label.
[0007] Preferably, the multi-head attention mechanism is as follows: ; ; Wherein, is the number of attention heads. , , A weight matrix independent for each head. This is the final output weight matrix.
[0008] Preferably, the location encoding formula is as follows: ; ; in, The position of the feature in the sequence, The dimension index of the feature vector. For model dimensions.
[0009] Preferably, a VAE model is trained using only health and known fault data; the VAE aims to learn the probability distribution of data in a low-dimensional latent space, and its core is to optimize the Evidence Lower Bound (ELBO) loss function: ; Where x represents the input data; z represents the latent variable; pθ(x|z) is the probability distribution of the decoder, representing the ability to reconstruct the original data from the latent space; q (z|x) is the approximate posterior distribution of the encoder; p(z) is the prior distribution of the latent variable (usually a standard Gaussian distribution); DKL is the Kullback-Leibler divergence, used to measure the difference between two probability distributions; Anomaly score calculation: During the diagnostic phase, for new test data, calculate its anomaly score on the trained VAE model; the anomaly score can be measured by the reconstruction error, i.e.: ; Where x is the input data and x^ is the output data reconstructed by the model from the latent space; since the model is only trained on healthy and known fault data, its reconstruction error will increase significantly when unknown fault or abnormal data is input; by setting a reconstruction error threshold, unsupervised identification of unknown faults can be achieved.
[0010] Preferably, the knowledge graph of the VAE model is constructed as follows: The internal components of the transformer, fault types, vibration spectrum, and attenuation coefficient are defined as entities; the relationships between them are defined, including location, generation, and indication, forming a set of triplets to construct a transformer fault knowledge graph.
[0011] Preferably, the knowledge graph embedding and reasoning mode of the VAE model is as follows: The TransE model is used to embed knowledge graphs, mapping each entity and relation to a low-dimensional vector space. The core idea of the TransE model is that for any correct triple... The sum of the head entity vector h and the relation vector r should approximately equal the tail entity vector t, i.e. The scoring function for this model is: ; in, , , These are vector representations of the head entity, relation, and tail entity, respectively. This represents the L1 or L2 norm.
[0012] Preferably, the optimization steps of the VAE model are as follows: The TransE model is trained by minimizing a margin-based ranking loss function: ; Where S is the set of correct triples, and S' is the set of incorrect triples generated by breaking the correct triples; Margin is the interval between positive and negative samples; this loss function aims to ensure that the score of correct triples is lower than that of incorrect triples, thereby improving the model's inference ability. Fault Severity Assessment and Remaining Life Prediction: For identified faults, the system combines the distance score from the TransE model and the reconstruction error of the VAE to assess the severity of the fault. For time-series data, the system can further integrate an SNN-LSTM hybrid model for remaining life (RUL) prediction, with the following output formula: ; in, Indicates that SNN in The hidden layer state vector at time step 1. Indicates that LSTM is in The cell state vector at time t.
[0013] To improve the accuracy and completeness of fault diagnosis, the unsupervised anomaly identification of VAE is combined with the supervised classification of surrogate models to form a closed-loop process of "initial screening → classification → verification → update". The specific steps are as follows: I. VAE Anomaly Screening: Quickly Differentiating Between "Healthy / Known Fault Candidates / Unknown Fault Candidates" 1. The multimodal signal X, which has been acquired and preprocessed in real time, real Input the VAE model and calculate: Reconstruction error:
[0014] X real Refers to multimodal signals acquired in real time and preprocessed, x^ real It is a VAE decoder for X real The reconstruction output.
[0015] Latent variable Z real All fault types U in the VAE distribution library k Minimum distance: ; Latent variable Z real For real-time signal X real The low-dimensional latent variable U after mapping by the VAE encoder kThe mean of the latent variables for the "k-th state" in the VAE distribution library; 2. The initial screening results will be determined according to the following rules:
[0016] The maximum reconstruction error threshold for health data (the error rate of all health samples counted during the VAE training phase). VAE The maximum value is taken as the threshold to distinguish between "healthy" and "abnormal". The average reconstruction error threshold for known fault data (the error rate of all known fault samples during the VAE training phase). VAE The average value is used as the threshold to distinguish between "known faults" and "unknown faults". Potential spatial distance threshold (empirical tuning value, usually taken from all known fault samples) The 95th percentile is used to ensure that the "known fault candidate" signal is sufficiently close to a certain type of fault distribution.
[0017] For the "known fault candidate" signals screened by VAE, the adaptability of the surrogate model to transformer-specific faults is used to achieve accurate classification of fault types and quantification of severity.
[0018] II. Accurate Classification Using the Proxy Model: Outputs "Fault Type + Severity Score" 1. Select X corresponding to "known fault candidates" real The surrogate model with its feature inputs has been fine-tuned. 2. The surrogate model outputs two core results through the "general anomaly pattern" learned in the pre-training phase and the "transformer-specific fault pattern" adapted in the fine-tuning phase: Result 1: Fault type K is the fault type identifier, which matches U in the VAE distribution library. k The "K" in each case corresponds one-to-one and is used to specify the specific type of fault.
[0019] Result 2: Fault severity score S sev The metric used to quantify the severity of a fault is based on the “feature-severity mapping relationship learned by the surrogate model”, with a value range of 0-10, where 0 represents no fault and 10 represents an extremely severe fault.
[0020] III. Result Verification and Correction: Ensuring the Reliability of Classification Results By verifying the classification results of the surrogate model through the "latent spatial distribution" of VAE, misjudgments caused by overfitting of the surrogate model due to small samples can be avoided, thereby improving the reliability of diagnosis.
[0021] KL divergence calculation: For the fault type K output by the surrogate model, calculate the KL divergence (Kullback-Leibler divergence) between the "real-time signal latent variables" and the "potential distribution of the k-th type of fault", denoted as D. k Used to measure the difference in distribution between the two.
[0022] ; : Variance matrix of latent variables for "Type K fault" in VAE distribution library (obtained by statistical analysis of latent variables of similar samples during VAE training, reflecting the characteristic dispersion of this type of fault). like (τ) D (With an empirical threshold of 0.5): If the classification result is valid, the output will be "fault type + severity + RUL prediction"; The classification result is questionable. Based on the reasoning of the knowledge graph, the classification result is corrected.
[0023] IV. Dynamic Model Updates: The "unknown fault candidate" data is manually labeled and added to the VAE training set, and the VAE is retrained to update the distribution library. Adding new samples to the surrogate model augmentation dataset triggers incremental fine-tuning of the surrogate model, enabling continuous evolution of both models.
[0024] (III) Beneficial Effects This invention provides a method for building a multimodal fusion-based fault diagnosis model for power transformers. It has the following beneficial effects: (1) This method uses a cross-modal Transformer for dynamic feature fusion and captures the correlation between vibration and electrical features through a multi-head attention mechanism to overcome the limitations of single-modal diagnosis (such as the insensitivity of vibration signals to insulation defects) and improve the diagnostic accuracy of composite faults (such as "winding looseness + partial discharge"). Early insulation fault diagnosis allows for timely repair, preventing fault expansion, preventing equipment from being scrapped prematurely due to serious faults, extending the design life of transformers, and reducing resource consumption and carbon emissions in the manufacturing process caused by frequent equipment replacement.
[0025] (2) This method uses multi-modal sensors to synchronously collect vibration and oscillation wave signals. It is a non-invasive detection method that does not require disassembling the transformer or shutting down the power supply. This avoids the extra energy consumption of the power grid during the shutdown period and the high energy consumption of the detection equipment caused by traditional offline disassembly and detection.
[0026] (3) This method introduces VAE (variational autoencoder), which can detect anomalies with only health data training. It generates realistic fault data by sampling from the potential space without real experiments. It can identify fault types that do not appear in the training set, reducing the need for traditional models to collect data through "fault simulation experiments" (such as artificially creating winding short circuits or insulation aging).
[0027] (4) This method adopts two-stage transfer learning to reduce the amount of data required during training. It uses general industrial equipment data for pre-training and only requires a small amount of transformer-specific data for fine-tuning. This greatly reduces the number of fault simulation experiments designed specifically for transformers, reduces experimental energy consumption and equipment wear, and improves cross-equipment knowledge transfer to enhance the model's generalization.
[0028] (5) This method constructs a transformer fault knowledge graph, which can provide a logical chain for fault diagnosis and supports root cause analysis.
[0029] (6) This method uses the DTW algorithm to match real-time signals with fault templates to solve the problem of unequal length of timing signals and reduce fault mode matching error. Attached Figure Description
[0030] Figure 1 Schematic diagram of the overall architecture of the power transformer fault diagnosis system.
[0031] Figure 2 Flowchart of multimodal signal acquisition and preprocessing.
[0032] Figure 3 : Schematic diagram of transformer graph data model based on physical topology.
[0033] Figure 4 : Schematic diagram of cross-modal Transformer network structure.
[0034] Figure 5 Flowchart of unsupervised diagnostics and data augmentation for variational autoencoders (VAEs).
[0035] Figure 6 Flowchart of Fault Mode Matching and Confidence Calculation Based on DTW.
[0036] Figure 7 Flowchart of fault diagnosis and remaining life prediction based on multi-algorithm fusion. Detailed Implementation
[0037] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0038] This invention proposes a power transformer fault diagnosis method and system based on the fusion of vibration and oscillatory wave multimodal data, aiming to solve the problems of high algorithm versatility, difficulty in meeting the authorization non-obviousness requirement, and low diagnostic accuracy with small sample data in existing technologies. This invention introduces several innovations, including unsupervised fault diagnosis and data augmentation based on generative models, feature representation and fusion based on cross-modal Transformer networks, fault mode reasoning based on knowledge graph embedding (KGE), and physical-data dual-driven Bayesian optimization. The method of the present invention includes the following steps: Step 1: Multimodal signal acquisition and preprocessing The mechanical vibration signal and oscillation wave response signal of the power transformer during operation are collected synchronously. The two types of signals are preprocessed by bandpass filtering and empirical mode decomposition (EMD) denoising, and features such as time domain, frequency domain and oscillation wave attenuation parameters (such as attenuation coefficient α, quality factor Q and resonant frequency offset) are extracted.
[0039] 1. Constructing a Graph Data Model: Based on the physical connections of components such as the transformer's internal windings, core, and insulation structure, a graph model reflecting its topology is constructed. The vibration features and oscillation wave features extracted in step one are used as node attributes of the graph, achieving an innovative transformation from traditional vector representation to a graph data structure.
[0040] 2. Feature Representation and Fusion: A Graph Convolutional Network (GCN) is used as the base model to perform feature fusion on the graph data, learning the physical coupling relationships between the components inside the transformer. The propagation rules of the GCN layer can be expressed as: ; in, Let be the feature matrix of the l-th layer node. Given an adjacency matrix with self-loops, for The degree matrix, For activation function, The weight matrix is learnable. GCN can effectively aggregate graph structure information, laying the foundation for subsequent fusion.
[0041] Cross-modal Transformer Network: This invention further introduces a cross-modal Transformer network as a more advanced fusion scheme. This network utilizes a multi-head self-attention mechanism to dynamically capture deep correlations between features of different modalities (vibration, oscillation wave). The self-attention calculation formula is as follows: ; Here, Q, K, and V are the query, key, and value matrices, respectively, which are obtained from the input feature vectors through linear transformation; represents the dimension of the key vector. This mechanism enables the model to automatically assign dynamic weights to each feature, highlighting the most critical feature information for fault diagnosis under specific time series and operating conditions.
[0042] 4. Multi-head Self-Attention: To capture feature associations from different dimensions, the model employs a multi-head self-attention (MHA) mechanism. MHA divides the input features into multiple subspaces and performs attention calculations in parallel within each subspace, finally concatenating the results. ; ; Wherein, is the number of attention heads. , , A weight matrix independent for each head. This is the final output weight matrix. This parallel computation can capture richer multimodal correlations.
[0043] 5. Temporal Information Encoding: To preserve temporal information, this invention performs positional encoding on the feature sequence before inputting it into the Transformer network. The positional encoding formula is as follows: ; ; in, The position of the feature in the sequence, The dimension index of the feature vector. This is the model dimension. This allows the model to understand the relative positions of different features on the time axis.
[0044] Step 3: Physical-Data Dual-Driven Bayesian Optimization and Transfer Learning This invention employs Bayesian optimization to adaptively adjust the algorithm parameters. Simultaneously, it uses... A two-stage transfer learning framework to overcome the problem of small sample data.
[0045] 1. Pre-training stage: The model is pre-trained using a public dataset containing a large amount of fault data of general industrial equipment (such as electromechanical actuators mentioned in CN120143610A) to learn general mechanical and electrical anomaly patterns.
[0046] 2. Fine-tuning Phase: The pre-trained model is fine-tuned using a small amount of high-value power transformer-specific fault data. At this stage, a loss function incorporating a regularization term is adopted, with the following form: ; in, For cross-entropy loss, Let Ω be the migration regularization coefficient, and Ω be the set of frozen layers. and These are the weights after fine-tuning and the weights after pre-training, respectively.
[0047] This invention further introduces a physical-data dual-driven optimization objective, the implementation of which relies on a Bayesian optimization algorithm. During the fine-tuning phase, the model's loss function is defined as: ; in, Loss function optimized for Bayesian approach. For feature space vectors, For the weight vector, This is the actual fault label.
[0048] Step 4: Unsupervised fault diagnosis based on generative models; This invention innovatively introduces an unsupervised fault diagnosis method based on variational autoencoders (VAEs) to address unknown and scarce fault types.
[0049] 1. Model Training: Train a VAE model using only healthy and known fault data. The VAE aims to learn the probability distribution of data in a low-dimensional latent space, its core being the optimization of the Evidence Lower Bound (ELBO) loss function. ; Where x represents the input data; z represents the latent variable; pθ(x|z) is the probability distribution of the decoder, representing the ability to reconstruct the original data from the latent space; q (z|x) is the approximate posterior distribution of the encoder; p(z) is the prior distribution of the latent variable (usually a standard Gaussian distribution); DKL is the Kullback-Leibler divergence, used to measure the difference between two probability distributions.
[0050] Anomaly score calculation: During the diagnostic phase, for new test data, anomaly scores are calculated on the trained VAE model. Anomaly scores can be measured by the reconstruction error, i.e.: ; Here, x represents the input data, and x^ represents the output data reconstructed by the model from the latent space. Since the model is trained only on healthy and known fault data, its reconstruction error increases significantly when unknown faults or abnormal data are input. By setting a reconstruction error threshold, unsupervised identification of unknown faults can be achieved.
[0051] Data augmentation: The VAE model is used as a data generator, sampling and decoding from its learned latent space to generate new, realistic fault data samples to augment the scarce fault dataset. This provides rich training data for subsequent model training, greatly improving the model's generalization ability in small-sample scenarios.
[0052] Step 5: Fault mode reasoning based on knowledge graph embedding (KGE); 1. Knowledge Graph Construction: Define transformer internal components (such as windings and core), fault types (such as inter-turn short circuits), and diagnostic indicators (such as vibration spectrum and attenuation coefficient) as entities. Define the relationships between them (such as "located in", "generated", and "indicated") as relations, forming a set of triples (entity_head, relation, entity_tail) to construct a transformer fault knowledge graph.
[0053] 2. Knowledge Graph Embedding and Reasoning: The TransE model is used to embed the knowledge graph, mapping each entity and relation to a low-dimensional vector space. The core idea of the TransE model is: for any correct triple... The sum of the head entity vector h and the relation vector r should approximately equal the tail entity vector t, i.e. The scoring function for this model is: ; in, , , These are vector representations of the head entity, relation, and tail entity, respectively. This represents the L1 or L2 norm.
[0054] 3. Model Optimization: The TransE model is trained by minimizing a margin-based ranking loss function. ; Where S is the set of correct triples, and S' is the set of incorrect triples generated by breaking the correct triples; The margin is the interval between positive and negative samples. This loss function aims to ensure that the score of correct triples is lower than that of incorrect triples, thereby improving the model's inference ability.
[0055] Fault Severity Assessment and Remaining Life Prediction: For identified faults, the system combines the distance score from the TransE model and the reconstruction error of the VAE to assess the severity of the fault. For time-series data, this system can further integrate an SNN-LSTM hybrid model for remaining life (RUL) prediction, with the following output formula: ; in, Indicates that SNN in The hidden layer state vector at time step 1. Indicates that LSTM is in The cell state vector at time t.
[0056] To improve the accuracy and completeness of fault diagnosis, the unsupervised anomaly identification of VAE is combined with the supervised classification of surrogate models to form a closed-loop process of "initial screening → classification → verification → update". The specific steps are as follows: I. VAE Anomaly Screening: Quickly Differentiating Between "Healthy / Known Fault Candidates / Unknown Fault Candidates" 1. Input the real-time acquired and preprocessed multimodal signal x into the VAE model and calculate: Reconstruction error:
[0057] X real Refers to multimodal signals acquired in real time and preprocessed, x^ real It is a VAE decoder for X real The reconstruction output.
[0058] Latent variable Z real All fault types U in the VAE distribution library k Minimum distance: ; Latent variable Z real For real-time signal X real The low-dimensional latent variable U after mapping by the VAE encoder k The mean of the latent variables for the "k-th state" in the VAE distribution library. 2. The initial screening results will be determined according to the following rules:
[0059] The maximum reconstruction error threshold for health data (the error rate of all health samples counted during the VAE training phase). VAE The maximum value is taken as the threshold to distinguish between "healthy" and "abnormal". The average reconstruction error threshold for known fault data (the error rate of all known fault samples during the VAE training phase). VAE The average value is used as the threshold to distinguish between "known faults" and "unknown faults". Potential spatial distance threshold (empirical tuning value, usually taken from all known fault samples) The 95th percentile is used to ensure that the "known fault candidate" signal is sufficiently close to a certain type of fault distribution.
[0060] For the "known fault candidate" signals screened by VAE, the adaptability of the surrogate model to transformer-specific faults is used to achieve accurate classification of fault types and quantification of severity.
[0061] II. Accurate classification using the surrogate model: Outputs "fault type + severity score"; 1. Select X corresponding to "known fault candidates" real The surrogate model with its feature inputs has been fine-tuned. 2. The surrogate model outputs two core results through the "general anomaly pattern" learned in the pre-training phase and the "transformer-specific fault pattern" adapted in the fine-tuning phase: Result 1: Fault type K is the fault type identifier, which matches U in the VAE distribution library. k The "K" in each case corresponds one-to-one and is used to specify the specific type of fault.
[0062] Result 2: Fault severity score S sev The metric used to quantify the severity of a fault is based on the “feature-severity mapping relationship learned by the surrogate model”, with a value range of 0-10, where 0 represents no fault and 10 represents an extremely severe fault.
[0063] III. Result Verification and Correction: Ensuring the Reliability of Classification Results By verifying the classification results of the surrogate model through the "latent spatial distribution" of VAE, misjudgments caused by overfitting of the surrogate model due to small samples can be avoided, thereby improving the reliability of diagnosis.
[0064] KL divergence calculation: For the fault type K output by the surrogate model, calculate the KL divergence (Kullback-Leibler divergence) between the "real-time signal latent variable" and the "potential distribution of the Kth type of fault", denoted as , to measure the difference in distribution between the two.
[0065] ; : Variance matrix of latent variables for "type k fault" in VAE distribution library (obtained by statistical analysis of latent variables of similar samples during VAE training, reflecting the characteristic dispersion of this type of fault); like (τ) D (With an empirical threshold of 0.5): If the classification result is valid, the output will be "fault type + severity + RUL prediction"; The classification result is questionable. Based on the reasoning of the knowledge graph, the classification result is corrected.
[0066] IV. Dynamic Model Updates: The "unknown fault candidate" data is manually labeled and added to the VAE training set, and the VAE is retrained to update the distribution library. Adding new samples to the surrogate model augmentation dataset triggers incremental fine-tuning of the surrogate model, enabling continuous evolution of both models.
[0067] This hybrid model can effectively capture time-series features and enable long-term trend analysis of equipment health status.
[0068] It should be noted that, in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such process, method, article, or apparatus.
[0069] Although embodiments of the invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims and their equivalents.
Claims
1. A method for building a multimodal fusion power transformer fault diagnosis model, characterized in that: Includes the following steps: S1: Multimode signal acquisition and preprocessing of the transformer; The multimodal signal includes vibration signal and oscillating wave response signal; The preprocessed vibration signal is bandpass filtered, and the intrinsic mode function is separated using the empirical mode decomposition method to remove high-frequency noise and low-frequency interference. The preprocessing process includes feature extraction of vibration signals and oscillating wave response signals; S2: Abstract the internal structure of the transformer into a graphical model, with nodes including windings, core, and insulation structure; Edges represent physical connections, while node attributes include vibration characteristics and oscillation wave characteristics. S3: A Graph Convolutional Network (GCN) is used as the base model to fuse features from graph data and learn the physical coupling relationships between the components inside the transformer. The propagation rules of the graph convolutional network layers can be expressed as: ; in, Let be the feature matrix of the l-th layer node. Given an adjacency matrix with self-loops, for The degree matrix, For activation function, The weight matrix is a learnable matrix; S4: Positional encoding of multimodal signal feature sequences; S5: Utilizing the multi-head self-attention mechanism of a cross-modal Transformer network, this model dynamically captures deep correlations between features of different modes (vibration, oscillation waves); the self-attention calculation formula is as follows: ; Where Q, K, and V are the query, key, and value matrices, respectively, which are obtained by linear transformation of the input feature vectors; is the dimension of the key vector; this mechanism enables the model to automatically assign dynamic weights to each feature, highlighting the most critical feature information for fault diagnosis under specific time series and operating conditions; S6: The model employs multi-head self-attention: To capture feature associations from different dimensions, the model uses a multi-head self-attention (MHA) mechanism; MHA divides the input features into multiple subspaces and performs attention calculations in parallel in each subspace, finally concatenating the results: ; ; Wherein, is the number of attention heads. , , A weight matrix independent for each head. This is the final output weight matrix; S7: Build a proxy model to predict performance under different combinations. The proxy model learning includes the first and second stages. Phase 1: Using a dataset containing fault data, pre-train the model to learn general mechanical and electrical anomaly patterns; Phase Two: Fine-tuning the pre-trained model using a small amount of high-value power transformer-specific fault data; at this stage, a loss function incorporating a regularization term is adopted, with the following form: ; wherein, is the cross-entropy loss, is the transfer regularization coefficient, Ω is the set of frozen layers, and are the fine-tuned and pre-trained weights, respectively. An optimization objective driven by both physics and data is introduced, the implementation of which relies on a Bayesian optimization algorithm; during the fine-tuning phase, the model's loss function is defined as: ; wherein, is a loss function for Bayesian optimization, is a feature space vector, is a weight vector, is an actual failure label.
2. The method of claim 1, wherein: The multi-head attention mechanism is as follows: ; ; Wherein, is the number of attention heads. , , A weight matrix independent for each head. This is the final output weight matrix.
3. The method of claim 1, wherein: The location encoding formula is as follows: ; ; in, The position of the feature in the sequence, The dimension index of the feature vector. For model dimensions.
4. The method of claim 1, wherein: A VAE model is trained using only health and known failure data. The VAE aims to learn the probability distribution of data in a low-dimensional latent space, with its core being the optimization of the Evidence Lower Bound (ELBO) loss function. ; Where x represents the input data; z represents the latent variable; pθ(x|z) is the probability distribution of the decoder, representing the ability to reconstruct the original data from the latent space; q (z|x) is the approximate posterior distribution of the encoder; p(z) is the prior distribution of the latent variable (usually a standard Gaussian distribution); DKL is the Kullback-Leibler divergence, used to measure the difference between two probability distributions; Anomaly score calculation: During the diagnostic phase, for new test data, calculate its anomaly score on the trained VAE model; the anomaly score can be measured by the reconstruction error, i.e.: ; Where x is the input data and x^ is the output data reconstructed by the model from the latent space; since the model is only trained on healthy and known fault data, its reconstruction error will increase significantly when unknown fault or abnormal data is input; by setting a reconstruction error threshold, unsupervised identification of unknown faults can be achieved.
5. The method according to claim 4, characterized in that: The knowledge graph of the VAE model is constructed as follows: The internal components of the transformer, fault types, vibration spectrum, and attenuation coefficient are defined as entities; the relationships between them are defined, including location, generation, and indication, forming a set of triplets to construct a transformer fault knowledge graph.
6. The method according to claim 5, characterized in that: The knowledge graph embedding and reasoning mode of the VAE model is as follows: The TransE model is used to embed knowledge graphs, mapping each entity and relation to a low-dimensional vector space. The core idea of the TransE model is that for any correct triple... The sum of the head entity vector h and the relation vector r should approximately equal the tail entity vector t, i.e. The scoring function for this model is: ; in, , , These are vector representations of the head entity, relation, and tail entity, respectively. This represents the L1 or L2 norm.
7. The method of claim 6, wherein: The optimization steps for the VAE model are as follows: The TransE model is trained by minimizing a margin-based ranking loss function: ; Where S is the set of correct triples, and S' is the set of incorrect triples generated by breaking the correct triples; Margin is the interval between positive and negative samples; this loss function aims to ensure that the score of correct triples is lower than that of incorrect triples, thereby improving the model's inference ability. Fault Severity Assessment and Remaining Life Prediction: For identified faults, the system combines the distance score from the TransE model and the reconstruction error of the VAE to assess the severity of the fault. For time-series data, the system can further integrate an SNN-LSTM hybrid model for remaining life (RUL) prediction, with the following output formula: ; in, Indicates that SNN in The hidden layer state vector at time step 1. Indicates that LSTM is in The cell state vector at time t.
8. The method of claim 1, wherein: To improve the accuracy and completeness of fault diagnosis, the unsupervised anomaly identification of VAEs is combined with the supervised classification of surrogate models. The specific steps are as follows: Step one: VAE anomaly preliminary screening, the multi-modal signal X collected and pre-processed in real time real Input the VAE model, calculate: reconstruction error: ; X real the pre-processed multi-modal signals, x^ real is the reconstruction output of the VAE decoder on X real ; Potential variable Z real Minimum distance to all fault types U in the VAE distribution library: k ; latent variable Z real for real-time signal X real low-dimensional latent variable, U, mapped by the VAE encoder k for latent variable mean of "state k" in the VAE distribution bank; like < If the initial screening result is considered to be a healthy state, there is no need to enter the agent model later, and the "device health" diagnosis conclusion will be directly output. like ≤ ≤ and If the initial screening result is considered as a candidate known fault, the subsequent data will be input into the surrogate model for accurate classification based on the distribution characteristics of the known faults. As or The preliminary screening result is considered as an unknown fault candidate, triggering an "unknown fault alarm", and synchronously recording the original signal and feature data; in, To determine the maximum reconstruction error threshold for health data, the VAE training phase calculates the error rate for all health samples. VAE The maximum value is taken as the threshold to distinguish between "healthy" and "abnormal"; The average reconstruction error threshold for known fault data is used as the basis for calculating the error rate (err) of all known fault samples during the VAE training phase. VAE The average value is taken as the threshold to distinguish between "known faults" and "unknown faults"; Potential spatial distance threshold, typically taken from all known fault samples. The 95th percentile is used to ensure that the "known fault candidate" signal is sufficiently close to a certain type of fault distribution; Step 2: Accurate classification using the agent model; corresponding to the known fault candidate in step one real and its feature input has been fine-tuned completed agent model The surrogate model, through the "general anomaly pattern" learned in the pre-training phase and the "transformer-specific fault pattern" adapted in the fine-tuning phase, outputs the following core results: Result 1 : The fault type K is the fault type identification, which corresponds one-to-one with the "K" in the VAE distribution library, for explicitly specifying the specific category of the fault; k Result 2: The fault type K is the fault type identification, which corresponds one-to-one with the "K" in the VAE distribution library, for explicitly specifying the specific category of the fault; Result 2: Failure severity score S sev To quantify the index of failure severity, the calculation basis is "feature-severity mapping relationship learned by the proxy model", the value range is 0-10, 0 represents no failure, and 10 represents extremely severe failure; Step 3: Verify and correct results; The classification results of the proxy model are verified by the "latent spatial distribution" of the VAE. KL divergence calculation: For each failure type K output by the surrogate model, the KL divergence (Kullback-Leibler divergence) of the "real-time signal latent variable" to the "kth failure latent distribution" is calculated, denoted as D k for measuring the distribution difference between the two; ; : Variance matrix of the latent variables for "class K fault" in the VAE distribution bank; If : classification result is valid, output "fault type + severity + RUL prediction"; The classification result is questionable. Based on the reasoning of the knowledge graph, the classification result is corrected. Step 4: Dynamic Model Update The unknown fault candidate data mentioned in step one are manually labeled and added to the VAE training set, and the VAE is retrained to update the distribution library. Adding it as a new sample to the proxy model augmentation dataset triggers incremental fine-tuning of the proxy model.