A switch cabinet operation state monitoring method based on multi-modal fusion features
By employing a multimodal fusion feature monitoring method, utilizing infrared thermal imaging, partial discharge ultra-high frequency signals, low-frequency vibration waveforms, and acoustic emission signals, combined with wavelet packet decomposition and graph convolutional networks, the accuracy and early fault identification problems of traditional monitoring methods under complex operating conditions are solved, enabling accurate identification and visual diagnosis of switchgear status.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- INNER MONGOLIA YULI ELECTRIC CO LTD
- Filing Date
- 2026-03-19
- Publication Date
- 2026-06-19
AI Technical Summary
Traditional switchgear monitoring methods rely on single-mode signals, making it difficult to accurately identify potential faults under complex operating conditions. They are particularly susceptible to noise interference in environments with high humidity, high salt spray, strong electromagnetic interference, or different frequency harmonics, and are difficult to capture hidden anomalies such as micro-arc discharges, resulting in delayed early warnings and a high false alarm rate. Existing technologies also struggle to effectively integrate multi-mode information.
A multimodal fusion feature monitoring method is adopted to acquire infrared thermal imaging, partial discharge ultra-high frequency signals, low frequency vibration waveforms and acoustic emission signals. Features are extracted by wavelet packet decomposition, entropy analysis and spectral energy distribution. Combined with multi-scale attention mechanism and graph convolutional network, a dynamic weight map between modes is constructed, and a deep fusion feature vector sequence is output. An anomaly recognition subnetwork is constructed and a state diagnosis map is generated.
It significantly improves the accuracy and robustness of switchgear operation status identification, enabling early detection of minor faults, and possesses interpretability and visualization capabilities, assisting maintenance personnel in anomaly tracing and precise repair.
Smart Images

Figure CN122241581A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of switchgear operation status monitoring technology, and specifically to a switchgear operation status monitoring method based on multimodal fusion features. Background Technology
[0002] With the continuous improvement of the intelligence level of power systems, switchgear, as one of the key electrical devices, plays a vital role in ensuring the safe and stable operation of the power grid through real-time monitoring of its operating status. Traditional switchgear monitoring methods mostly rely on single-mode sensor data, such as parameters like temperature, current, voltage, or partial discharge, and use threshold settings for alarms or trend judgments. However, under complex operating conditions, such as high humidity, high salt spray, strong electromagnetic interference, or background harmonics, single-mode signals are easily distorted by noise interference, leading to significant deviations in monitoring results and making it difficult to identify potential fault risks in a timely and accurate manner.
[0003] More seriously, when hidden, non-thermal anomalies occur inside the switchgear, such as micro-arc discharge, localized insulation carbonization, and the initiation of contact oxidation fatigue cracks, traditional methods lack the ability to characterize deep physical damage mechanisms and struggle to capture early signs of their evolution. This leads to delayed warnings and a high false alarm rate, especially under low load or intermittent operation. Existing technologies also struggle to effectively fuse multimodal information, exhibiting problems such as difficulty in timing alignment, complex scale coupling, and redundant feature dimensions, limiting fine-grained modeling and accurate identification of switchgear conditions. Summary of the Invention
[0004] The purpose of this invention is to provide a method for monitoring the operating status of switchgear based on multimodal fusion features, so as to overcome the shortcomings of the prior art.
[0005] To achieve the above objectives, the present invention provides the following technical solution: a method for monitoring the operating status of switchgear based on multimodal fusion features, comprising: Infrared thermal imaging data, partial discharge ultra-high frequency signals, low frequency vibration waveforms and acoustic emission signals collected during the operation of the switchgear are acquired, and time synchronization and spatial registration are performed to construct the original multimodal time series dataset. Wavelet packet decomposition, entropy analysis, and spectral energy distribution extraction are performed on the multimodal time series dataset to obtain a primary feature matrix characterizing the thermo-electric-acoustic-vibration co-evolution characteristics; The primary feature matrix is input into a graph convolutional network based on a multi-scale attention mechanism to construct a dynamic weight graph between modalities and output a deep fused feature vector sequence with temporal dependence and modal complementarity after fusion. Based on the deep fusion feature vector sequence, an anomaly recognition subnetwork is constructed. Short-term feature fragments are extracted using a sliding window method and compared and learned to generate a state embedding space that can measure the evolution trend of minor damage. In the state embedding space, a central boundary threshold model is constructed by combining historical health samples to dynamically measure the distance to the current operating state vector, identify potential anomalies, and output early warning signals. If a potential anomaly is identified, the key modal channels and anomaly occurrence areas are located by reverse tracing based on modal contribution, forming a status diagnosis map. Semantic annotation is then performed using an expert knowledge base, and a switchgear operation health report is output.
[0006] Preferably, the step of obtaining the primary characteristic matrix characterizing the thermo-electric-acoustic-vibrational co-evolutionary properties includes: Wavelet packet decomposition was performed on the synchronously registered infrared thermal image, partial discharge ultra-high frequency signal, low frequency vibration waveform and acoustic emission signal to extract the energy distribution characteristics of sub-signals of each mode in different frequency bands; Based on the energy entropy, arrangement entropy and sample entropy values of each frequency band sub-signal, a multi-scale entropy feature vector is constructed. Short-time Fourier transform is performed on the wavelet-reconstructed sub-signal to extract the spectral energy density distribution and frequency centroid; Entropy features and spectral features are cascaded and fused according to time windows to construct a primary feature matrix for multimodal collaborative expression.
[0007] Preferably, the step of constructing the inter-modal dynamic weight map includes: Construct a static topology graph with modal feature channels as graph nodes, and define an adjacency matrix to represent the initial connection relationships between different modes; A multi-scale attention mechanism is introduced, which calculates attention weight vectors at the global scale, local window scale, and intramodal channel scale, and dynamically adjusts the graph edge weights. The primary feature matrix is mapped to the graph space, and features are propagated through a multi-layer graph convolutional network, fusing structural information and dynamic weights. Residual connections and layer normalization operations are added after each graph convolutional layer to output a dynamic intermodal weight graph containing intermodal interaction dependencies.
[0008] Preferably, the step of outputting a deep fused feature vector sequence with temporal dependence and modal complementarity includes: The fused feature maps processed by the graph convolutional network are reorganized according to the time window order to construct a preliminary temporal feature sequence. A bidirectional long short-term memory network is introduced to perform temporal modeling on the preliminary temporal feature sequence, extracting forward and backward temporal dependency information respectively, and fusing them into a bidirectional contextual feature representation; At each time point, a channel attention mechanism is applied to automatically assign representation weights to each modal feature at that time point. The fused temporal features are subjected to layer normalization and linear transformation to output a deep fused feature vector sequence.
[0009] Preferably, the step of constructing an anomaly detection subnetwork based on the deep fusion feature vector sequence includes: The deep fusion feature vector sequence is segmented according to a fixed-length sliding time window to construct a set of short-time state feature segments; Each short-term state feature fragment is input into a contrastive coding network with a dual-tower structure. One tower is used for modeling the current running state, and the other tower is used for modeling historical health samples. The output is an embedding vector pair. The contrastive loss function is used to optimize the embedded vector pairs, so that the distance between normal states converges and the distance between abnormal states and normal states diffuses, thus constructing a discriminative state embedding space. A central boundary threshold model is introduced into the state embedding space. A multidimensional spherical boundary is constructed with all historical healthy embedding vectors as references to identify whether the current feature embedding has exceeded the boundary and output the anomaly judgment result.
[0010] Preferably, the step of generating a state embedding space that can measure the evolution trend of minute damage includes: The deep fusion feature vector sequence is divided into segments in chronological order using a fixed-length sliding window. The window length is set to 10 time steps and the step size is 5 time steps, and short-time feature fragments are extracted. For each short-term feature segment, construct positive sample pairs and negative sample pairs, where positive sample pairs come from historical healthy states or adjacent windows, and negative sample pairs contain a combination of potential abnormal states and healthy states; The sample pairs are input into a feature encoder network with shared parameters, the embedding vectors are extracted, and the cosine distance is used as a similarity metric. The embedding space structure is optimized based on the contrastive loss function, which clusters similar state embedding vectors and separates different state embedding vectors, generating a state embedding space that can measure the evolution trend of minor damage.
[0011] Preferably, the step of dynamically measuring the distance to the current running state vector includes: Collect the embedding vectors of multiple historical health state segments, calculate their center point vectors, and use them as the embedding mean representation of the health state. The maximum distance between all healthy embedding vectors and the center point is calculated using Euclidean distance and set as the boundary threshold radius to form a multidimensional spherical boundary model. Calculate the distance between the embedded vector generated from the current running state segment and the center point, and compare it with the boundary radius; If the distance exceeds the boundary radius, it is determined to be an out-of-bounds abnormal state, and an early warning signal is triggered and the abnormal detection result at the corresponding time and location is output.
[0012] Preferably, the steps for forming a state diagnostic map include: The deep fusion feature vectors corresponding to the abnormal state embedding vectors are extracted, and inverse gradient sensitivity analysis is applied to their input channels to obtain the contribution score of each modal channel to the abnormal output. The contribution scores are normalized, and the top two key modal channels with the highest contribution are selected. The corresponding original feature segments are backtracked under the key modalities, and the significant feature mutation regions are located by combining the temporal position. The modal channels, time segments, and spatial sampling positions are correlated and mapped to generate a state diagnosis map.
[0013] The technical effects and advantages provided by the present invention in the above technical solution are as follows: 1. This invention constructs a multimodal time-series dataset by fusing four types of modal information: infrared thermal imaging, partial discharge ultra-high frequency signals, low-frequency vibration waveforms, and acoustic emission signals. Multidimensional features are extracted using wavelet packet decomposition, entropy analysis, and spectral energy extraction methods. Furthermore, a multi-scale attention mechanism and graph convolutional networks are combined to achieve dynamic weight modeling and deep fusion representation among modalities. This technical solution effectively solves the problems of large signal interference, delayed anomaly identification, and weak feature representation capabilities in traditional single-modal monitoring methods under complex operating conditions, significantly improving the accuracy, robustness, and early detection capability of switchgear operating status identification, as well as minor faults.
[0014] 2. This invention constructs a state embedding space based on contrastive learning, employs a central boundary threshold model to quantify the operating state, and after identifying potential anomalies, performs reverse tracing based on modal contribution to accurately locate key modal channels and anomaly occurrence areas. Finally, it generates a state diagnostic map and outputs a health report by combining it with an expert knowledge base. This method possesses high interpretability, visualization, and traceability capabilities, effectively assisting maintenance personnel in anomaly tracing, risk assessment, and precise maintenance, thereby improving the intelligence and automation level of substation condition monitoring. Attached Figure Description
[0015] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments recorded in this invention. For those skilled in the art, other drawings can be obtained based on these drawings.
[0016] Figure 1 This is a flowchart of the method of the present invention. Detailed Implementation
[0017] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0018] For examples, please refer to Figure 1 As shown in this embodiment, a switchgear operation status monitoring method based on multimodal fusion features includes: Infrared thermal imaging data, partial discharge ultra-high frequency signals, low frequency vibration waveforms and acoustic emission signals collected during the operation of the switchgear are acquired, and time synchronization and spatial registration are performed to construct the original multimodal time series dataset.
[0019] To achieve comprehensive, microscopic, and multi-dimensional monitoring of the switchgear's operating status, this invention conducts joint monitoring of characteristics from four physical domains: thermal, electrical, acoustic, and vibration. It also performs time synchronization and spatial registration of the data to construct a unified original multimodal time-series dataset. Specifically, this includes: Infrared thermal imaging data acquisition: An uncooled infrared thermal imager (e.g., FLIRA series, wavelength range 8–14μm, thermal sensitivity <50mK) was selected and mounted on a bracket 1.5 meters directly in front of the switch cabinet, perpendicularly facing the main busbar area of the cabinet. The imager's acquisition frequency was set to 10 frames / second, and the image resolution was 640×480. To avoid interference from environmental heat sources, the image acquisition area was limited to three main areas: contact connection points, cable outlets, and disconnect switches. The temperature matrix data of each pixel was recorded in real time.
[0020] Partial discharge ultra-high frequency (UHF) signal acquisition: A UHF electromagnetic wave sensor with a pass frequency of 300 MHz to 1.5 GHz is deployed inside the partial discharge window of the switchgear and mounted on the inner wall of the metal casing using a magnetic bracket. The acquisition system uses a high-speed data acquisition card (such as NI PXIe-5162), with a sampling rate of 500 MS / s and a 1 ms acquisition time window. Trigger thresholds and event tracking mechanisms are configured to achieve accurate time-based capture and energy assessment of partial discharge pulse events.
[0021] Low-frequency vibration waveform acquisition: An IEPE-type triaxial accelerometer (such as PCB 356A45) is installed on the exterior of the switchgear closing spring mechanism housing. The sensor is magnetically attached to the equipment surface. The sampling frequency is set to 5 kHz, and the acquisition time is 10 s / time. The focus is on monitoring changes in impact vibration during mechanism operation to identify non-standard actions caused by wear, corrosion, or misalignment.
[0022] Acoustic emission signal acquisition: A broadband piezoelectric ceramic acoustic emission sensor (such as RS AE-921) with a frequency response range of 100kHz to 1MHz was selected and fixedly mounted on the switchgear support frame using a coupling agent. The signal was input to a high-speed acquisition card (sampling rate 10 MHz) after passing through a low-noise preamplifier. The acquisition window was set to 2 ms, mainly used to detect transient energy release events before breakdown.
[0023] Since various sensors have different sampling frequencies and acquisition periods, this invention uses a high-precision synchronous clock module (supporting the IEEE 1588 PTP protocol) as a unified time reference for the entire system: each frame of thermal imaging data is accompanied by a UTC timestamp; UHF and acoustic emission signals are locked by triggering the built-in clock of the sampling card; soft synchronization between vibration and thermal images is achieved through a trigger circuit; finally, all data are uniformly mapped to a time axis aligned to the millisecond level to ensure the consistency of the timing of events.
[0024] In addition, to address the differences in temporal resolution among data points of different modes, a sliding window resampling and interpolation alignment mechanism is adopted: based on the infrared image frame rate, high-frequency data (UHF, acoustic emission) are downsampled to calculate the energy envelope features within the corresponding time window, ensuring that the modal feature vectors are arranged at a uniform time.
[0025] To accurately map various modal signals onto the switchgear structural model, a three-dimensional spatial registration model is constructed. The specific method is as follows: A simplified 3D model is created based on the switchgear structural drawings, and the spatial coordinates of key components (such as main busbar, center isolator, wiring terminals, etc.) are defined. Each sensor records its spatial position (XYZ coordinates) and orientation information during deployment; For image data, the perspective is transformed to the perspective of the structural model, and thermal imaging pixels are mapped to corresponding physical components through homography matrix matching; Signals collected by other sensors (UHF, vibration, acoustic emission) are used as the center to anchor the modal output characteristics to the corresponding position of the structural model, forming a unified spatial feature distribution map.
[0026] Finally, after the above steps, a structured original multimodal time series dataset is constructed, which is in the following form: Data format: D={T,L,M1,M2,M3,M4}; where T is a unified timestamp sequence, L is a spatial location anchor matrix, and M1 to M4 are the feature vector sequences of thermal imaging, UHF, vibration, and acoustic emission modes, respectively.
[0027] This dataset provides a complete underlying data foundation for subsequent multi-scale feature extraction, graph model construction, and state recognition.
[0028] Wavelet packet decomposition, entropy analysis, and spectral energy distribution extraction are performed on the multimodal time series dataset to obtain a primary feature matrix characterizing the thermo-electric-acoustic-vibration co-evolutionary properties.
[0029] After constructing the multimodal time-series dataset, this embodiment performs time-frequency domain analysis and information entropy extraction on infrared thermal imaging data, partial discharge ultra-high frequency signals, low-frequency vibration waveforms, and acoustic emission signals to obtain a primary feature matrix reflecting the co-evolution characteristics of multi-source physical processes involving heat, electricity, sound, and vibration. The specific steps are as follows: For each modal data, wavelet packet decomposition is used to divide the signal into multiple frequency bands. Let the input signal be S(t), and select a Daubechies-class orthogonal wavelet basis (preferably db4) as the mother wavelet for decomposition, decomposing the signal to the third level, obtaining... =8 frequency band sub-signal components, denoted as For each sub-signal component, calculate its energy. Expressed as: , where N is the number of samples of the sub-signal.
[0030] Then, the energy of all frequency bands was normalized to construct an energy distribution vector. This is used to describe the energy distribution characteristics of the modal signal at different frequencies.
[0031] For each frequency band sub-signal obtained from wavelet packet decomposition, its statistical information entropy index is further extracted, including energy entropy, permutation entropy, and sample entropy: Energy entropy The expression used to measure the degree of unevenness in energy distribution is: ,in ; This represents the energy proportion of the i-th frequency band, also known as the energy probability distribution value; Permutation entropy is used to measure the change pattern of the local structure of a signal. The original sub-signal is divided into a sequence of reconstructed vectors with an embedding dimension of length m=3 and a delay of τ=1. By judging the permutation frequency of the reconstructed vectors under the relative size relationship, a probability distribution is constructed and its entropy value is solved.
[0032] Sample entropy is used to measure signal complexity. The embedding dimension is set to m=2, and the tolerance threshold is set to 0.2 times the standard deviation. The proportion of pairs of reconstructed vectors whose distance is less than this tolerance is calculated, and the negative log-likelihood value is used as the sample entropy.
[0033] Each type of entropy value is concatenated to form an entropy feature vector of length 3, which is used to reflect the complexity change of the modal signal under the current time window.
[0034] Each frequency band sub-signal after wavelet packet decomposition Perform a short-time Fourier transform using a Hamming window with a window length of 128 points and a sliding step of 64 points. Calculate the spectrum F(f,t) of the signal within each window.
[0035] Based on the obtained spectrum results, the following two types of frequency domain features are extracted: Spectral energy density: within each time window, the statistical frequency range The power spectrum integral value reflects the energy concentration of the signal within a specific frequency band; Centroid frequency: The frequency at which the center of gravity of the spectrum is calculated. The expression is: ,in Indicates frequency index, This represents the power spectral density at the corresponding frequency.
[0036] This step is used to extract the frequency shift and energy change trends of the signal over a local time period, which helps to identify non-steady-state disturbances or latent anomalies.
[0037] The obtained entropy feature vector (energy entropy, permutation entropy, sample entropy) and the obtained spectral feature vector (spectral energy density, frequency centroid) are horizontally concatenated along the time window dimension to construct a local modal feature vector of length 5.
[0038] For each modality, the sliding window width is set to 1 second and the sliding step size is 0.5 seconds. The above-mentioned fusion features are extracted window by window, and finally the feature sequence of the modality over the entire observation period is constructed.
[0039] The feature sequences of the four modalities are aligned along the time dimension and then stacked vertically to form a primary feature matrix of size T×20, where T is the number of time windows and 20 indicates that each time step incorporates 5-dimensional feature representations from the four modalities.
[0040] This feature matrix serves as the input basis for subsequent multimodal deep feature fusion and fault mode recognition, possessing excellent multi-domain physical meaning representation capabilities and modal complementarity.
[0041] The primary feature matrix is input into a graph convolutional network based on a multi-scale attention mechanism to construct a dynamic weight graph between modalities, and outputs a deep fused feature vector sequence with temporal dependence and modal complementarity after fusion.
[0042] To further model the structural dependencies and interactive coupling characteristics among multimodal features, this invention, based on graph modeling theory, introduces a combination of multi-scale attention mechanisms and graph convolutional networks to construct an intermodal dependency graph with dynamic weight update capabilities. The specific implementation process is as follows: Each modal feature channel in the primary feature matrix is regarded as a node in the graph structure. There are four types of modes: infrared thermal imaging, partial discharge ultra-high frequency signal, low frequency vibration waveform and acoustic emission signal. Each mode corresponds to one graph node, forming a graph structure with four nodes.
[0043] The initial graph structure is represented as an undirected graph, and the adjacency matrix is given by... ,in: If there is a known physical coupling between two modes (such as thermo-electric, electro-acoustic), then the corresponding element ; Otherwise set ; diagonal elements , indicates a self-join.
[0044] This adjacency matrix is used to define the static topology and provide an initial connection template for subsequent dynamic graph convolution.
[0045] To endow graph structures with adaptive modeling capabilities for time-varying dependencies, a three-scale attention mechanism is introduced to dynamically adjust the edge weights in the adjacency matrix. The three scales include: Global Scale Attention: A global context vector is constructed using the global average pooling result of node features across all time windows. Its global dependency distribution is then extracted through two fully connected layers to generate a global attention coefficient matrix between nodes. ; Local window-scale attention: Within a sliding time window, the dot product attention function is used to calculate the similarity of node features within each window, generating a locally temporally related attention matrix A1; Modal channel scale attention: Apply channel attention mechanism to the feature channels within each modal node, construct a nested activation function network using the variance and mean between channels, and output the node self-attention coefficient Ac.
[0046] Finally, the three types of attention matrices are weighted and fused to construct a dynamic adjacency matrix. ,in These are the normalized weight coefficients, with values set to 0.4, 0.4, and 0.2 respectively, which can be dynamically fine-tuned based on the model's validation performance during training.
[0047] The fused feature vectors extracted from each modality in the primary feature matrix at each time window are mapped to the corresponding graph nodes as initial input features. Where T is the number of time windows, 4 is the number of modal nodes, and d is the feature dimension per modality.
[0048] A graph convolution algorithm based on Chebyshev polynomial approximation is adopted to perform convolution operations on the graph structure within each time window. The convolution formula is as follows: ;in: The input graph features are for the l-th layer; This is the normalized dynamic adjacency matrix; σ is a learnable weight matrix; σ is a non-linear activation function, preferably ReLU. Through multi-layer stacked graph convolution, interaction patterns and long-term dependency features between different modalities are captured layer by layer.
[0049] To improve the stability and training efficiency of deep networks, a residual connection operation is introduced after the output of each graph convolutional layer, i.e.: ;in This represents the output feature matrix of the (l+1)th layer obtained after residual connection. Subsequently, layer normalization is applied to the residual output, and zero-mean, unit-variance standardization is performed on the feature vector of each node to ensure the stability of the values of each channel and suppress gradient vanishing.
[0050] Final output feature matrix This represents the feature representation with intermodal interaction dependencies after fusing graph structure information, and can be regarded as a dynamic weight graph between modalities, providing graph structure context support for subsequent time series modeling.
[0051] To further enhance the expressive power of multimodal fusion features in the temporal dimension, extract potential dynamic evolution patterns, and strengthen the complementary relationships between modalities, this embodiment performs temporal structure reconstruction and feature enhancement modeling on the dynamic weight map between modalities processed by graph convolutional networks. The specific steps are as follows: The fused feature map output by the aforementioned graph convolutional network Recombination is performed, where T is the number of time windows, 4 is the number of modalities, and d is the fusion feature dimension of each modality.
[0052] Following the time window order, the node feature vectors of the four modalities at each time point are concatenated to construct a temporal feature matrix. Each row represents the fused representation of all modalities within a time window. This temporal feature sequence serves as the input for subsequent temporal modeling and possesses the ability to provide a global representation of the aggregated modalities.
[0053] In order to capture the cause-and-effect dependencies of the switchgear's operating status in the time dimension, the preliminary temporal feature sequence S is input into a bidirectional long short-term memory network for time modeling.
[0054] The bidirectional long short-term memory network consists of two long short-term memory networks in opposite directions: The feedforward network processes the sequence in ascending chronological order and extracts the dependencies from the past states to infer the current state; The backward network processes the sequence in reverse chronological order to extract the potential impact of future states on the current state. forward output sequence With backward output sequence The features are concatenated along the feature dimension to obtain the fused bidirectional contextual feature representation. Where h is the hidden state dimension of the unidirectional long short-term memory network, and in this embodiment, the value is 64. This step effectively enhances the ability of the feature sequence to model nonlinear evolution patterns and delayed fault symptoms.
[0055] To further explore the differential contributions of different modal features to the final state determination at different time points, a channel attention mechanism is introduced based on bidirectional contextual feature representation. The specific method is as follows: For each moment The corresponding fused feature vector Input is fed into the channel attention computation network; This network consists of two fully connected layers, with a ReLU activation function in between, and outputs a vector of weight coefficients along the channel dimension. Each channel corresponds to a weight; Let α(t) and (t) performs element-wise multiplication to obtain the weighted time point feature vector, thereby achieving adaptive adjustment of the importance between modes.
[0056] This attention mechanism improves the discriminative power and robustness of modal features in dynamic scenes without increasing time complexity.
[0057] Temporal feature sequences after channel attention weighting Standardization is performed by introducing layer normalization, which normalizes the mean and variance of the vector at each time step to eliminate scale differences between different channels and improve the stability of model training.
[0058] Subsequently, a linear transformation layer (i.e., a fully connected network layer) is used to map the feature vector at each time step to a unified output dimension df, outputting the final deep fused feature vector sequence. In this embodiment, df=128.
[0059] This feature sequence possesses both temporal evolution modeling capabilities and multimodal collaborative expression capabilities, and can serve as the input basis for subsequent anomaly detection networks.
[0060] Based on the deep fusion feature vector sequence, an anomaly recognition subnetwork is constructed. Short-term feature fragments are extracted using a sliding window approach and compared and learned to generate a state embedding space that can measure the evolution trend of minor damage.
[0061] To achieve high-sensitivity identification of potential minute anomalies in multimodal fusion features, this embodiment constructs an anomaly identification sub-network based on a deep fusion feature vector sequence, and combines contrastive learning to generate a state embedding space, thereby realizing quantitative modeling and discrimination of the evolution trend of minute damage. The specific steps are as follows: The deep fusion feature vector sequence F is divided into segments using a fixed-length sliding time window. The window length is set to 10 time steps, and the sliding step size is set to 5 time steps, forming a set of short-time state feature segments {Ft}, where each segment Ft represents the running state within a continuous time period.
[0062] Each short-term state feature segment is input into a dual-tower contrastive coding network, where both towers share the same coding network structure and parameters. The current running state segment serves as the input to the first input tower, and historical healthy sample segments serve as the input to the second input tower. Each input tower consists of two layers of one-dimensional convolutional networks (kernel size 3, channels 64) and a fully connected layer, outputting a fixed-length embedding vector. They form an embedding vector pair (z1, z2).
[0063] The contrastive loss function is applied to optimize the above embedding vector pairs. Cosine similarity is used as the distance metric in the embedding space. The similarity of positive sample pairs should be maximized, and the similarity of negative sample pairs should be minimized. The contrastive loss function is defined as follows: For positive sample pairs: the loss is ; For negative sample pairs: the loss is , where m is the set minimum similarity interval threshold, with a value of 0.3.
[0064] By training the encoder network using this loss function, the embedding vectors between normal states converge, and the spacing between the embedding vectors of abnormal states and normal states increases, thereby constructing a state embedding space with a discriminative structure.
[0065] In the constructed state embedding space, embedding vectors {zh} of all historical health state segments are collected. A multidimensional spherical boundary model centered on these vectors is constructed based on Euclidean distance. Let the embedding center be the mean of the vector set. Let the threshold radius be the maximum distance in the training set. When the new embedding vector zcur satisfies When the current state is determined to be abnormal, the abnormal determination result is output.
[0066] Based on the same deep fusion feature vector sequence F, time segmentation is performed again using a sliding window method, with a window length of 10 time steps and a step size of 5 time steps, generating a short-time feature fragment set {Ft}.
[0067] For each fragment Ft, construct positive and negative sample pairs: Positive sample pairs are selected from consecutive time periods (such as Ft and Ft+1) or from different historical healthy samples; Negative sample pairs are generated by combining segments containing abnormal behavior, identified through manual annotation or external diagnostic information, with normal segments.
[0068] The sample pairs are input into a feature encoder network with shared parameters. The network structure is the same as the aforementioned dual-tower structure, and it outputs two embedding vectors. The similarity between the vectors is measured using cosine distance. The distance is calculated as follows: By training a contrastive loss function, the encoder learns a state embedding space in which similar states cluster and different states are separated. This embedding space can be used to measure the state evolution process caused by minor damage during the operation of the switchgear by calculating the distance change trend between different state embedding vectors, thereby achieving early fault identification and trend prediction.
[0069] In the state embedding space, a central boundary threshold model is constructed by combining historical health samples to dynamically measure the distance to the current operating state vector, identify potential anomalies, and output early warning signals.
[0070] To achieve accurate anomaly identification and real-time early warning of the current switchgear operating status, after the aforementioned status embedding space is constructed, this embodiment further constructs a center boundary threshold model based on the embedding vectors of historical health status segments, and uses Euclidean distance to determine the embedding position. The specific steps are as follows: Multiple short-time state segments are selected from the historical operating states known to be "healthy". Their embedding vectors are extracted using a contrastive coding network, denoted as {zh(1), zh(2), ..., zh(N)}, where each embedding vector zh(i) represents the position of the corresponding time segment in the state embedding space. The health state center point vector c is calculated by averaging all health embedding vectors element-wise, and its expression is: Center point vector The center point serves as the mean representation of the health status in the embedding space and is used for subsequent distance measurement and boundary construction.
[0071] Using the center point vector c as the center of the sphere, the Euclidean distance between all healthy state embedding vectors and the center point is calculated. The Euclidean distance is defined as: The maximum value among all distances is selected as the boundary threshold radius r, i.e.: Therefore, a multidimensional spherical boundary model with c as the center and r as the radius is constructed to delineate the normal range of the health state embedding space.
[0072] For the current running state segment to be tested Embedsion vectors are extracted using a shared contrastive coding network. Calculate the Euclidean distance between the embedding vector and the health center point c. Its expression is: ;Will It is compared with the set boundary radius r to determine whether the current state is within the normal operating range.
[0073] If the distance between the current state embedding vector and the health center point satisfies the following relationship: If the value is >r, the current state is determined to be an out-of-bounds abnormal state, indicating a potential operational anomaly or early signs of damage.
[0074] The system records the location of the corresponding time segment and outputs an early warning signal, which can be invoked by the upper-level monitoring module. Simultaneously, this status can be marked as a sample awaiting confirmation for subsequent expert review or model updates, realizing an online closed-loop feedback mechanism.
[0075] If a potential anomaly is identified, the key modal channels and anomaly occurrence areas are located by reverse tracing based on modal contribution, forming a status diagnosis map. Semantic annotation is then performed using an expert knowledge base, and a switchgear operation health report is output.
[0076] After identifying abnormal operating conditions and outputting warning signals, this embodiment, to further pinpoint the source of the anomaly and provide visual diagnostic evidence, uses deep fusion features corresponding to embedded vectors, combined with inverse gradient sensitivity analysis and modal mapping, to track key modal channels and abnormal regions, ultimately generating a status diagnostic map and a switchgear health report. The specific steps are as follows: Obtain the Zabn embedding vector corresponding to the state segment judged as abnormal, and trace its input at the front end of the contrast encoder network, i.e., the deep fusion feature vector of that segment. , where 10 is the time window length and d is the channel dimension after fusion.
[0077] The feature vector is input into the trained contrastive encoder network. The backbone parameters are frozen, and only the loss function for anomaly detection is retained. The gradient sensitivity of the input channel to the anomaly detection result is calculated, and the mean absolute value of the gradient for each channel is used as the modality contribution score, defined as follows: Contribution for each channel ci ; where L represents the contrastive loss function, and Fabian(ci) is the feature of the i-th channel.
[0078] Score the contribution of all channels Normalization is performed using the following method: Based on the normalized scoring results, each channel is mapped to the corresponding modal category (infrared thermal imaging, partial discharge ultra-high frequency signal, low frequency vibration waveform, acoustic emission signal) according to the modal channel affiliation, and the average contribution of all channels in each modal category is calculated.
[0079] The two modalities with the highest average contribution were selected as key modal channels for subsequent backtracking and localization of abnormal features.
[0080] Within the identified key modes, the primary feature segments corresponding to the anomalous segments are traced back to extract their original multimodal time-series data and frequency domain entropy spectra. Assuming the segment spans a time window from t1 to t10, the degree of abrupt change in the corresponding features across the key modal channels is analyzed at each time point.
[0081] The mutation index is defined as the absolute difference of the gradient of the time series: Where c is the key channel index; the significance of mutation is determined by a dynamic threshold θ, which is twice the historical standard deviation of the channel. If Dt>θ, it is determined to be an abnormal mutation point, and its time location and feature type are recorded.
[0082] By mapping key modal channels, significant abrupt change time points, and sensor spatial arrangement locations, a multi-dimensional information matrix containing time, modality, frequency band, and sampling coordinates is constructed. This matrix is used to generate a condition diagnosis map, which is labeled with: abnormal modes and their contribution scores, time abrupt change intervals, raw data change trends, and the physical spatial location of suspected fault locations. The graph information is input into a preset expert knowledge base for semantic matching. It is compared with existing anomaly type feature word library and structural response template, such as tags like "thermal non-uniform expansion", "high frequency partial discharge concentration" and "low frequency vibration anomaly". Finally, a switchgear operation health report is generated, which includes anomaly type description, anomaly confidence level, key modes, suggested inspection parts and priority ranking.
[0083] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any changes or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application.
Claims
1. A method for monitoring the operating status of switchgear based on multimodal fusion features, characterized in that: include: Infrared thermal imaging data, partial discharge ultra-high frequency signals, low frequency vibration waveforms and acoustic emission signals collected during the operation of the switchgear are acquired, and time synchronization and spatial registration are performed to construct the original multimodal time series dataset. Wavelet packet decomposition, entropy analysis, and spectral energy distribution extraction are performed on the multimodal time series dataset to obtain a primary feature matrix characterizing the thermo-electric-acoustic-vibration co-evolution characteristics; The primary feature matrix is input into a graph convolutional network based on a multi-scale attention mechanism to construct a dynamic weight graph between modalities and output a deep fused feature vector sequence with temporal dependence and modal complementarity after fusion. Based on the deep fusion feature vector sequence, an anomaly recognition subnetwork is constructed. Short-term feature fragments are extracted using a sliding window method and compared and learned to generate a state embedding space that can measure the evolution trend of minor damage. In the state embedding space, a central boundary threshold model is constructed by combining historical health samples to dynamically measure the distance to the current operating state vector, identify potential anomalies, and output early warning signals. If a potential anomaly is identified, the key modal channels and anomaly occurrence areas are located by reverse tracing based on modal contribution, forming a status diagnosis map. Semantic annotation is then performed using an expert knowledge base, and a switchgear operation health report is output.
2. The method for monitoring the operating status of switchgear based on multimodal fusion features according to claim 1, characterized in that: The steps for obtaining the primary characteristic matrix characterizing the thermo-electric-acoustic-vibrational co-evolution properties include: Wavelet packet decomposition was performed on the synchronously registered infrared thermal image, partial discharge ultra-high frequency signal, low frequency vibration waveform and acoustic emission signal to extract the energy distribution characteristics of sub-signals of each mode in different frequency bands; Based on the energy entropy, arrangement entropy and sample entropy values of each frequency band sub-signal, a multi-scale entropy feature vector is constructed. Short-time Fourier transform is performed on the wavelet-reconstructed sub-signal to extract the spectral energy density distribution and frequency centroid; Entropy features and spectral features are cascaded and fused according to time windows to construct a primary feature matrix for multimodal collaborative expression.
3. The method for monitoring the operating status of switchgear based on multimodal fusion features according to claim 1, characterized in that: The steps for constructing the intermodal dynamic weight graph include: Construct a static topology graph with modal feature channels as graph nodes, and define an adjacency matrix to represent the initial connection relationships between different modes; A multi-scale attention mechanism is introduced, which calculates attention weight vectors at the global scale, local window scale, and intramodal channel scale, and dynamically adjusts the graph edge weights. The primary feature matrix is mapped to the graph space, and features are propagated through a multi-layer graph convolutional network, fusing structural information and dynamic weights. Residual connections and layer normalization operations are added after each graph convolutional layer to output a dynamic intermodal weight graph containing intermodal interaction dependencies.
4. The method for monitoring the operating status of switchgear based on multimodal fusion features according to claim 3, characterized in that: The steps for generating a deep fused feature vector sequence with temporal dependence and modal complementarity after output fusion include: The fused feature maps processed by the graph convolutional network are reorganized according to the time window order to construct a preliminary temporal feature sequence. A bidirectional long short-term memory network is introduced to perform temporal modeling on the preliminary temporal feature sequence, extracting forward and backward temporal dependency information respectively, and fusing them into a bidirectional contextual feature representation; At each time point, a channel attention mechanism is applied to automatically assign representation weights to each modal feature at that time point. The fused temporal features are subjected to layer normalization and linear transformation to output a deep fused feature vector sequence.
5. The method for monitoring the operating status of switchgear based on multimodal fusion features according to claim 1, characterized in that: The steps for constructing an anomaly detection subnetwork based on the deep fusion feature vector sequence include: The deep fusion feature vector sequence is segmented according to a fixed-length sliding time window to construct a set of short-time state feature segments; Each short-term state feature fragment is input into a contrastive coding network with a dual-tower structure. One tower is used for modeling the current running state, and the other tower is used for modeling historical health samples. The output is an embedding vector pair. The contrastive loss function is used to optimize the embedded vector pairs, so that the distance between normal states converges and the distance between abnormal states and normal states diffuses, thus constructing a discriminative state embedding space. A central boundary threshold model is introduced into the state embedding space. A multidimensional spherical boundary is constructed with all historical healthy embedding vectors as references to identify whether the current feature embedding has exceeded the boundary and output the anomaly judgment result.
6. The method for monitoring the operating status of switchgear based on multimodal fusion features according to claim 5, characterized in that: The step of generating a state embedding space that can measure the evolution trend of minor damage includes: The deep fusion feature vector sequence is divided into segments in chronological order using a fixed-length sliding window. The window length is set to 10 time steps and the step size is 5 time steps, and short-time feature fragments are extracted. For each short-term feature segment, construct positive sample pairs and negative sample pairs, where positive sample pairs come from historical healthy states or adjacent windows, and negative sample pairs contain a combination of potential abnormal states and healthy states; The sample pairs are input into a feature encoder network with shared parameters, the embedding vectors are extracted, and the cosine distance is used as a similarity metric. The embedding space structure is optimized based on the contrastive loss function, which clusters similar state embedding vectors and separates different state embedding vectors, generating a state embedding space that can measure the evolution trend of minor damage.
7. The method for monitoring the operating status of switchgear based on multimodal fusion features according to claim 1, characterized in that: The step of dynamically measuring the distance to the current running state vector includes: Collect the embedding vectors of multiple historical health state segments, calculate their center point vectors, and use them as the embedding mean representation of the health state. The maximum distance between all healthy embedding vectors and the center point is calculated using Euclidean distance and set as the boundary threshold radius to form a multidimensional spherical boundary model. Calculate the distance between the embedded vector generated from the current running state segment and the center point, and compare it with the boundary radius; If the distance exceeds the boundary radius, it is determined to be an out-of-bounds abnormal state, and an early warning signal is triggered and the abnormal detection result at the corresponding time and location is output.
8. The method for monitoring the operating status of switchgear based on multimodal fusion features according to claim 7, characterized in that: The steps to form a condition diagnostic atlas include: The deep fusion feature vectors corresponding to the abnormal state embedding vectors are extracted, and inverse gradient sensitivity analysis is applied to their input channels to obtain the contribution score of each modal channel to the abnormal output. The contribution scores are normalized, and the top two key modal channels with the highest contribution are selected. The corresponding original feature segments are backtracked under the key modalities, and the significant feature mutation regions are located by combining the temporal position. The modal channels, time segments, and spatial sampling positions are correlated and mapped to generate a state diagnosis map.