A multi-source heterogeneous data fusion method and system for electrical equipment

By unifying the collection and alignment of multi-source data, extracting lightweight domain-invariant features, and employing two-stage attention fusion, the problems of poor adaptability and weak cross-scenario generalization in multi-source heterogeneous data fusion are solved. This achieves accurate aggregation of multi-modal features and interpretability of diagnostic results, thereby improving the accuracy of reflecting the operating status of electrical equipment.

CN122241590APending Publication Date: 2026-06-19STATE GRID JIANGSU ELECTRIC POWER CO LTD NANJING POWER SUPPLY COMPANY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
STATE GRID JIANGSU ELECTRIC POWER CO LTD NANJING POWER SUPPLY COMPANY
Filing Date
2026-03-20
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies struggle to effectively integrate multi-source heterogeneous data, exhibit poor adaptability, weak cross-scenario generalization ability, and lack interpretability in diagnostic results, leading to inaccurate reflection of equipment operating status.

Method used

By employing unified acquisition and alignment of multi-source data, lightweight domain-invariant feature extraction, two-stage attention fusion, and cross-modal consistency constraints, and through multi-level de-interference, dual-channel attention, cross-modal collaborative modeling, and adaptive weight prediction, we achieve accurate aggregation of multi-modal features and dynamic adjustment of modal contribution weights.

Benefits of technology

It improves the accuracy of multimodal fusion, enhances cross-scenario adaptability, optimizes model structure and deployment adaptability, and improves the interpretability and accuracy of diagnosis.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122241590A_ABST
    Figure CN122241590A_ABST
Patent Text Reader

Abstract

A method and system for multi-source heterogeneous data fusion for electrical equipment is disclosed. The method involves acquiring and preprocessing multi-source heterogeneous data generated during the operation of electrical equipment. The multi-source data is input into corresponding feature extraction networks to obtain initial features. The initial features of different modalities are de-interferenced. In the same dimensional space, dual-channel attention enhancement is applied to the de-interference features of multiple modalities, followed by cross-modal collaborative modeling. Then, the multiple collaborative features are mapped to a unified semantic space and semantically aligned. The semantically aligned collaborative features are concatenated and input into a weight prediction network, which outputs basic fusion weights for different modalities. Combined with the quality scores corresponding to each modality, the semantically aligned multi-modal features are fused to obtain multi-source heterogeneous fused features. This application can accurately aggregate multi-modal features, improving cross-scene adaptability and equipment fault diagnosis accuracy.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of intelligent sensing technology for power equipment, specifically relating to a method and system for multi-source heterogeneous data fusion for electrical equipment. Background Technology

[0002] Current equipment monitoring involves multi-source heterogeneous data, including infrared thermal images, current / voltage waveforms, ambient temperature and humidity, acoustic signals, and visible light images. These data originate from different sensors or monitoring devices and differ significantly across multiple dimensions, such as sampling frequency, dimensional structure, and physical meaning. Due to these data type differences, the extracted features are often heterogeneous in the feature space, making effective fusion difficult. Therefore, single-modal data is typically insufficient to comprehensively depict the actual operating state of the equipment and is easily affected by environmental and operating conditions, leading to insufficient feature stability.

[0003] To address the aforementioned issues, existing technologies have proposed numerous solutions. For example, patent application CN111062633A proposes a power transmission and transformation line and equipment condition assessment system. This system collects multi-source data, including infrared, visible light, electrical quantities, and meteorological data, and performs cleaning and fusion processing to achieve a comprehensive assessment of equipment operating status. Patent application CN113344026A proposes a substation equipment anomaly identification and location method. This method combines multi-dimensional monitoring data acquired by inspection robots to perform fusion analysis and location of abnormal states, improving the efficiency and accuracy of equipment anomaly identification. Patent application CN118859037A proposes a power equipment fault analysis method based on multi-source data fusion. This method combines multi-source data such as voltage, current, and temperature with intelligent diagnostic algorithms to achieve equipment condition monitoring and fault early warning.

[0004] However, in the complex operating environment of on-site operations, the existing technology for processing multi-source heterogeneous data fusion of power equipment still has some shortcomings: (1) It is difficult to effectively fuse data of different modalities, resulting in the fused features failing to accurately reflect the actual operating status of the equipment. (2) It is difficult to adaptively optimize the fusion strategy according to different operating conditions and environmental conditions. (3) It has poor adaptability to different equipment, different operating conditions, and different scenarios, resulting in its generalization ability failing to meet the needs of actual applications. Summary of the Invention

[0005] To address the challenges of multi-source heterogeneous data fusion, poor adaptability of traditional fusion methods, weak cross-scenario generalization, and lack of interpretability in diagnostic results in existing technologies, this invention provides a method and system for attention fusion of multi-source heterogeneous data in electrical equipment. Through unified acquisition and alignment of multi-source data, extraction of lightweight domain-invariant features, two-stage attention fusion, and cross-modal consistency constraints, it achieves accurate aggregation of multi-modal features, dynamically adjusts the contribution weights of each modality, improves the model's cross-scenario adaptability and diagnostic accuracy, and outputs interpretable health scores and fault cause analyses, providing accurate basis for electrical equipment operation and maintenance decisions.

[0006] The first aspect of this application discloses a method for fusing multi-source heterogeneous data for electrical equipment, which adopts the following technical solution: Synchronously acquire multi-source heterogeneous data generated by electrical equipment during operation and preprocess it; the multi-source heterogeneous data includes first modal data and second modal data; the preprocessed multi-source heterogeneous data is input into the corresponding feature extraction network to extract the first modal initial features and the second modal initial features; The initial features are de-interferenced separately; for the initial features of the first mode, a spatial domain de-interference mechanism is used to calculate and output the first mode features based on spatial attention weights; for the initial features of the second mode, a time-frequency domain joint de-interference mechanism is used to modify the features at the time and frequency domain levels respectively and then fuse them to generate the second mode features; In the same dimensional space, cross-modal collaborative modeling is performed after performing dual-channel attention enhancement on the de-interference features of multiple modalities; that is, the attention enhancement features of one modality are mapped to the semantic space of another modality through a mapping network to generate collaborative features; multiple collaborative features are mapped to a unified semantic space and semantically aligned. The semantically aligned collaborative features are concatenated and input into the weight prediction network, which outputs the basic fusion weights of different modalities. Combined with the quality scores corresponding to each modality, the semantically aligned multimodal features are fused to obtain multi-source heterogeneous fusion features. The quality scores are calculated by the quality evaluation network after the de-interference step.

[0007] Furthermore, the time-frequency domain joint interference resolution mechanism includes: The spatial attention mechanism is invoked to obtain the temporal attention weights of the initial features of the second modality. The initial features of the second modality are weighted and then passed through residual connections to obtain the temporal uninterrupted features. The initial features of the second mode are subjected to Fourier transform to obtain a complex frequency domain representation. The real and imaginary parts of the complex frequency domain representation are extracted and their corresponding frequency domain filtering weights are generated by a neural network. The filtering weights are multiplied element-wise with the real and imaginary parts respectively to reconstruct the complex frequency domain representation. The reconstructed complex frequency domain representation is subjected to inverse Fourier transform to obtain the frequency domain anti-interference features. The time-domain and frequency-domain anti-interference features are fused to obtain the second mode feature.

[0008] Furthermore, the steps of the cross-modal collaborative modeling include: For the attention enhancement features of the first modality, they are mapped to the semantic space of the second modality through multiple fully connected layers to obtain the mapping information of the first modality; for the attention enhancement features of the second modality, they are mapped to the semantic space of the first modality through multiple fully connected layers to obtain the mapping information of the second modality. The mapping information of the second modality is fused with the attention-enhanced features of the first modality through residual connections to generate the collaborative features of the first modality; The mapping information of the first modality is fused with the attention-enhanced features of the second modality through residual connections to generate collaborative features of the second modality.

[0009] Furthermore, the weight prediction network outputs basic fusion weights for different modalities, the steps of which include: The semantically aligned semantic features are concatenated along the feature dimension, and then the concatenated features are input into a two-layer fully connected weight prediction network. The basic fusion weight vector is output through softmax. The weighted prediction network is trained and optimized based on the prediction results of the splicing features in the equipment fault diagnosis and classification task.

[0010] Furthermore, the semantically aligned multimodal features are fused, including: Based on the quality scores of the anti-interference features of different modalities, the basic fusion weights are quality-weighted and normalized to generate modal contribution weights for different modalities. Based on modal contribution weights, element-level weighting is used to fuse semantically aligned multimodal features to obtain the final multi-source heterogeneous fusion feature representation.

[0011] Furthermore, the steps for obtaining the quality score include: The quality assessment includes a first fully connected layer, a nonlinear activation layer, and a second fully connected layer connected in sequence. The first fully connected layer is used to reduce the dimensionality of the input feature vector, and the second fully connected layer is used to output a one-dimensional quality score, and the value of the quality score is constrained to the range of 0 to 1 by the Sigmoid function; The quality assessment network is trained and optimized based on the prediction results of the features after de-interference in the equipment fault diagnosis and classification task.

[0012] Furthermore, the multi-source heterogeneous fusion features are used to identify equipment fault states, outputting the probability distribution of various operating states of power equipment, and calculating the classification confidence level; The quality weighting factor is calculated based on the modal contribution weights and quality scores of different modes; The health score of power equipment is calculated based on the probability that the equipment is in normal operating condition, the quality weighting factor, and the classification confidence level.

[0013] The second aspect of this application discloses a multi-source heterogeneous data fusion system for electrical equipment, which implements the multi-source heterogeneous data fusion method described in the first aspect of this application. The system includes: A multimodal data processing module is used to synchronously acquire multi-source heterogeneous data generated by electrical equipment during operation and perform preprocessing; the multi-source heterogeneous data includes first modal data and second modal data; the preprocessed multi-source heterogeneous data is input into the corresponding feature extraction network to extract the first modal initial features and the second modal initial features; The de-interference module is used to de-interference the initial features respectively. For the initial features of the first mode, a spatial domain de-interference mechanism is adopted, and the first mode features are calculated and output based on spatial attention weights. For the initial features of the second mode, a time-frequency domain joint de-interference mechanism is adopted, and the features are corrected at the time domain and frequency domain levels respectively and then fused to generate the second mode features. Cross-modal collaboration module; used to perform cross-modal collaborative modeling after performing dual-channel attention enhancement on the de-interference features of multiple modalities in the same dimensional space; that is, to map the attention enhancement features of one modality to the semantic space of another modality through a mapping network to generate collaborative features; and to map multiple collaborative features to a unified semantic space and perform semantic alignment. The feature fusion module is used to concatenate semantically aligned collaborative features and input them into the weight prediction network, output the basic fusion weights of different modalities, and combine them with the quality scores corresponding to each modality to fuse the semantically aligned multimodal features to obtain multi-source heterogeneous fusion features; the quality scores are calculated by the quality evaluation network after the de-interference step.

[0014] The beneficial effects of this invention are that, compared with the prior art, This invention achieves high-precision integration of multimodal features through unified acquisition of multi-source heterogeneous data, refined preprocessing, lightweight domain-invariant feature extraction, and two-stage attention fusion, and has the following beneficial effects: Improve the accuracy of multimodal fusion: Through multi-level de-interference, dual-channel attention, cross-modal knowledge distillation and adaptive weight prediction mechanisms, suppress interference noise, highlight fault information and solve modal conflict problems.

[0015] Enhance cross-scenario generalization capability: Extract domain-invariant features through feature decoupling, and combine dynamic domain adaptation module to adaptively adjust alignment strength, so that the model maintains stable performance under different devices and working conditions.

[0016] Optimize model structure and deployment adaptability: Reduce the number of parameters by using depthwise separable convolution and multi-scale feature fusion, and reduce computational complexity by combining lightweight network design to adapt to the deployment requirements of terminal devices.

[0017] Improve diagnostic interpretability: Output modal quality scores and contribution weights through quality assessment and weight prediction mechanisms, and combine them with health score indicators to provide a quantitative assessment basis for equipment health status.

[0018] Expanding the scope of application: The multi-level interference resolution mechanism handles multiple interference types, the multi-modal fusion mechanism handles both structural and non-structural faults, and the cross-domain adaptability is adapted to different electrical equipment scenarios. Attached Figure Description

[0019] Figure 1 A schematic diagram of the multi-source heterogeneous data attention fusion process provided in this embodiment; Figure 2 This is a schematic diagram of initial feature extraction provided for an embodiment. Detailed Implementation

[0020] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of this invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of this invention. The embodiments described in this application are merely some embodiments of this invention, and not all embodiments. Based on the spirit of this invention, all other embodiments obtained by those skilled in the art without creative effort are within the protection scope of this invention.

[0021] As an embodiment of this application, a specific implementation method for multi-source heterogeneous data fusion for electrical equipment is disclosed. The execution flow of the method embodiment is as follows: Figure 1 .

[0022] S1: As one implementation method of the embodiment, multi-source heterogeneous raw data generated during the operation of electrical equipment are acquired synchronously and standardized.

[0023] To address the differences in spatial structure, load characteristics, and fault modes across various operating scenarios such as charging piles, metering boxes, distribution area gateways, and user station rooms, this invention employs a unified data acquisition system to synchronously collect multi-source data characterizing the equipment's operating status. The collected data includes at least infrared image data and vibration signal data, used to reflect the equipment's thermal distribution characteristics and mechanical operating status, respectively.

[0024] In one specific implementation, refer to Figure 2Infrared image data is acquired using an infrared thermal imager as the first modality data, and vibration signal data is collected using a vibration sensor as the second modality data. All data are appended with a uniform timestamp during the acquisition process to ensure consistency of different modality data in the time dimension and avoid misalignment or semantic drift during cross-modal fusion.

[0025] In the specific implementation process, infrared images are acquired by infrared thermal imagers and stored in RGB three-channel format. The image resolution is configured according to the equipment type and installation environment to depict the temperature distribution of the equipment surface and key parts. Vibration signals are collected by vibration sensors and continuously recorded during equipment operation using a fixed sampling frequency to form complete time-series data.

[0026] After data acquisition, data from different modalities were preprocessed to eliminate dimensional differences and improve data quality. For infrared image data, the raw pixel values ​​were first normalized:

[0027] in, Indicates the image at position ,aisle The pixel values ​​are normalized, and the image data is constrained to a uniform range to reduce the impact of different devices and environmental conditions. Subsequently, the processed image data is uniformly converted into tensor form that the network can directly accept, serving as input for the subsequent feature extraction module.

[0028] For vibration signal data, the original time-series signal is first standardized to eliminate the influence of vibration amplitude differences under different acquisition conditions. The processing method is as follows:

[0029] in, For a moment The vibration signal value, and These represent the mean and standard deviation of the vibration signal, respectively. Subsequently, the standardized vibration signal is segmented into segments of fixed length to ensure each segment has a uniform time scale, facilitating subsequent modeling and analysis by the feature extraction network.

[0030] Through the above unified acquisition and preprocessing steps, step S1 finally outputs infrared image data and vibration signal data that are time-synchronized, have consistent numerical ranges, and have standardized data structures. This provides standardized input for multimodal feature extraction and fusion processing in step S2, thereby ensuring the data consistency and engineering feasibility of the overall method.

[0031] S2: As one implementation method of the embodiment, a multi-source heterogeneous feature extraction method.

[0032] As an embodiment of the present invention, its execution flow is as follows: Figure 2 The preprocessed multi-source heterogeneous data output in step S1 undergoes feature extraction. This multi-source heterogeneous data includes infrared image data and vibration time-series data. By constructing a backbone network feature extraction framework, discriminative feature representations are learned from different modal data, providing stable and reliable input for subsequent feature quality assessment, attention enhancement, and anomaly detection.

[0033] 2.1: In the specific implementation process, the infrared image data output from step S1 is input into the backbone structure of a two-dimensional convolutional neural network. Through multi-layer convolution, nonlinear mapping, and downsampling operations, local texture information and high-level semantic features in the image are extracted level by level. In parallel, the preprocessed vibration time-series data is input into the backbone structure of a one-dimensional convolutional neural network. Through temporal convolution and pooling operations, the temporal variation pattern and running state features in the vibration signal are extracted. The above feature extraction process can be represented as follows:

[0034] in, and These are the initial image features and the initial vibration features, respectively. and These represent the feature extraction networks corresponding to image modes and vibration modes, respectively. and This represents the corresponding intermediate feature.

[0035] 2.2: To reduce the interference of complex operating environment, background noise and multiple operating conditions on feature expression, a multi-level de-interference mechanism is introduced in the backbone network feature extraction process to adaptively modulate features of different modes.

[0036] (1) For the initial image features, a spatial domain interference resolution method is adopted: First, the initial image features extracted by convolution are... Remodeled into a four-dimensional tensor Then, depthwise separable convolution (groups=256, kernel size 3×3) is applied to the four-dimensional tensor. Spatial convolution is performed to obtain a spatial response map; then batch normalization and ReLU activation are applied to the spatial response map, followed by channel mixing via 1×1 convolution, and finally spatial attention weights are generated using the Sigmoid activation function.

[0037] Furthermore, spatial attention weights are combined with four-dimensional tensors. Element-wise multiplication is performed, and the features are added to the original features via residual connections to obtain the de-interferenced image features. : ; in, This is a spatial attention weight map. This represents the Hadamard product. Spatial attention is used to suppress regional responses irrelevant to the critical structure of the equipment, highlighting spatial information in key areas.

[0038] (2) Regarding the initial vibration characteristics A joint time-domain and frequency-domain interference resolution method is adopted: (2.1) Temporal domain de-interference; First, the initial vibration features extracted by convolution are expanded into a three-dimensional tensor. Then, depthwise separable one-dimensional convolution (groups=256, kernel size 3) is used to perform temporal convolution on the features to obtain the temporal response; batch normalization and ReLU activation are performed on the temporal response, and then channel mixing is performed through 1×1 convolution. Finally, the Sigmoid activation function is used to generate temporal attention weights.

[0039] Furthermore, the temporal attention weights are element-wise multiplied with the original features, and the temporal de-interference features are obtained through residual connections. : ; (2.2) In parallel, frequency domain interference de-interference is performed; the initial vibration features extracted by convolution are... Performing a Fast Fourier Transform yields the complex frequency domain representation. ,extract The real part and the virtual part Each frequency domain filter weight is generated through a two-layer fully connected network. and Next, the filter weights are multiplied element-wise with the real and imaginary parts respectively to reconstruct the complex frequency domain representation: ; Here It is the imaginary unit.

[0040] right Perform inverse fast Fourier transform (IFFT) to obtain frequency domain interference de-interference features. .

[0041] (2.3) After completing the time-domain and frequency-domain interference removal, the time-domain interference removal characteristics are... Frequency domain interference de-interference characteristics After averaging and fusing, the final vibration characteristics are obtained, which are expressed as: ; This refers to the vibration characteristics after interference removal. By reducing the influence of random noise and unsteady-state components on the initial vibration characteristics, the ability of the characteristics to represent the actual operating state of the equipment is improved.

[0042] 2.3: To reduce the interference of complex operating environment, background noise and multiple operating conditions on feature representation, a multi-scale feature modeling structure is introduced on the basis of the de-interference features. The features are abstracted and integrated from different spatial and temporal scales to form a feature representation that is sensitive to both local anomalies and overall trends.

[0043] Specifically, multi-scale feature pyramid modeling is performed on the image features and vibration features after the interference is resolved, in order to enhance the ability to characterize anomaly patterns at different scales.

[0044] Furthermore, through fully connected mapping, the features of different modes are uniformly compressed into the same dimensional space, resulting in infrared image feature vectors and vibration signal feature vectors:

[0045] in, For batch size, For feature dimensions.

[0046] Through the above-described multi-source heterogeneous feature extraction process based on the backbone network, a multimodal feature representation with unified structure and stable semantics is output, providing direct input for feature quality evaluation and attention enhancement in the subsequent step S3, thereby ensuring the overall method has clear hierarchical structure and scalability in function.

[0047] S3: As one implementation of the example, a feature stabilization method for feature quality assessment and attention enhancement.

[0048] 3.1: In practical applications, due to factors such as environmental noise, changes in operating conditions, and differences in sensor states, the amount of effective information contained in different modal features within the same time slice varies significantly. If feature fusion is performed directly without distinguishing feature reliability, low-quality features can easily interfere with the decision-making results, thereby reducing the overall accuracy and stability of anomaly detection. To address these issues, this invention introduces feature quality assessment and attention enhancement mechanisms after feature extraction to further optimize multimodal features.

[0049] In the specific implementation process, the infrared image feature vector output in step S2 is first processed. With vibration signal eigenvector Each feature is assessed individually. This step calculates a dynamic, learnable "quality score" for each modality's features, quantifying its effectiveness in the current sample.

[0050] Infrared image feature vector and vibration signal eigenvectors Two independent fully connected quality evaluation networks are configured. For example, the first fully connected layer maps the feature dimension from 256 to 128 and performs a non-linear transformation to obtain the hidden representation. The second fully connected layer maps the hidden representation to a one-dimensional quality score and constrains it to the [0,1] interval using a sigmoid function. The quality scores are represented as follows: (Corresponding to infrared image feature vectors) and (Corresponding vibration signal feature vector).

[0051] In a further embodiment, the quality assessment network is trained using the accuracy of the feature vectors in the previous fault diagnosis task. Specifically: The quality assessment network, together with the feature extraction network, attention enhancement network, and fusion network, constitutes a complete fault diagnosis network, and is jointly optimized through end-to-end training. This training process follows the logic below: (1) Forward propagation stage; the calculated mass fraction and and the corresponding feature vector and The input is fed into the subsequent feature fusion stage to generate fused features, which in turn output the predicted device status.

[0052] (2) Backpropagation stage: Based on the prediction results output by the forward propagation stage, the cross-entropy loss function and gradient propagation are used for optimization.

[0053] 3.2: After completing the feature quality assessment, a dual-channel attention enhancement mechanism is introduced to adaptively enhance the feature vectors of the infrared image and the vibration signal. This dual-channel attention enhancement mechanism consists of a multi-head attention mechanism and a channel attention mechanism, which are computed in parallel. Specifically: For any modality's feature vector, multi-head attention and channel attention mechanisms are invoked separately for enhancement and then fused to obtain a feature vector enhanced by dual-channel attention. In one example, the fusion method of the two types of attention outputs for any modality's feature vector is represented as follows: ; in, The feature vector is enhanced by dual-channel attention. This is a dimension compression operation used to remove unnecessary dimensions of size 1 from a tensor, thus ensuring that the output shape remains [B,D].

[0054] The feature vectors enhanced by dual-channel attention are represented as infrared attention-enhanced features. and vibration-enhanced attention features .

[0055] Through the above processing, step S3, while maintaining consistency in feature dimensions, outputs attention-enhanced multimodal features and corresponding feature quality scores. This makes the feature representations formed by the same device under different operating conditions more concentrated and stable, and effectively reduces the impact of environmental noise and non-state factors on feature expression. Step S3 does not perform feature fusion or anomaly detection; its output serves as an important input for multimodal feature fusion and device anomaly detection in step S4, thereby ensuring the overall method's structural hierarchy is clear and its functionality is decoupled.

[0056] S4: As one implementation of the example, multi-source heterogeneous feature adaptive fusion through cross-modal knowledge distillation.

[0057] Attention enhancement features of S3 output and Perform cross-modal collaborative modeling and adaptive fusion. Specifically: 4.1: Regarding attention enhancement features and Each feature is mapped to the semantic space of another modality feature through two fully connected layers; specifically, the infrared attention enhancement feature is... Mapped to the vibration semantic space, it is represented as: ; Vibration attention enhancement feature Mapped to the infrared semantic space, it is represented as: ; In the above mapping expression, and These are infrared attention enhancement features. The weight matrix and bias vector used in the first linear transformation (mapping); and These are the weight matrix and bias vector used when performing a second linear transformation (mapping) on ​​the result after the first transformation and activation, respectively. The output is the feature mapped to the vibration semantic space. ; and The vibration characteristics are respectively The weight matrix and bias vector used in the first linear transformation; and These are the weight matrix and bias vector used in the second linear transformation of the vibration features after the first transformation and activation, respectively. Their output is the feature mapped to the infrared semantic space. .

[0058] Furthermore, the mapping information is supplemented into the attention-enhanced features through residual connections, as follows: ; In the formula, and These are infrared collaborative features and vibration collaborative features, respectively. By constructing an inter-modal feature mapping network, knowledge transfer and information compensation between different modes are achieved. This allows for the introduction of supplementary information related to the device state from another mode while maintaining the discriminative characteristics of each mode, thereby reducing the impact of single-mode anomalies or noise on the overall feature expression.

[0059] 4.2: Mapping features from different modalities to a unified, interpretable semantic space, making expressions of the same state "closer" and expressions of different states "farther," thus improving stability across operating conditions. Specifically: Define a set of learnable semantic token matrices. This can be viewed as K basic, interpretable "device state prototypes." The co-located features of the two modes... and Through a shared two-layer fully connected network Each is mapped to a unified semantic space, resulting in and This mapping process can be represented as: ; ; In the above mapping, and These are the weight matrix and bias vector of the first fully connected layer, respectively. and These are the weight matrix and bias vector of the second fully connected layer, respectively.

[0060] Furthermore, and As a query vector, with a semantic token matrix Simultaneously serving as both keys and values, their weighted representations on semantic prototypes are computed through multi-head attention. and , is represented as: ; Furthermore, aligned semantic features are obtained through residual linking. and : ; This application uses a shared semantic token and a unified mapping network to bring features from different modalities but corresponding to the same device state closer together in the semantic space. Features from different states maintain good distinguishability under classification loss and alignment constraints, thereby alleviating the cross-condition feature distribution shift caused by different devices, loads and environmental conditions.

[0061] 4.3: In a further embodiment, to avoid bias in the fusion result caused by environmental factors or changes in operating conditions, an adaptive weight prediction mechanism is introduced in the feature fusion stage. Specifically: semantically aligned semantic features and Concatenate along the feature dimension Then The input is fed into a two-layer fully connected weight prediction network, and a basic fusion weight vector is output through softmax. and To further explain, the training approach for the same quality assessment network is an end-to-end method, through... and Based on the prediction results of device status, the weight prediction network is trained by combining the loss function and gradient propagation method. It gradually learns a weight allocation strategy that can improve classification performance, so that the modalities that contribute more to the classification task receive higher weights.

[0062] The quality score obtained from S3 and The basic weights are then quality-weighted and normalized. ; ; In the formula, and Contribute weights to modalities.

[0063] Finally, based on the modal contribution weights, an element-wise weighted approach is used to fuse the multimodal features, resulting in the final multi-source heterogeneous fused feature representation: ; Through the joint mechanism of cross-modal knowledge distillation, semantic alignment, adaptive weight prediction and element-level fusion, step S4 effectively suppresses the interference of low-quality modes on the fusion result while making full use of multimodal complementary information, and obtains a fusion feature representation that is semantically consistent, structurally stable and has good discriminative ability. This fusion feature and its corresponding modal contribution weight serve as important inputs for equipment health assessment and fault diagnosis in step S5.

[0064] S5: As one implementation method of the embodiment, the device operating status is determined; The fused features output by S4 can be used to identify equipment fault states. In some reference embodiments, a deep learning model can be trained to output the probability distribution of various operating states that the device may be in, including normal operating states and various abnormal operating states. Those skilled in the art can use existing technologies such as LSTM and other conventional techniques to achieve this, which will not be elaborated here.

[0065] After completing multimodal feature extraction and cross-modal adaptive fusion, step S5 determines the device's operating status. Specifically, the fused feature vector obtained in step S4 is denoted as... (in The data is input into the state discrimination model, which outputs the discrimination scores and probability distributions corresponding to various possible operating states of the device. The state discrimination model employs a fully connected classifier: it integrates features... The probability distribution is obtained by mapping the logits to each running state and normalizing them using Sofmax. ,in This represents the number of operating status categories, including normal operating status and various abnormal operating statuses (e.g., normal operation, slight overheating, severe overheating, insulation aging, partial discharge abnormality, mechanical loosening, etc.).

[0066] In further explanation, the state discrimination model can be constructed and implemented using existing deep learning methods. The focus of this application is on feature processing, and the specific process of constructing and training the state discrimination model will not be elaborated here.

[0067] Furthermore, the equipment fault identification phase identifies the probability distribution of the equipment in various operating states. , This represents the number of operational status categories. Based on this, a device health assessment index is constructed, specifically: To scientifically evaluate the certainty of the model's predictions, this application introduces information entropy as a basis for measuring classification confidence. This is derived from the output probability distribution. The classification confidence score is calculated using information entropy and is expressed as follows: ;

[0068] in, The classification confidence score has a range of [0,1]. A larger value indicates that the model is more confident in the current prediction. The state discrimination model predicts it as the class 1. The probability of a device state. In the above formula, The larger the value, the more uniform the probability distribution (i.e., the model predicts similar probabilities for all categories), and the higher the model's prediction uncertainty.

[0069] Furthermore, based on the modal contribution weights and feature quality scores of the infrared / vibration features targeted by the fault identification task, a quality weighting factor is calculated, expressed as: .

[0070] Furthermore, the health score can be expressed as In the formula, This indicates the probability that the device is in normal operating condition. , These are weighting coefficients. They are used to balance between the "probability of normal state" and the "quality-weighted confidence score". This design ensures that even when data quality is poor or the classification result has low confidence, the weighting factor remains relatively constant. Higher, also due to or A lower score would lower the overall health score, thus avoiding overly optimistic estimates based on a single probability value.

[0071] Ultimately, the system outputs the device operating status judgment results (i.e., status categories and their probability distributions) and the corresponding health assessment indicators, providing a reliable basis for device anomaly identification, operating status assessment and subsequent operation and maintenance strategy formulation.

[0072] As an embodiment of this application, a multi-source heterogeneous data fusion system for electrical equipment is disclosed. Employing the specific implementation method described above for multi-source heterogeneous data fusion, the system includes: A multimodal data processing module is used to synchronously acquire multi-source heterogeneous data generated by electrical equipment during operation and perform preprocessing; the multi-source heterogeneous data includes first modal data and second modal data; the preprocessed multi-source heterogeneous data is input into the corresponding feature extraction network to extract the first modal initial features and the second modal initial features; The de-interference module is used to de-interference the initial features respectively. For the initial features of the first mode, a spatial domain de-interference mechanism is adopted, and the first mode features are calculated and output based on spatial attention weights. For the initial features of the second mode, a time-frequency domain joint de-interference mechanism is adopted, and the features are corrected at the time domain and frequency domain levels respectively and then fused to generate the second mode features. Cross-modal collaboration module; used to perform cross-modal collaborative modeling after performing dual-channel attention enhancement on the de-interference features of multiple modalities in the same dimensional space; that is, to map the attention enhancement features of one modality to the semantic space of another modality through a mapping network to generate collaborative features; and to map multiple collaborative features to a unified semantic space and perform semantic alignment. The feature fusion module is used to concatenate semantically aligned collaborative features and input them into the weight prediction network, output the basic fusion weights of different modalities, and combine them with the quality scores corresponding to each modality to fuse the semantically aligned multimodal features to obtain multi-source heterogeneous fusion features; the quality scores are calculated by the quality evaluation network after the de-interference step.

[0073] As an embodiment of this application, an electronic device is provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the computer program is loaded onto the processor, it employs the specific implementation method described above for the multi-source heterogeneous data fusion method.

[0074] As an embodiment of this application, a computer-readable storage medium is provided, which stores a computer program that, when executed by a processor, employs the specific implementation method described above for the multi-source heterogeneous data fusion method.

[0075] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit it. Although the present invention has been described in detail with reference to the above embodiments, those skilled in the art should understand that modifications or equivalent substitutions can still be made to the specific implementation of the present invention. Any modifications or equivalent substitutions that do not depart from the spirit and scope of the present invention should be covered within the protection scope of the claims of the present invention.

Claims

1. A method for fusing multi-source heterogeneous data for electrical equipment, characterized in that, include: Synchronously acquire multi-source heterogeneous data generated by electrical equipment during operation and preprocess it; the multi-source heterogeneous data includes first modal data and second modal data; The preprocessed multi-source heterogeneous data are input into the corresponding feature extraction networks to extract the initial features of the first modality and the initial features of the second modality. The initial features are de-interferenced separately; for the initial features of the first mode, a spatial domain de-interference mechanism is used to calculate and output the first mode features based on spatial attention weights; for the initial features of the second mode, a time-frequency domain joint de-interference mechanism is used to modify the features at the time and frequency domain levels respectively and then fuse them to generate the second mode features; In the same dimensional space, cross-modal collaborative modeling is performed after performing dual-channel attention enhancement on the de-interference features of multiple modalities; that is, the attention enhancement features of one modality are mapped to the semantic space of another modality through a mapping network to generate collaborative features; multiple collaborative features are mapped to a unified semantic space and semantically aligned. The semantically aligned collaborative features are concatenated and input into the weight prediction network, which outputs the basic fusion weights of different modalities. Combined with the quality scores corresponding to each modality, the semantically aligned multimodal features are fused to obtain multi-source heterogeneous fusion features. The quality scores are calculated by the quality evaluation network after the de-interference step.

2. The method for multi-source heterogeneous data fusion for electrical equipment according to claim 1, characterized in that, The time-frequency domain joint interference resolution mechanism includes: The spatial attention mechanism is invoked to obtain the temporal attention weights of the initial features of the second modality. The initial features of the second modality are weighted and then passed through residual connections to obtain the temporal uninterrupted features. The initial features of the second mode are subjected to Fourier transform to obtain a complex frequency domain representation. The real and imaginary parts of the complex frequency domain representation are extracted and their corresponding frequency domain filtering weights are generated by a neural network. The filtering weights are multiplied element-wise with the real and imaginary parts respectively to reconstruct the complex frequency domain representation. The reconstructed complex frequency domain representation is subjected to inverse Fourier transform to obtain the frequency domain anti-interference features. The time-domain and frequency-domain anti-interference features are fused to obtain the second mode feature.

3. The method for multi-source heterogeneous data fusion for electrical equipment according to claim 1, characterized in that, The steps of the cross-modal collaborative modeling include: For the attention enhancement features of the first modality, they are mapped to the semantic space of the second modality through multiple fully connected layers to obtain the mapping information of the first modality; for the attention enhancement features of the second modality, they are mapped to the semantic space of the first modality through multiple fully connected layers to obtain the mapping information of the second modality. The mapping information of the second modality is fused with the attention-enhanced features of the first modality through residual connections to generate the collaborative features of the first modality; The mapping information of the first modality is fused with the attention-enhanced features of the second modality through residual connections to generate collaborative features of the second modality.

4. The method for multi-source heterogeneous data fusion for electrical equipment according to claim 1, characterized in that, The weight prediction network outputs basic fusion weights for different modalities, and the steps include: The semantically aligned semantic features are concatenated along the feature dimension, and then the concatenated features are input into a two-layer fully connected weight prediction network. The basic fusion weight vector is output through softmax. The weighted prediction network is trained and optimized based on the prediction results of the splicing features in the equipment fault diagnosis and classification task.

5. The method for multi-source heterogeneous data fusion for electrical equipment according to claim 1, characterized in that, The fusion of semantically aligned multimodal features includes: Based on the quality scores of the anti-interference features of different modalities, the basic fusion weights are quality-weighted and normalized to generate modal contribution weights for different modalities. Based on modal contribution weights, element-level weighting is used to fuse semantically aligned multimodal features to obtain the final multi-source heterogeneous fusion feature representation.

6. The method for multi-source heterogeneous data fusion for electrical equipment according to claim 1, characterized in that, The steps for obtaining the quality score include: The quality assessment includes a first fully connected layer, a nonlinear activation layer, and a second fully connected layer connected in sequence. The first fully connected layer is used to reduce the dimensionality of the input feature vector, and the second fully connected layer is used to output a one-dimensional quality score, and the value of the quality score is constrained to the range of 0 to 1 by the Sigmoid function; The quality assessment network is trained and optimized based on the prediction results of the features after de-interference in the equipment fault diagnosis and classification task.

7. The method for multi-source heterogeneous data fusion for electrical equipment according to claim 1, characterized in that, The multi-source heterogeneous fusion features are used to identify equipment fault states, outputting the probability distribution of various operating states of power equipment, and calculating the classification confidence. The quality weighting factor is calculated based on the modal contribution weights and quality scores of different modes; The health score of power equipment is calculated based on the probability that the equipment is in normal operating condition, the quality weighting factor, and the classification confidence level.

8. A multi-source heterogeneous data fusion system for electrical equipment, executing the multi-source heterogeneous data fusion method as described in any one of claims 1-7, characterized in that, The system includes: A multimodal data processing module is used to synchronously acquire multi-source heterogeneous data generated by electrical equipment during operation and perform preprocessing; the multi-source heterogeneous data includes first modal data and second modal data; the preprocessed multi-source heterogeneous data is input into the corresponding feature extraction network to extract the first modal initial features and the second modal initial features; The de-interference module is used to de-interference the initial features respectively. For the initial features of the first mode, a spatial domain de-interference mechanism is adopted, and the first mode features are calculated and output based on spatial attention weights. For the initial features of the second mode, a time-frequency domain joint de-interference mechanism is adopted, and the features are corrected at the time domain and frequency domain levels respectively and then fused to generate the second mode features. Cross-modal collaboration module; used to perform cross-modal collaborative modeling after performing dual-channel attention enhancement on the de-interference features of multiple modalities in the same dimensional space; that is, to map the attention enhancement features of one modality to the semantic space of another modality through a mapping network to generate collaborative features; and to map multiple collaborative features to a unified semantic space and perform semantic alignment. The feature fusion module is used to concatenate semantically aligned collaborative features and input them into the weight prediction network, output the basic fusion weights of different modalities, and combine them with the quality scores corresponding to each modality to fuse the semantically aligned multimodal features to obtain multi-source heterogeneous fusion features; the quality scores are calculated by the quality evaluation network after the de-interference step.

9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the computer program is loaded into the processor, it implements the multi-source heterogeneous data fusion method according to any one of claims 1-7.

10. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by the processor, it implements the multi-source heterogeneous data fusion method according to any one of claims 1-7.